Fault management in Map-Reduce

Abstract

The purpose of this project is to evaluate performance penalties experienced by Map-Reduce jobs in the presence of different types of injected faults. We will begin with the Hadoop implementation of Map-Reduce. Hadoop has in built fault-tolerance mechanisms. However, these mechanisms result in performance penalties in the presence of faults as indicated by prior research on our in-house clusters as well as by other recent literature. This project will enable us to make large-scale evaluations of these penalties, especially in the heterogeneous environment provided by FutureGrid.

Intellectual Merit

The ability to predict performance for distributed applications is a challenging problem.The ability to quantify performance for the case of Map-Reduce applications will enable us to propose mechanisms to overcome these penalties, enabling Map-Reduce to be more readily used for applications requiring performance guarantees.

Broader Impact

Performance in the presence of faults is a critical goal for applications executing in enterprise data centers and cloud computing environments. The technologies to achieve this will be helpful to a wide range of communities both in academia, industry and government that use Map-Reduce for bioinformatics, text-mining, machine-learning, web-indexing, ad-analytics, etc.

Use of FutureGrid

The scale of FutureGrid resources and its heterogeneity will help extend research conducted on in-house Map-Reduce clusters.

Scale Of Use

I would like to begin with a 16 VM cluster and be able to expand to few hundred VMs as my experiments proceed.

Publications


FG-235
Selvi Kadirvel
University of Florida
Active

Timeline

2 years 14 weeks ago