Fault management in Map-Reduce

Project Information

Computer Science (401) 

The purpose of this project is to evaluate performance penalties experienced by Map-Reduce jobs in the presence of different types of injected faults. We will begin with the Hadoop implementation of Map-Reduce. Hadoop has in built fault-tolerance mechanisms. However, these mechanisms result in performance penalties in the presence of faults as indicated by prior research on our in-house clusters as well as by other recent literature. This project will enable us to make large-scale evaluations of these penalties, especially in the heterogeneous environment provided by FutureGrid.

Intellectual Merit

The ability to predict performance for distributed applications is a challenging problem.The ability to quantify performance for the case of Map-Reduce applications will enable us to propose mechanisms to overcome these penalties, enabling Map-Reduce to be more readily used for applications requiring performance guarantees.

Broader Impacts

Performance in the presence of faults is a critical goal for applications executing in enterprise data centers and cloud computing environments. The technologies to achieve this will be helpful to a wide range of communities both in academia, industry and government that use Map-Reduce for bioinformatics, text-mining, machine-learning, web-indexing, ad-analytics, etc.

Project Contact

Project Lead
Selvi Kadirvel (selvik) 
Project Manager
Selvi Kadirvel (selvik) 

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • sierra (IBM iDataPlex at SDSC)
Use of FutureGrid

The scale of FutureGrid resources and its heterogeneity will help extend research conducted on in-house Map-Reduce clusters.

Scale of Use

I would like to begin with a 16 VM cluster and be able to expand to few hundred VMs as my experiments proceed.

Project Timeline

06/29/2012 - 06:43