Evaluation of Hadoop for IO-intensive applications
Project Information
- Discipline
- Computer Science (401)
- Orientation
- Research
One advantage of MapReduce is its data-affinity aware scheduling, which makes MapReduce more efficient than traditional HPC systems for data-intensive applications. In this project, we want to evaluate the performance of Hadoop for IO intensive applications.
Intellectual MeritWe closely investigate how Hadoop performs to run IO-intensive applications. For Hadoop, the execution time of MapReduce jobs is impacted by many factors. We choose some important factors (e.g. input data size, the number of nodes) and measure how they impact the job run time.
Broader ImpactsThis project enables Hadoop users to understand how the factors we considered influence performance. As a result, those factors can be accordingly tuned by users to maximize the performance for their specific environments.
Project Contact
- Project Lead
- Zhenhua Guo (zhguo)
- Project Manager
- Zhenhua Guo (zhguo)
Resource Requirements
- Hardware Systems
-
- hotel (IBM iDataPlex at U Chicago)
- india (IBM iDataPlex at IU)
- sierra (IBM iDataPlex at SDSC)
We wish to use bare metal machines to test how IO-intensive applications perform on Hadoop in FutureGrid.
Scale of UseWe plan to use 20-60 HPC nodes.
Project Timeline
- Submitted
- 10/28/2010 - 15:21
- Completed
- 09/06/2012