Evaluation of Hadoop for IO-intensive applications

Project Information

Discipline: Computer Science (401)
Orientation: Research

Abstract

One advantage of MapReduce is its data-affinity aware scheduling, which makes MapReduce more efficient than traditional HPC systems for data-intensive applications. In this project, we want to evaluate the performance of Hadoop for IO intensive applications.

Intellectual Merit

We closely investigate how Hadoop performs to run IO-intensive applications. For Hadoop, the execution time of MapReduce jobs is impacted by many factors. We choose some important factors (e.g. input data size, the number of nodes) and measure how they impact the job run time.

Broader Impacts

This project enables Hadoop users to understand how the factors we considered influence performance. As a result, those factors can be accordingly tuned by users to maximize the performance for their specific environments.

Project Contact

Project Lead: Zhenhua Guo (zhguo)
Project Manager: Zhenhua Guo (zhguo)

Resource Requirements

Hardware Systems

hotel (IBM iDataPlex at U Chicago)
india (IBM iDataPlex at IU)
sierra (IBM iDataPlex at SDSC)

Use of FutureGrid

We wish to use bare metal machines to test how IO-intensive applications perform on Hadoop in FutureGrid.

Scale of Use

We plan to use 20-60 HPC nodes.

Project Timeline

Submitted: 10/28/2010 - 15:21
Completed: 09/06/2012

Evaluation of Hadoop for IO-intensive applications

Project Information

Project Contact

Resource Requirements

Project Timeline

About

Support

Community

Projects