Evaluation of Hadoop for IO-intensive applications

Abstract

One advantage of MapReduce is its data-affinity aware scheduling, which makes MapReduce more efficient than traditional HPC systems for data-intensive applications. In this project, we want to evaluate the performance of Hadoop for IO intensive applications. 

Intellectual Merit

We closely investigate how Hadoop performs to run IO-intensive applications. For Hadoop, the execution time of MapReduce jobs is impacted by many factors. We choose some important factors (e.g. input data size, the number of nodes) and measure how they impact the job run time.

Broader Impact

This project enables Hadoop users to understand how the factors we considered influence performance. As a result, those factors can be accordingly tuned by users to maximize the performance for their specific environments.

Use of FutureGrid

We wish to use bare metal machines to test how IO-intensive applications perform on Hadoop in FutureGrid.

Scale Of Use

We plan to use 20-60 HPC nodes.

Publications


Results

FG-27
Zhenhua Guo
Indiana University
Closed

Keywords

Timeline

2 years 5 weeks ago
1 year 28 weeks ago