Improve resource utilization in MapReduce
Abstract
Hadoop partitions physical resources into conceptual map and reduce slots to control the maximum number of tasks that can concurrently run on each slave node. We observed that this mechanism can result in low resource utilization when not all task slots on a node are used. In this project, we propose a new mechanism called resource stealing to increase resource utilization. In addition, the default mechanism to trigger speculative execution may incur the execution of many non-beneficial speculative tasks that are killed before completion. In this project, we propose Benefit Aware Speculative Execution (BASE) which reduces the number of non-beneficial speculative tasks without sacrificing performance.
Intellectual Merit
This project addresses the inefficiencies of Hadoop. Our proposed resource stealing increases resource utilization without interfering with normal Hadoop task scheduling. In addition, our proposed Benefit Aware Speculative Execution (BASE) can eliminate most of the non-beneficial speculative tasks without degrading performance.
Broader Impact
MapReduce/Hadoop has been used by both industry and academia to run large-scale data processing applications. The proposed approaches evaluated in this project increase resource utilization, which can improve throughput. It enables users to run MapReduce jobs more efficiently, and therefore reduces job run time. So the productivity of scientists is increased because they can get results faster and tune their applications accordingly.
Use of FutureGrid
We used the High-Performance Computing (HPC) environments provided by FutureGrid to run experiments to evaluate our proposed approaches.
Scale Of Use
We used 20 - 40 of bare metal machines on a periodic basis.
Publications
- [ResStealAndBASE] Guo, Z., G. Fox, M. Zhou, and Y. Ruan, "Improving Resource Utilization in MapReduce", IEEE Computer Society,
Results
The detailed results of this project are presented in our paper "Improving Resource Utilization in MapReduce" [1].
References
- [ResStealAndBASE] Guo, Z., G. Fox, M. Zhou, and Y. Ruan, "Improving Resource Utilization in MapReduce", the 2012 IEEE International Conference on Cluster Computing, Beijing, China, IEEE Computer Society, 2012.