FG-260
Improving Resource Utilization in MapReduce
Improve resource utilization in MapReduce
Project Details
- Project Lead
- Zhenhua Guo
- Project Manager
- Zhenhua Guo
- Institution
- Indiana University, Pervasive Technology Institute
- Discipline
- Computer Science (401)
Abstract
Hadoop partitions physical resources into conceptual map and reduce slots to control the maximum number of tasks that can concurrently run on each slave node. We observed that this mechanism can result in low resource utilization when not all task slots on a node are used. In this project, we propose a new mechanism called resource stealing to increase resource utilization. In addition, the default mechanism to trigger speculative execution may incur the execution of many non-beneficial speculative tasks that are killed before completion. In this project, we propose Benefit Aware Speculative Execution (BASE) which reduces the number of non-beneficial speculative tasks without sacrificing performance.
Intellectual Merit
This project addresses the inefficiencies of Hadoop. Our proposed resource stealing increases resource utilization without interfering with normal Hadoop task scheduling. In addition, our proposed Benefit Aware Speculative Execution (BASE) can eliminate most of the non-beneficial speculative tasks without degrading performance.
Broader Impacts
MapReduce/Hadoop has been used by both industry and academia to run large-scale data processing applications. The proposed approaches evaluated in this project increase resource utilization, which can improve throughput. It enables users to run MapReduce jobs more efficiently, and therefore reduces job run time. So the productivity of scientists is increased because they can get results faster and tune their applications accordingly.
Scale of Use
We used 20 - 40 of bare metal machines on a periodic basis.
Results
We ran CPU-, IO-, and network-intensive applications to evaluate our algorithms. The results show resource stealing can achieve higher resource utilization and thus reduce job run time. Our BASE optimization reduces the number of non-beneficial speculative tasks significantly without incurring performance degradation.
The detailed results of this project are presented in our paper "Improving Resource Utilization in MapReduce" [bib]ResStealAndBASE[/bib].