MapReduce Scheduling in Cloud Environments
Abstract
This project aims to develop modules for a MapReduce framework capable of efficiently utilizing the resources in a cloud environment. We will conduct research on the problems associated with existing MapReduce implementations and the components and trade-offs necessary for scheduling and migrating tasks dynamically.
Intellectual Merit
This project will advance the state of the art in scheduling for large-scale data intensive applications by taking into account the heterogeneity in the infrastructure of large grids and clouds.
Broader Impact
The software modules resulting from this project will be released as open-source to the HPC community.
Use of FutureGrid
Plan to use the FutureGrid resources to increase the scale of our experiments from a few nodes to much larger sizes.
Scale Of Use
100-300 nodes, for experiments run a few times a week, for the next 6-12 months.