Distributed Mapreduce

Abstract

Data is generated at a rapid speed at all the places in the world nowadays. This nature of widely spread of data and their geographical distance to all the data centers has brought up a big challenge for big companies data analysis and for come-as-use cloud service. We have compared the performance of using different architecture of Hadoop on some widely scattered data sets by experiments, and proposed an approach of doing the computation as close to the data as possible. We also pointed out that there could be several factors we need to balance when we have data outside the cloud which we need to do the computation on. http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf
And this project is to for further experiment on the implementation of this distributed mapreduce system.

Intellectual Merit

This project will improve the research at distributed systems especially on the problems that moving data might be costly in the whole workflow of computation(like scientific data which needs to be imported into computing clusters). This project will also help build an improved Hadoop prototype which will have a better performance for widely distributed data set and it will be open sourced so that it could be used for other scientific experiments.
We have previously proved it is a worthwhile problem to work at in this published paper.
http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf

Broader Impact

Included student ranged from Ph.D to undergraduate students, it is also improving collaborated research across several research group including network, database, distributed system and database. The finished software could be used for further research topics like spatial data mining and social networks.

Use of FutureGrid

Run some simulation of different data transfer stategy;
Deploy our modified hadoop;
Run some experiments with the modified hadoop to see its performance.

Scale Of Use

a few VMs for an experiment

Publications


FG-134
Chenyu Wang
University of Minnesota, Twin Cities
Active

Project Members

Bingjing Zhang
Jerome Mitchell

Timeline

3 years 12 weeks ago