Distributed Mapreduce
Project Information
- Discipline
- Computer Science (401)
- Subdiscipline
- 11.07 Computer Science
- Orientation
- Research
Data is generated at a rapid speed at all the places in the world nowadays. This nature of widely spread of data and their geographical distance to all the data centers has brought up a big challenge for big companies data analysis and for come-as-use cloud service. We have compared the performance of using different architecture of Hadoop on some widely scattered data sets by experiments, and proposed an approach of doing the computation as close to the data as possible. We also pointed out that there could be several factors we need to balance when we have data outside the cloud which we need to do the computation on. http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf And this project is to for further experiment on the implementation of this distributed mapreduce system.
Intellectual MeritThis project will improve the research at distributed systems especially on the problems that moving data might be costly in the whole workflow of computation(like scientific data which needs to be imported into computing clusters). This project will also help build an improved Hadoop prototype which will have a better performance for widely distributed data set and it will be open sourced so that it could be used for other scientific experiments. We have previously proved it is a worthwhile problem to work at in this published paper. http://www-users.cs.umn.edu/~cardosa/cardosa-mapred11.pdf
Broader ImpactsIncluded student ranged from Ph.D to undergraduate students, it is also improving collaborated research across several research group including network, database, distributed system and database. The finished software could be used for further research topics like spatial data mining and social networks.
Project Contact
- Project Lead
- Chenyu Wang (wang2143)
- Project Manager
- Chenyu Wang (wang2143)
- Project Members
- Jerome Mitchell, Bingjing Zhang
Resource Requirements
- Hardware Systems
-
- foxtrot (IBM iDataPlex at UF)
- hotel (IBM iDataPlex at U Chicago)
- india (IBM iDataPlex at IU)
Run some simulation of different data transfer stategy; Deploy our modified hadoop; Run some experiments with the modified hadoop to see its performance.
Scale of Usea few VMs for an experiment
Project Timeline
- Submitted
- 07/18/2011 - 13:50