Comparison of MapReduce Systems

Abstract

We are in a Big Data era. The rapid growth of information in science requires processing of large amounts of scientific data. One proposed solution is to apply data flow languages and runtimes to data intensive applications. The sample systems include the Google MapReduce, Microsoft Dryad, and CGL Twister. In this project, we will study applicability and performance of using those runtimes to solve Big Data issue in the science.

Intellectual Merit

Deploy runtimes such as Twister, Hadoop, Dryad on FutureGrid resources.
Explore the applicability of new programming model and runtime technologies that can be used to solve thte Big Data issue in science.

Broader Impact

Explore the feasibility of run data intensive applications with map reduce related technology on a dynamic,elastic, provisioned resources infrastructure.
Abstract the design patterns of scientific applications for runtimes, such as Dryad, Twister, and Hadoop in HPC, Cluster, and Cloud.

Use of FutureGrid

Deploy runtimes such as Twister, Hadoop, Dryad on Future Grid resources.

Scale Of Use

Samples of scale of use include:
1) Parallel SW-G job with 10,000 sequences on 32 nodes in Hadoop cluster.
2) Parallel Matrix Multiplication with the order of 31200 on 16 nodes in Dryad cluster.
3) Parallel Pagerank with 10GB web graph on 16 nodes in Twister cluster.

Publications

Results

We started a project that evaluate latest version of DryadLINQ founded by MS in December 2010.
We evaluated the programmability and performance of DryadLINQ CTP for data intensive applications in HPC cluster.

The solid results include:
1) Technical report about DryadLINQ CTP Evaluation in July 2011
2) Technical paper in DataCloud-SC11 in Nov 2011

Note: we do not use FG resources this time. We would like to evaluate Dryad cluster when they are available on FutureGrid.

Project Number: FG-17

Project Lead: Judy Qiu

Project Manager: Yang Ruan

Institution: Indiana University

Project Status: Closed

View Project Details

Project Members

Judy Qiu

Yang Ruan

Yuduo Zhou

Project Alumni

Hui Li

Keywords

dryad, hadoop, twister

Timeline

Completed: 1 year 26 weeks ago

Updated: 1 year 26 weeks ago