Wide area distributed file system for MapReduce applications on FutureGrid platform

Abstract

Map/Reduce is a software framework introduced by Google for processing datasets with the help of parallel processing using a large number of computer nodes. Map/Reduce has received significant attention as a programming model for various scientific problems using a wide variety of computing architectures, including large-scale compute clusters, GPGPU, and multi-core architectures. However, some data-intensive computing applications such as those used as part of the Large Hadron Collider (LHC) computing project, earth observation sciences, and biomedical applications require large-scale distributed data processing across multiple data centers connected via high speed networking. In this project, we aim to develop a software framework for Map/Reduce across distributed data centers in support of data-intensive applications.

Intellectual Merit

"We will develop a Wide area Distributed File System, a high performance large-scale file system that integrates existing data management toolkits. The Cyberaide Farm is not expected to provide a complete set of file system functionalities; however, it will deliver high performance data transfer and replication management across distributed data centers. We will develop a Cyberaide Farm Runtime System, which extends the Apache Hadoop framework and develops a set of APIs to accommodate the developer and user communities."

Broader Impact

Our work will be implemented and tested on distributed production data centers powered by the FutureGrid projects. We will use standard data-intensive computing benchmarks, high performance engineering applications and data-intensive social computing applications to evaluate our implementations.

Use of FutureGrid

We are planning to use FutureGrid as a test bed to setup several clusters, to deploy wide-area parallel file system for MapReduce applications.

Scale Of Use

Distributed Hadoop clusters that across wide area, each cluster at least has 8 compute nodes

Publications


FG-60
Lizhe Wang
Indiana University
Closed

Project Alumni

Fugang Wang
Gregor von Laszewski

Timeline

1 year 38 weeks ago
1 year 38 weeks ago