Cloud Peer

Abstract

Scope of Research Work:

Currently available data replication play a vital role in cloud computing. Our goal is to build a efficient data replication algorithm to increase the data availability and decrease the bandwidth. We would store large amount of data in Hadoop Distributed File System (HDFS) and a efficient searching algorithm to find the replicated data in the datacenter. We will use Java or phyton as our development language. We will use Map/Reduce framework provided by Hadoop.

Research Objectives:

We will use the resources to generate large amount of data initially and to implement the replication algorithm. Then we will pre-process and store it in a Hadoop cluster and query it using Map/Reduce programming. This itself is quite challenging and requires a lot of disk space. We will try to find the best possible way to query the data by Map/Reduce programming.

Required Open Cirrus Resources:

For data generation and pre-processing phase: 8 cores @ 16 GB/node main memory and 3 TB/nodes of storage. For query phase: 128 cores @ 32 GB/node main memory and 1 TB/nodes of storage in addition to data storage.

Intellectual Merit

The Availability of the data can be increased and hence the users can benifited and the cost of the replication can be decreased by using effective replications

Broader Impact

The Availability of the data can be increased and hence the users can benifited and the cost of the replication can be decreased by using effective replications

Use of FutureGrid

It is highly relevant for my project

Scale Of Use

I want to run a set of comparisons on entire systems and for each I'll need about more number of days to do that

Publications

Project Number: FG-158

Project Lead: Kiruba Karan

Institution: Anna University

Project Status: Active

View Project Details

FutureGrid Experts

Zhenhua Guo

Keywords

cloud

Timeline

Updated: 3 years 1 week ago