Distributed Real-time Computation System

Abstract

This is a course project, my team is going to learn about real-time computation system Storm, and compare it with Hadoop. Following is our work steps: 1. Deploy Apache Storm and Hadoop on FutureGrid 2. Test some basic testing projects on both systems such as WordCound, BLAST and PageRank. Compare the performance 3. Test real-time data project, we've decided to use Twitter tweet data 4. After finished previous steps, we will consult professor for futher work

Intellectual Merit

Comparsion between Real-time Distributed Computation System and Hadoop MapReduce in Batch processing

Broader Impact

First, we can compare the performance between Storm and Hadoop in Batch processing. Then we hope to find or develop a 'real-time' version hadoop, and test on real-time processing with Storm. The results can be very useful.

Use of FutureGrid

  1. Deploy Apache Storm and Hadoop on FutureGrid
  2. Test some basic testing projects on both systems such as WordCound, BLAST and PageRank
  3. Test real-time data project such asTwitter tweet data

Scale Of Use

Not sure yet, we need to figure out how to deploy Storm first. I think we will need several physical maches and run computations for few hours several times a week in the next two months.

Publications


Results


FG-419
Yukai Xiao
Indiana University Bloomington
Active

Project Members

Hsi-Yun Cheng
Tianhao Cao
Wenlien Tsao

Keywords

Timeline

28 weeks 5 days ago