Distributed Real-time Computation System

Project Information

Discipline
Computer Science (401) 
Orientation
Education 
Abstract

This is a course project, my team is going to learn about real-time computation system Storm, and compare it with Hadoop. Following is our work steps: 1. Deploy Apache Storm and Hadoop on FutureGrid 2. Test some basic testing projects on both systems such as WordCound, BLAST and PageRank. Compare the performance 3. Test real-time data project, we've decided to use Twitter tweet data 4. After finished previous steps, we will consult professor for futher work

Intellectual Merit

Comparsion between Real-time Distributed Computation System and Hadoop MapReduce in Batch processing

Broader Impacts

First, we can compare the performance between Storm and Hadoop in Batch processing. Then we hope to find or develop a 'real-time' version hadoop, and test on real-time processing with Storm. The results can be very useful.

Project Contact

Project Lead
Yukai Xiao (xiaoyuk) 
Project Manager
Yukai Xiao (xiaoyuk) 
Project Members
Hsi-Yun Cheng, Wenlien Tsao, Tianhao Cao  

Resource Requirements

Hardware System
  • Not sure
 
Use of FutureGrid

1. Deploy Apache Storm and Hadoop on FutureGrid 2. Test some basic testing projects on both systems such as WordCound, BLAST and PageRank 3. Test real-time data project such asTwitter tweet data

Scale of Use

Not sure yet, we need to figure out how to deploy Storm first. I think we will need several physical maches and run computations for few hours several times a week in the next two months.

Project Timeline

Submitted
03/24/2014 - 22:02