Distributed Real-time Computation System
Abstract
This is a course project, my team is going to learn about real-time computation system Storm, and compare it with Hadoop. Following is our work steps: 1. Deploy Apache Storm and Hadoop on FutureGrid 2. Test some basic testing projects on both systems such as WordCound, BLAST and PageRank. Compare the performance 3. Test real-time data project, we've decided to use Twitter tweet data 4. After finished previous steps, we will consult professor for futher work
Intellectual Merit
Comparsion between Real-time Distributed Computation System and Hadoop MapReduce in Batch processing
Broader Impact
First, we can compare the performance between Storm and Hadoop in Batch processing. Then we hope to find or develop a 'real-time' version hadoop, and test on real-time processing with Storm. The results can be very useful.
Use of FutureGrid
- Deploy Apache Storm and Hadoop on FutureGrid
- Test some basic testing projects on both systems such as WordCound, BLAST and PageRank
- Test real-time data project such asTwitter tweet data
Scale Of Use
Not sure yet, we need to figure out how to deploy Storm first. I think we will need several physical maches and run computations for few hours several times a week in the next two months.