Use Hadoop to find popular words of open source Java codes

Abstract

Open source projects contribute so many free codes. Some functional-similar codes might share common “codes” or “words”, such as class name, variable name, and method name. Based on those common features, we might help programmer to find out other’s most-related codes for reference. Find similar codes might be difficult; find common words is easier. This project is to find out the popular words used in open source Java projects as a first step for above object. We use Hadoop to do the word counting job since it is convenient and could easily scale to bigger data set.

Intellectual Merit

Until now we have no Intellectual Merit, since this is a course project.

Broader Impact

This project might help programmer to find out functional related open source Java codes, which will convenient their coding and debugging.

Use of FutureGrid

Since FutureGrid provide myHadoop which is a good way to map and reduce, so I will apply for around 20 nodes in FutureGrid as the project platform and run myHadoop on it.

Scale Of Use

I need apply for around 20 VMS for this purpose.

Publications


FG-400
Lin Liu
university of nebraska-lincoln
Active

Timeline

43 weeks 5 days ago