Support multiple dimensional index in HDFS on the FutureGrid platform

Project Information

Discipline
Please Select... 
Orientation
Research 
Abstract

MapReduce is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms. Hadoop is an open source frame that implements the MapReduce paradigm. Hadoop File System (HDFS) currently use pair as index for data processing. The multiple requirements to process large voluminous high-dimensional data sets have emerged in many different application domains, such as Geographic Information System (GIS). In such systems there are complex data processing, such as range query, nearest neighbor query, and distance join query. Hierarchical index structures, like R-tree, is used for organize multi-dimensional data. This independent study develops a research strategy how to support R-tree liked multi-dimensional index in HDFS. It is planned to developed a HDFS upperware that can map R-tree index to pair index and efficiently handle complex data processing requirements.

Intellectual Merit

I will update it later.

Broader Impacts

This is an independent study project in Indiana University. Need to devlop upper ware for HDFS to support multi dimensional data.

Project Contact

Project Lead
Abhijeet Kodgire (akodgire) 
Project Manager
Abhijeet Kodgire (akodgire) 
Project Members
Lizhe Wang, Gregor von Laszewski  

Resource Requirements

Hardware System
  • india (IBM iDataPlex at IU)
 
Use of FutureGrid

Yes.

Scale of Use

2-3 days a week

Project Timeline

Submitted
02/03/2011 - 10:35 
Completed
05/23/2013