Support multiple dimensional index in HDFS on the FutureGrid platform

Abstract

MapReduce is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original forms. Hadoop is an open source frame that implements the MapReduce paradigm. Hadoop File System (HDFS) currently use pair as index for data processing. The multiple requirements to process large voluminous high-dimensional data sets have emerged in many different application domains, such as Geographic Information System (GIS). In such systems there are complex data processing, such as range query, nearest neighbor query, and distance join query. Hierarchical index structures, like R-tree, is used for organize multi-dimensional data. This independent study develops a research strategy how to support R-tree liked multi-dimensional index in HDFS. It is planned to developed a HDFS upperware that can map R-tree index to pair index and efficiently handle complex data processing requirements.

Intellectual Merit

I will update it later.

Broader Impact

This is an independent study project in Indiana University. Need to devlop upper ware for HDFS to support multi dimensional data.

Use of FutureGrid

Yes.

Scale Of Use

2-3 days a week

Publications


FG-89
Abhijeet Kodgire
Indiana University
Closed

Project Members

Gregor von Laszewski
Lizhe Wang

Timeline

1 year 20 weeks ago
1 year 20 weeks ago