Development of an Index File System to Support Geoscience Data with Hadoop

Project Information

Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 
Orientation
Research 
Abstract

Background - Science Geographic information system (GIS) is a system that captures, stores, analyzes, manages, and presents data that are linked to location(s) [1]. GIS analysis and simulation are used to investigate and understand the environment around us. The amount of data to be processed increases as populations grow, communities become more complex, or the size of the processing area grows. As the data grows, so does the processing time required to perform simulation and analysis processes. Furthermore, the analysis for complex problems, such as research on climate change [2], uses more than one dataset, thus increasing the computational requirements even more. Background - Technology * MapReduce [3] is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original form. * Hadoop [4] is an open source frame that implements the MapReduce paradigm. In this independent study, it is planned to develop an indexed file system that stores Geoscience data, for example, a file that stores Geoscience data file name and location. This will be used to generate the pair and later for parallel GIS operations with the Hadoop framework. It is planned to use open Source GIS software, such GRASS [5] or PostGIS [6] for development. The FutureGrid project [7] is a project to develop a high-performance grid test bed that will allow scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing. This independent study will use FutureGrid as development platform.

Intellectual Merit

(provided later)

Broader Impacts

This is also an independent study project at Indiana University.

Project Contact

Project Lead
Sonali Karwa (skarwa) 
Project Manager
Sonali Karwa (skarwa) 
Project Members
Lizhe Wang, Gregor von Laszewski, Geoffrey Fox  

Resource Requirements

Hardware System
  • india (IBM iDataPlex at IU)
 
Use of FutureGrid

Futuregrid will be used as a development Platform

Scale of Use

Scale of use will be frequent like 5 days a week

Project Timeline

Submitted
01/20/2011 - 09:48 
Completed
05/23/2013