Development of an Index File System to Support Geoscience Data with Hadoop

Abstract

Background - Science Geographic information system (GIS) is a system that captures, stores, analyzes, manages, and presents data that are linked to location(s) [1]. GIS analysis and simulation are used to investigate and understand the environment around us. The amount of data to be processed increases as populations grow, communities become more complex, or the size of the processing area grows. As the data grows, so does the processing time required to perform simulation and analysis processes. Furthermore, the analysis for complex problems, such as research on climate change [2], uses more than one dataset, thus increasing the computational requirements even more. Background - Technology * MapReduce [3] is a patented software framework introduced by Google to support distributed computing on large data sets on clusters of computers. The framework is inspired by the map and reduce functions commonly used in functional programming, although their purpose in the MapReduce framework is not the same as their original form. * Hadoop [4] is an open source frame that implements the MapReduce paradigm. In this independent study, it is planned to develop an indexed file system that stores Geoscience data, for example, a file that stores Geoscience data file name and location. This will be used to generate the <key, value=""> pair and later for parallel GIS operations with the Hadoop framework. It is planned to use open Source GIS software, such GRASS [5] or PostGIS [6] for development. The FutureGrid project [7] is a project to develop a high-performance grid test bed that will allow scientists to collaboratively develop and test innovative approaches to parallel, grid, and cloud computing. This independent study will use FutureGrid as development platform.</key,>

Intellectual Merit

(provided later)

Broader Impact

This is also an independent study project at Indiana University.

Use of FutureGrid

Futuregrid will be used as a development Platform

Scale Of Use

Scale of use will be frequent like 5 days a week

Publications


FG-84
Sonali Karwa
Indiana University
Closed

Project Members

Geoffrey Fox
Gregor von Laszewski
Lizhe Wang

Keywords

Timeline

1 year 20 weeks ago
1 year 20 weeks ago