HBase Application and Investigation
Project Information
- Discipline
- Computer Science (401)
- Subdiscipline
- 11.04 Information Sciences and Systems
- Orientation
- Research
HBase is the Hadoop implementation of the BigTable system presented by Google. It is designed to store and server a huge amount of data, which is organized in a non-relational data model, in a reliable and efficient manner. HBase has been released for a while but not much research work has been done in terms of applying it in scientific data storage or investigating its performance in supporting scientific computing. In this project, we will apply a distributed HBase deployment to store the metadata and data of a digital library system, and investigate its performance and related issues such as data locality, indexing, and load balance in supporting a search-oriented application as well as some data mining jobs.
Intellectual MeritWe will experiment the application of HBase in data intensive problems and use our experience to try to improve it. We will investigate indexing mechanisms for HBase type of storage solutions.
Broader ImpactsProvide insight to the indexing of non-relational databases.
Project Contact
- Project Lead
- Judy Qiu (xqiu)
- Project Manager
- Xiaoming Gao (gaoxm)
- Project Members
- Xiaoming Gao, Evan Roth, Pavan Venkatramanachar, Rohit Khapare, Amey Jahagirdar
Resource Requirements
- Hardware Systems
-
- alamo (Dell optiplex at TACC)
- hotel (IBM iDataPlex at U Chicago)
We will use some physical nodes in FutureGrid to build a stable Hadoop cluster where HBase and our application will be running.
Scale of Use5 or 6 physical nodes.
Project Timeline
- Submitted
- 06/20/2011 - 19:21