Laboratory for Cosmological Data Mining
Project Information
- Discipline
- Astronomy (201)
- Subdiscipline
- 40.03 Astrophysics
- Orientation
- Research
We will evaluate the use of Hadoop and cloud computing in general to the task of large scale cosmological data mining. Specifically we will explore the use of Mahout classification an clustering codes to determine source classifications and distance estimate for objects detected in large photometric surveys. We also will explore the development of specific clustering measurement codes, such as the two-point correlation function to the Hadoop Map-Rduce framework. We also will look to push the machine learning tasks to the calibrated image data themselves, in order to obtain more accurate classifications.
Intellectual MeritOur project will explore large scale data mining on the future grid system. Most algorithms that we will use are not traditional map-reduce tasks, thus we will help develop the cloud computing approach to general purpose data mining. In addition, our image data mining will help lead the way for other researchers who need to perform bulk image analysis and mining.
Broader ImpactsBeyond guiding others in our field and outside our field who may be interested in our data mining efforts, we will also be teaching students in our research group how to use future grid and Hadoop as well as specific data mining algorithms and implementations as exist in Mahout.
Project Contact
- Project Lead
- Robert Brunner (bigdog)
- Project Manager
- Robert Brunner (bigdog)
- Project Members
- Edward Kim, Robert Santucci, Fanshi Liu, Nick Ciaglia
Resource Requirements
- Hardware System
-
- Not sure
Want to test deployment of data mining virtual machines and the use of mahout on hadoop.
Scale of UseWe wish to scale as large as possible. Will try mahout on GPUs if deemed feasible as well.
Project Timeline
- Submitted
- 11/22/2013 - 11:03