FG-132
Adaptive Energy Forecasting and Information Diffusion for Smart Power Grids
Scalable Regression Tree Learning on Hadoop using OpenPlanet
Large scale data analytics
Project Details
- Project Lead
- Yogesh Simmhan
- Project Manager
- Yogesh Simmhan
- Project Members
- Alok Gautam Kumbhare, Charith Wickramaarachchi, Nam Ma, Hsuan-Yi Chu
- Supporting Experts
- Bingjing Zhang
- Institution
- University of Southern California, Computer Engineering Division
- Discipline
- Electrical and Related Engineering (106)
- Subdiscipline
- 11.04 Information Sciences and Systems
Abstract
The pervasive deployment of environmental sensors and instruments that monitor natural and human activities are leading to the generation of large data sets at fine granularities of time and space. Science and eEngineering applications can shift from a modeling and empirical testing of hypothesis approach to defining predictive models based on current and historical information. Such data analytics applications for eScience and eEngineering leverage data mining and machine learning methods to analyze large scale information to support research, development and even operations. However, such applications are data and compute intensive, and the changing nature of data requires them to run often. In this project, we propose to develop and scale machine learning algorithms onto elastic Cloud infrastructure to build predictive models of power forecast in smart electricity grids. These models can subsequently be used to make realtime predictions of energy usage at campus and city scales for energy conservation and planning. Programming models such as Map-Reduce/Hadoop and DAGs will be used to describe these applications and execute them on public Cloud platforms such Eucalyptus to evaluate their efficacy for both static and streaming datasets.
Intellectual Merit
Large scale data mining and machine learning are compute and data intensive but are less studied for executing on distributed systems, limiting users to run them run on smaller samples of dataset to fit single machines even though larger datasets are available. Most algorithms that are ported to the Cloud are inherently loosely coupled, but several commonly used modelling techniques are not naively loosely-coupled. We will study scalable machine learning models for classification, such as regression trees and ANN, that use novel algorithms or mapping techniques for scalable execution on the Cloud.
Broader Impacts
The result of our work will allow for a broader and more effective use of data mining for eScience and eEngineering. We will apply the tools and algorithms we develop to the smart power grid domain for energy use forecasting, but the machine learning algorithms will themselves be generally applicable. All our research and development will be publicly available and the research results published in workshops and conferences for access by the broader community.
Scale of Use
We expect to use a few VMs for regular (daily) experiments and 100's of VMs for testing scalability once a week.