Course: UCF EEL6938 Data-intensive computing and Cloud Class

Project Information

Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 
Orientation
Education 
Abstract

Using large-scale computing systems to solve data-intensive real-world problems has become indispensable for many scientific and engineering disciplines. This course provides a broad introduction to the fundamentals in data intensive computing and its enabling systems architectures such as MapReduce, cloud computing and storage, with a focus on system architecture, middleware and building blocks, programming models, algorithmic design, and application development. Selected scientific applications will be used as case studies.

Intellectual Merit

Data-intensive computing and cloud computing have become important forms of computing, and both appear poised to grow into dominant roles. Data-intensive computing (DISC) refers to analysis and information extraction from large and sometimes dynamic data corpi. Cloud computing refers to shared (multi-tenant) use of third-party computing and storage resources (and sometimes software setups) in place of dedicated resources. Each is interesting on its own, and the use of cloud computing for data-intensive computing is both inevitable and critical. In this course, we will explore the state-of-the-art and research directions relating to data-intensive computing and cloud computing. Included in this scope will be case studies of existing systems, compute and storage architectures, programming models, middleware and building blocks, and administration/automation. In our discussions, we will explore various metrics of goodness for alternate approaches, including efficiency, performance, robustness, complexity, ease-of-use, and so on.

Broader Impacts

This class project will have broad impact in the data intensive computing and HPC community by delivering a new cost-effective, fault-tolerant and scalable solution for Cloud and HPC data analytics. We believe it may form the basis of a new generation of HPC systems meeting the demands of emerging data-intensive analytics. We will disseminate our findings through new data-intensive HPC curricula development, publication and outreach activities to K-12 school students and teachers. We will train both graduate and undergraduate students and place these students in internships that aid in transferring our results to the data intensive computing and HPC community. This will prepare a new work force for data intensive computing and HPC.

Project Contact

Project Lead
Jun Wang (wangjun) 
Project Manager
Jun Wang (wangjun) 
Project Members
Francis Luna, Junyao Zhang, Anthony Wertz, Jie Chen, Yuyan Bao, Juan Carcheri, Steven Zittrower, Lauren Ball, Qiangju Xiao, liuva mendez, Abdullah Mahmud  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
  • bravo (large memory machine at IU)
  • Network Impairment Device
 
Use of FutureGrid

We will use FutureGrid resources as a testbed for course term projects.

Scale of Use

We have about 16 students in this class. Students may need one upto 50-node Hadoop cluster as their testbeds.

Project Timeline

Submitted
02/21/2012 - 09:34 
Completed
07/10/2013