Course: UCF EEL6938 Data-intensive computing and Cloud Class

Abstract

Using large-scale computing systems to solve data-intensive real-world problems has become indispensable for many scientific and engineering disciplines. This course provides a broad introduction to the fundamentals in data intensive computing and its enabling systems architectures such as MapReduce, cloud computing and storage, with a focus on system architecture, middleware and building blocks, programming models, algorithmic design, and application development. Selected scientific applications will be used as case studies.

Intellectual Merit

Data-intensive computing and cloud computing have become important forms of computing, and both appear poised to grow into dominant roles. Data-intensive computing (DISC) refers to analysis and information extraction from large and sometimes dynamic data corpi. Cloud computing refers to shared (multi-tenant) use of third-party computing and storage resources (and sometimes software setups) in place of dedicated resources. Each is interesting on its own, and the use of cloud computing for data-intensive computing is both inevitable and critical.

In this course, we will explore the state-of-the-art and research directions relating to data-intensive computing and cloud computing. Included in this scope will be case studies of existing systems, compute and storage architectures, programming models, middleware and building blocks, and administration/automation. In our discussions, we will explore various metrics of goodness for alternate approaches, including efficiency, performance, robustness, complexity, ease-of-use, and so on.

Broader Impact

This class project will have broad impact in the data intensive computing and HPC community by delivering a new cost-effective, fault-tolerant and scalable solution for Cloud and HPC data analytics. We believe it may form the basis of a new generation of HPC systems meeting the demands of emerging data-intensive analytics. We will disseminate our findings through new data-intensive HPC curricula development, publication and outreach activities to K-12 school students and teachers. We will train both graduate and undergraduate students and place these students in internships that aid in transferring our results to the data intensive computing and HPC community. This will prepare a new work force for data intensive computing and HPC.

Use of FutureGrid

We will use FutureGrid resources as a testbed for course term projects.

Scale Of Use

We have about 16 students in this class. Students may need one upto 50-node Hadoop cluster as their testbeds.

Publications


FG-191
Jun Wang
University of Central Florida
Closed

Project Members

Abdullah Mahmud
Anthony Wertz
Francis Luna
Jie Chen
Juan Carcheri
Junyao Zhang
Lauren Ball
liuva mendez
Qiangju Xiao
Steven Zittrower
Yuyan Bao

FutureGrid Experts

Tak-Lon Wu