Course: UCF EEL6938 Data-intensive computing and Cloud Class
Project Information
- Discipline
- Computer Science (401)
- Subdiscipline
- 14.09 Computer Engineering
- Orientation
- Education
Using large-scale computing systems to solve data-intensive real-world problems has become indispensable for many scientific and engineering disciplines. This course provides a broad introduction to the fundamentals in data intensive computing and its enabling systems architectures such as MapReduce, cloud computing and storage, with a focus on system architecture, middleware and building blocks, programming models, algorithmic design, and application development. Selected scientific applications will be used as case studies.
Intellectual MeritData-intensive computing and cloud computing have become important forms of computing, and both appear poised to grow into dominant roles. Data-intensive computing (DISC) refers to analysis and information extraction from large and sometimes dynamic data corpi. Cloud computing refers to shared (multi-tenant) use of third-party computing and storage resources (and sometimes software setups) in place of dedicated resources. Each is interesting on its own, and the use of cloud computing for data-intensive computing is both inevitable and critical. In this course, we will explore the state-of-the-art and research directions relating to data-intensive computing and cloud computing. Included in this scope will be case studies of existing systems, compute and storage architectures, programming models, middleware and building blocks, and administration/automation. In our discussions, we will explore various metrics of goodness for alternate approaches, including efficiency, performance, robustness, complexity, ease-of-use, and so on.
Broader ImpactsThis class project will have broad impact in the data intensive computing and HPC community by delivering a new cost-effective, fault-tolerant and scalable solution for Cloud and HPC data analytics. We believe it may form the basis of a new generation of HPC systems meeting the demands of emerging data-intensive analytics. We will disseminate our findings through new data-intensive HPC curricula development, publication and outreach activities to K-12 school students and teachers. We will train both graduate and undergraduate students and place these students in internships that aid in transferring our results to the data intensive computing and HPC community. This will prepare a new work force for data intensive computing and HPC.
Project Contact
- Project Lead
- Jun Wang (wangjun)
- Project Manager
- Jun Wang (wangjun)
- Project Members
- Francis Luna, Junyao Zhang, Anthony Wertz, Jie Chen, Yuyan Bao, Juan Carcheri, Steven Zittrower, Lauren Ball, Qiangju Xiao, liuva mendez, Abdullah Mahmud
Resource Requirements
- Hardware Systems
-
- alamo (Dell optiplex at TACC)
- foxtrot (IBM iDataPlex at UF)
- hotel (IBM iDataPlex at U Chicago)
- india (IBM iDataPlex at IU)
- sierra (IBM iDataPlex at SDSC)
- xray (Cray XM5 at IU)
- bravo (large memory machine at IU)
- Network Impairment Device
We will use FutureGrid resources as a testbed for course term projects.
Scale of UseWe have about 16 students in this class. Students may need one upto 50-node Hadoop cluster as their testbeds.
Project Timeline
- Submitted
- 02/21/2012 - 09:34
- Completed
- 07/10/2013