Course: Big Data Open Source Software and Projects (Data Science Curriculum)

Project Information

Discipline
Computer Science (401) 
Orientation
Education 
Abstract

This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~150 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/.
We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).

The course covers the following material
a) The cloud computing architecture underlying ABDS and
contrast of this with HPC.
b) The software architecture with its different layers at http://hpc-abds.org/kaleidoscope/ covering broad functionality and rationale for each layer.
c) We will give application examples
d) Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureGrid systems using OpenStack and Chef recipes.
e) Students will chose one other open source member of Kaleidoscope each and deploy as in d).
f) The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
g) Teams of up to 3 students can be formed with corresponding increase in scope in activities e), f)

Intellectual Merit

One of main data science classes being offerred for first time Fall 2014 with online and residential sections

Broader Impacts

Our MOOC style ensures broad impact

Project Contact

Project Lead
Geoffrey Fox (gcf) 
Project Manager
Sidd Maini (sidds1601) 
Project Members
Fugang Wang, Sidd Maini, Gregor von Laszewski, Scott McCaulay, Abhik Seal, Anesu Chaora, Sriram Pulipaka, Fazle Rabbi, Naveen Madhire, Aravindh Varadharaju, Harsh Seth, Rakesh Menon, Rahul Singhania, Amritanshu Joshi, satwik narlanka, William k., Priyank Kabaria, Karthik Mohandas Bangera, Pushkar Raj, Ian Wood, Siddhardha Raju Mandapati, Hyungro Lee, Yukai Xiao  

Resource Requirements

Hardware System
  • india (IBM iDataPlex at IU)
 
Use of FutureGrid

For student projects

Scale of Use

Modest as class

Project Timeline

Submitted
09/11/2014 - 14:13