Course: Big Data Open Source Software and Projects (Data Science Curriculum)
Abstract
This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~150 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/.
We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack).
The course covers the following material
a) The cloud computing architecture underlying ABDS and
contrast of this with HPC.
b) The software architecture with its different layers at http://hpc-abds.org/kaleidoscope/ covering broad functionality and rationale for each layer.
c) We will give application examples
d) Then we will go through selected software systems – about 10% of those in the Kaleidoscope which have been already deployed on FutureGrid systems using OpenStack and Chef recipes.
e) Students will chose one other open source member of Kaleidoscope each and deploy as in d).
f) The main activity of the course will be building a significant project using multiple HPC-ABDS subsystems combined with user code and data.
g) Teams of up to 3 students can be formed with corresponding increase in scope in activities e), f)
Intellectual Merit
One of main data science classes being offerred for first time Fall 2014 with online and residential sections
Broader Impact
Our MOOC style ensures broad impact
Use of FutureGrid
For student projects
Scale Of Use
Modest as class