Please use IU Canvas for submission of some assignments and checking your grades.
We will use FutureSystems (previously FutureGrid) facilities and cloud computing experience is helpful but not essential. Good working experience with Java is required.
Geoffrey Charles Fox
Senior Associate Dean for Research
Distinguished Professor of Physics,
Computer Science and Informatics
This course studies software used in many commercial activities to study Big Data. The backdrop for course is the ~300 software subsystems illustrated at http://hpc-abds.org/kaleidoscope/ (Links to an external site.). We will describe the software architecture represented by this collection which we term HPC-ABDS (High Performance Computing - enhanced Apache Big Data Stack).
This course will be incremented with new material. A full set of course material from previous Fall 2014 offering of the class is accessible here.
In this week, you will learn how to gain access to the FutureSystems resources. Some of the lessons have been prepared for the beginners to help understand the basics of Linux operating systems and the collaboration tools i.e. github, google hangout and remote desktop. Please watch video lessons and read online materials on this page. It also covers Unix shell scripting, SSH and other utilities with various exercises.
In this week, you will learn about OpenStack ad Public Clouds. OpenStack is a open-source cloud computing software platform and a community-driven project. You can use OpenStack to build a cloud infrastructure in your public or private network, or you can simply use cloud software for your services. The lessons in this week are specifically prepared to try OpenStack Software and give you the confidence and understanding of using IaaS cloud platforms. There are tutorial lessons to explore OpenStack web dashboard (Horizon) and compute engine (Nova) including Public Clouds e.g. Amazon EC2 or Microsoft Azure.
In this week, you will learn about Cloudmesh which is a cloud resource management software written in Python. It automates launching multiple VM instances across different cloud platforms including Amazon EC2, Microsoft Azure Virtual Machine, HP Cloud, OpenStack, and Eucalyptus. The web interface of Cloudmesh help users and administrators manage entire cloud resources. Most cutting-edge technologies such as Apache LibCloud, Celery, IPython, Flask, Fabric, Docopt, YAML, MongoDB, and Sphinx are applied to enhance Web Service, Command Line Tools and Rest APIs.
In this week, you will learn about open-source configuration management (CM) software as part of IT automation and orchestration. We focus on Ansible and OpenStack Heat to review of system configuration and management but Salt, Puppet, Chef, and Juju are introduced to explore other tools as well. With different features of these software, you will see which tool is ideal for your system environment and understand basic CM techniques. We have a few lab sessions to provide hands-on experience about deploying and configuring applications on IT infrastructure.
This week, you will learn basics of virtual clusters. Typically, analyzing large data sets containing unstructured data types requires distributed computing resources for data processing with high performance, scalability, and availability. With virtualization technology, cluster computing can be more flexible, effective and cost-efficient in terms of resource utilization. There are three basic tutorials about deploying a virtual cluster, Hadoop cluster and MongoDB Sharded cluster which give you a chance to gain some experience of how to setup virtual clusters manually and configure software with Cloudmesh.
This consists of the ~300 technologies in HPC-ABDS Described with roughly one page per technology. It is divided into several lessons in separate PowerPoints organized by layers of HPC-ABDS technology.