CCTools Scalability Testing

Abstract

This FG allocation will enable extended scalability and correctness testing of the Cooperative Computing Tools, a software project supported by the NSF SI2 program. The CCTools software enable non-privileged users to harness hundreds to thousands of cores from multiple clusters, clouds, and grids simultaneously. The main components of the software package include Parrot, a virtual file system that interfaces with multiple distributed storage systems, and Makeflow, a workflow engine that interfaces with multiple computing systems. Using existing services (such as the NMI Build and Test Lab) we are currently able to perform basic verification of portability across operating systems. However, full functionality testing requires regular access to a reproducible distributed system to verify, e.g., that the software can achieve the desired throughput at the scale of 1000 cores. Using FG, we will establish a distributed testing methodology to obtain rigorous quality control in our software development process.

Intellectual Merit

To our knowledge, there is no well-established methodology -- much less software -- for evaluating the correctness of distributed systems at scale in a continuous integration environment. This project will break new ground in the distributed testing and evaluation of complex software.

Broader Impact

This FG allocation will enhance the impact of an existing NSF award, which supports a variety of high impact scientific applications in fields such as bioinformatics, biometrics, data mining, high energy physics, and molecular dynamics. Users of these applications run on a wide variety of infrastructure, ranging from national scale (XSEDE and OSG) to local private clusters.

Use of FutureGrid

We will develop a framework for connecting our continuous integration environment to Future Grid, so that key software builds can be automatically dispatched and evaluated at the scale of 100s to 1000s of nodes.

Scale Of Use

For continuous build activities:
Up to 10 VMs continuously.

For distributed scalability and correctness testing:
Burst to hundreds of VMs for a day every few weeks.
Burst to thousands of VMs for a few days several times a year.

Publications


FG-234
Douglas Thain
Dinesh Rajan Pandiarajan
University of Notre Dame
Active

Project Members

Benjamin Tovar
Casey Robinson
Chris Bauschka
Dinesh Rajan Pandiarajan
Iheanyi Ekechukwu
Joe Fetsch
Kyle Mulholland
Li Yu
Michael Albrecht
Nate Wickham
Nicholas Hazekamp
Nick Jaeger
Patrick Donnelly
Peter Sempolinski
Rob Wirthman

FutureGrid Experts

Gregor von Laszewski