CCTools Scalability Testing

Abstract

This FG allocation will enable extended scalability and correctness testing of the Cooperative Computing Tools, a software project supported by the NSF SI2 program. The CCTools software enable non-privileged users to harness hundreds to thousands of cores from multiple clusters, clouds, and grids simultaneously. The main components of the software package include Parrot, a virtual file system that interfaces with multiple distributed storage systems, and Makeflow, a workflow engine that interfaces with multiple computing systems. Using existing services (such as the NMI Build and Test Lab) we are currently able to perform basic verification of portability across operating systems. However, full functionality testing requires regular access to a reproducible distributed system to verify, e.g., that the software can achieve the desired throughput at the scale of 1000 cores. Using FG, we will establish a distributed testing methodology to obtain rigorous quality control in our software development process.

Intellectual Merit

To our knowledge, there is no well-established methodology -- much less software -- for evaluating the correctness of distributed systems at scale in a continuous integration environment. This project will break new ground in the distributed testing and evaluation of complex software.

Broader Impact

This FG allocation will enhance the impact of an existing NSF award, which supports a variety of high impact scientific applications in fields such as bioinformatics, biometrics, data mining, high energy physics, and molecular dynamics. Users of these applications run on a wide variety of infrastructure, ranging from national scale (XSEDE and OSG) to local private clusters.

Use of FutureGrid

We will develop a framework for connecting our continuous integration environment to Future Grid, so that key software builds can be automatically dispatched and evaluated at the scale of 100s to 1000s of nodes.

Scale Of Use

For continuous build activities:
Up to 10 VMs continuously.

For distributed scalability and correctness testing:
Burst to hundreds of VMs for a day every few weeks.
Burst to thousands of VMs for a few days several times a year.

Publications

Project Number: FG-234

Project Lead: Douglas Thain

Project Manager: Dinesh Rajan Pandiarajan

Institution: University of Notre Dame

Project Status: Active

View Project Details

Project Members

Benjamin Tovar

Casey Robinson

Chris Bauschka

Dinesh Rajan Pandiarajan

Iheanyi Ekechukwu

Joe Fetsch

Kyle Mulholland

Li Yu

Michael Albrecht

Nate Wickham

Nicholas Hazekamp

Nick Jaeger

Patrick Donnelly

Peter Sempolinski

Rob Wirthman

FutureGrid Experts

Gregor von Laszewski

Keywords

cctools, chirp, makeflow, parrot, software testing, work queue