STAMPEDE

Project Information

Discipline
Computer Science (401) 
Subdiscipline
30.06 Systems Science 
Orientation
Research 
Abstract

Large-scale applications today make use of distributed resources to support computations and as part of their execution, generate large amounts of log information. Up to now, we have been using the Netlogger analysis tools to perform off-line log analysis. Stampede extends the current offline workflow log analysis capability and develops a comprehensive middleware solution that will allow users of complex scientific applications to track the status of their jobs in real time, to detect execution anomalies automatically, and to perform on-line troubleshooting without logging in to remote nodes or searching through thousands of log files.

Intellectual Merit

The system will be able to capture application-level logs from jobs as they are executing on the cyberinfrastructure. At the same time, it will also collect log information from the underlying cyberinfrastructure services, such as resource management and data transfer. These end-to-end logs will be combined and brokered through a subscription interface. External components will use the subscription interface to provide monitoring services.

Broader Impacts

We build on an important class of applications, scientific workflows, that are being used today in a number of scientific disciplines including astronomy, biology, ecology, earthquake science, gravitational-wave physics, and many others that are running on today's large-scale infrastructure such as the OSG or the TeraGrid. This solution will be modular and distributed, and reusable across a broad class of applications and workflow systems.

Project Contact

Project Lead
Dan Gunter (dang) 
Project Manager
Dan Gunter (dang) 
Project Members
Gaurang Mehta, Taghrid Samak, Ahmed El-Hassany, Karan Vahi  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
  • Network Impairment Device
 
Use of FutureGrid

Large-scale workflow experiments with induced failures.

Scale of Use

From one to hundreds of VMs for hours at a time see also http://pegasus.isi.edu/projects/stampede

Project Timeline

Submitted
12/15/2011 - 10:31