GridProphet, A workflow execution time prediction system for the Grid

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

Workflow applications have provided a dynamic and heterogeneous paradigm for execution of computationally large experiments on the Grid and have greatly increased the pace of scientific work. Through their distributed task based execution mechanism, they have eliminated the need for resource homogeneity. A Grid workflow application represents a collection of computational tasks (activities) interconnected in a directed graph through control and data flow dependencies that are suitable for execution on the Grid. The complexity of the workflows has increased over the years with the increasing complexity of scientific applications. A common measure for the performance of scientific workflow applications is the total execution time needed to finish the entire workflow. The objective of this project is to develop a grid performance prediction system, which can estimate the execution time of individual workflow tasks, single-entry-single-exit sub-workflows (e.g. loops), and entire workflows for scientific applications such that the prediction technology can be used to rank different workflow transformations or workflow versions with respect to their execution time behavior. The proposed system can be used for optimization of workflow applications, thus enabling scientists to better utilize computing resources and reach their scientific results in shorter time.

Intellectual Merit

The intellectual merits of this research lie in the following contributions to the fields of scientific workflows and Grid computing: The development of a prediction model based on advanced statistical techniques and machine learning methods to support : 1. The modeling of execution behaviour of highly distributed grid workflows, 2. The development of an execution trace collection and performance prediction system for Grid workflow execution environments. 3. Querying and utilization of historical trace data on-the-fly for accurate prediction of grid workflow execution time using machine learning based prediction system.

Broader Impacts

The success of this project will provide a general-purpose tool for execution time prediction for Grid workflow and will help the Grid users for efficient grid resource utilization. The tool would be customizable for use with other Grid workflow systems as well.

Project Contact

Project Lead
Thomas Fahringer (tfahringer) 
Project Manager
Thomas Fahringer (tfahringer) 
Project Members
Muhammad Junaid Malik, Juan J. Durillo, Simon Ostermann  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
  • bravo (large memory machine at IU)
  • I don't care (what I really need is a software environment and I don't care where it runs)
 
Use of FutureGrid

The primary use of the FutureGrid infrastructure is to execute large number of Grid workflow applications and collect the execution trace for these execution to be fed to the machine learning system for training the machine learning. If the required software deployments are availble the existing Globus based grid infrastructure will be used after deploying necessary software and required services into the Globus Grid server. Or we will deploy customized OS images on multiple VMs for creating a dedicated grid environment to execute our experiments.

Scale of Use

The project will complete in two phases. The first phase of the project is meant for collecting execution traces of grid workflows by executing large number of Grid workflows. This calls for an extensive resource reservation and utilization. The planned resource usage during this phase will start with around 128 core grid setup and will gradually increase upto 2048 cores or more if available. In the second phased the collected trace data will be used for development of the machine learning based prediction system. In this phase a small set of FutureGrid resources will be used but for longer periods, as demanded by the machine learning algorithm for the training phase.

Project Timeline

Submitted
11/24/2011 - 09:21