FG-130

Optimizing Scientific Workflows on Clouds

Fault Tolerant Clustering in Scientific Workflows

[fg-2636] Chen, W., and E. Deelman, "Fault Tolerant Clustering in Scientific Workflows", IEEE International Workshop on Scientific Workflows (SWF), in conjunction with 8th IEEE World Congress on Servicess, Honolulu, Hawaii, Jun 2012.

Imbalance Optimization in Scientific Workflows

[fg-2589] Chen, W., E. Deelman, and R. Sakellariou, "Imbalance Optimization in Scientific Workflows", International Conference on Supercomputing (ICS 2013) , Eugene, Oregon, Jun 10-14, 2013.

Balanced Task Clustering in Scientific Workflows

[fg-2588] Chen, W., R. F. da Silva, E. Deelman, and R. Sakellariou, "Balanced Task Clustering in Scientific Workflows", The 9th IEEE International Conference on e-Science (eScience 2013), to appear, Beijing, China, Oct 23-25, 2013, Submitted.

Optimizing Scientific Workflows on Clouds

Project Details

Project Lead
Weiwei Chen 
Project Manager
Weiwei Chen 
Project Members
Craig Ward, David Smith, soma prathibha, Jia Li  
Supporting Experts
Tak-Lon Wu, Saliya Ekanayake  
Institution
University of Southern California, Information Sciences Institute  
Discipline
Computer Science (401) 
Subdiscipline
11.04 Information Sciences and Systems 

Abstract

This project aims to run scientific workflows on clouds and attempts to optimize the performance with many attractive features, such as virtualization and on-demand provisioning. We plan to examine several benchmark workflows such as Montage (an astronomy application), Epigenomics ( a pipeline workflow) and CyberShake ( a seismographic application). This project also aims to integrate the Pegasus Workflow Management System and Virtual Infrastructure System.

Intellectual Merit

This project aims to address a newly emerging problem: how to improve the performance of scientific workflows on the popular cloud platforms? Scientists are considering to migrate the workflow execution environment from their own infrastructure to a more cost-effective platform. The challenge is to create a virtualization system that seamless integrates the workflow management system and execution engine. The team is well prepared to undertake these challenges, with strong experience in data intensive workflows, data placement services, dynamic virtual machine provisioning, grid computing and other past projects.

Broader Impacts

This project aims to enhance the understanding of scientific workflows and cloud computing. Based on this, the team members would give science and engineering presentations to the community and participate in multi- and interdisciplinary conferences, workshops and other research activities.

Scale of Use

Every experiment would require about 32 VMs for a few days.

Results

We have two on-going projects that have utilized resources provided by FutureGrid.
The first project aims to address the problem of scheduling large workflows onto multiple execution sites with storage constraints. Three heuristics are proposed to first partition the workflow into sub-workflows and then schedule to the optimal execution sites. In our experiments, we deployed multiple clusters with Eucalyptus and up to 32 virtual machines. Each execution site contains a Condor pool and a head node visible to the network. The performance with three real-world workflows shows that our approach is able to satisfy storage constraints and improve the overall runtime by up to 48% over a default whole-workflow scheduling. A paper [1] has been accepted based on this work.
The second project aims to identify the different overheads in workflow execution and to evaluate how optimization methods help reduce overheads and improve runtime performance. In this project, we present the workflow overhead analysis for our runs in FutureGrid deployed with Eucalyptus. We present the overhead distribution and conclude that the overheads satisfy an exponential or uniform distribution. We compared three metrics to calculate the cumulative sum of overhead considering the overlap between overheads. In addition, we indicated how experimental parameters impact the overhead and thereby the overall performance,. We then showed an integrated view over the overheads help us understand the performance of optimization methods better. A paper [2] based on this work has been accepted. In the future, we plan to evaluate the effectiveness of our approach with additional optimization methods. Additionally, our current work is based on static provisioning and we plan to analyze the performance along with dynamic provisioning.
Furthermore, we have developed a workflow simulator called WorkflowSim [5] based on the traces collected from experiments that were run on FutureGrid.

Reference:
[1] Partitioning and Scheduling Workflows across Multiple Sites with Storage Constraints, Weiwei Chen, Ewa Deelman, 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Poland, Sep 2011
[2] Workflow Overhead Analysis and Optimizations, Weiwei Chen, Ewa Deelman, The 6th Workshop on Workflows in Support of Large-Scale Science, in conjunction with Supercomputing 2011, Seattle, Nov 2011
[3] FutureGrid - a reconfigurable testbed for Cloud, HPC and Grid Computing Geoffrey C. Fox, Gregor von Laszewski, Javier Diaz, Kate Keahey, Jose Fortes, Renato Figueiredo, Shava Smallen, Warren Smith, and Andrew Grimshaw, Chapter in "Contemporary High Performance Computing: From Petascale toward Exascale", editor Jeff Vetter, April 23, 2013 by Chapman and Hall/CRC
[4] Functional Representations of Scientific Workflows, Noe Lopez-Benitez, JSM Computer Science and Engineering 1(1): 1001
[5] WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments, Weiwei Chen, Ewa Deelman, The 8th IEEE International Conference on eScience 2012 (eScience 2012), Chicago, Oct 8-12, 2012

Syndicate content