Skip to:

e-Science 2008 4th IEEE International Conference on e-Science

Main Conference Sessions

Distributed Replica-Exchange Simulations on Production Environments Using SAGA and Migol

Authors

  • Shantenu Jha, LSU
  • Andre Luckow

Abstract

There exists a class of scientific applications for which utilizing distributed resources is critical for reducing the time-to-solution. However, the ability to orchestrate many distributed jobs in a dynamic and inherently unreliable distributed environments is a major challenge. The more resources and components involved, the more complicated and error-prone the system becomes. We discuss a specific class of applications— Replica-Exchange simulations—where utilizing as many (often heterogeneous) distributed resources as possible, is critical for the effective solution of the scientific problem. Such applications require effective mechanisms to handle the unreliability inherent in dynamic distributed systems. In this paper, we describe the design, development, and deployment of a unique framework for constructing fault-tolerant distributed simulations. The framework is scalable, general purpose, and extensible and consists of two primary components: SAGA and Migol. SAGA is a high-level programmatic abstraction layer that provides a standardized interface for the primary distributed functionality required for application development. We provide details of a newly developed functionality in SAGA, the Checkpoint and Recovery API. Migol is an adaptive Grid middleware, which addresses the fault tolerance of Grid applications and services by providing the capability to recover applications from checkpoint files transparently. In addition to describing the integration of SAGA-CPR with the Migol infrastructure, we outline our experiences with running a large-scale, general-purpose, SAGA-CPR based Replica-Exchange application in a production distributed environment.

Date and Time

Wednesday, December 10, 10:45 a.m. to 11:15 a.m.

Room Number

206

More Information

Show your support for e-Science 2008

Add one of our badges to your site:

  • Teal eScience 2008 Web badge
  • Green eScience 2008 Web badge
  • Orange eScience 2008 Web badge