Cloud-Based Support for Distributed Multiscale Applications

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

Multiscale modeling is one of the most significant challenges which science faces today. The goal of our research is to build an environment that supports composition of multiscale simulations from single scale models encapsulated as scientific software components and distributed in the grid and cloud e-infrastructures. We plan to construct a programming and execution environment for such applications. We are going to investigate and integrate solutions from: 1. virtual experiment frameworks, such as the GridSpace Platform (http://dice.cyfronet.pl/gridspace) 2. tools suporting multiscale computing such as MUSCLE (http://muscle.berlios.de) 3. Cloud, Grid and HPC infrastructures We plan to extend the capabilities of the GridSpace platform developed as a basis for the Virtual Laboratory in the ViroLab project (http://www.virolab.org) and currently further developed in Mapper project (http://www.mapper-project.eu). GS is a framework enabling researchers to conduct virtual experiments on Grid-based resources, Cloud resources and HPC infrastructures. We have already performed several experiments using GridSpace with multiscale simulations: 1. modules taken from AMUSE framework (http://www.amusecode.org) were orchestrated by a GridSpace experiment and communicated using High Level Architecture (IEEE standard 1516) [ComHLA]; 2. modules of the computational biology application were orchestrated by GridSpace experiment and communicated using MUSCLE. Both experiments shared a local cluster with a Portable Batch System (PBS). Thanks to Future Grid resources we hope to acquire the possibility to experiment and compare results of multiscale simulations on Cloud resources. As case studies, we plan to investigate the following applications: 1. In-stent Restenosis; an application which simulates biological responses of cellular tissue for the treatment of atheriosclerosis based on complex automata [ISR]. 2. The Nanopolymer application which uses the LAMMPS Molecular Dynamics Simulator (http://lammps.sandia.gov/). 3. The Brain Aneurism application from the VPH-Share project, which attempts to model cerebral blood flow dynamics (http://uva.computationalscience.nl/research/projects/vph-share); References: [GS] E. Ciepiela, D. Harezlak, J. Kocot, T. Bartynski, M. Kasztelnik, P. Nowakowski, T. Gubała, M.Malawski, M. Bubak; Exploratory Programming in the Virtual Laboratory, in Proceedings of the International Multiconference on Computer Science and Information Technology pp. 621–628 [ISR] Alfons G. Hoekstra, Alfonso Caiazzo, Eric Lorenz, Jean-Luc Falcone, Bastien Chopard, Complex Automata: multi-scale Modeling with coupled Cellular Automata, in A. G. Hoekstra, J. Kroc, and P. M. A. Sloot (Eds.) Modelling Complex Systems with Cellular Automata, Spinger Verlag, July 2010. [ComHLA] K. Rycerz, M. Bubak, P. M. A. Sloot: HLA Component Based Environment For Distributed Multiscale Simulations In: T. Priol and M. Vanneschi (Eds.), From Grids to Service and Pervasive Computing, Springer, 2008, pp. 229-239

Intellectual Merit

Using cloud computing for scientific applications remains an open field. In our research we focus on multiscale simulations and how they can benefit from cloud solutions. We have already conducted preliminary experiments using GridSpace with cloud computing [ICCS11]. Moreover, we have experimented with multiscale simulations using various supporting tools (MUSCLE, HLA), local resource management systems on a cluster (PBS) and scripting languages (distributed Ruby). We have designed and implemented environment supporting building and execution of multiscale applications consisting of HLA-based components in the Grid environment (with test on DAS3 http://www.cs.vu.nl/das3/ infrastructure)[HLA11]. We have also developed a tool based on distributed Ruby that orchestrates setting up MUSCLE-based multiscale applications using PBS allocation. In this project we would like to: 1. design a system supporting execution of multiscale applications on cloud resources, 2. comparatively evaluate the performance of local cluster approach with cloud-based solutions, namely: performance of their resource management mechanims, multiscale applications execution performance, 3. study the differences between various cloud computing stacks and asses the relevant programming models. 4. design a user friendly interface, suitable for scientists working on multiscale problems (computational biologists, physicists) without computer science background. The aim is to extend our knowledge on scientific multiscale applications and how their requirements can be matched to the features of cloud computing. We plan to investigate a solution that allows to run each of the modules from multiscale simulation on different VMs and communicate using various mechanisms - either direct (as in MUSCLE or HLA) or indirect (file system, database, messaging systems). Part of this work will be performed in the scope of M.Sc. theses prepared by students from the AGH University of Technology. The results will also be published in peer-reviewed journals and conferences. [ICCS11] M. Malawski, J. Meizner, M. Bubak, P. Gepner, Component Approach to Computational Applications on Clouds, accepted for publication at the ICCS 2011 conference [HLA11] K. Rycerz, M.Bubak, Collaborative Environment For HLA Component-Based Distributed Multiscale Simulations accepted by: W. Dubitzky at al “Large Scale Computing Technologies for Complex System Applications”, Wiley&Sons.

Broader Impacts

The proposed activity will impact several domains of research infrastructure, teaching and society. The proposed environment will contribute towards better use of novel cyberinfrastructure technologies such as cloud computing to a new class of multiscale applications. The results are planned to be adapted into academic curricula of the partner universities, where involvement of undergraduate and graduate students is significant. Finally, the applications we are supporting are of great importance to society, since they tackle vital problems of modern medicine, including cardiovascular issues related to restenosis and aneurysms. The results of this project will be published in peer-reviewed journals (e.g. Future Generation Computer Systems, International Journal of High Performance Computing Applications, International Journal for Multiscale Computational Engineering etc.) and conferences (e.g. International Conference on Computational Science). Additionally, the work conducted within the project will be used in two Master of Science theses. The results of the project will also be exploited as educational aids in computer science courses at AGH and UvA.

Project Contact

Project Lead
Katarzyna Rycerz (krycerz) 
Project Manager
Katarzyna Rycerz (krycerz) 
Project Members
Pawel Pierzchala, Maciej Malawski, Marcin Nowak, Pawel Koperek, Darko Chadievski, Wojciech Kruczkowski  

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
 
Use of FutureGrid

We plan to compare the behavior of multiscale applications in two environments: (1) cluster with local resource management system using separate cluster nodes for each simulation module (2) cloud of VMs - using separate VM for each simulation module. we plan to compare: (1) application setup time, i.e. how long it takes to start the application in selected environments. (2) application execution time.

Scale of Use

We would like to use the Eucalyptus installation on India and Sierra clusters and compare the results with HPC jobs (PBS). For instent restenosis application we plan to run about 8-10 VMs for a single experiement (run on 8-10 nodes). For prototyping and development, we plan to run a set of simple experiments that will not consume much resources. For performance tests we plan to conduct larger experiment (execution time on the order of 72 hours, 4 GB of output data.) For nanopolymer application we will need ca. 32 nodes. For aneurysm simulation applications we will need ca. 128 nodes. Additionally, we would like to compare Eucalyptus with Nimbus, OpenStack and OpenNebula. We would require approximately 12 months to develop and test the whole system.

Project Timeline

Submitted
03/09/2011 - 04:27