Federating HPC, Cyberinfrastructure and Clouds using CometCloud

Project Information

Discipline
Computer Science (401) 
Subdiscipline
11.04 Information Sciences and Systems 
Orientation
Research 
Abstract

Clouds are rapidly joining high-performance Grids as viable computational platforms for scientific exploration and discovery, and it is clear that production computational infrastructures will integrate both paradigms in the near future. As a result, understanding usage modes that are meaningful in such a hybrid infrastructure is critical. For example, there are interesting application workflows that can benefit from such hybrid usage modes to, perhaps, reduce times to solutions, reduce costs (in terms of currency or resource allocation), or handle unexpectedruntime situations (e.g. unexpected delays in scheduling queues or unexpected failures). The primary goal of this project is to use CometCloud to create a federation that integrate resources from different infrastructures namely FutureGrid, XSEDE, Grid’5000, Rutgers Discovery Informatics Institute resources and, possibly, resources from other collaborating institutions. Moreover, the infrastructure will enable scale out to public clouds such as Amazon EC2 on-demand. In this way, we will be able to expose the different cyber-infrastructure ecosystems to scientific and engineering applications as Cloud services. Additionally, scientifics will be able to create their own applications on top of CometCloud using different programming models like master/worker, workflows or map/reduce.

Intellectual Merit

The proposed federation will be built on top of the CometCloud [1] dynamic infrastructure, which has been effectively used to federate US cyber-infrastructure such as XSEDE, OSG, FutureGrid, NERSC and Amazon EC2 resources. CometCloud is an autonomic computing engine that enables dynamic and on-demand federation of computational resources as well as the deployment and robust execution of applications on federated infrastructures. It combines highly heterogeneous and dynamic Cloud/Grid infrastructures, enabling the integration of public/private Clouds and autonomic Cloudbursts, i.e., dynamic scale-out to Clouds to address extreme requirements such as heterogeneous and dynamics workloads, and spikes in demands [2]. The CometCloud programming layer provides a platform for application development and management. It supports a range of paradigms including MapReduce, Workflow, and Master/Worker/BOT. [1] CometCloud web site. http://nsfcac.rutgers.edu/CometCloud/ [2] H. Kim, Y. el-Khamra, S. Jha, I. Rodero and M. Parashar, “Autonomic Management of Application Workflow on Hybrid Computing Infrastructure”, Scientific Programming, 19(2-3): 75-89 (2011).

Broader Impacts

We intend to explore different computational models to better support Science. Furthermore, users will be able to develop and run their applications on top of CometCloud to use the whole Federation (as long as they have access to such resources).

Project Contact

Project Lead
Javier Diaz Montes (jdiaz) 
Project Manager
Javier Diaz Montes (jdiaz) 
Project Members
Moustafa AbdelBaky, Ivan Rodero, Mengsong Zou, Daihou Wang, Jaroslaw Zola, Hoang Bui, Mehmet Aktas, Alejandro Pelaez, Rafael Tolosana Calasanz, Manuel Diaz-Granados  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
  • bravo (large memory machine at IU)
  • delta (GPU Cloud)
  • Network Impairment Device
 
Use of FutureGrid

We intend to include different FutureGrid sites into the Federation to be able to scale application across sites. FutureGrid cloud services would be used as part of the infrastructure or to scale out the infrastructure on demand though the CometCloud cloudburst capabilities.

Scale of Use

We will need a few VMs to setup the environment and prepare the actual experiments. It may take some weeks. Then, we will run a set of run at different scales using different systems and different configurations. Analysis will be performed between experiments in order to evaluate only key scenarios of interest. This process may take some months but the use of the resources will not be continued.

Project Timeline

Submitted
11/09/2012 - 18:33