Optimize rapid deployment and updating of VM images at the remote compute cluster

Project Information

Discipline
Physics (203) 
Orientation
Research 
Abstract

Monte-Carlo simulations of future particle physics experiments usually require the use of complex simulations packages developed and maintained over many years by a broad community, e.g. Geant @ CERN. It is relatively straightforward to deploy the most recent version of such software on a private machine and customize it to the needs of a particular experiment. It is more challenging to scale up to 10-20 computer nodes which may have different hardware and computing environments. One often uses 'spare cycles' at some computing facility affiliated with institutions supporting the new project. However, there is typically a tension between the required throughput and the stability of a large computing facility due to its core mission and the specific needs of a new experiment to customize the environment and update the existing libraries or operating system. The advent of virtual machines (VM) allows for a hardware-agnostic cloning of an experiment specific computation environment to be deployed at an arbitrary computer facility. This project intends to explore the practicality of this approach. We would like to investigate how easy it is to frequently build new VMs locally, ship them to a FutureGrid computing facility, deploy 10-20 copies, run for a week, and transfer back (moderate size) results. Then erase it all and start over.

Intellectual Merit

Optimization of the design of the future DarkLight experiment intended to run at JLab

Broader Impacts

Explores usability of cloud-like resources for physics experiment design

Project Contact

Project Lead
Jan Balewski (balewski) 
Project Manager
Jan Balewski (balewski) 
Project Members
Michael Poat, Justin Stevens  

Resource Requirements

Hardware Systems
  • Not sure
  • I don't care (what I really need is a software environment and I don't care where it runs)
 
Use of FutureGrid

The questions we would like to answer are: * What is typical door-to-door wall clock time between the VM image being ready to ship on a local machine vs. having 20 VM images up an running at the remote facility? * Is there a benefit of creating two VM images, one 'storage VM' for storage of output which is not changing with time and another 'production VM' with simulations software that changes with time? Is it more efficient to keep the unchanged large (50GB) 'storage-VM' at the remote facility and update only the smaller (10GB) 'production-VMs'?

Scale of Use

few weeks of CPU on 10-20 cores, used over few months

Project Timeline

Submitted
08/29/2013 - 12:40