V3VEE Project

Project Information

Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 
Orientation
Research 
Abstract

We plan to use FutureGrid to help to evaluate virtualization technologies for high performance computing. In particular we seek a testbed for scaling studies involving our Palacios VMM and its components (e.g. the VNET/P overlay).

Intellectual Merit

The V3VEE project (v3vee.org) is creating a virtual machine monitor framework for modern architectures (those with hardware virtualization support) that will permit the compile-time creation of VMMs with different structures, including those optimized for computer architecture research and use in high performance computing. V3VEE began as an NSF-funded collaborative project between Northwestern University and the University of New Mexico. It currently involves five DOE-funded partner institutions: Northwestern University, the University of New Mexico, the University of Pittsburgh, Sandia National Laboratories, and Oak Ridge National Laboratory. V3VEE is a community resource development effort that anyone can contribute to.

Broader Impacts

The infrastructure developed in the V3VEE project is used extensively in research and education. The codebase is freely available and BSD licensed. Underrepresented groups are impacted through Northwestern's AGEP program and through U NM, a minority serving university. More information on the project can be found at http://v3vee.org.

Project Contact

Project Lead
Peter Dinda (pdinda) 
Project Manager
Peter Dinda (pdinda) 
Project Members
ZHENG CUI, Lei Xia, Kyle Hale, Maciej Swiech  

Resource Requirements

Hardware System
  • Not sure
 
Use of FutureGrid

We want to run scaling studies of Palacios combined with its VNET/P overlay network on 1-10 Gbps Ethernet configurations and/or Infiniband configuration. That is, to run on as many nodes as possible and see the effects.

Scale of Use

There are four phases I envision currently: 1. We will want to log in to the various resources (or have someone do so) to interrogate the hardware. Palacios and VNET/P have some specific hardware requirements, and we must first determine which FG hardware would work. This will take only an hour per environment and we only need a single representative machine in each cluster. 2. We will initially need a small number of machines (2, say) to bring up Palacios and VNET/P in the FG environment. This will let us create a configuration (either kernel module + images + tools) or a whole OS image that can then be repllicated. The time to do this depends a lot on the hurdles that might be encountered. Anywhere from a day to a month. We would know very quickly if things will go fast. 3. Scaling studies. Here, we would use a single cluster (the largest possible) to study the performance of benchmarks and applications as a function of scale. The following paper describes the experimental protocol: L. Xia, et al, VNET/P: Bridging the Cloud and High Performance Computing Through Fast Overlay Networking, HPDC 2012 (and tech report), available from v3vee.org shows the test suite we would likely use. Based on our experience with running on Red Storm, I would anticipate this would take several days. 4. Possibly, we would want to do some cross-cluster experiments. This would depend on the challenges of porting (2), and if we do it, it would consume a couple of clusters for a day or two.

Project Timeline

Submitted
06/07/2012 - 10:33