Cloud Virtualization Environment Analysis towards High Performance Storage Solutions

Abstract

Infiniband, as a high-bandwidth, low-latency network interconnect has for most of the last decade been regarded as the fabric of choice for HPC clusters. It is deployed in many commodity cluster, and as of TOP500 list of June 2011, used as the communication network in 41.2% of all systems. However, even as Infiniband usage continues to grow, several factors continue to hinder full utilization of the technology’s capabilities. Recent advances in virtualization tools, namely Single Root I/O Virtualization (SR-IOV), have significantly reduced performance overhead. Specifically, SR-IOV exposes the host machine’s physical cards to multiple virtual machines (VMs), rather than requiring full emulation or passthrough of the device into the VM. This advancement allows VMs to not only concurrently share the same physical device, but does so with minimal overhead as compared to previous virtualization techniques that impose 10-15% performance penalty. We intend to perform an evaluation study of network interconnect performance & overhead analysis under different virtual- ization environments. Specifically, we are looking to begin evaluating Infiniband performance of HPC & Cloud Applications on both native and virtual machines. Further analysis includes evaluating parallel file system (PFS) implementations such as Lustre, and glusterFS towards providing a more effective cloud environment integration.

Intellectual Merit

Our project will be the first to provide a comprehensive evaluation of virtualization impact on both application and network
performance. Isolating and subsequently categorizing and sampling the frequency of measured performance overheads provides
multiple benefits. First, developers will have a better sense of where the performance bottlenecks are, and how they can be
optimized. Second, application users will be able to gauge how certain applications will perform under different environmental
constraints.

Broader Impact

Our project may pave the way for the HPC community to adopt virtualization techniques if it is shown that virtualized
performance is on par with native for majority of execution time. It will also allow current users of virtualization, primarily
the Cloud IT community to better schedule and allocate resources depending on the expected virtualization impact.

Use of FutureGrid

We intend to install custom Operating System and device drivers in order to enable SR-IOV. This setup is required for us to be able to run performance evaluation of different benchmark suites between different combinations/configurations virtual machines and native. Benchmarks utilize MPI for communication. We also intend to install Lustre, a Parallel File System (PFS), which would require a set of nodes to serve as the OSS/OST across which the files are distributed across, as well another set of nodes to represent clients.

Scale Of Use

I want to run a set of comparisons on the entire set of allocated machines, for which we may need 2-3 weeks. This time period should include installation and setup time as well.

Publications


FG-366
Malek Musleh
Purdue University
Active

Project Members

Andrew Younge
John Paul Walters
Vijay Pai

FutureGrid Experts

Andrew Younge