Evaluation of Network Interrupt Tuning on Virtualized Application Performance

Project Information

Discipline
Electrical and Related Engineering (106) 
Subdiscipline
14.09 Computer Engineering 
Orientation
Research 
Abstract

nfiniband, as a high-bandwidth, low-latency network interconnect has for most of the last decade been regarded as the fabric of choice for HPC clusters. It is deployed in many commodity cluster, and as of TOP500 list of June 2011, used as the communication network in 41.2% of all systems. However, even as Infiniband usage continues to grow, several factors continue to hinder full utilization of the technology’s capabilities. A recent paper proposed that tuning network interrupt parameters can be done to improve the competitiveness of virtualized performance with that of native. We would look to further evaluate the performance impact of network interrupt tuning on virtualized application performamce, for example cache misses, TLB misses, etc.

Intellectual Merit

Our project will be the first to provide a comprehensive evaluation of the network tuning optimizations impact on both application and network performance for PCI-passthrough.

Broader Impacts

Our project may pave the way for the HPC community to adopt virtualization techniques if it is shown that virtualized performance is on par with native for majority of execution time. It will also allow current users of virtualization, primarily the Cloud IT community to better schedule and allocate resources depending on the expected virtualization impact.

Project Contact

Project Lead
Malek Musleh (musleh) 
Project Manager
Malek Musleh (musleh) 

Resource Requirements

Hardware System
  • bravo (large memory machine at IU)
 
Use of FutureGrid

We intend to install custom Operating System and device drivers in order to enable PCI-passthrough. This setup is required for us to be able to run performance evaluation of different benchmark suites between different combinations/configurations virtual machines and native. Benchmarks utilize MPI for communication.

Scale of Use

I want to run a set of comparisons on the entire set of allocated machines, for which we may need 2 months. This time period should include installation and setup time as well.

Project Timeline

Submitted
07/03/2014 - 13:07