Optimizing Partitioned Address-space Programs for Shared Memory and Hybrid Clusters

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

Parallel programming with partitioned address spaces has several advantages, including, explicit data dependencies, programmer awareness of data-locality, and absence of obscure bugs cause by forgotten synchronization around shared data. However, partitioned address spaces do create their own problems. Our past and ongoing research has shown that it is possible to address productivity-related issues by devising abstractions for specifying communication However, an important shortcoming of programming with partitioned address spaces is that it is difficult to leverage hardware shared memory for such programs, due to program semantics and the limitations of the underlying communication libraries.

MPI is the leading communication library used for partitioned address-space programming. In this project, we will develop techniques that combine MPI-aware compiler analysis and run-time systems to optimize MPI programs for shared memory by achieving zero-copy communication in a large number of cases. We avoid the difficult problem of matching sends and receives in MPI programs by developing a smart run-time system that serves as a drop-in replacement for standard MPI primitives. A source-level compiler, based on LLNL's ROSE framework, will optimize the original MPI program by converting the original calls to MPI primitives and selectively globalizing communication buffers into shared space to enable zero-copy communication. We will extend our analysis and implementation to also work with a declarative language for specifying communication that we have been developing, called Kanor.

The ability to optimize partitioned address-space programs, such as those using MPI, on increasingly common many-cores gives us a powerful motivation for our project. However, to achieve improved scaling, we will also explore optimization challenges for hybrid platforms, consisting of clusters of shared-memory machines.

Intellectual Merit

Automatic optimization of programs written for distributed memory (such as, using MPI) for shared memory remains an open problem. This project will aim to address this important problem, which will be a significant step toward automatic migration of legacy code to next generation machines. By leveraging our work on Kanor this project will also help develop a uniform parallel programming environment that can deliver high performance at multiple scales.

Broader Impacts

One PhD student and one MS student will gain first-hand experience of working on a shared high-performance resource by working on FutureGrid. Outcome of the research will be published in leading international conferences, which will also provide the Masters student exposure to computer science research. The source-level compiler, based on ROSE, and the run-time system will be made available under an open-source license.

Project Contact

Project Lead
Arun Chauhan (achauhan) 
Project Manager
Arun Chauhan (achauhan) 
Project Members
Nilesh Mahajan, Uday Pitambare  

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • delta (GPU Cloud)
 
Use of FutureGrid

We will run experiments on Delta to assess the performance of our techniques on multiple cores of a node. Later, we will use multiple nodes of a cluster to evaluate the performance in hybrid scenarios.

Scale of Use

One node at a time, initially. Multiple nodes in the later part of the project.

Project Timeline

Submitted
06/19/2012 - 11:11