Parallelization of heterogeneous workloads for Imaging Genomic Browser

Abstract

With collaborators in the IU Medical School, we are applying our next-generation parallel programming libraries to a recent application in genome analysis, described here: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3065788/ The application enables a user to explore correlations between genotypes and brain structure. It presents a challenging target for parallelization: first, workloads are dynamic, driven by a user manipulating a GUI; second, workloads include both 3D image processing and genome analysis components, the former of which is a good candidate for GPU execution. Our software framework balances parallelism between CPUs and GPUs on multiple nodes, and thus the ideal platform for evaluation of our techniques is a cluster with both GPUs and a high number of CPU cores per node (so as to simultaneously test scaling of multi-threading, distribution, and CPU/GPU partitioning). For this reason we are interested in using the new Delta cluster.

Intellectual Merit

Heterogenous distributed platforms present critical new problems to
the software development process. Much recent research has attempted
to addres this problem. Our particul approach uses high-level
domain-specific languages that present enough information to the
compiler to enable code generation for different platforms
(e.g. CPU/GPU). Further, we use new dynamic load balancing techniques
to manage load between nodes and between multiple CPUs and multiple
GPUs.

The research goals supported by this project are two-fold,
corresponding to the collaborators involved: first, advances in
compilers and language runtimes, and second advances in the analysis
of large imaging/genomic data-sets.

Broader Impact

The software libraries that we have been developing are already used
by an open source software community (e.g. major packages on
hackage.org depend on our monad-par library). All software developed
in the course of this project will likewise be made available and
supported.

Use of FutureGrid

The software techniques we are developing are specifically targetted
to HPC platforms, access to which is critical for the continued
development of the software.

Scale Of Use

At this stage of the project we will primarily be running benchmarks
to evaluate the scalability of our software. Running our benchmark
suites can take from one hour to a few hours but requires exclusive
access to a set of machines. The ideal for us would be able to run a
bechmark suite periodically (say, every week or two) as we
incrementally improve the software.

Publications


FG-205
Ryan Newton
Indiana University
Active

Project Members

Aaron Hsu
Aaron Todd
Abhishek Kulkarni
Edward Amsden
Eric Holk
Eric Jiang
Li Shen
Rebecca Swords
Sajith Sasidharan
Sungeun Kim

Timeline

1 year 29 weeks ago