Next Generation Sequencing in the Cloud

Abstract

We will use this work to analyze next generation sequencing (NGS) algorithms and workflows in the cloud.

Intellectual Merit

There are many genomic data sets hosted either publicly or in clouds such as Amazon already. Many researchers have created algorithms using the Map/Reduce paradigm for pleasingly parallel algorithms. These algorithms fit nicely in clouds; however, we are also interested in understanding better how well other NGS algorithms map to clouds. Questions such as, "Are there limits to using clouds for certain algorithms?" and "Can current NGS algorithms be modified to perform well in the cloud?" are important for researchers to understand.

Broader Impact

This work will enhance scientific understanding on how next generation sequencing (NGS) algorithms operate in cloud computing infrastructures. By performing this work, researchers will gain a better understanding on how to perform NGS algorithms and workflows in computing environments such as cloud, which provide a necessary scale of resources.

Use of FutureGrid

Create virtual clusters in the Futuregrid Eucalyptus environments to test a variety of NGS software and algorithms as well as explore architecture decisions such as storage options.

Scale Of Use

I will request a few VMs for an experiment when initially testing. To perform some tests at small scale, I may request 10s of VMs for a virtual cluster. The time the VMs will run will be dependent on the analysis and/or workflow being tested.

Publications


FG-168
Jonathan Klinginsmith
Indiana University
Active

FutureGrid Experts

Saliya Ekanayake
Zhenhua Guo

Timeline

2 years 49 weeks ago