Next Generation Sequencing in the Cloud

Project Information

Discipline
Computer Science (401) 
Subdiscipline
26.0613 Genetics, Plant and Animal 
Orientation
Research 
Abstract

We will use this work to analyze next generation sequencing (NGS) algorithms and workflows in the cloud.

Intellectual Merit

There are many genomic data sets hosted either publicly or in clouds such as Amazon already. Many researchers have created algorithms using the Map/Reduce paradigm for pleasingly parallel algorithms. These algorithms fit nicely in clouds; however, we are also interested in understanding better how well other NGS algorithms map to clouds. Questions such as, "Are there limits to using clouds for certain algorithms?" and "Can current NGS algorithms be modified to perform well in the cloud?" are important for researchers to understand.

Broader Impacts

This work will enhance scientific understanding on how next generation sequencing (NGS) algorithms operate in cloud computing infrastructures. By performing this work, researchers will gain a better understanding on how to perform NGS algorithms and workflows in computing environments such as cloud, which provide a necessary scale of resources.

Project Contact

Project Lead
Jonathan Klinginsmith (jklingin) 
Project Manager
Jonathan Klinginsmith (jklingin) 

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
 
Use of FutureGrid

Create virtual clusters in the Futuregrid Eucalyptus environments to test a variety of NGS software and algorithms as well as explore architecture decisions such as storage options.

Scale of Use

I will request a few VMs for an experiment when initially testing. To perform some tests at small scale, I may request 10s of VMs for a virtual cluster. The time the VMs will run will be dependent on the analysis and/or workflow being tested.

Project Timeline

Submitted
10/23/2011 - 20:57