Next Generation Sequencing in the Cloud
Project Information
- Discipline
- Computer Science (401)
- Subdiscipline
- 26.0613 Genetics, Plant and Animal
- Orientation
- Research
We will use this work to analyze next generation sequencing (NGS) algorithms and workflows in the cloud.
Intellectual MeritThere are many genomic data sets hosted either publicly or in clouds such as Amazon already. Many researchers have created algorithms using the Map/Reduce paradigm for pleasingly parallel algorithms. These algorithms fit nicely in clouds; however, we are also interested in understanding better how well other NGS algorithms map to clouds. Questions such as, "Are there limits to using clouds for certain algorithms?" and "Can current NGS algorithms be modified to perform well in the cloud?" are important for researchers to understand.
Broader ImpactsThis work will enhance scientific understanding on how next generation sequencing (NGS) algorithms operate in cloud computing infrastructures. By performing this work, researchers will gain a better understanding on how to perform NGS algorithms and workflows in computing environments such as cloud, which provide a necessary scale of resources.
Project Contact
- Project Lead
- Jonathan Klinginsmith (jklingin)
- Project Manager
- Jonathan Klinginsmith (jklingin)
Resource Requirements
- Hardware Systems
-
- india (IBM iDataPlex at IU)
- sierra (IBM iDataPlex at SDSC)
Create virtual clusters in the Futuregrid Eucalyptus environments to test a variety of NGS software and algorithms as well as explore architecture decisions such as storage options.
Scale of UseI will request a few VMs for an experiment when initially testing. To perform some tests at small scale, I may request 10s of VMs for a virtual cluster. The time the VMs will run will be dependent on the analysis and/or workflow being tested.
Project Timeline
- Submitted
- 10/23/2011 - 20:57