De novo assembly of genomes and metagenomes from next generation sequencing data

Project Information

Discipline
Biology (603) 
Subdiscipline
14.14 Environmental/Environmental Health Engineering 
Orientation
Research 
Abstract

We will use the FutureGrid computing resource to assemble next-generation sequencing (NGS) reads from eukaryotic genome projects and metagenome project, including the human microbiome project and the earth microbiome project. The massive sequencing data generated by NGS sequencers have revolutionized many fields of biology, but requires extensive computing resources to be analyzed. In particular, we would like to utilize the computer clusters with large continuous RAM from the FutureGird project to test some assembly algorithms we developed for NGS data and to analyze large datasets from microbiome projects that may lead to new findings.

Intellectual Merit

Because of the nature of the large dataset, it is very time consuming to test and improve assembly algorithms for NGS data. FutureGrid resources provide a unique opportunity to test them on real large datasets. The results will be very valuable for the genomics community to develop and improve assembly algorithms.

Broader Impacts

NGS techniques have been applied to many different topics, ranging from biology to environmental sciences and new energy. The success of the proposed project will have great impact in these application areas.

Project Contact

Project Lead
Haixu Tang (hatang) 
Project Manager
Haixu Tang (hatang) 
Project Members
Heewook Lee, Mina Rho, Ram Podicheti, Gregory Zynda, Mingjie Wang  

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • Not sure
 
Use of FutureGrid

We will run the assembly algorithms we developed on large datasets from microbiome projects.

Scale of Use

We need to use computer nodes with large RAM for a week.

Project Timeline

Submitted
06/01/2011 - 21:28