De novo assembly of genomes and metagenomes from next generation sequencing data
Project Information
- Discipline
- Biology (603)
- Subdiscipline
- 14.14 Environmental/Environmental Health Engineering
- Orientation
- Research
We will use the FutureGrid computing resource to assemble next-generation sequencing (NGS) reads from eukaryotic genome projects and metagenome project, including the human microbiome project and the earth microbiome project. The massive sequencing data generated by NGS sequencers have revolutionized many fields of biology, but requires extensive computing resources to be analyzed. In particular, we would like to utilize the computer clusters with large continuous RAM from the FutureGird project to test some assembly algorithms we developed for NGS data and to analyze large datasets from microbiome projects that may lead to new findings.
Intellectual MeritBecause of the nature of the large dataset, it is very time consuming to test and improve assembly algorithms for NGS data. FutureGrid resources provide a unique opportunity to test them on real large datasets. The results will be very valuable for the genomics community to develop and improve assembly algorithms.
Broader ImpactsNGS techniques have been applied to many different topics, ranging from biology to environmental sciences and new energy. The success of the proposed project will have great impact in these application areas.
Project Contact
- Project Lead
- Haixu Tang (hatang)
- Project Manager
- Haixu Tang (hatang)
- Project Members
- Heewook Lee, Mina Rho, Ram Podicheti, Gregory Zynda, Mingjie Wang
Resource Requirements
- Hardware Systems
-
- india (IBM iDataPlex at IU)
- Not sure
We will run the assembly algorithms we developed on large datasets from microbiome projects.
Scale of UseWe need to use computer nodes with large RAM for a week.
Project Timeline
- Submitted
- 06/01/2011 - 21:28