Metagenome analysis of benthic marine invertebrates

Project Information

Discipline
Biology (603) 
Subdiscipline
40.05 Chemistry 
Orientation
Research 
Abstract

We are carrying out deep sequencing of environmental DNA from benthic marine organisms that are important components of their community but that have not been extensively examined genomically. In these organisms, symbiotic bacteria are demonstrably critical to host survival. The metagenomes are extremely complex, yet robust assemblies can sometimes be achieved. These properties make benthic marine invertebrates excellent models for NGS technology. In this project, we will use Future Grid resources to carry out de novo assembly of marine invertebrate metagenomic sequence data, a process that requires large amounts of memory and CPU power due the volume of data.

Intellectual Merit

This work will help determine the potential utility of NGS technology, which produces a large amount of data but as relatively short reads, in metagenomics.

Broader Impacts

In the course of our work we will determine the practical aspects of processing large and complex Illumina sequencing data to obtain de novo genome assemblies of very minor members of the metagenome. This will be of great use to the metagenomics community.

Project Contact

Project Lead
Malcolm Zachariah (mzachariah) 
Project Manager
Malcolm Zachariah (mzachariah) 
Project Members
Earl Middlebrook, Diarey Tianero, Thomas Waller, Thomas Kakule, Malcolm Zachariah, Jason Kwan, Russell Green, Zhejian Lin, Ashaimaa Moussa  

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • bravo (large memory machine at IU)
  • delta (GPU Cloud)
 
Use of FutureGrid

Future Grid will be used for de novo assembly of metagenomic sequence data generated by Illumina technology. FG will also be used for the analysis of the assembled data - including automatic annotation and large scale BLAST searches

Scale of Use

Assemblies using the program Meta-Velvet require a single node with a large amount of memory (~150 GB). Ideally we would be able to SSH into a single node to run the assembly. Long-term we may explore more distributed workflows.

Project Timeline

Submitted
09/02/2011 - 20:43