Biome representational in silico karyotyping

Project Information

Discipline
Ophthalmology (709) 
Orientation
Research 
Abstract

Characterization of complex metagenomics remains a challenge both with biochemical techniques and bioinformatics. We have designed a novel modification of digital karyotyping-biome representational in silico karyotyping (BRISK)-as a general technique for analyzing a defined representation of all DNA present in a sample. BRISK utilizes a Type IIB DNA restriction enzyme to create a defined representation of 27-mer DNAs in a sample. Massively parallel sequencing of this representation allows for construction of high-resolution karyotypes and identification of multiple species within a biome. We propose to develop a distributed bioinformatics processing chain using Hadoop to perform complex analyses of microbiomes with the sequencing output from BRiSK.

Intellectual Merit

Characterization of complex microbiomes using novel biochemical quantitative methods. Determine new bioinformatic heuristics for handling large sequencing data from next-generation sequencing platforms.

Broader Impacts

All software developed will be released as open source.

Project Contact

Project Lead
Aaron Lee (ayl) 
Project Manager
Aaron Lee (ayl) 

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • delta (GPU Cloud)
  • Not sure
  • I don't care (what I really need is a software environment and I don't care where it runs)
 
Use of FutureGrid

Need long-term access to large computing capacities for handling next generation sequencing data.

Scale of Use

A few VMs for development and experiments of scaling. Then possibly more computing resources to process and analyze BRISK output data.

Project Timeline

Submitted
03/07/2013 - 15:35