Biome representational in silico karyotyping

Abstract

Characterization of complex metagenomics remains a challenge both with biochemical techniques and bioinformatics. We have designed a novel modification of digital karyotyping-biome representational in silico karyotyping (BRISK)-as a general technique for analyzing a defined representation of all DNA present in a sample. BRISK utilizes a Type IIB DNA restriction enzyme to create a defined representation of 27-mer DNAs in a sample. Massively parallel sequencing of this representation allows for construction of high-resolution karyotypes and identification of multiple species within a biome. We propose to develop a distributed bioinformatics processing chain using Hadoop to perform complex analyses of microbiomes with the sequencing output from BRiSK.

Intellectual Merit

Characterization of complex microbiomes using novel biochemical quantitative methods.

Determine new bioinformatic heuristics for handling large sequencing data from next-generation sequencing platforms.

Broader Impact

All software developed will be released as open source.

Use of FutureGrid

Need long-term access to large computing capacities for handling next generation sequencing data.

Scale Of Use

A few VMs for development and experiments of scaling. Then possibly more computing resources to process and analyze BRISK output data.

Publications


FG-315
Aaron Lee
Washington University at St Louis, School of Medicine
Active

FutureGrid Experts

Saliya Ekanayake

Timeline

1 year 4 weeks ago