Analyzing Large-scale Cancer Genomics Sequencing Data with Next Generation Sequencing (NGS) Data Analysis Tools in Hybrid Cloud
Project Information
- Discipline
- Biology (603)
- Orientation
- Research
This project will use the resources provided by FutureGrid to explore, test and analyze different NGS data analysis tools for detection and annotation of all types of genetic variations (e.g. SNPs, Indels, SVs and CNVs) in large-scale sequencing data of cancer genomics using the MapReduce framework of the HugeSeq computational pipeline. This project will also exploit Virtual Machines and hybrid cloud computing to develop a portable and stand-alone hybrid cloud-enable VM software package (bundled with the aforementioned NGS tools in the HugeSeq MapReduce framework) for researchers in genomics medicine to run computationally intensive NGS data analyses easily in hybrid cloud - Which keeps sensitive data in private cloud, while providing (especially to those without extensive bioinformatics resources) the scalability, computational resources and cost-effectiveness of the public cloud.
Broader ImpactsThe proposed portable hybrid cloud-enable VM package developed in this project will be available for (1) download as an open source software through Sourceforge, (2) researchers in medical genomics and NGS data analysis research community at large to analyze large-scale sequence data in hybrid cloud for detection and annotation all types of genetic variations (SNPs, Indels, SVs and CNVs) in the genomic sequences, and (3) education, training and outreach on NGS data analyses in the cloud through online tutorials, online classes freely accessed by everyone worldwide through Coursera (https://www.coursera.org/), webinars, and workshops.
Project Contact
- Project Lead
- Linda McMahan (mcmahan)
- Project Manager
- Linda McMahan (mcmahan)
Resource Requirements
- Hardware Systems
-
- alamo (Dell optiplex at TACC)
- hotel (IBM iDataPlex at U Chicago)
- india (IBM iDataPlex at IU)
- sierra (IBM iDataPlex at SDSC)
Plan to store small-scale cancer genomics sequence data, create and test VM on FutureGrid to test and apply NGS analysis tools using the MapReduce framework of the HugeSeq computational pipeline in hybrid cloud. Plan to make VM available to researchers in cancer genomics, genomics medicine and the NGS data analysis research communities at large.
Scale of UseA few VMs to test and process small-scale cancer genomics sequence data. The running time of the VMs will be dependent on the NGS analysis tools being tested in the HugeSeq MapReduce framework. If possible, I would like a long term usage of the service to store small-scale sequence data and perform ongoing exploring, testing and analyzing different NGS data analysis tools for detection and annotation of all types of genetic variations in genomics sequence data.
Project Timeline
- Submitted
- 08/28/2012 - 14:47