Sequence alignment for Phylogenetic Tree Generation on Big Data Set

Project Information

Discipline
Computer Science (401) 
Subdiscipline
26.01 Biology, General 
Orientation
Research 
Abstract

As sequences generation become faster, the computing power to process these sequences should increased as well. In our dataset, we need to handle million scale of sequences, cluster and visualize them, and find reference sequences in each cluster.
However, doing multiple sequence alignment (MSA) is still a challenge for us as it can easily overwhelmed tranditional compute nodes. We need to test different alignment method on high performance computers and possibly GPUs tryting to address the issue brought by MSA. After sequence alignment, it is possible to generate phylogenetic tree with thousands of branches and visualize that in 3D.

Intellectual Merit

We are doing experiments which makes multiple sequence alignment on and visualized thousands of sequences with length from 1000 to 5000 possible.

Broader Impacts

If we can successfully do MSA on our dataset with acceptable time usage on Futuregrid, it will be possible for us to provide such service to Biologist who needs to use Phylogenetic Tree for their research.

Project Contact

Project Lead
Yang Ruan (yangruan) 
Project Manager
Yang Ruan (yangruan) 
Project Members
Saliya Ekanayake, Geoffrey Fox, Loran Saggu, Khaliq Satchell  

Resource Requirements

Hardware Systems
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • bravo (large memory machine at IU)
  • delta (GPU Cloud)
 
Use of FutureGrid

Use FutureGrid for large scale of sequence alignment, including multiple sequence alignment and pairwise sequence alignment

Scale of Use

a few nodes with faster CPU and large memory

Project Timeline

Submitted
09/28/2012 - 21:46