Sequence alignment for Phylogenetic Tree Generation on Big Data Set
Project Information
- Discipline
- Computer Science (401)
- Subdiscipline
- 26.01 Biology, General
- Orientation
- Research
As sequences generation become faster, the computing power to process these sequences should increased as well. In our dataset, we need to handle million scale of sequences, cluster and visualize them, and find reference sequences in each cluster.
However, doing multiple sequence alignment (MSA) is still a challenge for us as it can easily overwhelmed tranditional compute nodes. We need to test different alignment method on high performance computers and possibly GPUs tryting to address the issue brought by MSA. After sequence alignment, it is possible to generate phylogenetic tree with thousands of branches and visualize that in 3D.
We are doing experiments which makes multiple sequence alignment on and visualized thousands of sequences with length from 1000 to 5000 possible.
Broader ImpactsIf we can successfully do MSA on our dataset with acceptable time usage on Futuregrid, it will be possible for us to provide such service to Biologist who needs to use Phylogenetic Tree for their research.
Project Contact
- Project Lead
- Yang Ruan (yangruan)
- Project Manager
- Yang Ruan (yangruan)
- Project Members
- Saliya Ekanayake, Geoffrey Fox, Loran Saggu, Khaliq Satchell
Resource Requirements
- Hardware Systems
-
- hotel (IBM iDataPlex at U Chicago)
- india (IBM iDataPlex at IU)
- bravo (large memory machine at IU)
- delta (GPU Cloud)
Use FutureGrid for large scale of sequence alignment, including multiple sequence alignment and pairwise sequence alignment
Scale of Usea few nodes with faster CPU and large memory
Project Timeline
- Submitted
- 09/28/2012 - 21:46