Sequence alignment for Phylogenetic Tree Generation on Big Data Set

Abstract

As sequences generation become faster, the computing power to process these sequences should increased as well. In our dataset, we need to handle million scale of sequences, cluster and visualize them, and find reference sequences in each cluster.
However, doing multiple sequence alignment (MSA) is still a challenge for us as it can easily overwhelmed tranditional compute nodes. We need to test different alignment method on high performance computers and possibly GPUs tryting to address the issue brought by MSA. After sequence alignment, it is possible to generate phylogenetic tree with thousands of branches and visualize that in 3D.

Intellectual Merit

We are doing experiments which makes multiple sequence alignment on and visualized thousands of sequences with length from 1000 to 5000 possible.

Broader Impact

If we can successfully do MSA on our dataset with acceptable time usage on Futuregrid, it will be possible for us to provide such service to Biologist who needs to use Phylogenetic Tree for their research.

Use of FutureGrid

Use FutureGrid for large scale of sequence alignment, including multiple sequence alignment and pairwise sequence alignment

Scale Of Use

a few nodes with faster CPU and large memory

Publications


FG-271
Yang Ruan
Indiana University
Active

Project Members

Geoffrey Fox
Khaliq Satchell
Loran Saggu
Saliya Ekanayake

Timeline

18 weeks 3 days ago