Sequence alignment for Phylogenetic Tree Generation on Big Data Set
Abstract
As sequences generation become faster, the computing power to process these sequences should increased as well. In our dataset, we need to handle million scale of sequences, cluster and visualize them, and find reference sequences in each cluster.
However, doing multiple sequence alignment (MSA) is still a challenge for us as it can easily overwhelmed tranditional compute nodes. We need to test different alignment method on high performance computers and possibly GPUs tryting to address the issue brought by MSA. After sequence alignment, it is possible to generate phylogenetic tree with thousands of branches and visualize that in 3D.
Intellectual Merit
We are doing experiments which makes multiple sequence alignment on and visualized thousands of sequences with length from 1000 to 5000 possible.
Broader Impact
If we can successfully do MSA on our dataset with acceptable time usage on Futuregrid, it will be possible for us to provide such service to Biologist who needs to use Phylogenetic Tree for their research.
Use of FutureGrid
Use FutureGrid for large scale of sequence alignment, including multiple sequence alignment and pairwise sequence alignment
Scale Of Use
a few nodes with faster CPU and large memory