Workshops & Special Sessions
eScience for cheminformatics and drug discovery
SQMD: Architecture for Scalable, Distributed Database System built on Virtual Private Servers
Presenters and Authors
- Kangseok Kim
- Rajarshi Guha
- Marlon Pierce
Abstract
Many scientific fields routinely generate huge datasets. In many cases, these datasets are not static but rapidly grow in size. Handling these types of datasets, as well as allowing sophisticated queries necessitates scalable distributed database systems, in which scientists are efficiently able to search the datasets. In this paper we present the architecture, implementation and performance analysis of a scalable, distributed database system built on software based virtualization environments. The system architecture makes use of a software partitioning of the database based on data clustering, SQMD (Single Query Multiple Database) mechanism, a web service interface, and virtualization software technologies. The system allows uniform access to concurrently distributed databases, using the SQMD mechanism based on the publish/subscribe paradigm. We highlight the scalability of our architecture by applying it to a database of 17 million chemical structures. In addition to simple identifier based retrieval, we will present performance results for shape similarity queries, which is extremely, time intensive with traditional architectures.
Date and Time
Friday, December 12, 10–10:30 a.m.
<< Return to workshop