Scaling-out CloudBLAST: Deploying Elastic MapReduce across Geographically Distributed Virtulized Resources for BLAST
Project Information
- Discipline
- Electrical and Related Engineering (106)
- Subdiscipline
- 14.09 Computer Engineering
- Orientation
- Research
This project proposes and evaluates an approach to the parallelization, deployment and management of embarrassingly parallel bioinformatics applications (e.g., BLAST) that integrates several emerging technologies for distributed computing. In particular, it evaluates scaling-out applications on a geographically distributed system formed by resources from distinct cloud providers, which we refer to as sky-computing systems. Such environments are inherently disconnected and heterogeneous with respect to performance, requiring the combination and extension of several existing technologies to efficiently scale-out applications with respect to management and performance.
Intellectual MeritAn end-to-end approach to sky computing is proposed, integrating several technologies and techniques, namely, Infrastructure-as-a-Service cloud toolkit (Nimbus) to create virtual machines (VMs) on demand with contextualization services that facilitate the formation and management of a logic cluster, virtual network (ViNe) to connect VMs on private networks or protected by firewalls, virtual networking (TinyVine) to overcome additional connectivity limitations imposed by cloud providers or middleware, MapReduce framework (Hadoop) for parallel fault-tolerant execution of unmodified applications, extensions to Hadoop to handle inputs as those required by BLAST, and skewed task distribution to deal with resource imbalance.
Broader ImpactsThe outcomes of this project are made available in the form of publications, demos, appliances, presentations and tutorials. This material can be transformative in accelerating future computer engineering developments by taking advantage of a proven integrated cloud-based solution and in facilitating the use of complex systems by non-experts in the field of bioinformatics by offering an end-to-end solution to run BLAST that does not require in-depth knowledge of the underlying cyberinfrastructure technologies.
Project Contact
- Project Lead
- Andrea Matsunaga (ammatsun)
- Project Manager
- Andrea Matsunaga (ammatsun)
- Project Members
- Mauricio Tsugawa
Resource Requirements
- Hardware Systems
-
- alamo (Dell optiplex at TACC)
- foxtrot (IBM iDataPlex at UF)
- hotel (IBM iDataPlex at U Chicago)
- sierra (IBM iDataPlex at SDSC)
Perform experiments to evaluate the proposed solution and develop tutorial.
Scale of UseEvery system you have for blocks of few days for running large experiments (already provided). A few VMs for upgrading and maintaining of existing solution.
Project Timeline
- Submitted
- 05/14/2012 - 19:14