Cost-Aware Cloud Computing

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

A significant driving force behind cloud computing is its potential for executing scientific applications. Traditional large-scale scientific computing applications are typically executed non locally accessible clusters, or possibly on national laboratory supercomputers. However, such machines are often oversubscribed, which causes long wait times (potentially weeks) just to start an application. Furthermore, this time increases along with both the number of requested processors and the amount of requested time. The key to scientific cloud computing is that the user can run a job immediately, albeit for a certain cost. Also important is that conceptually, cloud computing can, if fully successful, allow sites to rid themselves of their local clusters, which have large total cost of ownership. Traditionally, both computational and computer scientists use metrics like run-time and throughput to evaluate high-performance applications. However, with the cloud, cost is additionally a critical factor in evaluating evaluating alternative application designs. Cloud computing installations generally provide bundled services, each at a different cost. Applications therefore must evaluate different sets of services from different cloud providers to find the lowest-cost alternative that satisfies their particular performance constraints. In the particular case of iPlant, cost and performance are most certainly a factor. In particular, iPlant has as part of its funding money to potentially spend on running jobs on Amazon EC2, the most popular cloud installation. This begs several questions: (1) Which iPlant applications will execute efficiently on the cloud? (2) What cloud configuration should be used? For example, Amazon sells a ``quadruple extra large'' virtual machine instance, which is powerful yet expensive. Is that better than buying several small virtual machine instances?* How can these decisions be made without spending precious dollars executing applications on the cloud? A specific example is iPlant's GLM code, which currently we are extending to execute on multiple nodes, each with a GPU for acceleration. While we have been granted compute hours on the TACC cluster, it is clear that the large data sets desired make this potentially an out-of-core application---the primary data set, consisting of millions of SNPs, will likely not fit in the aggregate memory even if we are able to obtain all TACC nodes. (And, it is rather unlikely that we can obtain them all; our experiments on other supercomputers have shown that the wait time to get all nodes is essentially infinite.) GLM is likely an excellent application to run on the cloud; in fact, the data set may fit in the aggregate memory of the cloud nodes---at a price.

Intellectual Merit

The intellectual merit of the proposal will be in the design and implementation of techniques,\n\nboth for iPlant and in general, to determine automatically what cloud resources to purchase\n\nfor a most cost-effective solution.

Broader Impacts

The broader impact of our proposal is in developing tools and techniques that are broadly applicable to both the iPlant project and the general community. Our research agenda is focused on empowering application developers, especially those involved with iPlant, by reducing their cost without sacrificing performance. More generally, our work can have the effect of lowering the barrier to entry of a new generation of cloud applications. In addition, it may lead to cloud providers improving the way they bundle their services.

Project Contact

Project Lead
David Lowenthal (dlowenthal) 
Project Manager
David Lowenthal (dlowenthal) 
Project Members
Aniruddha Marathe, Matt Justice, Stephen Robinson  

Resource Requirements

Hardware System
  • Not sure
 
Use of FutureGrid

We want to perform performance prediction on the Cloud for scientific applications, especially those of interest to the NSF iPlant project.

Scale of Use

Hundreds to thousands of dedicated machines.

Project Timeline

Submitted
12/06/2010 - 13:28