FG-52

Cost-Aware Cloud Computing

A comparative study of high-performance computing on the cloud

[Marathe:2013:CSH:2493123.2462919] Marathe, A., R. Harris, D. K. Lowenthal, B. R. de Supinski, B. Rountree, M. Schulz, and X. Yuan, "A comparative study of high-performance computing on the cloud", Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, New York, NY, USA, ACM, pp. 239–250, 2013.

Cost-Aware Cloud Computing

Project Details

Project Lead
David Lowenthal 
Project Manager
David Lowenthal 
Project Members
Aniruddha Marathe, Matt Justice, Stephen Robinson  
Supporting Experts
Saliya Ekanayake  
Institution
University of Arizona, Dept. of Computer Science  
Discipline
Computer Science (401) 

Abstract

A significant driving force behind cloud computing is its potential for executing scientific applications. Traditional large-scale scientific computing applications are typically executed non locally accessible clusters, or possibly on national laboratory supercomputers. However, such machines are often oversubscribed, which causes long wait times (potentially weeks) just to start an application. Furthermore, this time increases along with both the number of requested processors and the amount of requested time. The key to scientific cloud computing is that the user can run a job immediately, albeit for a certain cost. Also important is that conceptually, cloud computing can, if fully successful, allow sites to rid themselves of their local clusters, which have large total cost of ownership. Traditionally, both computational and computer scientists use metrics like run-time and throughput to evaluate high-performance applications. However, with the cloud, cost is additionally a critical factor in evaluating evaluating alternative application designs. Cloud computing installations generally provide bundled services, each at a different cost. Applications therefore must evaluate different sets of services from different cloud providers to find the lowest-cost alternative that satisfies their particular performance constraints. In the particular case of iPlant, cost and performance are most certainly a factor. In particular, iPlant has as part of its funding money to potentially spend on running jobs on Amazon EC2, the most popular cloud installation. This begs several questions: (1) Which iPlant applications will execute efficiently on the cloud? (2) What cloud configuration should be used? For example, Amazon sells a ``quadruple extra large'' virtual machine instance, which is powerful yet expensive. Is that better than buying several small virtual machine instances?* How can these decisions be made without spending precious dollars executing applications on the cloud? A specific example is iPlant's GLM code, which currently we are extending to execute on multiple nodes, each with a GPU for acceleration. While we have been granted compute hours on the TACC cluster, it is clear that the large data sets desired make this potentially an out-of-core application---the primary data set, consisting of millions of SNPs, will likely not fit in the aggregate memory even if we are able to obtain all TACC nodes. (And, it is rather unlikely that we can obtain them all; our experiments on other supercomputers have shown that the wait time to get all nodes is essentially infinite.) GLM is likely an excellent application to run on the cloud; in fact, the data set may fit in the aggregate memory of the cloud nodes---at a price.

Intellectual Merit

The intellectual merit of the proposal will be in the design and implementation of techniques,\n\nboth for iPlant and in general, to determine automatically what cloud resources to purchase\n\nfor a most cost-effective solution.

Broader Impacts

The broader impact of our proposal is in developing tools and techniques that are broadly applicable to both the iPlant project and the general community. Our research agenda is focused on empowering application developers, especially those involved with iPlant, by reducing their cost without sacrificing performance. More generally, our work can have the effect of lowering the barrier to entry of a new generation of cloud applications. In addition, it may lead to cloud providers improving the way they bundle their services.

Scale of Use

Hundreds to thousands of dedicated machines.

Results

Abstract:

Minimizing the operational cost while improving application execution times
and maximizing resource usage is a key research topic in Cloud Computing.
Different virtual machine configurations, associated cost and input
sizes make it challenging for the user to maximize resource usage while
minimizing total cost. In this project, we attempt to maximize the
resource usage by finding the largest possible input size considering
user constraints on the execution time and the operational cost.

Work done:

As  Amazon EC2 is our commercial target platform, we came up with
different VM specifications. To understand system characteristics, we
wrote our own synthetic benchmarks. Following are the benchmarks we ran on FutureGrid:

- Pingpong (latency/bandwidth) tests
- Compute bound application tests, which we use in both strong and weak
scaling modes
- Memory access tests
- Scalability tests with NAS, ASCII Purple and synthetic benchmarks
on larger number of cores (both intra- and inter- VM)

Achievements/Results:

- We executed and studied benchmarks at different sites within FutureGrid.

- We used Eucalyptus and
Nimbus clients extensively to develop and test set of scripts aimed to
be used with Amazon EC2. This was possible due to compatibility
between EC2 and Eucalyptus APIs.
Overall, based on all of this, we have launched a project to develop a cloud service to automatically choose the most cost-effective cloud instance for a scientific application.  FutureGrid has been extremely valuable to our research.

Syndicate content