Cloud-TM

Abstract

Cloud Computing has emerged as a new paradigm for deploying, managing and offering services through a shared infrastructure. The foreseen benefits of Cloud Computing are very compelling both from a cloud consumer and from a cloud services provider perspective: freeing corporations from large IT capital investments via usage-based pricing schemes; leveraging the economies of scale for both services providers and users of the cloud; ease of deployment of services. One of the main challenges to materialize these perceived benefits is to identify innovative distributed programming models that simplify the development of Cloud-based services, allowing for ordinary programmers to take full advantage of the seemingly unbounded amount of computational power and storage available on demand in large scale Cloud infrastructures. This project aims at designing, building, and evaluating a novel platform for service implementation of Cloud-based services: Cloud-TM (Cloud-Transactional Memory). Cloud-TM offers a simple and intuitive programming model for large scale distributed applications that integrates the familiar notion of atomic transaction as a first-class programming language construct, sparing programmers from the burden of implementing low level, error-prone mechanisms (e.g. locking, persistence and fault-tolerance) and permitting major reductions in the time and cost of the development process. Cloud-TM will embed a set of autonomic mechanisms to simplify service monitoring and administration, a major source of costs in dynamic and elastic environments such as the cloud. These mechanisms aim at ensuring the achievement of user defined Quality of Service levels at minimum operational costs by automating the provisioning of resources from the cloud and self-tuning the middleware platform to achieve optimal efficiency in the utilization of resources.

Intellectual Merit

The research and development activities that will be carried out within this project have relations with the following main areas:
- Programming Paradigms Cloud Computing
- (Distributed) Transactional Memories
- Persistence support
- Replicated and Distributed databases

In the following we highlight the progresses beyond the state of the art that will be brought about by this project.

-- Programming paradigms for the Cloud ---

This project aims at introducing a novel programming model explicitly aimed at simplifying the development of service-oriented applications deployed in Cloud Computing infrastructures.
To this end, the Cloud-TM paradigm tackles one of the main issues in the development of applications/services deployed in large scale distributed platforms: the design of highly scalable, fault-tolerant schemes for ensuring the consistency of the application’s shared state.
This is achieved by transparently extending the mainstream object-oriented programming model, exposing to the programmers a simple, innovative Distributed Transactional Memory interface which will hide both the issues related to concurrency and to distribution.
Rather than forcing to the adoption of unfamiliar programming approaches, as it’s for instance the case of MapReduce, the Cloud-TM paradigm aims for a truly general purpose approach to the development of Cloud applications by integrating the transaction abstraction as a first-class construct of a widely adopted object-oriented programming language, such as Java.
Rather than relying on a conventional relational database to ensure ACID transactional properties, Cloud-TM will integrate novel, highly scalable and adaptive transactional mechanisms optimized for elastic, large scale computing infrastructures.
The development of these innovative mechanisms will actually contribute to advance also the state of the art in the recent areas of (Distributed) Transactional Memories, which are discussed next.

-- Transactional Memories --

The Cloud-TM project aims at overcome the limitations of current TMs in several, complementary ways.
First, to ensure optimal performances in the face of varying workload conditions, the Cloud-TM platform will embed autonomic mechanisms to autonomously adapt its contention management strategies.
Second, in order to meet the scalability and reliability requirements of real-world service-oriented applications, we aim at enriching the traditional STM model to transparently leverage the resources of large scale Cloud Computing platforms, hiding the complexities related to implementing consistent, scalable and fault-tolerant replication schemes.
Finally, to unleash the full power of the TM paradigm, Cloud-TM integrate novel algorithms for supporting parallel nesting capable of retaining the competitive performance of serial nesting STMs, no matter how deep nesting may be.

-- Distributed Transactional Memories --

The project will design, develop and evaluate a DSTM platform that:
I. Will integrate innovative data distribution, replication and caching aimed at minimizing the costs associated with remote network accesses. This will be done in a transparent manner for the programmers, based on the run-time monitoring and prediction of the applications’ data access patterns. The result will be a significant simplification and costs’ reduction of the development process, as programmers will be able to focus on implementing the real differentiating values of services, rather than crafting complex, low-level caching schemes.
II. Will embed autonomic mechanisms for automatically scaling up and down the number of resources acquired from the cloud, so to match predefined QoS criteria with minimal operational costs. To maximize the efficiency while the number of nodes varies elastically to accommodate for traffic oscillations, the DSTM platform will transparently adapt the policies it adopts for local and global contention management, fault tolerance, data partitioning and caching.
The achievement of this result will advance not only the state of the art in the area of performance evaluation by developing novel models able to capture the complex dynamics of DSTM systems. It will also lead to advances in the area of adaptive and self-stabilizing systems through the definition of novel mechanisms allowing to safely and efficiently transition between alternative implementations of fundamental services (such as caching and fault-tolerance) largely employed also in different contexts.
III. Will include optimized interfaces towards highly-scalable object-oriented persistent storages, specifically designed to operate in Cloud Computing environments, such as Google’s BigTable and Amazon’s SimpleDB. Since, unlike in database environments, in a DSTM-based approach durability is only viewed as an optional requirement, the Cloud-TM platform will provide simple declarative mechanisms (e.g. at the programming language level or via external configuration files) to identify the portions of the application state that will have to be transparently persisted.

-- Persistence Support --

Current support for persistence in cloud computing environments focuses on high-availability of the resources, on-demand scalability, flexibility of the persistent representation, and in simplifying the distribution of the resources through the cloud. On the other side, they do not focus yet on being programmer-friendly, they force application design to meet certain API limitations, and in some aspects it is currently a step back in some of the best object-oriented programming practices. They also impose an impedance mismatch and the programmer must explicitly code the persistency of the non-transient data implying a cost in development time. Last but not least, existing persistency mechanisms either cannot be composed with transactions, or just support the persistency of data local to a node in the context of a transaction.

To address some of the limitations identified before, we propose a solution where the characteristics of the non-transient data of the application domain model is specified in a declarative way. This is achieved through the use of a Domain Modeling Language (DML) [Cachopo and Silva, 2006]. The DML is used to specify just the structural part of the persistent entities of the application domain model in a succinct and declarative Java-like syntax. It should be easy to use and allow the programmer to add specific behavior to the persistent domain entities. A compiler then receives the DML specification and automatically generates the code responsible for supporting the persistency of the non-transient entities. This way, the programmer does not have to code a single line concerning the persistency of data. Moreover, we will be able to automatically produce code for the persistency of the non-transient data that is best suited for specific requirements of the application.
The proposed persistency component will take into account the local disk storage of each node for storing the persistent entities so that we do not waste resources. It will support the persistency of non-transient data accessed in the context of a transaction that is spread over the cloud.
Finally, we aim at implementing a persistency mechanism that is self-tuning and auto-scaling in order to automatically chose the best algorithms and optimize the number of resources necessary to support the current workload of the application.

-- Replicated and Distributed Databases --

The literature on replicated databases certainly represents a source of inspiration for the design of replication schemes for TMs.
However, as also highlighted by our work in [Romano et al., 2008], typical workloads of TMs and databases show at least two key differences which make it highly desirable to develop replication schemes specifically tailored to meet the requirements of TMs. On one hand, the transaction execution time in STM-based systems is often several orders of magnitude smaller than for database applications. This leads to a corresponding amplification of the AB-based synchronization overhead with respect to the scenario of conventional database replication, raising the urge for novel replication schemes explicitly optimized for STM environments. Another important feature characterizing the workload of standard STM benchmarks is the much higher heterogeneity of their workload with respect to the case of traditional databases [Romano et al., 2008].
This motivates the need for novel, highly efficient replication schemes specifically designed to match the unique requirements of DSTMs. This is indeed an important goal of this project, which will actually lead to advances not only of the state of the art on DSTMs. In fact, we expect that the innovative replication mechanisms developed within the project will likely reveal attractive also in the more traditional context of database replication.

Broader Impact

The innovative technologies that will be developed by this project aim at overcoming some of the major roadblocks to the adoption of the Cloud Computing model, namely:
• the lack of programming paradigms that simplify the development of service-oriented applications (such as e-Commerce, online games, or social networks) to be deployed in the Cloud. The Cloud-TM platform aims at achieving very high scalability levels, and at strongly reducing the development costs and the time-to-market by freeing the programmers from the burden of implementing low level, error-prone mechanisms for, e.g., caching, replication and fault-tolerance.
• the lack of autonomic mechanisms simplifying service monitoring and administration, a major source of costs in dynamic and elastic environments such as the Cloud. By self-optimizing the system depending on the current workload characteristics, Cloud-TM will strongly reduce not only the administration costs, but also the operational costs, by ensuring that the resources acquired from the Cloud infrastructure are never in excess and always optimally utilized.

At the light of the above observations, in this project we aim at:

• Deep technological advances in software/service engineering. New software technologies for improving scalability and predictability of distributed systems, improving responsiveness and throughput. A more competitive environment including infrastructure operators moved up the value chain with innovative service offerings on scalable infrastructure.

• Lowered barriers for service providers, in particular SMEs, to develop services through standardised open (source) platforms and interfaces

• A strengthened industry for software, software services and Web services, offering a greater number of more reliable and affordable services, enabled by flexible and resilient platforms for software/service engineering, design, development, management and interoperability. Technologies tailored to meet key societal and economical needs

• Lower management costs of large networked systems through the ability to adapt to changing environments and patterns of use, and through a greater degree of, flexibility and reliability.

• More efficient use of resources such as processing power, energy and bandwidth through autonomic decisions based on awareness

Use of FutureGrid

Several partners have available, at their premises, small scale clusters that can be used both for development, performance debugging and preliminary experimentation. For instance, INESC-ID has a cluster with around 20 physical multi-core nodes (8 cores per node); CINI has a smaller cluster of three highly parallel node (32 cores per node).

On the other hand, one of the key objectives of the project is to achieve extremely high scalability levels, namely hundreds or thousands of nodes. Therefore FutureGrid will represents an ideal test-bed to perform performance and scalability studies of the Cloud-TM middleware on a very large scale platform.

Scale Of Use

We plan two main types of experiments:
- large scale performance/scalability evaluation studies involving from 10 to 100-200 nodes in a single data center
- inter-data center performance/scalability evaluation studies, involving 2-3 data centers. each of which using 10-20 nodes

Each experimental session would run typically for one or two hours.

Publications


Results

2012

Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, and Luis Rodrigues

When Scalability Meets Consistency: Genuine Multiversion Update Serializable Partial Data Replication

32nd International Conference on Distributed Computing Systems (ICDCS'12)

Diego Didona, Pierangelo Di Sanzo, Roberto Palmieri, Sebastiano Peluso, Francesco Quaglia and Paolo Romano,

Automated Workload Characterization in Cloud-based Transactional Data Grids

17th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems (DPDNS'12)

 

Paolo Romano,

Elastic, scalable and self-tuning data replication in the Cloud-TM platform,

Proceedings of 1st European Workshop on Dependable Cloud Computing (EWDCC'12)

Paolo Romano and M. Leonetti,

Self-tuning Batching in Total Order Broadcast Protocols via Analytical Modelling and Reinforcement Learning

IEEE International Conference on Computing, Networking and Communications, Network Algorithm & Performance Evaluation Symposium (ICNC'12), Jan. 2012

 

2011     

 

Self-optimizing transactional data grids for elastic cloud environments, P. Romano, CloudViews 2011    

 

Boosting STM Replication via Speculation, P. Romano, R. Palmeri, F. Quaglia, L. Rodrigues, 3rd Workshop on the Theory of Transactional Memory

 

Data Access Pattern Analysis and Prediction for Object-Oriented Applications, S. Garbatov, J. Cachopo, INFOCOMP Journal of Computer Science, December 2011

 

Software Cache Eviction Policy based on Stochastic Approach, S. Garbatov, J. Cachopo, The Sixth International Conference on Software Engineering Advances (ICSEA 2011), October 2011

 

Optimal Functionality and Domain Data Clustering based on Latent Dirichlet Allocation, S. Garbatov, J. Cachopo, The Sixth International Conference on Software Engineering Advances (ICSEA 2011), October 2011

 

Strict serializability is harmless: a new architecture for enterprise applications, S. Fernandes, J. Cachopo, Proceedings of the ACM international conference on Object oriented programming systems languages and applications companion

 

Towards a simple programming model in Cloud Computing platforms, J. Martins, J. Pereira, S.M. Fernandes, J. Cachopo, First International Symposium on Network Cloud Computing and Applications (NCCA2011)

 

On Preserving Domain Consistency for an Evolving Application, J. Neves, J. Cachopo, Terceiro Simpósio de Informática, September 2011

 

Oludap, an AI approach to web gaming in the Cloud, V. Ziparo, Open World Forum 2011, September 2011

 

Towards Autonomic Transactional Replication for Cloud Environments, M.  Couceiro, P. Romano, L. Rodrigues, European Research Activities in Cloud Computing

 

SPECULA: um Protocolo de Replicação Preditiva para Memória Transaccional por Software Distribuída, J. Fernandes, P.  Romano, L. Rodrigues, Simpósio de Informática, Universidade de Coimbra (INFORUM 2011)  

 

Replicação Parcial em Sistemas de Memória Transaccional, P. Ruivo, P. Romano, L., Rodrigues, Simpósio de Informática, Universidade de Coimbra (INFORUM 2011)

 

Integrated Monitoring of Infrastructures and Applications in Cloud Environments, R. Palmieri, P. Di Sanzo, F. Quaglia, P. Romano, S. Peluso, D. Didona, Workshop on Cloud Computing: Projects and Initiatives (CCPI 2011)

 

PolyCert: Polymorphic Self-Optimizing Replication for In-Memory Transactional Grids, M. Couceiro, P. Romano and L. Rodrigues, ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011)

 

Exploiting Total Order Multicast in Weakly Consistent Transactional Caches, P. Ruivo, M. Couceiro, P. Romano and L. Rodrigues, Proc. IEEE 17th Pacific Rim International Symposium on Dependable Computing (PRDC’11)

 

Tutorial on Distributed Transactional Memories, M. Couceiro, P. Romano and L. Rodrigues, 2011 International Conference on High Performance Computing & Simulation July 2011

 

Keynote Talk: Autonomic mechanisms for transactional replication in elastic cloud environments, P. Romano, 2nd Workshops on Software Services (WOSS), Timisoara, Romania, June 2011

 

Self-tuning Batching in Total Order Broadcast Protocols via Analytical Modelling and Reinforcement Learning , P. Romano and M. Leonetti, ACM Performance Evaluation Review, to appear (also presented as a Poster at IFIP Performance 2011 Symposium)

 

On the Analytical Modeling of Concurrency Control Algorithms for Software Transactional Memories: the Case of Commit-Time-Locking, P. Di Sanzo, B. Ciciani, F. Quaglia, R. Palmieri and Paolo Romano, Elsevier Performance Evaluation Journal (to appear)

 

OSARE: Opportunistic Speculation in Actively REplicated Transactional Systems, R. Palmieri, F. Quaglia and Paolo Romano, The 30th IEEE Symposium on Reliable Distributed Systems (SRDS 2011), Madrid, Spain, to appear.

 

A Generic Framework for Replicated Software Transactional Memories, N. Carvalho, P. Romano and L. Rodrigues, , Proceedings of the 9th IEEE International Symposium on Network Computing and Applications (NCA), Cambridge, Massachussets, USA, IEEE Computer Society Press, August 2011

Autonomic mechanisms for transactional replication in elastic cloud environments (Keynote Talk), Workshop on Software Services: Cloud Computing and Applications based on Software Services, Paolo Romano, Timisoara, June 2011

 

SCert: Speculative Certification in Replicated Software Transactional Memories. N. Carvalho, P. Romano and L. Rodrigues. 
Proceedings of the 4th Annual International Systems and Storage Conference (SYSTOR 2011), Haifa, Israel, June 2011.
 

2010

 

Asynchronous Lease-based Replication of Software Transactional Memory. N. Carvalho, P. Romano and L. Rodrigues. Proceedings of the ACM/IFIP/USENIX 11th Middleware Conference (Middleware), Bangalore, India, ACM Press, November 2010.

 

Analytical Modeling of Commit-Time-Locking Algorithms for Software Transactional Memories. P. Di Sanzo, B. Ciciani, F. Quaglia, R. Palmieri and P. Romano. Proceedings of the 35th International Computer Measurement Group Conference (CMG), Orlando, Florida, Computer Measurement Group, December 2010 (also presented in the 1st Workshop on "Informatica Quantitative" (InfQ), Pisa, July 2010)

 

Do we really need parallel programming or should we strive for parallelizable programming instead? João Cachopo. SPLASH 2010 Workshop on Concurrency for the Application Programmer. October, 2010.

 

A Machine Learning Approach to Performance Prediction of Total Order Broadcast Protocols. M. Couceiro, P. Romano and L. Rodrigues. Proceedings of the 4th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), Budapest, Hungary, IEEE Computer Society Press, September 2010

 

An Optimal Speculative Transactional Replication Protocol. P. Romano, R. Palmieri, F. Quaglia, N. Carvalho and L. Rodrigues. Proceedings of the 8th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), Taiwan, Taipei, IEEE Computer Society Press, September 2010.

 

FG-172
Paolo Romano
INESC ID, Lisbon
Active

Project Members

Diego Didona
Fabio Cottefoglie
Francesco Maria Molfese
Hugo Pimentel
Maria Couceiro
Nuno Diegues
Pedro Ruivo
Sebastiano Peluso
Vittorio Amos Ziparo

FutureGrid Experts

Tak-Lon Wu
Zhenhua Guo