HyFlowTM

Project Information

Discipline
Computer Science (401) 
Subdiscipline
14.09 Computer Engineering 
Orientation
Research 
Abstract

The HyFlowTM project is developing distributed transactional memory (or DTM) as an alternative to lock-based distributed concurrency control. HyFlow is a framework for DTM, with pluggable support for policies for directory lookup, transactional synchronization and recovery, contention management, and cache coherence. HyFlow exports a distributed programming model that excludes locks: using (Java 5’s) annotations, a programmer can define atomic sections as transactions, in which reads and writes to shared, local and remote objects appear to take effect instantaneously. No changes are needed to the underlying virtual machine or compiler.

Intellectual Merit

We propose fault-tolerant distributed transactional memory. Distributed transactional memory (TM) promises to alleviate the problems of lock-based distributed concurrency control-- e.g., distributed deadlocks, livelocks, and lock convoying. Distributed TM exports a simple programming interface, which avoids locks. Fault-tolerance is essential to distributed TM to cope with node/network failures, but that must be achieved with scalability. Object replication is central to fault-tolerant D-STM. However, replication protocols must allow read concurrency and ensure transactional consistency and progress properties, while being message-efficient on metric-space networks (i.e., one in which communication cost between nodes form a metric). This is an open problem. Broadcasting -- classical replication primitive in database systems -- is inherently non-scalable in metric-space networks, as messages transmitted grow quadratically with the number of nodes. The project proposes a novel replicated distributed TM framework, whose key idea is to split a transaction into two independent phases with orthogonal responsibilities: 1) regular read/write phases, in which latest copies of required objects are quickly collected, in the presence of node/link failures, without any consistency guarantees, and 2) a request-commit phase, in which conflicts are detected and consistency is ensured by consulting the results collected from conflict resolution modules of other nodes. The project proposes quorum-based replication protocols, in which a transaction reads an object by reading object copies from a read quorum, and writes an object by writing copies to a write quorum. Additionally, the project proposes correctness criterion including weaker consistency models, and distributed conflict resolution protocols and cache coherence protocols. The proposed framework and protocols will be implemented in the HyFlow distributed TM Java package, which is open-source and freely distributed.

Broader Impacts

Project's broader impacts include: 1) integration of the research results into a graduate course taught at Blacksburg, VA and VT-MENA/Egypt; 2) dissemination of the project's results through an open-source implementation of the project's distributed TM infrastructure and publication of results at relevant conferences; and 3) increasing the cultural interaction between students and faculty in the US and those in the Middle East and North Africa region, by advisement of VT-MENA students and teaching an advanced course at VT-MENA/Egypt.

Project Contact

Project Lead
Roberto Palmieri (palmieri) 
Project Manager
Roberto Palmieri (palmieri) 
Project Members
Roberto Palmieri  

Resource Requirements

Hardware Systems
  • alamo (Dell optiplex at TACC)
  • foxtrot (IBM iDataPlex at UF)
  • hotel (IBM iDataPlex at U Chicago)
  • india (IBM iDataPlex at IU)
  • sierra (IBM iDataPlex at SDSC)
  • xray (Cray XM5 at IU)
  • bravo (large memory machine at IU)
  • Network Impairment Device
 
Use of FutureGrid

On of the main purposes of the HyFlowTM project is to build scalable protocols for Distributed Transactional Memories. Scalability in terms of the size of the system so we would use the future grid resources for running experiments using several machines (>8).

Scale of Use

The size of the system is composed by several VMs for not a long time (e.g., 20 VMs for 1 day)

Project Timeline

Submitted
02/26/2013 - 12:31