DRAFT: This document is not yet finalized. Please do not quote.

Java Grande Forum

Report of the Concurrency & Application Working Group

September 28, 1998

Contents


Preface

By Dennis Gannon and George Thiruvathukal

The primary concern of the Java Grande Forum (hereafter JGF) is to ensure that the Java language, libraries and virtual machine can become the implementation vehicle of choice for future scientific and engineering applications. The first step in meeting this goal is to implement the complex and numerics proposals described in the report of the Numerics Working Group. Accomplishing this task provides the essential language semantics needed to write high quality scientific software. However, more will be required of the Java class libraries and runtime environment if we wish to capitalize on these language changes. The Java Grande Forum Applications & Concurrency Working Group (hereafter ACG) focuses on these issues.

It is possible that many of the needed improvements will be driven by commercial sector efforts to build server side enterprise applications. Indeed, the requirements of technical computing overlap with those of large enterprise applications in many ways. For example, both technical and enterprise computing applications can be very large and they will stress the memory management of the VM. The demand for very high throughput on network and I/O services is similar for both. Many of the features of the Enterprise Bean model will be of great importance to technical computing.

But there are also areas where technical computing is significantly different from Enterprise applications. For example, fine grained concurrency performance is substantially more critical in technical computing where a single computation may require 10,000 threads that synchronize in frequent, regular patterns. These computations would need to run on desktops as well as very large, shared memory multiprocessors. In technical applications, the same data may be accessed again and again, while in enterprise computing there is a great emphasis on transactions involving different data each time. Consequently, memory locality optimization may be more important for Grande applications than it is elsewhere in the Java world. Some technical applications will require the ability to link together multiple VMs concurrently executing on a dedicated cluster of processors which communicate through special high performance switches. On such a system, specialized, ultra low latency versions of the RMI protocol would be necessary. (In such an environment, an interface to shared memory, via RMI or the VM, would also be desirable.)

It is important to observe that there are problems which can be described as technical computing today which will become part of the enterprise applications of the future. For example, image analysis and computer vision are closely tied to application of data mining. The processing and control of data from arrays of sensors has important applications in manufacturing and medicine. The large scale simulation of non-linear mathematical systems is already finding its way into financial and marketing models. It is also the case that many technical computing applications do impact our day-to-day lives, such as aircraft simulation (the recent design of the Boeing 777) and weather forecasting. At least in the case of aircraft design, the industry has a valuation in the billions of dollars, which means it is far from being merely niche area being of limited interest.

This document is part of a larger report being written by the Java Grande Forum members. There are two major sections of this report: Numerics and Concurrency/Applications. For all practical purposes, each of these documents is self-contained and thus can be read separately.
 

Organization

This section of the Java Grande Report pertains to Concurrency/Applications. It is organized as follows:
  1. critical JDK issues

  2. highest priority issues, mostly related to Remote Method Invocation
  3. benchmarks
  4. seamless computing
  5. other parallel and distributed computing issues
In this report, we present preliminary findings of the working group. We welcome a continuing discussion of these issues. Please send questions or comments to javagrandeforum@npac.syr.edu.


I. Critical JDK Issues

By Michael Philippsen and George Thiruvathukal

Sequential VM performance is of utmost importance to develop Grande applications. Since there are many groups working on this issue, the ACG simply provides some additional kernel benchmarks illustrating performance aspects in areas that are particularly important for Grande applications.

In addition to sequential VM performance, Grande applications require high performance for parallel and distributed computing. Although some more research is needed on other paradigms that might be better suited for parallelism in Java, this report will focus on RMI (Java's remote method invocation mechanism), since there is wide-spread agreement on both the general usefulness and the deficiencies of RMI.

In general, RMI provides the capability of allowing objects running in different JVMs to collaborate. Current RMI is specifically designed for peer-to-peer client/server-applications that communicate over the commodity Internet or over Ethernet. For high performance scientific applications, however, some of RMI's design goals and implementation decisions are inappropriate and cause serious performance limitations. This is especially troublesome on platforms targeted by the Java Grande community, i.e., closely connected environments, e.g. clusters of workstations and DMPs that are based on user-space interconnects like Myrinet, ParaStation, or SP2. On these platforms, a remote method invocation may not take longer than a few 10 microseconds.

The choice of a client/server model may prove too limited for many applications in scientific computing, which usually take advantage of collective communication as found in Message Passing Interface (MPI); however, the goal of this document is not to propose such sweeping changes to RMI. We rather choose to focus on how to make RMI, a client/server design, suitable for Grande applications with no changes to the core model itself. It is our hope that a better RMI design and implementation will stimulate community activities to support better communication models that are well-suited to solving community problems.
 

ISSUE 1 : Performance of Object Serialization

Requirement: Fast remote method invocations with low latency and high bandwidth are essential, especially in fine grained areas of science and engineering. Since object serialization and parameter marshaling are the mechanisms used for passing parameters to remote calls and the associated cost(s) amount to a significant portion of the cost of remote method invocation, serialization should be as fast as possible.

In an ideal solution, the exact byte representation of an object would be sent over the network and turned back into an object at the recipient's side without any unnecessary buffering and copying.

The ACG understands that the JDK's object serialization is used for several purposes, e.g. for long term storage of persistent objects and for dynamic class loading on remote hosts via http-servers. It is obvious that some of these special purpose uses require properties that are either costly to compute at runtime (latency) or that are verbose in their wire representation (bandwidth).

However, since some of these features are not used in Grande applications, there is room for improvement. The following subsections identify particular aspects of the current implementation of serialization that result in bad performance. The problems are described, and some solutions are suggested. Where possible, some benchmark results demonstrate the quantitative effects of the proposed solution.

Experiment: Experiments at Amsterdam [4] indicate that easily up to 30% of the run time of a remote method invocation are spent in the serialization, most of which can be avoided by compile time serialization.

1.1 : Slim Encoding of Type Information

Problem: For every type of object that is serialized, the current implementation prepends a complete description of the type, i.e., all fields of the type are described verbosely. For a single serialization connection, every type is marshaled only once. Subsequent objects of the same type use a reference number to refer to that type description. Type description is useful when objects are stored persistently and when the recipient does not have access to the byte code representation of the type.

When RMI uses the serialization for marshaling of method parameters, a new serialization connection is opened for every single method invocation. (More specifically, the reset method is called on the serialization stream.) Hence, type information is marshaled over and over again, thus consuming both latency and bandwidth. The current implementation cannot keep the connection open, because the serialization would otherwise refrain from re-sending arguments with modified instance variables. (Note, that the whole structure of objects reachable from argument objects is serialized; one of the objects deeply burried in that graph might have changed.)

Approach: For Grande applications it can be assumed that all JVMs that collaborate on a parallel application use the same file system, i.e., can load all classes from a common CLASSPATH. Furthermore, it is safe to assume that the life time of an object is within the boundaries of the overall runtime of the application. Hence, there is no need to completely encode and decode the type information in the byte stream and to transmit that information over the network.

Experiment: At Karlsruhe University, a slim type encoding has been implemented prototypically [1]. It has improved the performance of serialization significantly by avoiding the latency of complicated type encoding and decoding mechanisms. Moreover, some bandwidth can be saved due to the slimmer format. Figure AC-1 shows the runtime of standard serialization in the first/blue row and the runtime of the improved serialization with slim type encoding in the second/red row. The effect is much more prominent on the side of the reader (right two bars, 2) than on the side of the writer (left two bars, 1).
 
 



Figure AC-1: Serialization with slim type encoding


 


Solution: The ACG sees two options to avoid costly encoding of type information.