DRAFT: This document is not yet finalized. Please do not quote.
Report of the Concurrency & Application Working Group
Dennis Gannon, Michael Philippsen, George Thiruvathukal
September 10, 1998
Contents
Preface
The primary concern of the Java Grande Forum (hereafter JGF) is to ensure
that the Java language, libraries and virtual machine can become the implementation
vehicle of choice for future scientific and engineering applications. The
first step in meeting this goal is to implement the complex and numerics
proposals described in the report of the Numerics Working Group. Accomplishing
this task provides the essential language semantics needed to write high
quality scientific software. However, more will be required of the Java
class libraries and runtime environment if we wish to capitalize on these
language changes. The Java Grande Forum Applications & Concurrency
Working Group (hereafter JGACWG) focuses on these issues.
It is possible that many of the needed improvements will be driven by
commercial sector efforts to build server side enterprise applications.
Indeed, the requirements of technical computing overlap with those of large
enterprise applications in many ways. For example, both technical and enterprise
computing applications can be very large and they will stress the memory
management of the VM. The demand for very high throughput on network and
I/O services is similar for both. Many of the features of the Enterprise
Bean model will be of great importance to technical computing.
But there are also areas where technical computing is significantly
different from Enterprise applications. For example, fine grained concurrency
performance is substantially more critical in technical computing where
a single computation may require 10,000 threads that synchronize in frequent,
regular patterns. These computations would need to run on desktops as well
as very large, shared memory multiprocessors. In technical applications,
the same data may be accessed again and again, while in enterprise computing
there is a great emphasis on transactions involving different data each
time. Consequently, memory locality optimization may be more important
for Grande applications than it is elsewhere in the Java world. Some technical
applications will require the ability to link together multiple VMs concurrently
executing on a dedicated cluster of processors which communicate through
special high performance switches. On such a system, specialized, ultra
low latency versions of the RMI protocol would be necessary. (In such an
environment, an interface to shared memory, via RMI or the VM, would also
be desirable.)
It is important to observe that there are problems which can be described
as technical computing today which will become part of the enterprise applications
of the future. For example, image analysis and computer vision are closely
tied to application of data mining. The processing and control of data
from arrays of sensors has important applications in manufacturing and
medicine. The large scale simulation of non-linear mathematical systems
is already finding its way into financial and marketing models. It is also
the case that many technical computing applications do impact our day-to-day
lives, such as aircraft simulation (the recent design of the Boeing 777)
and weather forecasting. At least in the case of aircraft design, the industry
has a valuation in the billions of dollars, which means it is far from
being merely niche area being of limited interest.
This document is part of a larger report being written by the Java Grande
Forum members. There are two major sections of this
report: Numerics and Concurrency/Applications. For all practical purposes,
each of these documents is self-contained and thus can be read separately.
Organization
This section of the Java Grande Report pertains to Concurrency/Applications.
It is organized as follows:
-
critical JDK issues
highest priority issues, mostly related to Remote Method Invocation
-
benchmarks
-
seamless computing
-
other parallel and distributed computing issues
In this report, we present preliminary findings of the working group. We
welcome a continuing discussion of these issues. Please send questions
or comments to javagrandeforum@npac.syr.edu.
I. Critical JDK Issues
Sequential VM performance is of utmost importance to develop Grande applications.
Since there are many groups working on this issue, the JGACWG simply provides
some additional kernel benchmarks illustrating
performance aspects in areas that are particularly important for Grande
applications.
In addition to sequential VM performance, Grande applications require
high performance for parallel and distributed computing. Although some
more research is needed on other paradigms that might be better suited
for parallelism in Java, this report will focus on RMI (Java's remote method
invocation mechanism), since there is wide-spread agreement on both the
general usefulness and the deficiencies of RMI.
In general, RMI provides the capability of allowing objects running
in different JVMs to collaborate. Current RMI is specifically designed
for peer-to-peer client/server-applications that communicate over the commodity
Internet. For high performance scientific applications, however, some of
RMI's design goals and implementation decisions are inappropriate and cause
serious performance limitations. This is especially troublesome in closely
connected environments, e.g. clusters of workstations and DMPs.
The choice of a client/server model may prove too limited for many applications
in scientific computing, which usually take advantage of collective communication
as found in Message Passing Interface (MPI); however, the goal of this
document is not to propose such sweeping changes to RMI. We rather choose
to focus on how to make RMI, a client/server design, suitable for Grande
applications with no changes to the core model itself. It is our hope that
a better RMI design and implementation will stimulate community activities
to support better communication models that are well-suited to solving
community problems.
ISSUE 1 : Performance of Object Serialization
Requirement: Fast remote method invocations with low latency and
high bandwidth are essential, especially in fine grained areas of science
and engineering. Since object serialization and parameter marshaling are
the mechanisms used for passing parameters to remote calls and the associated
cost(s) amount to a significant portion of the cost ot remote method invocation,
serialization should be as fast as possible.
In an ideal solution, the exact byte representation of an object would
be sent over the network and turned back into an object at the recipient's
side without any unnecessary buffering and copying.
The JGACWG understands that the JDK's object serialization is used for
several purposes, e.g. for long term storage of persistent objects and
for dynamic class loading on remote hosts via http-servers. It is obvious
that some of these special purpose uses require properties that are either
costly to compute at runtime (latency) or that are verbose in their wire
representation (bandwidth).
However, since some of these features are not used in Grande applications,
there is room for improvement. The following subsections identify particular
aspects of the current implementation of serialization that result in bad
performance. The problems are described, and some solutions are suggested.
Where possible, some benchmark results demonstrate the quantitative effects
of the proposed solution.
Experiment: Experiments at Amsterdam [4]
indicate that easily up to 30% of the run time of a rmote method
invocation are spent in the serialization.
1.1 : Slim Encoding of Type Information
Problem: For every type of object that is serialized, the current
implementation prepends a complete description of the type, i.e., all fields
of the type are described verbosely. For a single serialization connection,
every type is marshaled only once. Subsequent objects of the same type
use a reference number to refer to that type description. Type description
is useful when objects are stored persistently and when the recipient does
not have access to the byte code representation of the type.
When RMI uses the serialization for marshaling of method parameters,
a new serialization connection is opened for every single method
invocation. (More specifically, the reset method is
called on the serialization stream.)
Hence, type information is marshaled over and over again, thus consuming
both latency and bandwidth. The current implementation cannot keep
the connection open, because
the serialization would otherwise refrain from re-sending arguments with
modified instance variables. (Note, that the whole structure of objects
reachable from argument objects is serialized; one of the
objects deeply burried in that graph might have changed.)
Approach: For Grande applications it can be assumed that all
JVMs that collaborate on a parallel application use the same file system,
i.e., can load all classes from a common
CLASSPATH. Furthermore,
it is safe to assume that the life time of an object is within the boundaries
of the overall runtime of the application. Hence, there is no need to completely
encode and decode the type information in the byte stream and to transmit
that information over the network.
Experiment: At Karlsruhe University, a slim type encoding has
been implemented prototypically [1]. It has improved
the performance of serialization significantly by avoiding the latency
of complicated type encoding and decoding mechanisms. Moreover, some bandwidth
can be saved due to the slimmer format. Figure AC-1 shows the runtime of
standard serialization in the first/blue row and the runtime of the improved
serialization with slim type encoding in the second/red row. The effect
is much more prominent on the side of the reader (right two bars, 2) than
on the side of the writer (left two bars, 1).
Figure AC-1: Serialization with slim type encoding
Solution: The JGACWG sees two options to avoid costly encoding
of type information.
-
An additional ProtocolVersion is added to the implementation of
serialization. Both the rmiregistry and the RMI user programs
must select the new protocol version. For that purpose, the
rmiregistry
will need a new command line option (protocol number). The RMISocketFactory
will need a new method
setProtocolVersion(int) that RMI user programs
must call.
-
Alternatively, some performance improvements would be reached, if the object
serialization would offer two types of reset methods. Similar
to the current reset method, one method would reset the cached
information on both types and objects. The other method would reset only
the information on objects that have already been sent.
-
The more general solution is similar to the socket factory approach where
the user can provide the name of the class that implements his own implementation
of object serialization. Both the
rmiregistry and the RMI user
programs must select the new protocol version. For that purpose, the rmiregistry
will need a new command line option (class name). Similar to
RMISocketFactory,
an RMISerializationFactory is added to the API that RMI user programs
must initialize.
The JGACWG favors the first approach because it will require less
changes in the current specification. Although the second approach
would help, a little, the complete type information would still be
sent, yet less frequently.
1.2 : Handling of Floats and Doubles
Problem: Since in scientific applications, floats and arrays of
floats are used frequently (the same holds for doubles), it is absolutely
essential, that these data types are packed and unpacked efficiently. Nevertheless,
the current serialization does not handle these primitive types efficiently.
The conversion of these primitive data types into their equivalent byte
representation is (on most machines) a matter of a type cast. However,
in the current implementation, the type cast is implemented in the JNI
and hence requires various time consuming operations for check pointing
and state recovery upon JNI entry and JNI exit. Moreover, the serialization
of float arrays (and double arrays) currently invokes the above mentioned
JNI routine for every single array element.
Approach:
- The costly JNI-entry/exit can be avoided by teaching the JITs
to recognize and inline the conversion of single floats or doubles
into byte arrays. The conversion routine could even be included into
the JVM.
- For arrays, it is much faster to enter the JNI only once for the
whole array (or at least for the section that still fits in the available
communication packet).
- If the JNI code must be retained, at a
minimum we believe a single JNI call should be made to convert multiple
scalars. See section on Reflection
Enhancements.
Experiment: A prototype at Karlsruhe University (see Figure AC-2,
[1])
stresses the effect of this approach.
Figure AC-2: Serialization of float arrays
Solution: It is absolutely essential that
-
floats and doubles are turned into their corresponding wire representation
as efficiently as possible and that
-
arrays of floats and doubles are serialized efficiently.
And since there are easy ways to do it that do not require any change of
an API, this should be done in the next release of the JDK.
1.3 : Reflection Enhancements
Problem: A byte array is used as communication packet. During serialization,
every single instance variable is copied into/from that byte array individually.
Most of this copying is done at the Java level using reflection. (As mentioned
above, only for floats and doubles the JNI is asked to return the appropriate
byte representation.)
In contrast to zero-copy protocols that are used (or attempted to be
used) in other messaging protocols, at least two complete copy operations
(one at the sender side and one at the recipient side) are needed. Although
the JGACWG regrets it, it seems very unlikely that future JVM implementations
will closely interact with the communication mechanisms to allow for zero-copy
protocols. This seems to be one of the price tags caused by Java's portability.
Approach: Obviously, there should be as few copy operations as
possible. Those that remain should be performed as fast as possible.
Solution: Better performance can be achieved in two ways
-
The object stream opens up its communication buffer for public access so
that the object itself can use a writeExternal routine to write
into that buffer immediately, i.e., without reflection.
Since the JGACWG does not believe that the communication buffer will ever
be made public, the following proposal seems to be more promising.
-
The reflection mechanism should be enhanced so that it can copy all instance
variables into a buffer at once with a single method call. For example,
class Class could be extended to return an object of a new class
ClassInfo:
ClassInfo getClassInfo(Field[] fields);
The object of type ClassInfo then provides two routines that
do the copying to/from the communication buffer.
int toByteArray(Object obj, int objectoffset,
byte[] buffer, int bufferoffset);
int fromByteArray(Object obj, int objectoffset,
byte[] buffer, int bufferoffset);
The first routine copies the bytes that represent all the instance
variables into the buffer, starting at the given buffer offset. The first
objectoffset
bytes are left out. The routine returns the number of bytes that have actually
been copied. Hence, if the communication puffer is too small to hold all
bytes, the routine must be called again, with appropriately modified
offsets.
Experiment: A serialization with better buffering and less copying
has been implemented at Karlsruhe University [1].
The prototype is based on an enhanced reflection mechanism, that can deal
with all instance variables of an object at once. (This has been hacked
into native code that is called through the JNI.) The performance numbers
are shown in figure AC-3 [1]. In the figures,
"std" refers to the standard serialization, "ext" indicates the fact that
the object provides a writeExternal routine, "buf" incorporates
more efficient internal buffering, and "native" uses the hack that is supposed
to be replaced by an enhance reflection.
Figure AC-3: Serialization of several instance variables
The JGACWG strongly recommends to allow for more efficient handling
of the communication buffer, since performance of serialization can be
increased by a factor of more than 10.
1.4 : Implementation Improvements
Problem: In addition to the suggestions for improved implementation
of object serialization, object serialization should take about the same
time for writing and reading. Otherwise, the pipeline between sender and
receiver shows an imbalance resulting in wasted time.
Experiment: In the current implementation, it is much faster
to produce the byte array representation of an object than it is to recreate
the object. The severity of this problem can be seen in Figures AC-1 to
AC-3 above.
Approach: The JGACWG cannot provide any specific suggestions
here.
ISSUE 2 : Performance of RMI
Requirement: Fast remote method invocations with low latency and
high bandwidth are essential, especially in fine grained areas of science
and engineering. Apart from object serialization, there are other aspects
of RMI that should be improved.
Experiment: Experiments at Amsterdam [4]
indicate that easily up to 30% of the run time of a rmote method
invocation are spent in the RMI implementation. Other 30% are spent in
the network.
2.1 : Improved Connection Management
Problem: The current RMI implementation closes and re-opens connections
to other hosts too frequently. For object serialization streams, that causes
the re-transmission of type information, as has been discussed above.
Re-creation of socket connections is a costly operating system job.
Approach: Whereas RMI's approach seems to be useful in case of
unstable wide area networks, for closely connected JVMs this approach is
far too pessimistic. Socket connections should be left open for the duration
of the user program. At least, it might make more sense to keep a
working set and use a least recently used algorithm to kick
connections out of the working set.
Solution: To achieve this, the user should have an option to
switch RMI into that mode. Again, this requires a command line option for
the rmiregistry and another method in the
RMISocketFactory
class.
2.2 : Careful Resource Consumption
Problem: In the current RMI implementation, every single remote
object uses a port of its own. In addition, a thread is started monitoring
that port. Every remote method invocation causes the creation of a new
socket and a new thread. In addition, a watch dog thread monitors the state
of a connection. Since object creation is known to be quite slow in current
JVMs and since thread creation is even slower, this approach is not amenable
for high performance comptuing, especially for fine grained problems.
Approach: Instead of creating new objects and threads almost
on every RMI activity, the internal layers of RMI should reuse
sockets. Moreover, worker threads from a thread pool should repeatedly
service incoming requests. The underlying asumption is
that the
network is quite stable and
communication failure will cause failure of the whole application.
2.3 : Custom Transport
Requirement: Fast remote
method invocations with low latency and high bandwidth are essential, especially
in fine grained areas of science and engineering. It is absolutely essential
that non-Ethernet special purpose high-end communication hardware can be
used.
Problem: It is almost impossible, to use very specialized, high
performance network protocols through RMI. Although SCI, ATM AAL5, Myrinet,
ParaStation, Active Messages are all used in technical applications, there
is no straightforward way for Java applications to make use of them.
-
Non-Socket-Services. Some available message protocols do not
operate on the socket level, e.g. FM and Nexus. Although RMI offers a
socket factory where user defined socket classes can be plugged in,
this is unsufficient because it would require a costly socket protocol
to be implemented on top of the base functionality.
There should be a way to make use of non-socket protocols in
RMI. The following aspects need to be considered:
- For sockets (and TCP/IP) most of the necessary protocol
initialization is done by the operating system. This explains that the
current implementation does not offer appropriate interfaces to plug
in specific protocol initialization code as needed by other protocols,
especially by Nexus. It might be possible to use the static class
initializer of a abstract transport protocol class for that purpose
but that has to be determined.
- Another related problem that is caused by the fact that the
current approach is based on sockets is the lack of an interface to
provide addressing information to interact with other transport
protocols. IP and DNS names are not the only meaningful forms. For
example, on the SP2 or a Cray T3E, a combination of IP# and node
number. (e.g., quad.mcs.anl.gov/25, for node 25 at quad.mcs.anl.gov)
is used.
-
User-Level-Services. Several high speed protocols offer socket
like functionality in the user level, i.e., the operating system is
left out for performance reasons.
Since these protocols offer socket functionality, it seems to be easy
to wrap them in a Java socket class that is then plugged into the RMI socket
factory. Unfortunately, that approach does not work properly.
The problem is that SocketImpl returns file descriptors that
are later on used by the JVM's thread scheduler. It issues calls on read/write/select
methods which are only useful on the level of the operating system. For
example, a select might be executed that waits for the arrival of a communication
packet in the kernel although that packet has already arrived at the user
level.
Hack around. The only reasonable way to go seems to be to
load a dynamic library that replaces the standard operating system
calls with those that can handle the high end communication
hardware. This replacement is done unnoticed by the JVM. Such an
approach has been successfully tested at the University of Karlsruhe
on ParaStation
hardware, it required an undesirable source code modification of
the rmiregistry implementation (to load the dynamic library
before main).
The JPACWG strongly recommends that RMI
will be extended to allow for high-speed communication hardware to be
used. Since the source code of the lower layers of the RMI
implementation are not public, no specific suggestions can be made in
this report.
ISSUE 4 : Other Suggestions
4.1 : Source Code Availability
Problem: The RMI implementation needs class files from several packages.
A lot of the code is contained in sun.rmi.*. For this code, the
Java source is not released. In particular, there is no source code for
the various versions of JDK 1.2.
There are people that are willing to work on that code to improve it's
runtime efficiency. Because of the missing source code (and the fact that
de-compilers do not recreate the comments) the missing source code these
people are prevented from doing work the JGACWG is interested in.
Solution: The JGACWG suggests that Sun will include the source
code of current RMI implementations either in the standard JDK distribution
or will create a process so that interested research groups can get access
to it before a release gets final.
4.2 : Class Compatibility
Problem: The class compatibility test is too stringent. In the case
where one is not using NFS (more common than the case where one is using
it), it appears one can take the same code (unmodified), compile it with
the same version of JDK (resulting in the same class files), and cause
an exception, because the classes are deemed incompatible.
Solution: There may be no easy fix for this. Compatibility should
probably be defined in terms of the version of Java being used as it is
now; however, two classes with the same class and instance variables should
be compatible, provided they are compiled with the same Java compiler.
Experiment: None known. Related to this problem is section 4.3
for which an experimental prototype has been constructed.
4.3 : Dynamic Class Loading from Remote Hosts
Problem: RMI's implementation currently makes use of a network class
loader, but in a very limited way. It would be nice if a client could be
started very thin (with no classes) and have all needed classes be downloaded.
In a browser, for security reasons, this feature should be optional. For
Grande applications, most of which do not depend on a browser, this has
a very desirable deployment property.
First, NFS (or File Sharing) or a web browser would not be a requirement
for deploying application code. Grande applications tend to run on large
networks or multicomputers, where a large number of nodes need to be bootstrapped.
It should be possible for all code to be dynamically loaded by pointing
Java at the network class server with an appropriate, fully-qualified,
class name.
Second, the ability to take advantage of class loading is one way of
dealing with the class incompatibility problem. The network class loader
can effectively be used to replicate a directory of classes onto multiple
nodes at startup time. This ensures that all nodes participating in an
RMI computation will be coherent (well, almost, but with a good network
class loader design, it is possible).
Experiment: The RRMI research done by Loyola University and JHPC
has resulted in a prototype of this approach.
II. Benchmarks
Requirement: The performance of the JVM should be as high as possible,
especially in areas that are of particular importance to Grande applications.
Since the JGACWG cannot improve the JVM implementations, the group has
started a community activity to collect benchmarks that are helpful to
guide JVM builders.
ACTIVITY 1 : Performance of the JVM
Technical computing often involves application components that require
multi-gigabyte images. Unfortunately, many current VM implementations have
restrictions on the size of application heap and many others demonstrate
poor memory management and garbage collection performance. JVM performance
areas that are of specific importance to JGACWG include:
-
Process/Threads:
-
Large numbers of threads. Does thread synchronization scale as the number
and size of thread objects grow?
-
Support light weight process structure that are tuned for high-end SMPs.
In particular: Affinity/co-scheduling
Memory management:
-
How does the VM optimize deep memory hierarchies?
-
Large objects? Limitations
-
Large number of objects
-
Staging large objects (3d graphics problem)
-
Scalable I/O
The JGACWG is putting together a suite of kernel benchmarks to test VM
performance in these areas.
Threads - synchronization, scalability, threads versus runable |
Dennis Gannon |
I/O |
Dan Reed ? |
Memory management - object size, object number, garbage collection |
Piyush Mehrorta |
ACTIVITY 2 : Application benchmarks
The JGACWG has identified a series of real technical applications that
can be provided to the community to support compiler and VM optimization
efforts. This project has goals similar to the original NAS, Splash, and
Perfect Benchmarks which were used by the high performance computer and
compiler designers to gauge their progress. In the case of the Perfect
Benchmarks, many of them are now being integrated into the SPEC suite which
is the standard for the industry. The Grande benchmarks should play the
same role in the Java computing industry. The following table gives the
area and the organization that promised to supply the benchmark.
Monte Carlo Simulations |
NCSA and EPCC |
Image Analysis, Radio Astronomy |
NCSA |
Gravitational N-Body Simulations |
Indiana |
Computational Fluid Dynamics |
Syracuse |
Geophysics |
University of Karlsruhe, CD
available |
Discrete Event Simulation |
INRIA and EPCC |
ACTIVITY 3 : Other Kernel Benchmarks
Spec Java Kernels |
Mpeg3 audio decoder |
Javac |
Jess |
Ray tracing |
Sorting |
George Thiruvathukal |
RMI |
Michael Philippsen and Bernhard Haumacher |
Serialization |
Michael Philippsen and Bernhard Haumacher |
Sun benchmarks? |
Low level arithmetics |
Numerics Group |
Linpack |
NG |
Jama |
NG |
III. Seamless Computing
For the average scientist and engineer one of the greatest difficulties
in doing large scale computation is the constant struggle requires to port
applications to a new environment. This involves the following tasks:
-
Dealing with authentication and authorization at a remote site.
-
Finding the required libraries to be able to link the application.
-
Understanding the batch scheduler.
-
Moving files from one site to the new one and then moving results back.
-
Comprehending the local parallel file system.
-
Cataloging and recording application changes and experimental results.
A seamless technical computing environment would allow a Java based programming
environment that could provide a uniform interface to all these remote
resources. Java based agents can be installed at each site which cooperate
with the user and guide him through the resource discovery and authorization
process and provide an integrated development environment for using these
remote resources.
It is possible that such a system can be built on top of some of the
existing and emerging meta-computing infrastructures. Many of these provide
the tool kit and components to build on, and a few have partial solutions
to the problems listed above.
Related projects are:
-
Unicore
-
Globus
-
Legion
-
Condor
-
Sweb
-
Websubmit
With a collective effort of the JGACWG it should be possible to do much
more. The JGACWG envisions an interface that is widely supported by the
hardware vendors, much like the JDBC interface for database access.
Such an interface should deal with the following issues:
-
Authentication
-
Resource discovery (metadata)
-
Resource allocation (Grande cache)
-
Accounting
-
Resource and job Scheduling
-
Job monitoring
-
Code/Data-filesystem
-
Connected to resource discovery service (JINI)
To define such an interface, a close examination of the various technologies
is in order.
IV. Other Parallel Issues
The role parallel computation plays in high performance technical computing
cannot be under estimated. There are several ways to building a Java parallel
computing environment. The following approaches are controversial and require
community research.
-
Grande parallelism design patterns. One approach is to take the experience
of the last ten years of parallel programming and build a set of Grande
parallelism design patterns that can be cast as a set of interfaces and
base classes that simplify the task of writing parallel Grande applications.
This API can then be hosted on either a set of concurrently executing JVMs
or in an environment where large numbers of native threads are well supported.
Such an API may be as simple as defining a truly object oriented version
of MPI (see below), or it may define a new category of distributed object
aggregates and collective operations.
-
Grande Beans. A second approach that may be more consistent with current
Java directions would be to design a Grande Bean specification that extends
the basic Bean model to one appropriate for technical applications. This
would follow what has been done with Enterprise Beans for transaction oriented
business applications. The Enterprise Beans model has allowed CORBA based
resources to be woven into a unified component model. Grande beans can
build upon this to incorporate high end, parallel computational modules
and visualization and VR tools into a grid of resources controlled by the
JVM on the users desktop system.
-
MPI. The JGACWG agrees that MPI is necessary in the Java environment. However,
it is controversial how to make MPI available.
-
Pure bindings
-
Interface style issue
-
Not feasible or sensible to do whole MPI
-
Issues with heterogeneous language interaction
-
Extended MPI flavor, JMPI, truly object oriented
The JGACWG currently considers to release a Request for Proposals for an
MPI in Java.
-
Object model: value semantic in procedure calls, remote reference management,
problems for large objects [API]
-
Threads issues?
V. Participants in the Applications & Concurrency
Working Group
The following individuals contributed to the development of this document
at the Java Grande Forum meetings on May 9-10 and August 6-7 in Palo Alto,
California.
-
Geoffrey Fox, University of Syracuse
-
Dennis Gannon, Indiana, Chair
-
Vladimir Getov, University of Westminster, UK
-
Piyush Mehrotra, ICASE
-
Michael Philippsen, University of Karlsruhe, Germany
-
Omer Rana, University of Wales, Cardiff, UK
-
Tony Skjellum. MPI-Softtech
-
Henry Sowizral, Sun
-
George Thiruvathukal, Loyola University, Chicago
-
Julien J.P. Vayssiere, Schlumberger House, Norway
-
Martin Westhead, EPCC, University of Edinburgh, UK
The following additional individuals also contributed comments which helped
in the development of this document.
-
Denis Caromel, INRIA, France
-
Bernhard Haumacher, University of Karlsruhe, Germany
VI. References
[1] |
The benchmarks have been performed on a 300 Mhz UltraSparc II with
JDK 1.2beta3, JIT enabled. JDK 1.2 (production release) shows numbers that
are similar in ratio. |
[2] |
Fabian Breg, Shridhar Diwan, Juan Villacis,
Jayashree Balasubramanian, Esra Akman, and Dennis
Gannon:
RMI Performance and Object Model Interoperability: Experiments with Java/HPC++.
Concurrency: Practice and Experience, volume 10, 1998, to appear.
|
[3] |
G.K. Thiruvathukal,
L.S. Thomas, and A. T. Korczynski:
Reflective Remote Method Invocation.
Concurrency: Practice and
Experience, volume 10, 1998, to appear. |
[4] |
Ronald Veldema, Rob van
Nieuwpoort, Jason Maassen, Henri E. Bal, and Aske Plaat: Efficient
Remote Method Invocation. Technical Report IR-450, Dept. of Computer
Science, Vrjie Universiteit Amsterdam, The Netherlands,
September 1998.
|
[5] |
Jim Waldo: Remote Procedure Calls and Java Remote Method
Invocation. IEEE Concurrency. Pages 5-7, September 1998.
|