Java Grande/ISCOPE 2001 Special Issue of Concurrency and
Computation:Practice and Experience
Call for papers: Special Journal
Issue for Java Grande/ISCOPE 2001
Standard Referee Form
WILEY
Journal
Home Page
- C559: Kava: A Java dialect with a uniform object model for
lightweight classes
- Abstract: Object-oriented programming languages have
always distinguished between primitive and user-defined
data types, and in the case of languages like C++ and Java, the primitives are
not even treated as objects, further fragmenting the programming model. The
distinction is especially problematic when a particular programming community
requires primitive-level support for a new data type, as for complex,
intervals, fixed-point numbers, and so on. We present Kava, a design for a
backward-compatible version of Java that solves the problem of programmable
lightweight objects in a much more aggressive and uniform manner than previous
proposals. In Kava, there are no primitive types; instead, object-oriented
programming is provided down to the level of single bits, and types such as int
can be explicitly programmed within the language. While the language maintains
a uniform object reference semantics, efficiency is obtained by making heavy
use of unboxing and semantic expansion. We describe Kava as a dialect of the
Java language, show how it can be used to define various primitive types,
describe how it can be translated into Java, and compare it to other approaches
to lightweight objects.
- David F. Bacon
- IBM T.J. Watson Research Center P.O. Box 704, Yorktown Heights,
NY 10598, U.S.A.
- email: dfb@watson.ibm.com
- Submitted July 31 2001
- Original PDF: C559bacon/c559Bacon02Kava.pdf
- C560: Framework for Testing Multi-threaded Java
Programs
- Abstract: Finding bugs due to race conditions in
multi-threaded programs is difficult, mainly because there are many possible
interleavings, any of which may contain a fault. In this work we present a
methodology for testing multi-threaded programs which has minimal impact on the
user and is likely to find interleaving bugs. Our method reruns existing tests
in order to detect synchronization faults. We found that a single test executed
a number of times in a controlled environment, may be as effective in finding
synchronization faults as many different tests. A great deal of resources are
saved since tests are very expensive to write and maintain. We observed that
simply re-running tests, without ensuring in some way that the interleaving
will change, yields almost no benefits. We implemented the methodology in our
test generation tool -- ConTest. ConTest combines the replay algorithm, which
is essential for debugging, with our interleaving test generation heuristics.
ConTest also contains an instrumentation engine, a coverage analyzer, and a
race detector (not finished yet) that enhance bug detection capabilities. The
greatest advantage of ConTest, besides finding bugs of course, is its minimal
effect on the user. When ConTest is combined into the test harness, the user
may not even be aware that ConTest is being used.
- Orit Edelstein, Eitan Farchi, Evgeny Goldin, Yarden Nir, Gil
Ratsaby, and Shmuel Ur
- IBM Research Laboratory in Haifa
- email: ur@il.ibm.com
- Submitted August 16 2001
- Original PDF: C560ur/c560conTestExp.pdf
- C561: Integration and Application of TAU in Parallel Java
Environments
- Abstract: Parallel Java environments present challenging
problems for performance tools because of Java's rich language system and its
multi-level execution platform combined with the integration of native-code
application libraries and parallel runtime software. In addition to the desire
to provide robust performance measurement and analysis capabilities for the
Java language itself, the coupling of different software execution contexts
under a uniform performance model needs careful consideration of how events of
interest are observed and how cross-context parallel execution information is
linked. This paper relates our experience in extending the TAU performance
system to a parallel Java environment based on mpiJava. We describe the
complexities of the instrumentation model used, how performance measurements
are made, and the overhead incurred. A parallel Java application simulating the
game of Life is used to show the performance system's capabilities.
- Sameer Shende and Allen D. Malony
- Department of Computer and Information Science University of
Oregon, Eugene, Oregon
- email: sameer@cs.uoregon.edu
- Submitted August 24 2001
- Original PDF: C561shende/shende.pdf
- C562: High-Performance Java Codes for Computational Fluid
Dynamics
- Abstract:The computational science community is reluctant
to write large-scale computationally-intensive applications in Java due to
concerns over Javas poor performance, despite the claimed software
engineering advantages of its object-oriented features. Naive Java
implementations of numerical algorithms can perform poorly compared to
corresponding Fortran or C implementations. To achieve high performance, Java
applications must be designed with good performance as a primary goal. This
paper presents the object-oriented design and implementation of two real-world
applications from the field of Computational Fluid Dynamics (CFD): a
finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and
an unstructured mesh adaptation algorithm (2D TAG, from NASA Ames Research
Center). This work builds on our previous experience with the design of
high-performance numerical libraries in Java. We examine the performance of the
applications using the currently available Java infrastructure and show that
the Java version of the flow solver LAURA performs almost within a factor of 2
of the original procedural version. Our Java version of the mesh adaptation
algorithm 2D TAG performs within a factor of 1.5 of its original procedural
version on certain platforms. Our results demonstrate that object-oriented
software design principles are not necessarily inimical to high performance.
- Christopher J. Riley, Siddhartha Chatterjee, Rupak Biswas
- Blue Ridge Numerics, Inc. Charlottesville, VA 22901, IBM T. J.
Watson Research Center Yorktown Heights, NY 10598, NAS Systems Division, MS
T27A-1 NASA Ames Research Center Moffett Field, CA 94035
- email: chris@brni.com
- Submitted August 30 2001
- Original PDF: C562riley/c562riley.pdf
- C563: Automatic Translation of Fortran to JVM Bytecode
- Abstract:This paper reports on the design of a Fortran -
to-Java translator whose target language is the in- struction set of the Java
Virtual Machine. The goal of the translator is to generate Java imple-
mentations of legacy Fortran numerical codes in a consistent and reliable
fashion. The benefits of directly generating bytecode are twofold. First, it
provides a much more straightforward and efficient mechanism for translating
Fortran GOTO state- ments. Second, it provides a framework for pursu- ing
various compiler optimizations, which could be beneficial not only to our
project, but to the Java community as a whole.
- Keith Seymour, Jack Dongarra
- Department of Computer Science University of Tennessee,
Knoxville Knoxville, TN 37996
- email: seymour@cs.utk.edu
- Submitted August 31 2001
- Original PDF: C563seymour/f2jreport.pdf
- C564: Performance evaluation of Java against C and Fortran
for Scientific Applications
- Abstract: Increasing interest is being shown in the use
of Java for scientific applications. The Java Grande benchmark suite was
designed with such applications primarily in mind. The perceived lack of
performance of Java still deters many potential users, despite recent advances
in just-in-time (JIT) and adaptive compilers. There are however few benchmark
results available comparing Java to more traditional lan- guages such as C and
Fortran. To address this issue, a subset of the Java Grande Benchmarks have
been re-written in C and Fortran allowing direct performance comparisons
between the three languages. The performance of a range of Java execution
environments, C and Fortran compilers have been tested across a number of
platforms using the suite. These demonstrate that on some platforms (notably
Intel Pentium) the performance gap is now quite small.
- J. M. Bull, L. A. Smith, C. Ball, L. Pottage, R. Freeman
- Edinburgh Parallel Computing Centre, James Clerk Maxwell
Building, The Kings Buildings, The University of Edinburgh, Mayfield
Road, Edinburgh EH9 3JZ, Scotland, U.K.
- email: epcc-javagrande@epcc.ed.ac.uk
- Submitted August 30 2001
- Original PDF: C564bull/jgflangcomp_ccpe.pdf
- C565: Sapphire: Copying GC Without Stopping the World
- Abstract: Many concurrent garbage collection (GC)
algorithms have been devised, but few have been implemented and evaluated,
particularly for the Java programming language. Sapphire is an algorithm we
have devised for concurrent copying GC. Sapphire stresses minimizing the amount
of time any given application thread may need to block to support the
collector. In particular, Sapphire is intended to work well in the presence of
a large number of application threads, on small- to medium-scale shared memory
multiprocessors. A specific problem that Sapphire addresses is not stopping all
threads while thread stacks are adjusted to account for copied objects (in GC
parlance, the "flip" to the new copies). Sapphire extends previous algorithms,
and is most closely related to replicating copying collection, a GC technique
in which application threads observe and update primarily the old copies of
objects. The key innovations of Sapphire are: (1) the ability to "flip" one
thread at a time (changing the threads view from the old copies of
objects to the new copies), as opposed to needing to stop all threads and flip
them at the same time; and (2) avoiding a read barrier.
- Richard L. Hudson, J. Eliot B. Moss
- Intel Corporation 2200 Mision College Blvd. Santa Clara, CA
95052-8119, Dept. of Computer Science Univ. of Massachusetts Amherst, MA
01003-4610
- email: moss@kiwi.cs.umass.edu
- Submitted August 30 2001
- Original PDF: C565moss/c565ccpe-submit.pdf
- C566: Spar: a set of extensions to Java for scientific
computation
- Abstract: In this paper we present a set of language
extensions that improve the expressiveness and performance of Java for
scientific computation. First of all, the language extensions allow the
manipulation of multi-dimensional arrays to be expressed more naturally, and to
be implemented more efficiently. Furthermore, data-parallel programming is
supported, allowing efficient parallelization of a large class of operations on
arrays. We also provide language extensions to construct specialized array
representations, such as symmetric, block, and sparse matrices. These
extensions are: tuples, parameterized types, array subscript overloading, and
the inline modifier. These extensions are not only useful to construct special
array representations, but are also useful in their own right. In particular
tuples are a useful addi- tion to the language. Finally, we add complex numbers
as a primitive type to the language. We evaluate our language extensions using
performance results. We also compare relevant code fragments of our extended
language with standard Java implementations and language extensions proposed by
others.
- C. van Reeuwijk, F. Kuijlman, and H.J. Sips
- Delft University of Technology
- email: C.vanReeuwijk@its.tudelft.nl
- Submitted August 31 2001
- Original PDF: C566vanreeuwijk/reeuwijk.pdf
- C567: Runtime Optimizations for a Java DSM
Implementation
- Abstract: Jackal is a fine-grained distributed shared
memory implementation of the Java programming language. Jackal implements
Javas memory model and allows multithreaded Java programs to run
unmodified on distributed-memory systems. This paper focuses on Jackals
runtime system, which implements a multiple-writer, home-based consistency
protocol. Proto-col actions are triggered by software access checks that
Jackals compiler inserts before object and array references. To reduce
access-check overhead, the compiler exploits source-level information and
performs extensive static analysis to optimize, lift, and remove access checks.
We describe optimizations for Jackals run-time system, which mainly
consist of discovering opportunities to dispense with flushing of cached data.
We give performance results for different runtime optimizations, and compare
their impact with the impact of one compiler optimization. We find that our
runtime optimizations are necessary for good Jackal performance, but only in
conjunction with the Jackal compiler optimizations. As a yardstick, we compare
the performance of Java applications run on Jackal with the performance of
equivalent applications that use a fast implementation of Javas Remote
Method Invocation (RMI) instead of shared memory.
- R. Veldema, R.F.H. Hofman, R.A.F. Bhoedjang, H.E. Bal
- Department of Computer Science Vrije Universiteit Amsterdam, The
Netherlands, Department of Computer Science, Cornell University, Ithaca, NY,
USA
- email: rveldema@cs.vu.nl
- Submitted August 31 2001
- Original PDF: C567veldema/c567paper.pdf
- C568: Supporting Multidimensional Arrays in Java
- Abstract: The lack of direct support for multidimensional
arrays in Java TM has been recognized as a major deficiency in the
languages applicability to numerical computing. It has been shown that,
when augmented with multidimensional arrays, Java can achieve very
high-performance for numerical computing through the use of compiler techniques
and efficient implementations of aggregate array operations. Three approaches
have been discussed in the literature for extending Java with support for
multidimensional arrays: class libraries that implement these structures;
relying on the JVM to recognize those arrays of arrays that are being used to
simulate multidimensional arrays; and extending the Java language with new
syntactic constructs for multidimensional arrays and directly compiling those
constructs to byte-code. This paper presents a balanced presentation of the
pros and cons of each technique in the areas of functionality, language and
virtual machine impact, implementation effort, and effect on performance. We
show that the best choice depends on the relative importance attached to the
different metrics, and thereby provide a common ground for a rational
discussion of the technique that is in the best interests of the Java community
at large, and the growing community of numerical Java programmers.
- Jose E. Moreira, Samuel P. Midkiff, Manish Gupta
- IBM T. J. Watson Research Center Yorktown Heights, NY
10598-0218
- email: jmoreira@us.ibm.com
- Submitted August 31 2001
- Original PDF: C568moreira/c568paper.pdf
- C569: CartaBlanca- A Pure-Java, Component-based Systems
Simulation Tool for Coupled Nonlinear Physics on Unstructured Grids -An
Update
- Abstract: This paper describes a component-based
nonlinear physical system simulation prototyping package written entirely in
Java using object-oriented design. The package provides scientists and
engineers a "developer-friendly" software environment for large-scale
computational algorithm and physical model development. The software design
centers on the Jacobian-Free Newton-Krylov solution method surrounding a
finite-volume treatment of conservation equations. This enables a clean
component-like implementation. We first provide motivation for the development
of the software and then discuss software structure. Discussion includes a
description of the use of Java's built-in thread facility that enables
parallel, shared-memory computations on a wide variety of unstructured grids
with triangular, quadrilateral, tetrahedral and hexahedral elements. We also
discuss the use of Java's inheritance mechanism in the construction of a
hierarchy of physics systems objects and linear and nonlinear solver objects
that simplify development and foster software re-use. We also provide a brief
review of the Jacobian-Free Newton-Krylov nonlinear system solution method and
discuss how it fits into our design. Following this, we show results from
example calculations and then discuss plans including the extension of the
software to distributed memory computer systems
- W. B. VanderHeyden, E. D. Dendy, N. T. Padial-Collins
- Los Alamos National Laboratory, Theoretical Division and Los
Alamos Computer Science Institute Los Alamos, NM 87545
- email: wbv@lanl.gov
- Submitted August 31 2001
- Original PDF: C569vanderheyden/c569cartablanca.pdf
(larger PDF or word available)
- C570: CCJ: Object-based Message Passing and Collective
Communication in Java
- Abstract: CCJ is a communication library that adds
MPI-like message passing and collective operations to Java. Rather than trying
to adhere to the precise MPI syntax, CCJ aims at a clean integration of
communication into Javas object-oriented framework. For example, CCJ uses
thread groups to support Javas multithreading model and it allows any
data structure (not just arrays) to be communicated. CCJ is implemented
entirely in Java, on top of RMI, so it can be used with any Java virtual
machine. The paper discusses three parallel Java applications that use
collective communication. It compares the performance (on top of a Myrinet
cluster) of CCJ, RMI and mpiJava versions of these applications, and also
compares the code complexity of the CCJ and RMI versions. The results show that
the CCJ versions are significantly simpler than the RMI versions and obtain a
good performance. A detailed performance comparison between CCJ and mpiJava is
given using the Java Grande Forum MPJ benchmark suite.
- Arnold Nelisse, Jason Maassen, Thilo Kielmann, Henri E. Bal
- Division of Mathematics and Computer Science, Vrije
Universiteit, Amsterdam, The Netherlands
- email: kielmann@cs.vu.nl
- Submitted August 31 2001
- Original PDF:C570bal/c570cpe01.pdf
- C571: Providing Soft Real-time QoS Guarantees for Java
Threads
- Abstract:The Java platform has many characteristics that
make it very desirable for integrated continuous media processing.
Unfortunately, it lacks the necessary CPU resource management facility to
support quality of service guarantees for soft real-time multimedia tasks. In
this paper, we present our new Java Virtual Machine, Q-JVM, which brings CPU
resource management to the Java platform. Q-JVM is based on Suns JVM
version 1.1.5. It implements an enhanced version of the MTR-LS algorithm in its
thread scheduler. Combined with admission control that could be implemented in
an application-level resource manager, it is able to support QoS parameters
such as fairness, bandwidth partitioning and delay bound guarantees, as well as
the cumulative service guarantee. Our test results show that Q-JVM is backward
compatible with the standard JVM from Sun, has low scheduling overhead, and is
able to provide QoS guarantees as specified.
- James C. Pang, Gholamali C. Shoja, and Eric G. Manning
- Department of Computer Science University of Victoria, Victoria,
BC, Canada, V8W 3P6
- email: jcpang@redback.com
- Submitted August 31 2001
- Original PDF: C571pang/c571C_and_C_draft.pdf
- C572: Supporting Dynamic Parallel Object Arrays
- Abstract: We present efficient support for generalized
arrays of parallel data driven objects. Array elements are regular C++ objects,
and are scattered across the parallel machine. An individual element is
addressed by its "index", which can be an arbitrary object rather than a simple
integer. For example, an array index can be a series of numbers, supporting
multidimensional sparse arrays; a bit vector, supporting collections of
quadtree nodes; or a string. Methods can be invoked on any individual array
element from any processor, and the elements can participate in reductions and
broadcasts. Individual elements can be created or deleted dynamically at any
time. Most importantly, the elements can migrate from processor to processor at
any time. The paper discusses support for message delivery and collective
operations in face of such dynamic behavior. The migration capabilities of
array elements have proven extremely useful, for example, in implementing
flexible load balancing strategies and for exploiting workstation clusters
adaptively.
- Orion S. Lawlor and Laxmikant V. Kalé
- Computer Science Department, Univ. of Illinois at
Urbana-Champaign, 1304 W. Springfield, 3315 DCL Urbana, IL 61801-2987
- email: olawlor@students.uiuc.edu
- Submitted August 31 2001
- Original PDF: C572lawlor/c572lawlor_arrays.pdf
- C573: Platform Independent Dynamic Java Virtual Machine
Analysis: the Java Grande Forum Benchmark Suite
- Abstract: In this paper we present a platform independent
analysis of the dynamic profiles of Java programs when executing on the Java
Virtual Machine. The Java programs selected are taken from the Java Grande
Forum benchmark suite, and five different Java-to-bytecode compilers are
analysed. The results presented describe the dynamic instruction usage
frequencies, as well as the sizes of the local variable, parameter and operand
stacks during execution on the JVM. These results, presenting a picture of the
actual (rather than presumed) behaviour of the JVM, have implications both for
the coverage aspects of the Java Grande benchmark suites, for the performance
of the Java-to-bytecode compilers, and for the design of the JVM.
- David Gregg, James Power, John Waldron
- Department of Computer Science, Trinity College, Dublin 2,
Ireland; Dept of Computer Science, National University of Ireland, Maynooth,
Co. Kildare, Ireland; Department of Computer Science, Trinity College, Dublin
2, Ireland.
- email: John.Waldron@cs.tcd.ie
- Submitted August 31 2001
- Original PDF: C573waldron/c573jour1.pdf
- C574: Java Virtual Machine Support for Object
Serialization
- Abstract: Distributed computing has become increasingly
popular in the high performance community. Java's Remote Method Invocation
(RMI) provides a simple, yet powerful method for implementing parallel
algorithms in Java. The performance of RMI has been less than adequate,
however, and object serialization is often identified as a major performance
inhibitor. We believe that object serialization is best performed in the Java
Virtual Machine (JVM), where information regarding object layout and hardware
communication resources are readily available. We implement a subset of Java's
object serialization protocol in native code, using the Java Native Interface
(JNI) and JVM internals. Experiments show that our approach is up to eight
times faster than Java's original object serialization protocol for array
objects. Also for linked data structures, our approach obtains a moderate
speedup and better scalability. Evaluation of our object serialization
implementation in an RMI framework indicates that a higher throughput can be
obtained. Parallel applications, written using RMI, obtain better speedups and
scalability when this more efficient object serialization is used.
- Fabian Breg, Constantine D. Polychronopoulos
- University of Illinois, Urbana-Champaign
- email: breg@csrd.uiuc.edu
- Submitted August 31 2001
- Original PDF: C574breg/c574paper.pdf