Papers in Java Grande/ISCOPE 2001 Special Issue of Concurrency and Computation:Practice and Experience

Java Grande/ISCOPE 2001 Special Issue of Concurrency and Computation:Practice and Experience

Call for papers: Special Journal Issue for Java Grande/ISCOPE 2001

Standard Referee Form

WILEY Journal Home Page

C559: Kava: A Java dialect with a uniform object model for lightweight classes
- Abstract: Object-oriented programming languages have always distinguished between “primitive” and “user-defined” data types, and in the case of languages like C++ and Java, the primitives are not even treated as objects, further fragmenting the programming model. The distinction is especially problematic when a particular programming community requires primitive-level support for a new data type, as for complex, intervals, fixed-point numbers, and so on. We present Kava, a design for a backward-compatible version of Java that solves the problem of programmable lightweight objects in a much more aggressive and uniform manner than previous proposals. In Kava, there are no primitive types; instead, object-oriented programming is provided down to the level of single bits, and types such as int can be explicitly programmed within the language. While the language maintains a uniform object reference semantics, efficiency is obtained by making heavy use of unboxing and semantic expansion. We describe Kava as a dialect of the Java language, show how it can be used to define various primitive types, describe how it can be translated into Java, and compare it to other approaches to lightweight objects.
- David F. Bacon
- IBM T.J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, U.S.A.
- email: dfb@watson.ibm.com
- Submitted July 31 2001
- Original PDF: C559bacon/c559Bacon02Kava.pdf
C560:
- Abstract: Finding bugs due to race conditions in multi-threaded programs is difficult, mainly because there are many possible interleavings, any of which may contain a fault. In this work we present a methodology for testing multi-threaded programs which has minimal impact on the user and is likely to find interleaving bugs. Our method reruns existing tests in order to detect synchronization faults. We found that a single test executed a number of times in a controlled environment, may be as effective in finding synchronization faults as many different tests. A great deal of resources are saved since tests are very expensive to write and maintain. We observed that simply re-running tests, without ensuring in some way that the interleaving will change, yields almost no benefits. We implemented the methodology in our test generation tool -- ConTest. ConTest combines the replay algorithm, which is essential for debugging, with our interleaving test generation heuristics. ConTest also contains an instrumentation engine, a coverage analyzer, and a race detector (not finished yet) that enhance bug detection capabilities. The greatest advantage of ConTest, besides finding bugs of course, is its minimal effect on the user. When ConTest is combined into the test harness, the user may not even be aware that ConTest is being used.
- Orit Edelstein, Eitan Farchi, Evgeny Goldin, Yarden Nir, Gil Ratsaby, and Shmuel Ur
- IBM Research Laboratory in Haifa
- email: ur@il.ibm.com
- Submitted August 16 2001
- Original PDF: C560ur/c560conTestExp.pdf
C561: Integration and Application of TAU in Parallel Java Environments
- Abstract: Parallel Java environments present challenging problems for performance tools because of Java's rich language system and its multi-level execution platform combined with the integration of native-code application libraries and parallel runtime software. In addition to the desire to provide robust performance measurement and analysis capabilities for the Java language itself, the coupling of different software execution contexts under a uniform performance model needs careful consideration of how events of interest are observed and how cross-context parallel execution information is linked. This paper relates our experience in extending the TAU performance system to a parallel Java environment based on mpiJava. We describe the complexities of the instrumentation model used, how performance measurements are made, and the overhead incurred. A parallel Java application simulating the game of Life is used to show the performance system's capabilities.
- Sameer Shende and Allen D. Malony
- Department of Computer and Information Science University of Oregon, Eugene, Oregon
- email: sameer@cs.uoregon.edu
- Submitted August 24 2001
- Original PDF: C561shende/shende.pdf
C562: High-Performance Java Codes for Computational Fluid Dynamics
- Abstract:The computational science community is reluctant to write large-scale computationally-intensive applications in Java due to concerns over Java’s poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.
- Christopher J. Riley, Siddhartha Chatterjee, Rupak Biswas
- Blue Ridge Numerics, Inc. Charlottesville, VA 22901, IBM T. J. Watson Research Center Yorktown Heights, NY 10598, NAS Systems Division, MS T27A-1 NASA Ames Research Center Moffett Field, CA 94035
- email: chris@brni.com
- Submitted August 30 2001
- Original PDF: C562riley/c562riley.pdf
C563: Automatic Translation of Fortran to JVM Bytecode
- Abstract:This paper reports on the design of a Fortran - to-Java translator whose target language is the in- struction set of the Java Virtual Machine. The goal of the translator is to generate Java imple- mentations of legacy Fortran numerical codes in a consistent and reliable fashion. The benefits of directly generating bytecode are twofold. First, it provides a much more straightforward and efficient mechanism for translating Fortran GOTO state- ments. Second, it provides a framework for pursu- ing various compiler optimizations, which could be beneficial not only to our project, but to the Java community as a whole.
- Keith Seymour, Jack Dongarra
- Department of Computer Science University of Tennessee, Knoxville Knoxville, TN 37996
- email: seymour@cs.utk.edu
- Submitted August 31 2001
- Original PDF: C563seymour/f2jreport.pdf
C564: Performance evaluation of Java against C and Fortran for Scientific Applications
- Abstract: Increasing interest is being shown in the use of Java for scientific applications. The Java Grande benchmark suite was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time (JIT) and adaptive compilers. There are however few benchmark results available comparing Java to more traditional lan- guages such as C and Fortran. To address this issue, a subset of the Java Grande Benchmarks have been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notably Intel Pentium) the performance gap is now quite small.
- J. M. Bull, L. A. Smith, C. Ball, L. Pottage, R. Freeman
- Edinburgh Parallel Computing Centre, James Clerk Maxwell Building, The King’s Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, Scotland, U.K.
- email: epcc-javagrande@epcc.ed.ac.uk
- Submitted August 30 2001
- Original PDF: C564bull/jgflangcomp_ccpe.pdf
C565: Sapphire: Copying GC Without Stopping the World
- Abstract: Many concurrent garbage collection (GC) algorithms have been devised, but few have been implemented and evaluated, particularly for the Java programming language. Sapphire is an algorithm we have devised for concurrent copying GC. Sapphire stresses minimizing the amount of time any given application thread may need to block to support the collector. In particular, Sapphire is intended to work well in the presence of a large number of application threads, on small- to medium-scale shared memory multiprocessors. A specific problem that Sapphire addresses is not stopping all threads while thread stacks are adjusted to account for copied objects (in GC parlance, the "flip" to the new copies). Sapphire extends previous algorithms, and is most closely related to replicating copying collection, a GC technique in which application threads observe and update primarily the old copies of objects. The key innovations of Sapphire are: (1) the ability to "flip" one thread at a time (changing the thread’s view from the old copies of objects to the new copies), as opposed to needing to stop all threads and flip them at the same time; and (2) avoiding a read barrier.
- Richard L. Hudson, J. Eliot B. Moss
- Intel Corporation 2200 Mision College Blvd. Santa Clara, CA 95052-8119, Dept. of Computer Science Univ. of Massachusetts Amherst, MA 01003-4610
- email: moss@kiwi.cs.umass.edu
- Submitted August 30 2001
- Original PDF: C565moss/c565ccpe-submit.pdf
C566: Spar: a set of extensions to Java for scientific computation
- Abstract: In this paper we present a set of language extensions that improve the expressiveness and performance of Java for scientific computation. First of all, the language extensions allow the manipulation of multi-dimensional arrays to be expressed more naturally, and to be implemented more efficiently. Furthermore, data-parallel programming is supported, allowing efficient parallelization of a large class of operations on arrays. We also provide language extensions to construct specialized array representations, such as symmetric, block, and sparse matrices. These extensions are: tuples, parameterized types, array subscript overloading, and the inline modifier. These extensions are not only useful to construct special array representations, but are also useful in their own right. In particular tuples are a useful addi- tion to the language. Finally, we add complex numbers as a primitive type to the language. We evaluate our language extensions using performance results. We also compare relevant code fragments of our extended language with standard Java implementations and language extensions proposed by others.
- C. van Reeuwijk, F. Kuijlman, and H.J. Sips
- Delft University of Technology
- email: C.vanReeuwijk@its.tudelft.nl
- Submitted August 31 2001
- Original PDF: C566vanreeuwijk/reeuwijk.pdf
C567: Runtime Optimizations for a Java DSM Implementation
- Abstract: Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Java’s memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems. This paper focuses on Jackal’s runtime system, which implements a multiple-writer, home-based consistency protocol. Proto-col actions are triggered by software access checks that Jackal’s compiler inserts before object and array references. To reduce access-check overhead, the compiler exploits source-level information and performs extensive static analysis to optimize, lift, and remove access checks. We describe optimizations for Jackal’s run-time system, which mainly consist of discovering opportunities to dispense with flushing of cached data. We give performance results for different runtime optimizations, and compare their impact with the impact of one compiler optimization. We find that our runtime optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Java’s Remote Method Invocation (RMI) instead of shared memory.
- R. Veldema, R.F.H. Hofman, R.A.F. Bhoedjang, H.E. Bal
- Department of Computer Science Vrije Universiteit Amsterdam, The Netherlands, Department of Computer Science, Cornell University, Ithaca, NY, USA
- email: rveldema@cs.vu.nl
- Submitted August 31 2001
- Original PDF: C567veldema/c567paper.pdf
C568: Supporting Multidimensional Arrays in Java
- Abstract: The lack of direct support for multidimensional arrays in Java ^TM has been recognized as a major deficiency in the language’s applicability to numerical computing. It has been shown that, when augmented with multidimensional arrays, Java can achieve very high-performance for numerical computing through the use of compiler techniques and efficient implementations of aggregate array operations. Three approaches have been discussed in the literature for extending Java with support for multidimensional arrays: class libraries that implement these structures; relying on the JVM to recognize those arrays of arrays that are being used to simulate multidimensional arrays; and extending the Java language with new syntactic constructs for multidimensional arrays and directly compiling those constructs to byte-code. This paper presents a balanced presentation of the pros and cons of each technique in the areas of functionality, language and virtual machine impact, implementation effort, and effect on performance. We show that the best choice depends on the relative importance attached to the different metrics, and thereby provide a common ground for a rational discussion of the technique that is in the best interests of the Java community at large, and the growing community of numerical Java programmers.
- Jose E. Moreira, Samuel P. Midkiff, Manish Gupta
- IBM T. J. Watson Research Center Yorktown Heights, NY 10598-0218
- email: jmoreira@us.ibm.com
- Submitted August 31 2001
- Original PDF: C568moreira/c568paper.pdf
C569: CartaBlanca- A Pure-Java, Component-based Systems Simulation Tool for Coupled Nonlinear Physics on Unstructured Grids -An Update
- Abstract: This paper describes a component-based nonlinear physical system simulation prototyping package written entirely in Java using object-oriented design. The package provides scientists and engineers a "developer-friendly" software environment for large-scale computational algorithm and physical model development. The software design centers on the Jacobian-Free Newton-Krylov solution method surrounding a finite-volume treatment of conservation equations. This enables a clean component-like implementation. We first provide motivation for the development of the software and then discuss software structure. Discussion includes a description of the use of Java's built-in thread facility that enables parallel, shared-memory computations on a wide variety of unstructured grids with triangular, quadrilateral, tetrahedral and hexahedral elements. We also discuss the use of Java's inheritance mechanism in the construction of a hierarchy of physics systems objects and linear and nonlinear solver objects that simplify development and foster software re-use. We also provide a brief review of the Jacobian-Free Newton-Krylov nonlinear system solution method and discuss how it fits into our design. Following this, we show results from example calculations and then discuss plans including the extension of the software to distributed memory computer systems
- W. B. VanderHeyden, E. D. Dendy, N. T. Padial-Collins
- Los Alamos National Laboratory, Theoretical Division and Los Alamos Computer Science Institute Los Alamos, NM 87545
- email: wbv@lanl.gov
- Submitted August 31 2001
- Original PDF: C569vanderheyden/c569cartablanca.pdf (larger PDF or word available)
C570: CCJ: Object-based Message Passing and Collective Communication in Java
- Abstract: CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java’s object-oriented framework. For example, CCJ uses thread groups to support Java’s multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications, and also compares the code complexity of the CCJ and RMI versions. The results show that the CCJ versions are significantly simpler than the RMI versions and obtain a good performance. A detailed performance comparison between CCJ and mpiJava is given using the Java Grande Forum MPJ benchmark suite.
- Arnold Nelisse, Jason Maassen, Thilo Kielmann, Henri E. Bal
- Division of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands
- email: kielmann@cs.vu.nl
- Submitted August 31 2001
- Original PDF:C570bal/c570cpe01.pdf
C571: Providing Soft Real-time QoS Guarantees for Java Threads
- Abstract:The Java platform has many characteristics that make it very desirable for integrated continuous media processing. Unfortunately, it lacks the necessary CPU resource management facility to support quality of service guarantees for soft real-time multimedia tasks. In this paper, we present our new Java Virtual Machine, Q-JVM, which brings CPU resource management to the Java platform. Q-JVM is based on Sun’s JVM version 1.1.5. It implements an enhanced version of the MTR-LS algorithm in its thread scheduler. Combined with admission control that could be implemented in an application-level resource manager, it is able to support QoS parameters such as fairness, bandwidth partitioning and delay bound guarantees, as well as the cumulative service guarantee. Our test results show that Q-JVM is backward compatible with the standard JVM from Sun, has low scheduling overhead, and is able to provide QoS guarantees as specified.
- James C. Pang, Gholamali C. Shoja, and Eric G. Manning
- Department of Computer Science University of Victoria, Victoria, BC, Canada, V8W 3P6
- email: jcpang@redback.com
- Submitted August 31 2001
- Original PDF: C571pang/c571C_and_C_draft.pdf
C572: Supporting Dynamic Parallel Object Arrays
- Abstract: We present efficient support for generalized arrays of parallel data driven objects. Array elements are regular C++ objects, and are scattered across the parallel machine. An individual element is addressed by its "index", which can be an arbitrary object rather than a simple integer. For example, an array index can be a series of numbers, supporting multidimensional sparse arrays; a bit vector, supporting collections of quadtree nodes; or a string. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most importantly, the elements can migrate from processor to processor at any time. The paper discusses support for message delivery and collective operations in face of such dynamic behavior. The migration capabilities of array elements have proven extremely useful, for example, in implementing flexible load balancing strategies and for exploiting workstation clusters adaptively.
- Orion S. Lawlor and Laxmikant V. Kalé
- Computer Science Department, Univ. of Illinois at Urbana-Champaign, 1304 W. Springfield, 3315 DCL Urbana, IL 61801-2987
- email: olawlor@students.uiuc.edu
- Submitted August 31 2001
- Original PDF: C572lawlor/c572lawlor_arrays.pdf
C573: Platform Independent Dynamic Java Virtual Machine Analysis: the Java Grande Forum Benchmark Suite
- Abstract: In this paper we present a platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine. The Java programs selected are taken from the Java Grande Forum benchmark suite, and five different Java-to-bytecode compilers are analysed. The results presented describe the dynamic instruction usage frequencies, as well as the sizes of the local variable, parameter and operand stacks during execution on the JVM. These results, presenting a picture of the actual (rather than presumed) behaviour of the JVM, have implications both for the coverage aspects of the Java Grande benchmark suites, for the performance of the Java-to-bytecode compilers, and for the design of the JVM.
- David Gregg, James Power, John Waldron
- Department of Computer Science, Trinity College, Dublin 2, Ireland; Dept of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland; Department of Computer Science, Trinity College, Dublin 2, Ireland.
- email: John.Waldron@cs.tcd.ie
- Submitted August 31 2001
- Original PDF: C573waldron/c573jour1.pdf
C574: Java Virtual Machine Support for Object Serialization
- Abstract: Distributed computing has become increasingly popular in the high performance community. Java's Remote Method Invocation (RMI) provides a simple, yet powerful method for implementing parallel algorithms in Java. The performance of RMI has been less than adequate, however, and object serialization is often identified as a major performance inhibitor. We believe that object serialization is best performed in the Java Virtual Machine (JVM), where information regarding object layout and hardware communication resources are readily available. We implement a subset of Java's object serialization protocol in native code, using the Java Native Interface (JNI) and JVM internals. Experiments show that our approach is up to eight times faster than Java's original object serialization protocol for array objects. Also for linked data structures, our approach obtains a moderate speedup and better scalability. Evaluation of our object serialization implementation in an RMI framework indicates that a higher throughput can be obtained. Parallel applications, written using RMI, obtain better speedups and scalability when this more efficient object serialization is used.
- Fabian Breg, Constantine D. Polychronopoulos
- University of Illinois, Urbana-Champaign
- email: breg@csrd.uiuc.edu
- Submitted August 31 2001
- Original PDF: C574breg/c574paper.pdf