Concurrency and Computation: Practice and Experience

Published Papers for 2003

Journal Vision

WILEY Journal Home Page

Papers under review through 2004

Editorial Board


2003 Volume 15 Articles

Article ID: CPE708

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A stochastic load balancing algorithm for Computing
Volume ID 15
Issue ID 1
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.708
Article ID CPE708
Author Name(s) Yuk Yin Wong1Kwong Sak Leung2Kin Hong Lee3
Author Email(s) clevin.wong@cityu.edu.hk1
Affiliation(s) Computing Services Centre, City University of Hong Kong, Tat Chee Avenue, Kln., Hong Kong 1>Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, N.T., Hong 2 3
Keyword(s) Internet computing, stochastic load balancing, distributed systems, Java computing,
Abstract
This paper presents a stochastic dynamic load balancing algorithm for Internet computing, which is a new type of distributed computing involving heterogeneous workstations from different organizations on the Internet. To realize the practical environment, we assume the system to be comprised of heterogeneous,untrusted and non dedicated workstations connected by a non dedicated network. Our algorithm uses the product of the average processing time and the queue length of system jobs as the load index. Dynamic communication delay is included in the execution cost calculation. The transfer policy and the location policy are combined in a stochastic algorithm. State information exchange is done via information feedback and mutual updating. Simulations demonstrate that our algorithm outperforms conventional approaches over a wide range of system parameters. These results are reconfirmed by empirical experiments after we have implemented the algorithms on the Distributed Java Machine global virtual machine. Copyright 2003 John Wiley Sons, Ltd.

Article ID: CPE706

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A comparison of concurrent programming and cooperative multithreading
Volume ID 15
Issue ID 1
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.706
Article ID CPE706
Author Name(s) Ronald A Olsson1Tiejun Li2Eugene F. Fodor3Aaron W. Keen4
Author Email(s) olsson@cs.ucdavis.edu1
Affiliation(s) Department of Computer Science, University of California, Davis, CA 95616, U.S.A. 1 2 3 4
Keyword(s) cooperative multithreading, concurrent programming, parallel and distributed programming languages, synchronization mechanisms, synchronization optimization,
Abstract
This paper presents a comparison of the cooperative multithreading model with the general concurrent programming model.It focuses on the execution time performance of a range of standard concurrent programming applications. The overall results are mixed. In some cases, programs written in the cooperative multithreading model outperform those written in the general concurrent programming model. The contributions of this paper are twofold. First, it presents a thorough analysis of the performances of applications in the different models, i.e. to explain the criteria that determine when a program in one model will outperform an equivalent program in the other. Second, it examines the tradeoffs in writing programs in the different programming styles. In some cases, better performance comes at the cost of more complicated code. Copyright 2003 John Wiley Sons, Ltd.

Article ID: CPE704

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World Wide Grid
Volume ID 15
Issue ID 1
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.704
Article ID CPE704
Author Name(s) Rajkumar Buyya1Kim Branson2Jon Giddy3DavidA Abramson4
Author Email(s) rajkumar@buyya.com1
Affiliation(s) Grid Computing and Distributed Systems Laboratory, Department of Computer Science and Software Engin 1Structural Biology, Walter and Eliza Hall Institute, Royal Parade, Parkville, Melbourne, Australia 2Welsh Science Centre, Department of Computer Science, Cardiff University, Cardiff, U.K. 3School of Computer Science and Software Engineering, Monash University, Caulfield Campus, Melbourne 4
Keyword(s) Grid computing, drug design, application scheduling, grid economy,
Abstract
Computational Grids are emerging as a new paradigm for sharing and aggregation of geographically distributed resources for solving large scale compute and data intensive problems in science, engineering and commerce. However, application development, resource management and scheduling in these environments is a complex undertaking. In this paper, we illustrate the development of a Virtual Laboratory environment by leveraging existing Grid technologies to enable molecular modelling for drug design on geographically distributed resources. It involves screening millions of compounds in the chemical database CDB against a protein target to identify those with potential use for drug design. We have used the Nimrod G parameter specification language to transform the existing molecular docking application into a parameter sweep application for executing on distributed systems. We have developed new tools for enabling access to ligand records ;molecules in the CDB from remote resources. The Nimrod G resource broker along with molecule CDB data broker is used for scheduling and on demand processing of docking jobs on the World Wide Grid ;WWG ; resources. The results demonstrate the ease of use and power of the Nimrod G and virtual laboratory tools for grid computing. Copyright 2003 John Wiley Sons, Ltd.

Article ID: CPE709

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Object oriented distributed computing based on remote class reference
Volume ID 15
Issue ID 1
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.708
Article ID CPE709
Author Name(s) David W. Walker1Yan Huang2Omer F. Rana3
Author Email(s) David.W.Walker@cs.cf.ac.uk1
Affiliation(s) Department of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, U.K 1 2 3
Keyword(s) Java, RMI, distributed computing, remote class reference,
Abstract
Java RMI, Jini and CORBA provide effective mechanisms for implementing a distributed computing system. Recently many numeral libraries have been developed that take advantage of Java as an object oriented and portable language.The widely used client server method limits the extent to which the benefits of the object oriented approach can be exploited because of the difficulties arising when a remote object is the argument or return value of a remote or local method. In this paper this problem is solved by introducing a data object that stores the data structure of the remote object and related access methods. By using this data object, the client can easily instantiate a remote object, and use it as the argument or return value of either a local or remote method. Copyright 2003 John Wiley Sons, Ltd.

Article ID: CPE713

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Modular specification of frame properties in JML
Volume ID 15
Issue ID 2
Date Feb 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.713
Article ID CPE713
Author Name(s) Gary T. Leavens1Arnd Poetzsch-Heffter2Peter Müller3
Author Email(s) leavens@cs.iastate.edu1
Affiliation(s) Department of Computer Science, Iowa State University, 229 Atanasoff Hall, Ames, IA 50011-1040, U.S. 1Universität Kaiserslautern, 67653 Kaiserslautern, Germany 2Deutsche Bank AG, Frankfurt, Germany 3
Keyword(s) formal specification, frame problem, depends-clause, modifies-clause, JML, alias control, ownership, Java,
Abstract
We present a modular specification technique for frame properties. The technique uses modifies clauses and abstract fields with declared dependencies. Modularity is guaranteed by a programming model that enforces data abstraction by preventing representation and argument exposure, a semantics of modifies clauses that uses a notion of 'relevant location', and by modularity rules for dependencies. For concreteness, we adapt this technique to the Java Modeling Language, JML. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE714

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Improving the official specification of Java bytecode verification
Volume ID 15
Issue ID 2
Date Feb 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.714
Article ID CPE714
Author Name(s) Alessandro Coglio1
Author Email(s) coglio@kestrel.edu1
Affiliation(s) Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, U.S.A. 1
Keyword(s) Java, bytecode verification, specification,
Abstract
Bytecode verification is the main mechanism to ensure type safety in the Java Virtual Machine. Inadequacies in its official specification may lead to incorrect implementations where security can be broken and/or certain legal programs are rejected. This paper provides a comprehensive analysis of the specification, along with concrete suggestions for improvement. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE711

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Articles
Article Title Clustering revealed in high-resolution simulations and visualization of multi-resolution features in fluid-particle models
Volume ID 15
Issue ID 2
Date Feb 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.711
Article ID CPE711
Author Name(s) David A. Yuen1Krzysztof Boryczko2Witold Dzwinel3
Author Email(s) davey@krissy.msi.umn.edu1
Affiliation(s) Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455-1227, U.S. 11AGH Institute of Computer Science, al. Mickiewicza 30, 30-059, Kraków, Poland 2 3
Keyword(s) large-scale data sets, visualization, feature extraction, parallel clustering, dissipative particle dynamics, fluid particle model,
Abstract
Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105-106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE705

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title MPI-CHECK: a tool for checking Fortran 90 MPI programs
Volume ID 15
Issue ID 2
Date Feb 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.705
Article ID CPE705
Author Name(s) Glenn Luecke1Hua Chen2James Coyle3Jim Hoekstra4Yan Zou5Marina Kraeva6
Author Email(s) grl@iastate.edu1
Affiliation(s) High Performance Computing Group, Iowa State University, Ames, IA 50011, U.S.A. 1 2 3 4 5 6
Keyword(s) MPI, tool, bounds checking, argument checking, data type checking,
Abstract
MPI is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time and run-time checking of MPI programs. MPI-CHECK automatically detects the following problems in the use of MPI routines: (i) mismatch in argument type, kind, rank or number; (ii) messages which exceed the bounds of the source/destination array; (iii) negative message lengths; (iv) illegal MPI calls before MPI_INIT or after MPI_FINALIZE; (v) inconsistencies between the declared type of a message and its associated DATATYPE argument; and (vi) actual arguments which violate the INTENT attribute. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE653

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Kava: a Java dialect with a uniform object model for lightweight classes
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.653
Article ID CPE653
Author Name(s) David F. Bacon1
Author Email(s) dfb@watson.ibm.com1
Affiliation(s) IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, U.S.A 1
Keyword(s) Java, lightweight classes, object inlining, object models, unboxing,
Abstract
Object-oriented programming languages have always distinguished between 'primitive' and 'user-defined' data types, and in the case of languages like C++ and Java the primitives are not even treated as objects, further fragmenting the programming model. The distinction is especially problematic when a particular programming community requires primitive-level support for a new data type, as for complex, intervals, fixed-point numbers, and so on. We present Kava, a design for a backward-compatible version of Java that solves the problem of programmable lightweight objects in a much more aggressive and uniform manner than previous proposals. In Kava, there are no primitive types; instead, object-oriented programming is provided down to the level of single bits, and types such as int can be explicitly programmed within the language. While the language maintains a uniform object reference semantics, efficiency is obtained by making heavy use of unboxing and semantic expansion. We describe Kava as a dialect of the Java language, show how it can be used to define various primitive types, describe how it can be translated into Java, and compare it to other approaches to lightweight objects. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE654

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Framework for testing multi-threaded Java programs
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.654
Article ID CPE654
Author Name(s) Shmuel Ur1Orit Edelstein2Eitan Farchi3Evgeny Goldin4Gil Ratsaby5
Author Email(s) UR@il.ibm.com1 Correspondence to Shmuel Ur, IBM Research Laboratory in Haifa5
Affiliation(s) Correspondence to Shmuel Ur, IBM Research Laboratory in Haifa 1 2 3 4 5
Keyword(s) .,
Abstract
Finding bugs due to race conditions in multi-threaded programs is difficult, mainly because there are many possible interleavings, any of which may contain a fault. In this work we present a methodology for testing multi-threaded programs which has minimal impact on the user and is likely to find interleaving bugs. Our method reruns existing tests in order to detect synchronization faults. We find that a single test executed a number of times in a controlled environment may be as effective in finding synchronization faults as many different tests. A great deal of resources are saved since tests are very expensive to write and maintain. We observe that simply rerunning tests, without ensuring in some way that the interleaving will change, yields almost no benefits. We implement the methodology in our test generation tool - ConTest. ConTest combines the replay algorithm, which is essential for debugging, with our interleaving test generation heuristics. ConTest also contains an instrumentation engine, a coverage analyzer, and a race detector (not finished yet) that enhance bug detection capabilities. The greatest advantage of ConTest, besides finding bugs of course, is its minimal effect on the user. When ConTest is combined into the test harness, the user may not even be aware that ConTest is being used. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE655

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Integration and application of TAU in parallel Java environments
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.655
Article ID CPE655
Author Name(s) Sameer Shende1Allen D. Malony2
Author Email(s) sameer@cs.uoregon.edu1
Affiliation(s) Department of Computer and Information Science, 1202 University of Oregon, Eugene, OR 97403, U.S.A. 1Department of Computer and Information Science, 1202 University of Oregon, Eugene, OR 97403, U.S.A. 2
Keyword(s) parallel, Java, performance, tools,
Abstract
Parallel Java environments present challenging problems for performance tools because of Java's rich language system and its multi-level execution platform combined with the integration of native-code application libraries and parallel run-time software. In addition to the desire to provide robust performance measurement and analysis capabilities for the Java language itself, the coupling of different software execution contexts under a uniform performance model needs careful consideration of how events of interest are observed and how cross-context parallel execution information is linked. This paper relates our experience in extending the TAU (Tuning and Analysis Utilities) performance system to a parallel Java environment based on mpiJava. We describe the complexities of the instrumentation model used, how performance measurements are made, and the overhead incurred. A parallel Java application simulating the game of Life is used to show the performance system's capabilities. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE666

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Platform independent dynamic Java virtual machine analysis: the Java Grande Forum benchmark suite
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.666
Article ID CPE666
Author Name(s) John Waldron1James Power2David Gregg3
Author Email(s) John.Waldron@cs.tcd.ie1
Affiliation(s) Department of Computer Science, Trinity College, Dublin 2, Ireland 1Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland 2 3
Keyword(s) Java Virtual Machine, Java Grande, instruction frequency, method profiling,
Abstract
In this paper we present a platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine. The Java programs selected are taken from the Java Grande Forum benchmark suite and five different Java-to-bytecode compilers are analysed. The results presented describe the dynamic instruction usage frequencies, as well as the sizes of the local variable, parameter and operand stacks during execution on the JVM. These results, presenting a picture of the actual (rather than presumed) behaviour of the JVM, have implications both for the coverage aspects of the Java Grande benchmark suites, for the performance of the Java-to-bytecode compilers and for the design of the JVM. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE656

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title High-performance Java codes for computational fluid dynamics
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.656
Article ID CPE656
Author Name(s) Christopher J. Riley1Siddhartha Chatterjee2Rupak Biswas3
Author Email(s) chris.riley@cfdesign.com1
Affiliation(s) Blue Ridge Numerics, Inc., Charlottesville, VA 22901, U.S.A 1IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A. 2NASA Advanced Supercomputing Division, MS T27A-1, NASA Ames Research Center, Moffett Field, CA 94035 3
Keyword(s) scientific computing, object-oriented, benchmarks,
Abstract
The computational science community is reluctant to write large-scale computationally-intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of computational fluid dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center) and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE667

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Java Virtual Machine support for object serialization
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.667
Article ID CPE667
Author Name(s) Fabian Breg1Constantine D. Polychronopoulos2
Author Email(s) breg@csrd.uiuc.edu1
Affiliation(s) University of Illinois, Coordinated Science Laboratory, room 444, 1308 W. Main Street, Urbana, IL 61 1 2
Keyword(s) Java, object serialization,
Abstract
Distributed computing has become increasingly popular in the high-performance community. Java's remote method invocation (RMI) provides a simple, yet powerful method for implementing parallel algorithms. The performance of RMI has been less than adequate, however, and object serialization is often identified as a major performance inhibitor. We believe that object serialization is best performed in the Java Virtual Machine (JVM), where information regarding object layout and hardware communication resources are readily available. We implement a subset of Java's object serialization protocol in native code, using the Java Native Interface (JNI) and JVM internals. Experiments show that our approach is up to eight times faster than Java's original object serialization protocol for array objects. Also, for linked data structures our approach obtains a moderate speedup and better scalability. Evaluation of our object serialization implementation in an RMI framework indicates that a higher throughput can be obtained. Parallel applications, written using RMI, obtain better speedups and scalability when this more efficient object serialization is used. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE657

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Automatic translation of Fortran to JVM bytecode
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.657
Article ID CPE657
Author Name(s) Keith Seymour1Jack Dongarra2
Author Email(s) seymour@cs.utk.edu1
Affiliation(s) Department of Computer Science, University of Tennessee, Knoxville, Knoxville, TN 37996, U.S.A. 1 2
Keyword(s) Fortran, Java, JVM, bytecode, numerical libraries,
Abstract
This paper reports on the design of a Fortran-to-Java translator whose target language is the instruction set of the Java Virtual Machine. The goal of the translator is to generate Java implementations of legacy Fortran numerical codes in a consistent and reliable fashion. The benefits of directly generating bytecode are twofold. First, compared with generating Java source code, it provides a much more straightforward and efficient mechanism for translating Fortran GOTO statements. Second, it provides a framework for pursuing various compiler optimizations, which could be beneficial not only to our project, but to the Java community as a whole. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE568

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Benchmarking Java against C and Fortran for scientific applications
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.658
Article ID CPE568
Author Name(s) J. M. Bull1L. A. Smith2L. Pottage3R. Freeman4
Author Email(s) markb@epcc.ed.ac.uk1
Affiliation(s) Edinburgh Parallel Computer Centre, James Clerk Maxwell Building, The King's Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, U.K. 1 2 3 4
Keyword(s) Java, C, performance, benchmarking, scientific applications,
Abstract
Increasing interest is being shown in the use of Java for scientific applications. The Java Grande benchmark suite was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time and adaptive compilers. There are, however, few benchmark results available comparing Java to more traditional languages such as C and Fortran. To address this issue, a subset of the Java Grande benchmarks has been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notable Intel Pentium) the performance gap is now quite small. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE660

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Run-time optimizations for a Java DSM implementation
Volume ID 15
Issue ID 3
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.660
Article ID CPE660
Author Name(s) R. Veldema1R. F. H. Hofman2R. A. F. Bhoedjang3H. E. Bal4
Author Email(s) rveldema@cs.vu.nl1
Affiliation(s) 1Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands 1 2 Department of Computer Science, Cornell University, Ithaca, NY, USA 3 4
Keyword(s) Java, DSM, compiler, runtime, protocols,
Abstract
Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Java's memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems.
This paper focuses on Jackal's run-time system, which implements a multiple-writer, home-based consistency protocol. Protocol actions are triggered by software access checks that Jackal's compiler inserts before object and array references. To reduce access-check overhead, the compiler exploits source-level information and performs extensive static analysis to optimize, lift, and remove access checks. We describe optimizations for Jackal's run-time system, which mainly consists of discovering opportunities to dispense with flushing of cached data. We give performance results for different run-time optimizations, and compare their impact with the impact of one compiler optimization. We find that our run-time optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations described by Veldema et al. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Java's Remote Method Invocation instead of shared memory. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE661

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Supporting multidimensional arrays in Java
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.661
Article ID CPE661
Author Name(s) José E. Moreira1Samuel P. Midkiff2Manish Gupta3
Author Email(s) jmoreira@us.ibm.com1
Affiliation(s) IBM T. J. Watson Research Center, Yorktown Heights, NY 10598-0218, U.S.A. 1 2 3
Keyword(s) Java, high-performance computing, compiler optimizations, multidimensional arrays,
Abstract
The lack of direct support for multidimensional arrays in JavaTM has been recognized as a major deficiency in the language's applicability to numerical computing. It has been shown that, when augmented with multidimensional arrays, Java can achieve very high-performance for numerical computing through the use of compiler techniques and efficient implementations of aggregate array operations. Three approaches have been discussed in the literature for extending Java with support for multidimensional arrays: class libraries that implement these structures; extending the Java language with new syntactic constructs for multidimensional arrays that are directly translated to bytecode; and relying on the Java Virtual Machine to recognize those arrays of arrays that are being used to simulate multidimensional arrays. This paper presents a balanced presentation of the pros and cons of each technique in the areas of functionality, language and virtual machine impact, implementation effort, and effect on performance. We show that the best choice depends on the relative importance attached to the different metrics, and thereby provide a common ground for a rational discussion and comparison of the techniques. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE662

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title CartaBlanca - a pure-Java, component-based systems simulation tool for coupled nonlinear physics on unstructured grids - an update
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.662
Article ID CPE662
Author Name(s) W. B. VanderHeyden1E. D. Dendy2N. T. Padial-Collins3
Author Email(s) wbv@lanl.gov1
Affiliation(s) Los Alamos National Laboratory, Theoretical Division and Los Alamos Computer Science Institute, Los Alamos, NM 87545, U.S.A. 1 2 3
Keyword(s) Java, high performance computing, object oriented design, finite volume, nonlinear solver, development environment,
Abstract
This paper describes a component-based nonlinear physical system simulation prototyping package written entirely in Java using object-oriented design. The package provides scientists and engineers with a 'developer-friendly' software environment for large-scale computational algorithm and physical model development. The software design centers on the Jacobian-free Newton-Krylov solution method surrounding a finite-volume treatment of conservation equations. This enables a clean component-like implementation. We first provide motivation for the development of the software and then discuss software structure. The discussion includes a description of the use of Java's built-in thread facility that enables parallel, shared-memory computations on a wide variety of unstructured grids with triangular, quadrilateral, tetrahedral and hexahedral elements. We also discuss the use of Java's inheritance mechanism in the construction of a hierarchy of physics systems objects and linear and nonlinear solver objects that simplify development and foster software re-use. We provide a brief review of the Jacobian-free Newton-Krylov nonlinear system solution method and discuss how it fits into our design. Following this, we show results from example calculations and then discuss plans including the extension of the software to distributed-memory computer systems. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE663

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Providing soft real-time quality of service guarantees for Java threads
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.663
Article ID CPE663
Author Name(s) James C. Pang1Gholamali C. Shoja2Eric G. Manning3
Author Email(s) jcpang@redback.com1
Affiliation(s) Department of Computer Science, University of Victoria, Victoria, BC, Canada V8W 3P6 1 2 Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada V8W 3P6 3
Keyword(s) Java Virtual Machine, CPU resource management, QoS, real-time scheduling, move-to-rear list scheduling,
Abstract
The Java platform has many characteristics that make it very desirable for integrated continuous media processing. Unfortunately, it lacks the necessary CPU resource management facilities to support quality of service (QoS) guarantees for soft real-time multimedia tasks. In this paper, we present our new Java Virtual Machine, Q-JVM, which brings CPU resource management to the Java platform. Q-JVM is based on Sun's JVM version 1.1.5. It implements an enhanced version of the MTR-LS algorithm in its thread scheduler. Combined with admission control that could be implemented in an application-level resource manager, it is able to support QoS parameters such as fairness, bandwidth partitioning and delay bound guarantees, as well as the cumulative service guarantee. Our test results show that Q-JVM is backward compatible with the standard JVM from Sun, has low scheduling overhead, and is able to provide QoS guarantees as specified. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE664

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title CCJ: object-based message passing and collective communication in Java
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.664
Article ID CPE664
Author Name(s) Jason Maassen1Arnold Nelisse2Thilo Kielmann3Henri E. Bal4
Author Email(s) jason@cs.vu.nl1
Affiliation(s) Division of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands 1 2 3 4
Keyword(s) parallel programming, collective communication,
Abstract
CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications and also compares their code complexity. A detailed performance comparison between CCJ and mpiJava is given using the Java Grande Forum MPJ benchmark suite. The results show that neither CCJ's object-oriented design nor its implementation on top of RMI impose a performance penalty on applications compared to their mpiJava counterparts. The source of CCJ is available from our Web site http://www.cs.vu.nl/manta. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE665

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Supporting dynamic parallel object arrays
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.665
Article ID CPE665
Author Name(s) Orion S. Lawlor1Laxmikant V. Kalé2
Author Email(s) olawlor@acm.org1
Affiliation(s) Computer Science Department, University of Illinois at Urbana-Champaign, 1304 W. Springfield, 3315 DCL, Urbana, IL 61801-2987, U.S.A. 1 2
Keyword(s) parallel runtime, object migration, parallel hashtable,
Abstract
We present efficient support for generalized arrays of parallel data driven objects. Array elements are regular C++ objects, and are scattered across the parallel machine. An individual element is addressed by its 'index', which can be an arbitrary object rather than a simple integer. For example, an array index can be a series of numbers, supporting multidimensional sparse arrays; a bit vector, supporting collections of quadtree nodes; or a string. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most importantly, the elements can migrate from processor to processor at any time. We discuss support for message delivery and collective operations in the face of such dynamic behavior. The migration capabilities of array elements have proven extremely useful, for example, in implementing flexible load balancing strategies and for exploiting workstation clusters adaptively. We present the design, an implementation, and performance results. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE712

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Sapphire: copying garbage collection without stopping the world
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.712
Article ID CPE712
Author Name(s) Eliot B. Moss1Richard L. Hudson2
Author Email(s) moss@cs.umass.edu1
Affiliation(s) Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610, U.S.A. 1Intel Corporation, 2200 Mission College Boulevard, Santa Clara, CA 95052-8119, U.S.A. 2
Keyword(s) garbage collection, copying garbage collection, concurrent garbage collection, replicating garbage collection, Java garbage collection,
Abstract
The growing use in concurrent systems of languages that require garbage collection (GC), such as Java, is raising practical interest in concurrent GC. Sapphire is a new algorithm for concurrent copying GC for Java. It stresses minimizing the amount of time any given application thread may need to block to support the collector. In particular, Sapphire is intended to work well in the presence of a large number of application threads, on small- to medium-scale shared memory multiprocessors.

Sapphire extends previous concurrent copying algorithms, and is most closely related to replicating copying collection, a GC technique in which application threads observe and update primarily the old copies of objects. The key innovations of Sapphire are: (1) the ability to 'flip' one thread at a time (changing the thread's view from the old copies of objects to the new copies), as opposed to needing to stop all threads and flip them at the same time; (2) exploiting Java semantics and assuming any data races occur on volatile fields, to avoid a barrier on reads of non-volatile fields; and (3) working in concert with virtually any underlying (non-concurrent) copying collection algorithm. Copyright © 2003 John Wiley & Sons, Ltd

Article ID: CPE735

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorials
Article Title Editorial: ACM 2001 Java Grande-ISCOPE (JGI2001) Conference
Volume ID 15
Issue ID 3-5
Date March 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.735
Article ID CPE735
Author Name(s) Geoffrey C. Fox1
Author Email(s) gcf@indiana.edu1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE659

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Spar: a set of extensions to Java for scientific computation
Volume ID 15
Issue ID 3-5
Date 03 00 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.659
Article ID CPE659
Author Name(s) C. van Reeuwijk1F. Kuijlman2H. J. Sips3
Author Email(s) C.vanReeuwijk@its.tudelft.nl1
Affiliation(s) Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands 1 2 3
Keyword(s) Java, multi-dimensional array, tuple, parameterized type,
Abstract
inline

Article ID: CPE715

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Multi-wavelength image space: another Grid-enabled science
Volume ID 15
Issue ID 6
Date May 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.715
Article ID CPE715
Author Name(s) Roy Williams1Bruce Berriman2Ewa Deelman3John Good4Joseph Jacob5Carl Kesselman6Carol Lonsdale7Seb Oliver8Thomas A. Prince9
Author Email(s) roy@cacr.caltech.edu1
Affiliation(s) Center for Advanced Computing Research, California Institute of Technology 158-79, Pasadena, CA 91125, U.S.A. 12Infrared Processing and Analysis Center, California Institute of Technology 100-22, Pasadena, CA 91125, U.S.A. 2Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Key, CA 90292, U.S.A. 3 4 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, U.S.A 53Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Key, CA 90292, U.S.A. 6 7 Astronomy Centre, University of Sussex, Falmer, Brighton BN1 9R, U.K. 8 9
Keyword(s) astronomy, mosaicking, computational grid, montage,
Abstract
We describe how the Grid enables new research possibilities in astronomy through multi-wavelength images. To see sky images in the same pixel space, they must be projected to that space, a computer-intensive process. There is thus a virtual data space induced that is defined by an image and the applied projection. This virtual data can be created and replicated with Planners and Replica catalog technology developed under the GriPhyN project. We plan to deploy our system (MONTAGE) on the U.S. Teragrid. Grid computing is also needed for ingesting data - computing background correction on each image - which forms a separate virtual data space. Multi-wavelength images can be used for pushing source detection and statistics by an order of magnitude from current techniques; for optimization of multi-wavelength image registration for detection and characterization of extended sources; and for detection of new classes of essentially multi-wavelength astronomical phenomena. The paper discusses both the Grid architecture and the scientific goals. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE716

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title CORBA request portable interceptors: analysis and applications
Volume ID 15
Issue ID 6
Date May 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.716
Article ID CPE716
Author Name(s) C. Marchetti1R. Baldoni2L. Verde3
Author Email(s) marchet@dis.uniroma1.it1
Affiliation(s) Dipartimento di Informatica e Sistemistica, Universitą di Roma La Sapienza, Via Salaria 113, 00198, Roma, Italy 1 2 3
Keyword(s) interceptors, CORBA, fault-tolerant CORBA, middleware, performance analysis, distributed object computing,
Abstract
Interceptors are an emerging middleware technology enabling the addition of specific network-oriented capabilities to distributed applications. By exploiting interceptors, developers can register code within interception points, extending the basic middleware mechanisms with specific functionality, e.g. authentication, flow control, caching, etc. Notably, these extensions can be achieved without modifying either the application or the middleware code.

In this paper we report the results of our experiences with CORBA request portable interceptors. In particular, we point out (i) the basic mechanisms implementable by these interceptors, i.e. request redirection and piggybacking and (ii) we analyze their limitations. We then propose a proxy-based technique to overcome the interceptors' limitations. Successively, we present a performance analysis carried out on three Java-CORBA platforms currently implementing the portable interceptors specification. Finally, we conclude our work with a case study in which portable interceptors are used to implement the fault-tolerant CORBA client invocation semantic without impacting on the client application code and on the CORBA ORB. We also release fragments of Java code for implementing the described techniques. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE717

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title An adaptive parallel genetic algorithm system for i-Computing environment
Volume ID 15
Issue ID 6
Date May 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.717
Article ID CPE717
Author Name(s) Yuk-Yin Wong1Kin-Hong Lee2Kwong-Sak Leung3
Author Email(s) clevin.wong@cityu.edu.hk1
Affiliation(s) Computing Services Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong 1Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 3
Keyword(s) Internet computing, distributed computing, parallel genetic algorithm, adaptive genetic algorithm,
Abstract
Many real-world optimization problems in the scientific and engineering fields can be solved by genetic algorithms (GAs) but it still requires a long execution time for complex problems. At the same time, there are many under-utilized workstations on the Internet. In this paper, we present a self-adaptive parallel GA system named APGAIN, which utilizes the spare power of the heterogeneous workstations on the Internet to solve complex optimization problems. In order to maintain a balance between exploitation and exploration, we have devised a novel probabilistic rule-driven adaptive model (PRDAM) to adapt the GA parameters automatically. APGAIN is implemented on an Internet Computing system called DJM. In the implementation, we discover that DJM's original load balancing strategy is insufficient. Hence the strategy is extended with the job migration capability. The performance of the system is evaluated by solving the traveling salesman problem with data from a public database. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE718

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A scalable HPF implementation of a finite-volume computational electromagnetics application on a CRAY T3E parallel system
Volume ID 15
Issue ID 6
Date May 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.718
Article ID CPE718
Author Name(s) Yi Pan1Joseph J. S. Shang2Minyi Guo3
Author Email(s) pan@cs.gsu.edu1
Affiliation(s) 1Department of Computer Science, Georgia State University, Atlanta, GA 30303, U.S.A 1Air Vehicle Directorate, Air Force Research Laboratory, WPAFB, OH 45433-7521, U.S.A 2Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan 3
Keyword(s) parallel computers, execution time, efficiency, scalability, loop parallelization, Cray T3E, high performance Fortran,
Abstract
The time-dependent Maxwell equations are one of the most important approaches to describing dynamic or wide-band frequency electromagnetic phenomena. A sequential finite-volume, characteristic-based procedure for solving the time-dependent, three-dimensional Maxwell equations has been successfully implemented in Fortran before. Due to its need for a large memory space and high demand on CPU time, it is impossible to test the code for a large array. Hence, it is essential to implement the code on a parallel computing system. In this paper, we discuss an efficient and scalable parallelization of the sequential Fortran time-dependent Maxwell equations solver using High Performance Fortran (HPF). The background to the project, the theory behind the efficiency being achieved, the parallelization methodologies employed and the experimental results obtained on the Cray T3E massively parallel computing system will be described in detail. Experimental runs show that the execution time is reduced drastically through parallel computing. The code is scalable up to 98 processors on the Cray T3E and has a performance similar to that of an MPI implementation. Based on the experimentation carried out in this research, we believe that a high-level parallel programming language such as HPF is a fast, viable and economical approach to parallelizing many existing sequential codes which exhibit a lot of parallelism. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE720

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Analysis of a prototype intelligent network interface
Volume ID 15
Issue ID 7-8
Date June 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.720
Article ID CPE720
Author Name(s) Keith D. Underwood1Walter B. Ligon III2Ron R. Sass3
Author Email(s) keithu@parl.clemson.edu1
Affiliation(s) Parallel Architecture Research Laboratory, Holcombe Department of Electrical and Computer Engineering, Clemson University, 105 Riggs Hall, Clemson, SC 29634-0915, U.S.A. 1 2 3
Keyword(s) intelligent network interface, cluster computing, Beowulf, programmable NIC,
Abstract
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. Many users find that what Beowulf clusters lack in technology, they more than make up for with their significant cost advantage. In this paper, an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters is proposed. The proposed extension, called an intelligent network interface card (or INIC), enhances both the network and processor capabilities of the cluster, which has a significant impact on the performance of a crucial class of applications. Furthermore, for some applications, the proposed extension partially compensates for weaknesses in the PC memory subsystem. A prototype of the proposed architecture was constructed and analyzed. In addition, two applications, the 2D Fast Fourier Transform (2D-FFT) and integer sorting, which benefit from the resulting architecture, are discussed and analyzed on the proposed architecture. Specifically, results indicate that the 2D-FFT is performed 15-50% faster when using the prototype INIC rather than the comparable Gigabit Ethernet. Early simulation results also indicate that integer sorting will receive a 21-32% performance boost from the prototype. The significant improvements seen with a relatively limited prototype lead to the conclusion that cluster network interfaces enhanced with reconfigurable computing could significantly improve the Beowulf architecture. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE721

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Design and implementation of a user-level Sockets layer over Virtual Interface Architecture
Volume ID 15
Issue ID 7-8
Date June 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.721
Article ID CPE721
Author Name(s) Jin-Soo Kim1Kangho Kim2Sung-In Jung3Soonhoi Ha4
Author Email(s) jinsoo@cs.kaist.ac.kr1
Affiliation(s) Division of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, South Korea 1Computer Systems Research Department, Electronics and Telecommunications Research Institute, Daejeon 305-350, South Korea 2 3 School of Computer Science and Engineering, Seoul National University, Seoul 151-742, South Korea 4
Keyword(s) Virtual Interface Architecture, Sockets, user-level communication architecture, cluster computing,
Abstract
The Virtual Interface Architecture (VIA) is an industry standard user-level communication architecture for system area networks. The VIA provides a protected, directly-accessible interface to a network hardware, removing the operating system from the critical communication path. In this paper, we design and implement a user-level Sockets layer over VIA, named SOVIA (Sockets Over VIA). Our objective is to use the SOVIA layer to accelerate the existing Sockets-based applications with a reasonable effort and to provide a portable and high-performance communication library based on VIA to application developers. SOVIA realizes comparable performance to native VIA, showing a minimum one-way latency of 10.5s and a peak bandwidth of 814 Mbps on Giganet's cLAN. We have shown the functional compatibility with the existing Sockets API by porting File Transfer Protocol (FTP) and Remote Procedure Call (RPC) applications over the SOVIA layer. Compared to the Giganet's LAN Emulation (LANE) driver which emulates TCP/IP inside the kernel, SOVIA easily doubles the file transfer bandwidth in FTP and reduces the latency of calling an empty remote procedure by 77% in RPC applications. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE722

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters
Volume ID 15
Issue ID 7-8
Date June 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.722
Article ID CPE722
Author Name(s) Greg Bruno1Philip M. Papadopoulos2Mason J. Katz3
Author Email(s) bruno@sdsc.edu1
Affiliation(s) San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093-0505, U.S.A. 1 2 3
Keyword(s) clusters, Beowulf, cluster management, scalable installation, kickstart,
Abstract
High-performance computing clusters (commodity hardware with low-latency, high-bandwidth interconnects) based on Linux are rapidly becoming the dominant computing platform for a wide range of scientific disciplines. Yet, straightforward software installation, maintenance, and health monitoring for large-scale clusters has been a consistent and nagging problem for non-cluster experts. The NPACI Rocks distribution takes a fresh perspective on management and installation of clusters to dramatically simplify software version tracking and cluster integration. NPACI Rocks incorporates the latest Red Hat distribution (including security patches) with additional cluster-specific software. Using the identical software tools used to create the base distribution, users can customize and localize Rocks for their site. Strong adherence to widely-used (de facto) tools allows Rocks to move with the rapid pace of Linux development. Version 2.2.1 of the toolkit is available for download and installation. Over 100 Rocks clusters have been built by non-cluster experts at multiple institutions (residing in various countries) providing a peak aggregate of 2 TFLOPS of clustered computing. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE723

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title p-Jigsaw: a cluster-based Web server with cooperative caching support
Volume ID 15
Issue ID 7-8
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.723
Article ID CPE723
Author Name(s) Ge Chen1Cho-Li Wang2Francis C. M. Lau3
Author Email(s) gechen@csis.hku.hk1
Affiliation(s) Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong 1 2 3
Keyword(s) Web object cache, cooperative caching, Global Object Space, hot object, replacement algorithms, cluster, the World Wide Web,
Abstract
Clustering provides a viable approach to building scalable Web systems with increased computing power and abundant storage space for data and contents. In this paper, we present a pure-Java-based parallel Web server system,p-Jigsaw, which operates on a cluster and uses the technique of cooperative caching to achieve high performance. We introduce the design of an in-memory cache layer, called Global Object Space (GOS), for dynamic caching of frequently requested Web objects. The GOS provides a unified view of cluster-wide memory resources for achieving location-transparent Web object accesses. The GOS relies on cooperative caching to minimize disk accesses. A requested Web object can be fetched from a server node's local cache or a peer node's local cache, with the disk serving only as the last resort. A prototype system based on the W3C Jigsaw server has been implemented on a 16-node PC cluster. Three cluster-aware cache replacement algorithms were tested and evaluated. The benchmark results show good speedups with a real-life access log, proving that cooperative caching can have significant positive impacts on the performance of cluster-based parallel Web servers. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE724

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Clusterfile: a flexible physical layout parallel file system
Volume ID 15
Issue ID 7-8
Date June 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.724
Article ID CPE724
Author Name(s) Florin Isaila1Walter F. Tichy2
Author Email(s) florin@ira.uka.de1
Affiliation(s) Fakultät für Informatik, Universität Karlsruhe Postfach 6980, 76128 Karlsruhe, Germany 1 2
Keyword(s) parallel file system, parallel I/O, physical and logical partitioning, non-contiguous I/O, data redistribution, mapping functions,
Abstract
This paper presents Clusterfile, a parallel file system that provides parallel file access on a cluster of computers. We introduce a file partitioning model that has been used in the design of Clusterfile. The model uses a data representation that is optimized for multidimensional array partitioning while allowing arbitrary partitions. The paper shows how the file model can be employed for file partitioning into both physical subfiles and logical views. We also present how the conversion between two partitions of the same file is implemented using a general memory redistribution algorithm. We show how we use the algorithm to optimize non-contiguous read and write operations. The experimental results include performance comparisons with the Parallel Virtual File System (PVFS) and an MPI-IO implementation for PVFS. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE725

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Using multirail networks in high-performance clusters
Volume ID 15
Issue ID 7-8
Date June 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.725
Article ID CPE725
Author Name(s) Fabrizio Petrini1Salvador Coll2Eitan Frachtenberg3Adolfy Hoisie4Leonid Gurvits5
Author Email(s) fabrizio@lanl.gov1
Affiliation(s) CCS-3 Modeling, Algorithms and Informatics Group, Computer and Computational Sciences Division, Los Alamos National Laboratory, Los Alamos, CA, U.S.A. 1 2 3 4 5
Keyword(s) communication protocols, high-performance interconnection networks, performance evaluation, routing, communication libraries, parallel architectures,
Abstract
Using multiple independent networks (also known as rails) is an emerging technique which is being used to overcome bandwidth limitations and enhance fault tolerance of current high-performance parallel computers. In this paper, we present and analyze various algorithms to allocate multiple communication rails, including static and dynamic allocation schemes. An analytical lower bound on the number of rails required for static rail allocation is shown. We also present an extensive experimental comparison of the behavior of various algorithms in terms of bandwidth and latency. We show that striping messages over multiple rails can substantially reduce network latency, depending on average message size, network load and allocation scheme. The methods compared include a static rail allocation, a basic round-robin rail allocation, a local-dynamic allocation based on local knowledge and a dynamic rail allocation that reserves both communication endpoints of a message before sending it. The last method is shown to perform better than the others at higher loads: up to 49% better than local-knowledge allocation and 37% better than the round-robin allocation. This allocation scheme also shows lower latency and it saturates at higher loads (for long enough messages). Most importantly, this proposed allocation scheme scales well with the number of rails and message size. In addition we propose a hybrid algorithm that combines the benefits of the local-dynamic allocation for short messages with those of the dynamic algorithm for large messages. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE750

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Editorial
Volume ID 15
Issue ID 7-8
Date Jan 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.750
Article ID CPE750
Author Name(s) Mark Baker1
Author Email(s)
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE719

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Comparing the performance of MPICH with Cray's MPI and with SGI's MPI
Volume ID 15
Issue ID 9
Date Aug 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.719
Article ID CPE719
Author Name(s) Glenn R. Luecke1Marina Kraeva2Lili Ju3
Author Email(s) grl@iastate.edu1
Affiliation(s) High Performance Computing Group, Iowa State University, Ames, Iowa 50011-2251, U.S.A. 1 2 3
Keyword(s) MPI, MPICH, performance, Cray T3E-900, SGI Origin 3000,
Abstract
The purpose of this paper is to compare the performance of MPICH with the vendor Message Passing Interface (MPI) on a Cray T3E-900 and an SGI Origin 3000. Seven basic communication tests which include basic point-to-point and collective MPI communication routines were chosen to represent commonly-used communication patterns. Cray's MPI performed better (and sometimes significantly better) than Mississippi State University's (MSU's) MPICH for small and medium messages. They both performed about the same for large messages, however for three tests MSU's MPICH was about 20% faster than Cray's MPI. SGI's MPI performed and scaled better (and sometimes significantly better) than MPICH for all messages, except for the scatter test where MPICH outperformed SGI's MPI for 1 kbyte messages. The poor scalability of MPICH on the Origin 3000 suggests there may be scalability problems with MPICH. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE728

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title The LINPACK Benchmark: past, present and future
Volume ID 15
Issue ID 9
Date Aug 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.728
Article ID CPE728
Author Name(s) Jack J. Dongarra1Piotr Luszczek2Antoine 3
Author Email(s) dongarra@cs.utk.edu1 Petitet2
Affiliation(s) 1University of Tennessee, Department of Computer Science, Knoxville, TN 37996-3450, U.S.A. 1 2 Sun Microsystems, Inc., Paris, France 3
Keyword(s) benchmarking, BLAS, high-performance computing, HPL, linear algebra, LINPACK, TOP500,
Abstract
This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE732

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Enhancing OODB semantics to support browsing in an OODB vocabulary representation
Volume ID 15
Issue ID 9
Date Aug 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.732
Article ID CPE732
Author Name(s) James Geller1Li-min Liu2Yehoshua Perl3
Author Email(s) geller@la.njit.edu1
Affiliation(s) Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 11Department of Applied Mathematics, Chung Yuan Christian University, 22, Pu-Zen, Pu-Chung Li, Chung-Li, Taiwan, Republic of China 2 3
Keyword(s) object-oriented databases, object-oriented models, object-oriented systems, knowledge representation, database models, vocabulary systems, object-oriented hierarchical relationships, object-oriented semantic relationships,
Abstract
In previous work, we have modeled a vocabulary given as a semantic network by an object-oriented database (OODB). The OODB schema thus obtained provides a compact abstract view of the vocabulary. This enables the fast traversal of the vocabulary by a user. In the semantic network vocabulary, the IS-A relationships express the specialization hierarchy. In our OODB modeling of the vocabulary, the SUBCLASS relationship expresses the specialization hierarchy of the classes and supports the inheritance of their properties. A typical IS-A path in the vocabulary has a corresponding shorter SUBCLASS path in the OODB schema.

In this paper we expose several cases where the SUBCLASS hierarchy fails to fully correspond to the IS-A hierarchy of the vocabulary. In these cases there exist traversal paths in the semantic network for which there are no corresponding traversal paths in the OODB schema. The reason for this failure is the existence of some IS-A relationships between concepts of two classes, which are not connected by a SUBCLASS relationship. This phenomenon weakens the accuracy of our modeling.

To rectify the situation we introduce a new OODB semantic relationship IS-A to represent the existence of IS-A relationships between concepts of a pair of classes which are not connected via a SUBCLASS relationship. The resulting schema contains both SUBCLASS relationships and IS-A relationships which completely model the IS-A hierarchy of the vocabulary. We define a mixed-class level traversal path to contain either SUBCLASS or IS-A relationships. Consequently, each traversal path in the semantic network has a corresponding mixed traversal path in the OODB schema. Hence the introduction of the semantic OODB IS-A relationship improves the modeling of semantic network vocabularies by OODBs. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE729

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system
Volume ID 15
Issue ID 9
Date 08 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.729
Article ID CPE729
Author Name(s) G. Tremblay1C. J. Morrone2J. N. Amaral3G. R. Gao4
Author Email(s) amaral@cs.ualberta.ca3
Affiliation(s) Département d`informatique, Université du Québecà Montréal, Montréal, QC, Canada 1Computer Architecture and Parallel Systems Laboratory (CAPSL), Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, U.S.A. 2Department of Computing Science, University of Alberta, Edmonton, AB, Canada 3 4
Keyword(s) multi-threading, cluster computing, parallel programming language,
Abstract
This paper describes the design and implementation of an Efficient Architecture for Running THreads (EARTH) runtime system for a multi-processor/multi-node cluster. The (EARTH) model was designed to support the efficient execution of parallel (multi-threaded) programs with irregular fine-grain parallelism using off-the-shelf computers. Implementing an EARTH runtime system requires an explicitly threaded runtime system. For portability, we built this runtime system on top of Pthreads under Linux and used sockets for inter-node communication. Moreover, in order to make the best use of the resources available on a cluster of symmetric multi-processors (SMP), this implementation enables the overlapping of communication and computation. We used Threaded-C, a language designed to implement the programming model supported by the EARTH architecture. This language allows the expression of various levels of parallelism and provides the primitives needed to manage the required communication and synchronization. The Threaded-C programming language supports irregular fine-grain parallelism through a two-level hierarchy of threads and fibers. It also provides various synchronization and communication constructs that reflect the nature of EARTH`s fibers-non-preemptive execution with data-driven scheduling-as well as the extensive use of split-phase transactions on EARTH to execute long-latency operations. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE736

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Editorial
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.736
Article ID CPE736
Author Name(s) Rizos Sakellariou1
Author Email(s) .1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE737

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Scalable causal message logging for wide-area environments
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.737
Article ID CPE737
Author Name(s) Keith Marzullo1Karan Bhatia2Lorenzo Alvisi3
Author Email(s) marzullo@cs.ucsd.edu1
Affiliation(s) Department of Computer Science and Engineering, University of California, La Jolla, CA 92093-0114, U.S.A. 1NPACI, UCSD, MC 0505, 9500 Gilman Drive, La Jolla, CA 92093-0505, U.S.A. 2Department of Computer Sciences, The University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233, U.S.A. 3
Keyword(s) grid computing, wide-area computing, fault tolerance, causal message logging,
Abstract
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging because of their scale and their variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible because of network partitions.

However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Message Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or proxies, to significantly reduce the space requirements as compared with existing protocols. These proxies also act as caches for fault tolerance information and reduce the overall message overhead by as much as 50%. HCML also leverages differences in bandwidth between processes that reduces overall message latency by as much as 97%. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE738

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Achieving portable and efficient parallel CORBA objects
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.738
Article ID CPE738
Author Name(s) Christian Pérez1Alexandre Denis2Thierry Priol3
Author Email(s) Christian.Perez@irisa.fr1
Affiliation(s) IRISA/IFSIC, Campus de Beaulieu, 35042 Rennes Cedex, France 1IRISA/INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France 2 3
Keyword(s) CORBA, Grid computing, MPI, code coupling, high-performance network,
Abstract
With the availability of Computational Grids, new kinds of applications are emerging. They raise the problem of how to program them on such computing systems. In this paper, we advocate a programming model based on a combination of parallel and distributed programming models. Compared to previous approaches, this work aims at bringing single program multiple data (SPMD) programming into CORBA in a portable way. For example, we want to interconnect two parallel codes by CORBA without modifying either CORBA or the parallel communication API. We show that such an approach does not entail any loss of performance compared to previous approaches that required modification to the CORBA standard. Moreover, using an ORB that is able to exploit high-performance networks, we show that portable parallel CORBA objects can efficiently make use of such networks. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE739

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title HA-PSLS: a highly available parallel single-level store system
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.739
Article ID CPE739
Author Name(s) Christine Morin1Anne-Marie Kermarrec2
Author Email(s) cmorin@irisa.fr1
Affiliation(s) IRISA/INRIA, Campus universitaire de Beaulieu, 35042 Rennes Cedex, France 1Microsoft Research, 7 J J Thomson Avenue, Cambridge CB3 0FB, U.K 2
Keyword(s) parallel single-level store, high-availability, fault tolerance, checkpointing, replication, integration, parallel file systems, shared virtual memory,
Abstract
Parallel single-level store (PSLS) systems integrate a shared virtual memory and a parallel file system. They provide programmers with a global address space including both memory and file data. PSLS systems implemented in a cluster thus represent a natural support for long-running parallel applications, combining both the natural shared memory programming model and a large and efficient file system.

However, the need to tolerate failures in such a system increases with the size of applications. In this paper we present a highly-available parallel single level store system (HA-PSLS), which smoothly integrates a backward error recovery high-availability mechanism into a PSLS system. Our system is able to tolerate multiple transient failures, a single permanent failure, and power cut failures affecting the whole cluster, without requiring any specialized hardware. For this purpose, HA-PSLS relies on a high degree of integration (and reusability) of high-availability and standard features. A prototype integrating our high-availability support has been implemented and we show some performance results in the paper. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE740

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel application of a novel domain decomposition preconditioner for the adaptive finite-element solution of three-dimensional convection-dominated PDEs
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.740
Article ID CPE740
Author Name(s) P. K. Jimack1S. A. Nadeem2
Author Email(s) pkj@comp.leeds.ac.uk1
Affiliation(s) Computational PDEs Unit, School of Computing, University of Leeds, Leeds LS2 9JT, U.K. 1 2
Keyword(s) domain decomposition, additive Schwarz, weakly overlapping, convection-diffusion,
Abstract
We describe and analyse the parallel implementation of a novel domain decomposition preconditioner for the fast iterative solution of linear systems of algebraic equations arising from the discretization of elliptic partial differential equations (PDEs) in three dimensions. In previous theoretical work, this preconditioner has been proved to be optimal for symmetric positive-definite (SPD) linear systems. In this paper, we provide details of our three-dimensional parallel implementation and demonstrate that the technique may be generalized to the solution of non-symmetric algebraic systems, such as those arising when convection-diffusion problems are discretized using either Galerkin or stabilized finite-element methods (FEMs). Furthermore, we illustrate the potential of the preconditioner for use within an adaptive finite-element framework by successfully solving convection-dominated problems on locally, rather than globally, refined meshes. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE741

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A distributed object infrastructure for interaction and steering
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.741
Article ID CPE741
Author Name(s) Manish Parashar1Rajeev Muralidhar2
Author Email(s) parashar@caip.rutgers.edu1
Affiliation(s) The Applied Software Systems Laboratory, Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, 94 Brett Road, Piscataway, NJ 08854, U.S.A. 1 2
Keyword(s) distributed and dynamic objects, computational interaction and steering, sensors and actuators, computational collaboratory, scientific applications, parallel and distributed computing,
Abstract
This paper presents the design, implementation and experimental evaluation of DIOS (Distributed Interactive Object Substrate), an interactive object infrastructure to enable the runtime monitoring, interaction and computational steering of parallel and distributed applications. DIOS enables application objects (data structures, algorithms) to be enhanced with sensors and actuators so that they can be interrogated and controlled. Application objects may be distributed (spanning many processors) and dynamic (be created, deleted, changed or migrated). Furthermore, DIOS provides a control network that interconnects the interactive objects in a parallel/distributed application and enables external discovery, interrogation, monitoring and manipulation of these objects at runtime. DIOS is currently being used to enable interactive visualization, monitoring and steering of a wide range of scientific applications, including oil reservoir, compressible turbulence and numerical relativity simulations. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE742

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Load redundancy elimination on executable code
Volume ID 15
Issue ID 10
Date Aug 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.742
Article ID CPE742
Author Name(s) Manel Fernįndez1Roger Espasa2Saumya Debray3
Author Email(s) mfernand@ac.upc.es1
Affiliation(s) Computer Architecture Department, Universitat Politčcnica de Catalunya, Barcelona, Spain 1 2 Department of Computer Science, University of Arizona, Tucson, AZ 08034, U.S.A. 3
Keyword(s) profile-guided optimizations, redundancy elimination, link-time optimizations,
Abstract
Optimizations performed at link time or directly applied to final program executables have received increased attention in recent years. This paper discusses the discovery and elimination of redundant load operations in the context of a link-time optimizer, an optimization that we call Load Redundancy Elimination (LRE). Our experiments show that between 50% and 75% of a program's memory references can be considered redundant because they are accessing memory locations that have been referenced less than 200-400 instructions away. We then present three profile-based LRE algorithms targeted at optimizing away these redundancies. Our results show that between 5% and 30% of the redundancy detected can indeed be eliminated, which translates into program speedups of around 8%. We also test our algorithm assuming different cache latencies, and show that, if latencies continue to grow, the load redundancy elimination will become more important. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE778

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title SCALEA: a performance analysis tool for parallel programs
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.778
Article ID CPE778
Author Name(s) Hong-Linh Truong1Thomas Fahringer2
Author Email(s) truong@par.univie.ac.at1
Affiliation(s) Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1 2
Keyword(s) performance analysis, instrumentation, performance overheads, visualization, parallel programs,
Abstract
Many existing performance analysis tools lack the flexibility to control instrumentation and performance measurement for code regions and performance metrics of interest. Performance analysis is commonly restricted to single experiments.

In this paper we present SCALEA, which is a performance instrumentation, measurement, analysis, and visualization tool for parallel programs that supports post-mortem performance analysis. SCALEA currently focuses on performance analysis for OpenMP, MPI, HPF, and mixed parallel programs. It computes a variety of performance metrics based on a novel classification of overhead. SCALEA also supports multi-experiment performance analysis that allows one to compare and to evaluate the performance outcome of several experiments. A highly flexible instrumentation and measurement system is provided which can be controlled by command-line options and program directives. SCALEA can be interfaced by external tools through the provision of a full Fortran90 OpenMP/MPI/HPF frontend that allows one to instrument an abstract syntax tree at a very high-level with C-function calls and to generate source code. A graphical user interface is provided to view a large variety of performance metrics at the level of arbitrary code regions, threads, processes, and computational nodes for single- and multi-experiment performance analysis. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE779

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Deep Start: a hybrid strategy for automated performance problem searches
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.779
Article ID CPE779
Author Name(s) Philip C. Roth1Barton P. Miller2
Author Email(s) pcroth@cs.wisc.edu1
Affiliation(s) Computer Sciences Department, University of Wisconsin-Madison, Madison, WI 53706, U.S.A. 1 2
Keyword(s) automated performance diagnosis, search, sampling,
Abstract
To attack the problem of scalability of performance diagnosis tools with respect to application code size, we have developed the Deep Start search strategy - a new technique that uses stack sampling to augment an automated search for application performance problems. Our hybrid approach locates performance problems more quickly and finds performance problems hidden from a more straightforward search strategy. The Deep Start strategy uses stack samples collected as a by-product of normal search instrumentation to select deep starters, functions that are likely to be application bottlenecks. With priorities and careful control of the search refinement, our strategy gives preference to experiments on the deep starters and their callees. This approach enables the Deep Start strategy to find application bottlenecks more efficiently and more effectively than a more straightforward search strategy. We implemented the Deep Start search strategy in the Performance Consultant, Paradyn's automated bottleneck detection component. In our tests, Deep Start found half of our test applications' known bottlenecks between 32% and 59% faster than the Performance Consultant's current search strategy, and finished finding bottlenecks between 10% and 61% faster. In addition to improving the search time, Deep Start often found more bottlenecks than the call graph search strategy. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE781

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A self-stabilizing token-based k-out-of-exclusion algorithm
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.781
Article ID CPE781
Author Name(s) A. K. Datta1R. Hadid2V. Villain3
Author Email(s) datta@cs.unlv.edu1
Affiliation(s) School of Computer Science, University of Nevada Las Vegas, Las Vegas, NV 89154-4019, U.S.A 1LaRIA, Université de Picardie Jules Verne, 33 rue Saint Leu, 80039 Amiens Cedex, France 2 3
Keyword(s) fault-tolerance, -out-of- exclusion, -exclusion, mutual exclusion, self-stabilization,
Abstract
In this paper, we present the first self-stabilizing solution to the k-out-of- exclusion problem on a ring. The k-out-of- exclusion problem is a generalization of the well-known mutual exclusion problem - there are units of the shared resources, any process can request k units of the shared resources, and no resource unit can be allocated to more than one process at one time. The space requirement of the proposed algorithm is independent of for all processors except a special processor, called Root. The stabilization time is only 5n, where n is the size of the ring. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE782

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Articles
Article Title Fairness in systems based on multiparty interactions
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.782
Article ID CPE782
Author Name(s) David Ruiz1Rafael Corchuelo2Miguel Toro3
Author Email(s) druiz@tdg.lsi.us.es1
Affiliation(s) ETSI Informįtica, Avda. de la Reina Mercedes s/n, Sevilla E-41012, Spain 1 2 3
Keyword(s) concurrent programs, multiparty interactions, fairness, fair finiteness, conspiracies,
Abstract
In the context of the Multiparty Interaction Model, fairness is used to insure that an interaction that is enabled sufficiently often in a concurrent program will eventually be selected for execution. Unfortunately, this notion does not take conspiracies into account, i.e. situations in which an interaction never becomes enabled because of an unfortunate interleaving of independent actions; furthermore, eventual execution is usually too weak for practical purposes since this concept can only be used in the context of infinite executions. In this article, we present a new fairness notion, k-conspiracy-free fairness, that improves on others because it takes finite executions into account, alleviates conspiracies that are not inherent to a program, and k may be set a priori to control its goodness to address the above-mentioned problems. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE783

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Weak communication in single-hop radio networks: adjusting algorithms to industrial standards
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.783
Article ID CPE783
Author Name(s) Tomasz Jurdziski1Mirosaw Kutyowski2Jan Zatopiaski3
Author Email(s) mirekk@im.pwr.wroc.pl1
Affiliation(s) Institute of Computer Science, Wrocaw University, Wrocaw, Poland 1 2 3
Keyword(s) radio network, initialization, leader election, power consumption,
Abstract
Quite often algorithms designed for no-collision-detection radio networks use a hidden form of collision detection: it is assumed that a station can simultaneously send and listen. If it cannot hear its own message, apparently the message has been scrambled by another station sending at the same time.

Industrial standard IEEE 802.11 says that a station can either send or listen to a radio channel at a given time, but not both. In order to relate the industrial standard and theoretical algorithms we consider a weak radio network model with no collision detection in which a station cannot simultaneously send and receive signals. Otherwise we talk about a strong model.

In this paper we consider a measure called energy cost (or 'power consumption') which is equal to the maximum over all stations of the number of steps in which the station is sending or listening.

We show that computational power of weak and strong single-hop radio networks differ substantially in the deterministic case: deterministic leader election requires energy cost in the weak model and can be solved by a practical algorithm with energy cost in the strong model. By contrast, we present a very efficient randomized simulation of strong radio networks by weak ones, with preprocessing that requires steps and has energy cost . Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE784

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance analysis and qualitative results of an efficient parallel stochastic simulator for a marine host-parasite system
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.784
Article ID CPE784
Author Name(s) G. Latu1M. Langlais2J. Roman3P. Silan4
Author Email(s) latu@labri.fr1 LaBRI, UMR CNRS 5800, Université Bordeaux 1 et ENSEIRB, 351, cours de la Libération, 33405 Talence, France3
Affiliation(s) LaBRI, UMR CNRS 5800, Université Bordeaux 1 et ENSEIRB, 351, cours de la Libération, 33405 Talence, France 1MAB, UMR CNRS 5466, Université Bordeaux 2, 146 Léo Saignat, 33076 Bordeaux Cedex, France 2 3UMR CNRS 5000, Université Montpellier II, Station Méditerranéenne de lÉnvironnement Littoral, 1 Quai de la Daurade, 34200 Sčte, France 4
Keyword(s) parallel algorithmics, numerical simulation, stochastic simulation, host-parasite system, Monogenea, fish,
Abstract
We are interested in a host-parasite system, i.e. the sea bass-Diplectanum aequans system. A discrete mathematical model is used to describe the dynamics of both populations. Our goal is notably to validate the model in the context of aquaculture. A deterministic numerical simulator and, recently, a stochastic simulator were developed to study this biological system. Parallelization is required because the execution times are too long. The Monte Carlo algorithm of the stochastic simulator and its three levels of parallelism are described. Analysis and performances, up to 256 processors, of a hybrid MPI/OpenMP code are then presented for a cluster of symmetric multi-processor (SMP) nodes. Qualitative results are given for the host-macroparasite system simulation. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE777

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Euro-Par 2002
Volume ID 15
Issue ID 11-12
Date Sep 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.777
Article ID CPE777
Author Name(s) Rainer Feldman1Burkhard Monien2
Author Email(s)
Affiliation(s) 1 2
Keyword(s)
Abstract
No abstract

Article ID: CPE785

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel iterative methods for Navier-Stokes equations and application to eigenvalue computation
Volume ID 15
Issue ID 11-12
Date 09 00 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.785
Article ID CPE785
Author Name(s) Ivan G. Graham1Alastair Spence2Eero Vainikko3
Author Email(s) i.g.graham@bath.ac.uk1
Affiliation(s) Department of Mathematical Sciences, University of Bath, U.K. 1 2 Institute of Technology, University of Tartu, Estonia 3
Keyword(s) parallel iterative methods, domain decomposition, eigenvalues, incompressible Navier-Stokes equations,
Abstract
We describe the construction of parallel iterative solvers for finite-element approximations of the Navier–Stokes equations on unstructured grids using domain decomposition methods. The iterative method used is FGMRES, preconditioned by a parallel adaptation of a block preconditioner recently proposed by Kay <I>et al.</I> The parallelization is achieved by adapting the technology of our domain decomposition solver <MONO>DOUG</MONO> (previously used for scalar problems) to block-systems. The iterative solver is applied to shifted linear systems that arise in eigenvalue calculations. To illustrate the performance of the solver, we compare several strategies both theoretically and practically for the calculation of the eigenvalues of large sparse non-symmetric matrices arising in the assessment of the stability of flow past a cylinder. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE786

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Dynamic query scheduling in parallel data warehouses
Volume ID 15
Issue ID 11-12
Date 09 00 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.786
Article ID CPE786
Author Name(s) Holger Märtens1Erhard Rahm2Thomas Stöhr3
Author Email(s) h.maertens@fh-wolfenbuettel.de1
Affiliation(s) University of Applied Sciences Braunschweig/Wolfenbüttel, Germany 1University of Leipzig, Institute of Computer Science, Augustusplatz 10-11, D-04109 Leipzig, Germany 2 3
Keyword(s) parallel database systems, query optimization, load balancing, data warehousing, Shared Disk architecture, disk contention,
Abstract
Parallel processing is a key to high performance in very large data warehouse applications that execute complex analytical queries on huge amounts of data. Although parallel database systems (PDBSs) have been studied extensively in the past decades, the specifics of load balancing in parallel data warehouses have not been addressed in detail. In this study, we investigate how the load balancing potential of a Shared Disk (SD) architecture can be utilized for data warehouse applications. We propose an integrated scheduling strategy that simultaneously considers both processors and disks, regarding not only the total workload on each resource but also the distribution of load over time. We evaluate the performance of the new method in a comprehensive simulation study and compare it to several other approaches. The analysis incorporates skew aspects and considers typical data warehouse features such as star schemas with large fact tables and bitmap indices. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE730

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Resource management in open Linda systems
Volume ID 15
Issue ID 13
Date Nov 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.730
Article ID CPE730
Author Name(s) Ronaldo Menezes1
Author Email(s) rmenezes@cs.fit.edu1
Affiliation(s) Florida Institute of Technology, Department of Computer Sciences, 150 West University Boulevard, Melbourne, FL 32901, U.S.A. 1
Keyword(s) Linda, garbage collection, open systems, coordination systems,
Abstract
Coordination systems, in particular Linda, have established themselves as important tools for the development of applications to open systems such as the Internet.

This paper shows how to tackle a forgotten, but crucial problem in open coordination systems: memory management. As with any system which intends to be of wide use and because memory is a finite resource, coordination systems must address the problems of memory exhaustion. This paper first explores the orthogonality between coordination and computation in order to make it clear that the problem of memory exhaustion in coordination systems cannot be solved using garbage collection schemes implemented at the computation language - a garbage collection scheme must exist in the coordination environment as well.

Following the explanation on orthogonality, the paper will focus on describing a garbage collection scheme for the Linda family of coordination systems. It is expected that the solution in Linda can be adapted to other coordination systems as long as they are based on tuple space communication. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE726

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A parallel data assimilation model for oceanographic observations
Volume ID 15
Issue ID 13
Date Nov 1 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.726
Article ID CPE726
Author Name(s) Aad J. van der Steen1Fons van Hees2Peter Jan van Leeuwen3
Author Email(s) a.vandersteen@phys.uu.nl1
Affiliation(s) High Performance Computing Group, Utrecht University, P.O. Box 80195, 3508 TD Utrecht, The Netherlands 11High Performance Computing Group, Utrecht University, P.O. Box 80195, 3508 TD Utrecht, The Netherlands 2Institute for Marine and Atmospheric Research Utrecht, Utrecht University, P.O. Box 80005, 3508 TA Utrecht, The Netherlands 3
Keyword(s) stochastic model, data assimilation, MPI/OpenMP implementation,
Abstract
In this paper we describe the development of a program that aims at achieving the optimal integration of observed data in an oceanographic model describing the water transport phenomena in the Agulhas area at the tip of South Africa. Two parallel implementations, MPI and OpenMP, are described and experiments with respect to speed and scalability on a Compaq AlphaServer SC and an SGI Origin3000 are reported. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE727

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Coordinating components in middleware systems
Volume ID 15
Issue ID 13
Date 11 00 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.727
Article ID CPE727
Author Name(s) Matthias Radestock1Susan Eisenbach2
Author Email(s) sue@doc.ic.ac.uk2
Affiliation(s) LShift, Burbage House, 83-85 Curtain Road, London EC2A 3BS, U.K. 1Department of Computing, Imperial College London, London SW7 2BZ, U.K. 2
Keyword(s) coordination, adaptive open distributed systems, traps, message interception,
Abstract
Configuration coordination

Article ID: CPE731

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A set-oriented method definition language for object databases and its semantics
Volume ID 15
Issue ID 14
Date Dec 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.731
Article ID CPE731
Author Name(s) Giovanna Guerrini1Elisa Bertino2Isabella Merlo3
Author Email(s) guerrini@disi.unige.it1
Affiliation(s) Dipartimento di Informatica e Scienze dell'Informazione, Universitą degli Studi di Genova, Via Dodecaneso 35 - I16146 Genova, Italy 1Dipartimento di Scienze dell'Informazione, Universitą degli Studi di Milano, Via Comelico 39 - I20135 Milano, Italy 2 3
Keyword(s) object-oriented database systems, rule-based languages, database programming languages,
Abstract
In this paper we propose a set-oriented rule-based method definition language for object-oriented databases. Most existing object-oriented database systems exploit a general-purpose imperative object-oriented programming language as the method definition language. Because methods are written in a general-purpose imperative language, it is difficult to analyze their properties and to optimize them. Optimization is important when dealing with a large amount of objects as in databases. We therefore believe that the use of an ad hoc, set-oriented language can offer some advantages, at least at the specification level. In particular, such a language can offer an appropriate framework to reason about method properties.

In this paper, besides defining a set-oriented rule-based language for method definition, we formally define its semantics, addressing the problems of inconsistency and non-determinism in set-oriented updates. Moreover, we characterize some relevant properties of methods, such as conflicts among method specifications in sibling classes and behavioral refinement in subclasses. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE788

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Optimizing operating system performance for CC-NUMA architectures
Volume ID 15
Issue ID 14
Date 12 10 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.788
Article ID CPE788
Author Name(s) Moon Seok Chang1
Author Email(s) cmoon@us.ibm.com1
Affiliation(s) ETRI, 161 Gajeong-Dong, Yuseong-Gu, Daejon 305-350, Korea 1
Keyword(s) CC-NUMA, UNIX operating system, remote memory access, data striping, localization, load balancing,
Abstract
The CC-NUMA (cache-coherent non-uniform memory access) architecture is an attractive solution to scalable servers. The performance of a CC-NUMA system heavily depends on the number of accesses to remote memory through an interconnection network. To reduce the number of remote accesses, an operating system needs to exploit the potential locality of the architecture. This paper describes the design and implementation of a UNIX-based operating system supporting the CC-NUMA architecture. The operating system implements various enhancements by revising kernel algorithms and data structures. This paper also analyzes the performance of the enhanced operating system by running commercial benchmarks on a real CC-NUMA system. The performance analysis shows that the operating system can achieve improved performance and scalability for CC-NUMA by implementing kernel data striping, localization and load balancing. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE733

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Frameworks for incorporating semantic relationships into object-oriented database systems
Volume ID 15
Issue ID 15
Date Dec 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.733
Article ID CPE733
Author Name(s) Michael Halper1Li-min Liu2James Geller3Yehoshua Perl4
Author Email(s) mhalper@kean.edu1
Affiliation(s) Department of Mathematics and Computer Science, Kean University, Union, NJ 07083, U.S.A. 12Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taiwan, Republic of China 2Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 3 4
Keyword(s) semantic relationship, object-oriented database, object-oriented modeling, part-whole relationship, metaclass,
Abstract
A semantic relationship is a data modeling construct that connects a pair of classes or categories and has inherent constraints and other functionalities that precisely reflect the characteristics of the specific relationship in an application domain. Examples of semantic relationships include part-whole, ownership, materialization and role-of. Such relationships are important in the construction of information models for advanced applications, whether one is employing traditional data-modeling techniques, knowledge-representation languages or object-oriented modeling methodologies. This paper focuses on the issue of providing built-in support for such constructs in the context of object-oriented database (OODB) systems. Most of the popular object-oriented modeling approaches include some semantic relationships in their repertoire of data-modeling primitives. However, commercial OODB systems, which are frequently used as implementation vehicles, tend not to do the same. We will present two frameworks by which a semantic relationship can be incorporated into an existing OODB system. The first only requires that the OODB system support manifest type with respect to its instances. The second assumes that the OODB system has a special kind of metaclass facility. The two frameworks are compared and contrasted. In order to ground our work in existing systems, we show the addition of a part-whole semantic relationship both to the ONTOS DB/Explorer OODB system and the VODAK Model Language. Copyright © 2003 John Wiley & Sons, Ltd.

Article ID: CPE743

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance comparison of checkpoint and recovery protocols
Volume ID 15
Issue ID 15
Date Dec 25 2003
DOI(URI) http://dx.doi.org/10.1002/cpe.743
Article ID CPE743
Author Name(s) Himadri Sekhar Paul1Arobinda Gupta2R. Badrinath3
Author Email(s) hpaul@cse.iitkgp.ernet.in1
Affiliation(s) Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India 1 2 3
Keyword(s) checkpoint and recovery, fault tolerance, distributed systems,
Abstract
Checkpoint and rollback recovery is a well-known technique for providing fault tolerance to long-running distributed applications. Performance of a checkpoint and recovery protocol depends on the characteristics of the application and the system on which it runs. However, given an application and system environment, there is no easy way to identify which checkpoint and recovery protocol will be most suitable for it. Conventional approaches require implementing the application with all the protocols under consideration, running them on the desired system, and comparing their performances. This process can be very tedious and time consuming. This paper first presents the design and implementation of a simulation environment, distributed process simulation or dPSIM, which enables easy implementation and evaluation of checkpoint and recovery protocols. The tool enables the protocols to be simulated under a wide variety of application, system, and network characteristics. The paper then presents performance evaluation of five checkpoint and recovery protocols. These protocols are implemented and executed in dPSIM under different simulated application, system, and network characteristics. Copyright © 2003 John Wiley & Sons, Ltd.