WILEY Journal Home Page
Papers under review through 2004
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A stochastic load balancing algorithm for Computing | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.708 | ||||||||
Article ID | CPE708 | ||||||||
Author Name(s) | Yuk Yin Wong1Kwong Sak Leung2Kin Hong Lee3 | ||||||||
Author Email(s) | clevin.wong@cityu.edu.hk1 | ||||||||
Affiliation(s) | Computing Services Centre, City University of Hong Kong, Tat Chee Avenue, Kln., Hong Kong 1>Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, N.T., Hong 2 3 | ||||||||
Keyword(s) | Internet computing, stochastic load balancing, distributed systems, Java computing, | ||||||||
Abstract |
|||||||||
This paper presents a stochastic dynamic load balancing algorithm for Internet computing, which is a new type of distributed computing involving heterogeneous workstations from different organizations on the Internet. To realize the practical environment, we assume the system to be comprised of heterogeneous,untrusted and non dedicated workstations connected by a non dedicated network. Our algorithm uses the product of the average processing time and the queue length of system jobs as the load index. Dynamic communication delay is included in the execution cost calculation. The transfer policy and the location policy are combined in a stochastic algorithm. State information exchange is done via information feedback and mutual updating. Simulations demonstrate that our algorithm outperforms conventional approaches over a wide range of system parameters. These results are reconfirmed by empirical experiments after we have implemented the algorithms on the Distributed Java Machine global virtual machine. Copyright 2003 John Wiley Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A comparison of concurrent programming and cooperative multithreading | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.706 | ||||||||
Article ID | CPE706 | ||||||||
Author Name(s) | Ronald A Olsson1Tiejun Li2Eugene F. Fodor3Aaron W. Keen4 | ||||||||
Author Email(s) | olsson@cs.ucdavis.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, University of California, Davis, CA 95616, U.S.A. 1 2 3 4 | ||||||||
Keyword(s) | cooperative multithreading, concurrent programming, parallel and distributed programming languages, synchronization mechanisms, synchronization optimization, | ||||||||
Abstract |
|||||||||
This paper presents a comparison of the cooperative multithreading model with the general concurrent programming model.It focuses on the execution time performance of a range of standard concurrent programming applications. The overall results are mixed. In some cases, programs written in the cooperative multithreading model outperform those written in the general concurrent programming model. The contributions of this paper are twofold. First, it presents a thorough analysis of the performances of applications in the different models, i.e. to explain the criteria that determine when a program in one model will outperform an equivalent program in the other. Second, it examines the tradeoffs in writing programs in the different programming styles. In some cases, better performance comes at the cost of more complicated code. Copyright 2003 John Wiley Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | The Virtual Laboratory: a toolset to enable distributed molecular modelling for drug design on the World Wide Grid | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.704 | ||||||||
Article ID | CPE704 | ||||||||
Author Name(s) | Rajkumar Buyya1Kim Branson2Jon Giddy3DavidA Abramson4 | ||||||||
Author Email(s) | rajkumar@buyya.com1 | ||||||||
Affiliation(s) | Grid Computing and Distributed Systems Laboratory, Department of Computer Science and Software Engin 1Structural Biology, Walter and Eliza Hall Institute, Royal Parade, Parkville, Melbourne, Australia 2Welsh Science Centre, Department of Computer Science, Cardiff University, Cardiff, U.K. 3School of Computer Science and Software Engineering, Monash University, Caulfield Campus, Melbourne 4 | ||||||||
Keyword(s) | Grid computing, drug design, application scheduling, grid economy, | ||||||||
Abstract |
|||||||||
Computational Grids are emerging as a new paradigm for sharing and aggregation of geographically distributed resources for solving large scale compute and data intensive problems in science, engineering and commerce. However, application development, resource management and scheduling in these environments is a complex undertaking. In this paper, we illustrate the development of a Virtual Laboratory environment by leveraging existing Grid technologies to enable molecular modelling for drug design on geographically distributed resources. It involves screening millions of compounds in the chemical database CDB against a protein target to identify those with potential use for drug design. We have used the Nimrod G parameter specification language to transform the existing molecular docking application into a parameter sweep application for executing on distributed systems. We have developed new tools for enabling access to ligand records ;molecules in the CDB from remote resources. The Nimrod G resource broker along with molecule CDB data broker is used for scheduling and on demand processing of docking jobs on the World Wide Grid ;WWG ; resources. The results demonstrate the ease of use and power of the Nimrod G and virtual laboratory tools for grid computing. Copyright 2003 John Wiley Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Object oriented distributed computing based on remote class reference | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.708 | ||||||||
Article ID | CPE709 | ||||||||
Author Name(s) | David W. Walker1Yan Huang2Omer F. Rana3 | ||||||||
Author Email(s) | David.W.Walker@cs.cf.ac.uk1 | ||||||||
Affiliation(s) | Department of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, U.K 1 2 3 | ||||||||
Keyword(s) | Java, RMI, distributed computing, remote class reference, | ||||||||
Abstract |
|||||||||
Java RMI, Jini and CORBA provide effective mechanisms for implementing a distributed computing system. Recently many numeral libraries have been developed that take advantage of Java as an object oriented and portable language.The widely used client server method limits the extent to which the benefits of the object oriented approach can be exploited because of the difficulties arising when a remote object is the argument or return value of a remote or local method. In this paper this problem is solved by introducing a data object that stores the data structure of the remote object and related access methods. By using this data object, the client can easily instantiate a remote object, and use it as the argument or return value of either a local or remote method. Copyright 2003 John Wiley Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Modular specification of frame properties in JML | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 2 | ||||||||
Date | Feb 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.713 | ||||||||
Article ID | CPE713 | ||||||||
Author Name(s) | Gary T. Leavens1Arnd Poetzsch-Heffter2Peter Müller3 | ||||||||
Author Email(s) | leavens@cs.iastate.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, Iowa State University, 229 Atanasoff Hall, Ames, IA 50011-1040, U.S. 1Universität Kaiserslautern, 67653 Kaiserslautern, Germany 2Deutsche Bank AG, Frankfurt, Germany 3 | ||||||||
Keyword(s) | formal specification, frame problem, depends-clause, modifies-clause, JML, alias control, ownership, Java, | ||||||||
Abstract |
|||||||||
We present a modular specification technique for frame properties. The technique uses modifies clauses and abstract fields with declared dependencies. Modularity is guaranteed by a programming model that enforces data abstraction by preventing representation and argument exposure, a semantics of modifies clauses that uses a notion of 'relevant location', and by modularity rules for dependencies. For concreteness, we adapt this technique to the Java Modeling Language, JML. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Improving the official specification of Java bytecode verification | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 2 | ||||||||
Date | Feb 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.714 | ||||||||
Article ID | CPE714 | ||||||||
Author Name(s) | Alessandro Coglio1 | ||||||||
Author Email(s) | coglio@kestrel.edu1 | ||||||||
Affiliation(s) | Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, U.S.A. 1 | ||||||||
Keyword(s) | Java, bytecode verification, specification, | ||||||||
Abstract |
|||||||||
Bytecode verification is the main mechanism to ensure type safety in the Java Virtual Machine. Inadequacies in its official specification may lead to incorrect implementations where security can be broken and/or certain legal programs are rejected. This paper provides a comprehensive analysis of the specification, along with concrete suggestions for improvement. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Articles | ||||||||
Article Title | Clustering revealed in high-resolution simulations and visualization of multi-resolution features in fluid-particle models | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 2 | ||||||||
Date | Feb 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.711 | ||||||||
Article ID | CPE711 | ||||||||
Author Name(s) | David A. Yuen1Krzysztof Boryczko2Witold Dzwinel3 | ||||||||
Author Email(s) | davey@krissy.msi.umn.edu1 | ||||||||
Affiliation(s) | Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455-1227, U.S. 11AGH Institute of Computer Science, al. Mickiewicza 30, 30-059, Kraków, Poland 2 3 | ||||||||
Keyword(s) | large-scale data sets, visualization, feature extraction, parallel clustering, dissipative particle dynamics, fluid particle model, | ||||||||
Abstract |
|||||||||
Simulating natural phenomena at greater accuracy results in an explosive growth of data. Large-scale simulations with particles currently involve ensembles consisting of between 106 and 109 particles, which cover 105-106 time steps. Thus, the data files produced in a single run can reach from tens of gigabytes to hundreds of terabytes. This data bank allows one to reconstruct the spatio-temporal evolution of both the particle system as a whole and each particle separately. Realistically, for one to look at a large data set at full resolution at all times is not possible and, in fact, not necessary. We have developed an agglomerative clustering technique, based on the concept of a mutual nearest neighbor (MNN). This procedure can be easily adapted for efficient visualization of extremely large data sets from simulations with particles at various resolution levels. We present the parallel algorithm for MNN clustering and its timings on the IBM SP and SGI/Origin 3800 multiprocessor systems for up to 16 million fluid particles. The high efficiency obtained is mainly due to the similarity in the algorithmic structure of MNN clustering and particle methods. We show various examples drawn from MNN applications in visualization and analysis of the order of a few hundred gigabytes of data from discrete particle simulations, using dissipative particle dynamics and fluid particle models. Because data clustering is the first step in this concept extraction procedure, we may employ this clustering procedure to many other fields such as data mining, earthquake events and stellar populations in nebula clusters. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | MPI-CHECK: a tool for checking Fortran 90 MPI programs | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 2 | ||||||||
Date | Feb 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.705 | ||||||||
Article ID | CPE705 | ||||||||
Author Name(s) | Glenn Luecke1Hua Chen2James Coyle3Jim Hoekstra4Yan Zou5Marina Kraeva6 | ||||||||
Author Email(s) | grl@iastate.edu1 | ||||||||
Affiliation(s) | High Performance Computing Group, Iowa State University, Ames, IA 50011, U.S.A. 1 2 3 4 5 6 | ||||||||
Keyword(s) | MPI, tool, bounds checking, argument checking, data type checking, | ||||||||
Abstract |
|||||||||
MPI is commonly used to write parallel programs for distributed memory parallel computers. MPI-CHECK is a tool developed to aid in the debugging of MPI programs that are written in free or fixed format Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time and run-time checking of MPI programs. MPI-CHECK automatically detects the following problems in the use of MPI routines: (i) mismatch in argument type, kind, rank or number; (ii) messages which exceed the bounds of the source/destination array; (iii) negative message lengths; (iv) illegal MPI calls before MPI_INIT or after MPI_FINALIZE; (v) inconsistencies between the declared type of a message and its associated DATATYPE argument; and (vi) actual arguments which violate the INTENT attribute. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Kava: a Java dialect with a uniform object model for lightweight classes | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.653 | ||||||||
Article ID | CPE653 | ||||||||
Author Name(s) | David F. Bacon1 | ||||||||
Author Email(s) | dfb@watson.ibm.com1 | ||||||||
Affiliation(s) | IBM T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, U.S.A 1 | ||||||||
Keyword(s) | Java, lightweight classes, object inlining, object models, unboxing, | ||||||||
Abstract |
|||||||||
Object-oriented programming languages have always distinguished between 'primitive' and 'user-defined' data types, and in the case of languages like C++ and Java the primitives are not even treated as objects, further fragmenting the programming model. The distinction is especially problematic when a particular programming community requires primitive-level support for a new data type, as for complex, intervals, fixed-point numbers, and so on. We present Kava, a design for a backward-compatible version of Java that solves the problem of programmable lightweight objects in a much more aggressive and uniform manner than previous proposals. In Kava, there are no primitive types; instead, object-oriented programming is provided down to the level of single bits, and types such as int can be explicitly programmed within the language. While the language maintains a uniform object reference semantics, efficiency is obtained by making heavy use of unboxing and semantic expansion. We describe Kava as a dialect of the Java language, show how it can be used to define various primitive types, describe how it can be translated into Java, and compare it to other approaches to lightweight objects. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Framework for testing multi-threaded Java programs | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.654 | ||||||||
Article ID | CPE654 | ||||||||
Author Name(s) | Shmuel Ur1Orit Edelstein2Eitan Farchi3Evgeny Goldin4Gil Ratsaby5 | ||||||||
Author Email(s) | UR@il.ibm.com1 Correspondence to Shmuel Ur, IBM Research Laboratory in Haifa5 | ||||||||
Affiliation(s) | Correspondence to Shmuel Ur, IBM Research Laboratory in Haifa 1 2 3 4 5 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
Finding bugs due to race conditions in multi-threaded programs is difficult, mainly because there are many possible interleavings, any of which may contain a fault. In this work we present a methodology for testing multi-threaded programs which has minimal impact on the user and is likely to find interleaving bugs. Our method reruns existing tests in order to detect synchronization faults. We find that a single test executed a number of times in a controlled environment may be as effective in finding synchronization faults as many different tests. A great deal of resources are saved since tests are very expensive to write and maintain. We observe that simply rerunning tests, without ensuring in some way that the interleaving will change, yields almost no benefits. We implement the methodology in our test generation tool - ConTest. ConTest combines the replay algorithm, which is essential for debugging, with our interleaving test generation heuristics. ConTest also contains an instrumentation engine, a coverage analyzer, and a race detector (not finished yet) that enhance bug detection capabilities. The greatest advantage of ConTest, besides finding bugs of course, is its minimal effect on the user. When ConTest is combined into the test harness, the user may not even be aware that ConTest is being used. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Integration and application of TAU in parallel Java environments | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.655 | ||||||||
Article ID | CPE655 | ||||||||
Author Name(s) | Sameer Shende1Allen D. Malony2 | ||||||||
Author Email(s) | sameer@cs.uoregon.edu1 | ||||||||
Affiliation(s) | Department of Computer and Information Science, 1202 University of Oregon, Eugene, OR 97403, U.S.A. 1Department of Computer and Information Science, 1202 University of Oregon, Eugene, OR 97403, U.S.A. 2 | ||||||||
Keyword(s) | parallel, Java, performance, tools, | ||||||||
Abstract |
|||||||||
Parallel Java environments present challenging problems for performance tools because of Java's rich language system and its multi-level execution platform combined with the integration of native-code application libraries and parallel run-time software. In addition to the desire to provide robust performance measurement and analysis capabilities for the Java language itself, the coupling of different software execution contexts under a uniform performance model needs careful consideration of how events of interest are observed and how cross-context parallel execution information is linked. This paper relates our experience in extending the TAU (Tuning and Analysis Utilities) performance system to a parallel Java environment based on mpiJava. We describe the complexities of the instrumentation model used, how performance measurements are made, and the overhead incurred. A parallel Java application simulating the game of Life is used to show the performance system's capabilities. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Platform independent dynamic Java virtual machine analysis: the Java Grande Forum benchmark suite | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.666 | ||||||||
Article ID | CPE666 | ||||||||
Author Name(s) | John Waldron1James Power2David Gregg3 | ||||||||
Author Email(s) | John.Waldron@cs.tcd.ie1 | ||||||||
Affiliation(s) | Department of Computer Science, Trinity College, Dublin 2, Ireland 1Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland 2 3 | ||||||||
Keyword(s) | Java Virtual Machine, Java Grande, instruction frequency, method profiling, | ||||||||
Abstract |
|||||||||
In this paper we present a platform independent analysis of the dynamic profiles of Java programs when executing on the Java Virtual Machine. The Java programs selected are taken from the Java Grande Forum benchmark suite and five different Java-to-bytecode compilers are analysed. The results presented describe the dynamic instruction usage frequencies, as well as the sizes of the local variable, parameter and operand stacks during execution on the JVM. These results, presenting a picture of the actual (rather than presumed) behaviour of the JVM, have implications both for the coverage aspects of the Java Grande benchmark suites, for the performance of the Java-to-bytecode compilers and for the design of the JVM. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | High-performance Java codes for computational fluid dynamics | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.656 | ||||||||
Article ID | CPE656 | ||||||||
Author Name(s) | Christopher J. Riley1Siddhartha Chatterjee2Rupak Biswas3 | ||||||||
Author Email(s) | chris.riley@cfdesign.com1 | ||||||||
Affiliation(s) | Blue Ridge Numerics, Inc., Charlottesville, VA 22901, U.S.A 1IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, U.S.A. 2NASA Advanced Supercomputing Division, MS T27A-1, NASA Ames Research Center, Moffett Field, CA 94035 3 | ||||||||
Keyword(s) | scientific computing, object-oriented, benchmarks, | ||||||||
Abstract |
|||||||||
The computational science community is reluctant to write large-scale computationally-intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of computational fluid dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center) and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Java Virtual Machine support for object serialization | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.667 | ||||||||
Article ID | CPE667 | ||||||||
Author Name(s) | Fabian Breg1Constantine D. Polychronopoulos2 | ||||||||
Author Email(s) | breg@csrd.uiuc.edu1 | ||||||||
Affiliation(s) | University of Illinois, Coordinated Science Laboratory, room 444, 1308 W. Main Street, Urbana, IL 61 1 2 | ||||||||
Keyword(s) | Java, object serialization, | ||||||||
Abstract |
|||||||||
Distributed computing has become increasingly popular in the high-performance community. Java's remote method invocation (RMI) provides a simple, yet powerful method for implementing parallel algorithms. The performance of RMI has been less than adequate, however, and object serialization is often identified as a major performance inhibitor. We believe that object serialization is best performed in the Java Virtual Machine (JVM), where information regarding object layout and hardware communication resources are readily available. We implement a subset of Java's object serialization protocol in native code, using the Java Native Interface (JNI) and JVM internals. Experiments show that our approach is up to eight times faster than Java's original object serialization protocol for array objects. Also, for linked data structures our approach obtains a moderate speedup and better scalability. Evaluation of our object serialization implementation in an RMI framework indicates that a higher throughput can be obtained. Parallel applications, written using RMI, obtain better speedups and scalability when this more efficient object serialization is used. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Automatic translation of Fortran to JVM bytecode | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.657 | ||||||||
Article ID | CPE657 | ||||||||
Author Name(s) | Keith Seymour1Jack Dongarra2 | ||||||||
Author Email(s) | seymour@cs.utk.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, University of Tennessee, Knoxville, Knoxville, TN 37996, U.S.A. 1 2 | ||||||||
Keyword(s) | Fortran, Java, JVM, bytecode, numerical libraries, | ||||||||
Abstract |
|||||||||
This paper reports on the design of a Fortran-to-Java translator whose target language is the instruction set of the Java Virtual Machine. The goal of the translator is to generate Java implementations of legacy Fortran numerical codes in a consistent and reliable fashion. The benefits of directly generating bytecode are twofold. First, compared with generating Java source code, it provides a much more straightforward and efficient mechanism for translating Fortran GOTO statements. Second, it provides a framework for pursuing various compiler optimizations, which could be beneficial not only to our project, but to the Java community as a whole. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Benchmarking Java against C and Fortran for scientific applications | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.658 | ||||||||
Article ID | CPE568 | ||||||||
Author Name(s) | J. M. Bull1L. A. Smith2L. Pottage3R. Freeman4 | ||||||||
Author Email(s) | markb@epcc.ed.ac.uk1 | ||||||||
Affiliation(s) | Edinburgh Parallel Computer Centre, James Clerk Maxwell Building, The King's Buildings, The University of Edinburgh, Mayfield Road, Edinburgh EH9 3JZ, U.K. 1 2 3 4 | ||||||||
Keyword(s) | Java, C, performance, benchmarking, scientific applications, | ||||||||
Abstract |
|||||||||
Increasing interest is being shown in the use of Java for scientific applications. The Java Grande benchmark suite was designed with such applications primarily in mind. The perceived lack of performance of Java still deters many potential users, despite recent advances in just-in-time and adaptive compilers. There are, however, few benchmark results available comparing Java to more traditional languages such as C and Fortran. To address this issue, a subset of the Java Grande benchmarks has been re-written in C and Fortran allowing direct performance comparisons between the three languages. The performance of a range of Java execution environments, C and Fortran compilers have been tested across a number of platforms using the suite. These demonstrate that on some platforms (notable Intel Pentium) the performance gap is now quite small. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Run-time optimizations for a Java DSM implementation | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.660 | ||||||||
Article ID | CPE660 | ||||||||
Author Name(s) | R. Veldema1R. F. H. Hofman2R. A. F. Bhoedjang3H. E. Bal4 | ||||||||
Author Email(s) | rveldema@cs.vu.nl1 | ||||||||
Affiliation(s) | 1Department of Computer Science, Vrije Universiteit, Amsterdam, The Netherlands 1 2 Department of Computer Science, Cornell University, Ithaca, NY, USA 3 4 | ||||||||
Keyword(s) | Java, DSM, compiler, runtime, protocols, | ||||||||
Abstract |
|||||||||
Jackal is a fine-grained distributed shared memory implementation of the Java programming language. Jackal implements Java's memory model and allows multithreaded Java programs to run unmodified on distributed-memory systems. This paper focuses on Jackal's run-time system, which implements a multiple-writer, home-based consistency protocol. Protocol actions are triggered by software access checks that Jackal's compiler inserts before object and array references. To reduce access-check overhead, the compiler exploits source-level information and performs extensive static analysis to optimize, lift, and remove access checks. We describe optimizations for Jackal's run-time system, which mainly consists of discovering opportunities to dispense with flushing of cached data. We give performance results for different run-time optimizations, and compare their impact with the impact of one compiler optimization. We find that our run-time optimizations are necessary for good Jackal performance, but only in conjunction with the Jackal compiler optimizations described by Veldema et al. As a yardstick, we compare the performance of Java applications run on Jackal with the performance of equivalent applications that use a fast implementation of Java's Remote Method Invocation instead of shared memory. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Supporting multidimensional arrays in Java | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.661 | ||||||||
Article ID | CPE661 | ||||||||
Author Name(s) | José E. Moreira1Samuel P. Midkiff2Manish Gupta3 | ||||||||
Author Email(s) | jmoreira@us.ibm.com1 | ||||||||
Affiliation(s) | IBM T. J. Watson Research Center, Yorktown Heights, NY 10598-0218, U.S.A. 1 2 3 | ||||||||
Keyword(s) | Java, high-performance computing, compiler optimizations, multidimensional arrays, | ||||||||
Abstract |
|||||||||
The lack of direct support for multidimensional arrays in JavaTM has been recognized as a major deficiency in the language's applicability to numerical computing. It has been shown that, when augmented with multidimensional arrays, Java can achieve very high-performance for numerical computing through the use of compiler techniques and efficient implementations of aggregate array operations. Three approaches have been discussed in the literature for extending Java with support for multidimensional arrays: class libraries that implement these structures; extending the Java language with new syntactic constructs for multidimensional arrays that are directly translated to bytecode; and relying on the Java Virtual Machine to recognize those arrays of arrays that are being used to simulate multidimensional arrays. This paper presents a balanced presentation of the pros and cons of each technique in the areas of functionality, language and virtual machine impact, implementation effort, and effect on performance. We show that the best choice depends on the relative importance attached to the different metrics, and thereby provide a common ground for a rational discussion and comparison of the techniques. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | CartaBlanca - a pure-Java, component-based systems simulation tool for coupled nonlinear physics on unstructured grids - an update | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.662 | ||||||||
Article ID | CPE662 | ||||||||
Author Name(s) | W. B. VanderHeyden1E. D. Dendy2N. T. Padial-Collins3 | ||||||||
Author Email(s) | wbv@lanl.gov1 | ||||||||
Affiliation(s) | Los Alamos National Laboratory, Theoretical Division and Los Alamos Computer Science Institute, Los Alamos, NM 87545, U.S.A. 1 2 3 | ||||||||
Keyword(s) | Java, high performance computing, object oriented design, finite volume, nonlinear solver, development environment, | ||||||||
Abstract |
|||||||||
This paper describes a component-based nonlinear physical system simulation prototyping package written entirely in Java using object-oriented design. The package provides scientists and engineers with a 'developer-friendly' software environment for large-scale computational algorithm and physical model development. The software design centers on the Jacobian-free Newton-Krylov solution method surrounding a finite-volume treatment of conservation equations. This enables a clean component-like implementation. We first provide motivation for the development of the software and then discuss software structure. The discussion includes a description of the use of Java's built-in thread facility that enables parallel, shared-memory computations on a wide variety of unstructured grids with triangular, quadrilateral, tetrahedral and hexahedral elements. We also discuss the use of Java's inheritance mechanism in the construction of a hierarchy of physics systems objects and linear and nonlinear solver objects that simplify development and foster software re-use. We provide a brief review of the Jacobian-free Newton-Krylov nonlinear system solution method and discuss how it fits into our design. Following this, we show results from example calculations and then discuss plans including the extension of the software to distributed-memory computer systems. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Providing soft real-time quality of service guarantees for Java threads | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.663 | ||||||||
Article ID | CPE663 | ||||||||
Author Name(s) | James C. Pang1Gholamali C. Shoja2Eric G. Manning3 | ||||||||
Author Email(s) | jcpang@redback.com1 | ||||||||
Affiliation(s) | Department of Computer Science, University of Victoria, Victoria, BC, Canada V8W 3P6 1 2 Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada V8W 3P6 3 | ||||||||
Keyword(s) | Java Virtual Machine, CPU resource management, QoS, real-time scheduling, move-to-rear list scheduling, | ||||||||
Abstract |
|||||||||
The Java platform has many characteristics that make it very desirable for integrated continuous media processing. Unfortunately, it lacks the necessary CPU resource management facilities to support quality of service (QoS) guarantees for soft real-time multimedia tasks. In this paper, we present our new Java Virtual Machine, Q-JVM, which brings CPU resource management to the Java platform. Q-JVM is based on Sun's JVM version 1.1.5. It implements an enhanced version of the MTR-LS algorithm in its thread scheduler. Combined with admission control that could be implemented in an application-level resource manager, it is able to support QoS parameters such as fairness, bandwidth partitioning and delay bound guarantees, as well as the cumulative service guarantee. Our test results show that Q-JVM is backward compatible with the standard JVM from Sun, has low scheduling overhead, and is able to provide QoS guarantees as specified. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | CCJ: object-based message passing and collective communication in Java | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.664 | ||||||||
Article ID | CPE664 | ||||||||
Author Name(s) | Jason Maassen1Arnold Nelisse2Thilo Kielmann3Henri E. Bal4 | ||||||||
Author Email(s) | jason@cs.vu.nl1 | ||||||||
Affiliation(s) | Division of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands 1 2 3 4 | ||||||||
Keyword(s) | parallel programming, collective communication, | ||||||||
Abstract |
|||||||||
CCJ is a communication library that adds MPI-like message passing and collective operations to Java. Rather than trying to adhere to the precise MPI syntax, CCJ aims at a clean integration of communication into Java's object-oriented framework. For example, CCJ uses thread groups to support Java's multithreading model and it allows any data structure (not just arrays) to be communicated. CCJ is implemented entirely in Java, on top of RMI, so it can be used with any Java virtual machine. The paper discusses three parallel Java applications that use collective communication. It compares the performance (on top of a Myrinet cluster) of CCJ, RMI and mpiJava versions of these applications and also compares their code complexity. A detailed performance comparison between CCJ and mpiJava is given using the Java Grande Forum MPJ benchmark suite. The results show that neither CCJ's object-oriented design nor its implementation on top of RMI impose a performance penalty on applications compared to their mpiJava counterparts. The source of CCJ is available from our Web site http://www.cs.vu.nl/manta. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Supporting dynamic parallel object arrays | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.665 | ||||||||
Article ID | CPE665 | ||||||||
Author Name(s) | Orion S. Lawlor1Laxmikant V. Kalé2 | ||||||||
Author Email(s) | olawlor@acm.org1 | ||||||||
Affiliation(s) | Computer Science Department, University of Illinois at Urbana-Champaign, 1304 W. Springfield, 3315 DCL, Urbana, IL 61801-2987, U.S.A. 1 2 | ||||||||
Keyword(s) | parallel runtime, object migration, parallel hashtable, | ||||||||
Abstract |
|||||||||
We present efficient support for generalized arrays of parallel data driven objects. Array elements are regular C++ objects, and are scattered across the parallel machine. An individual element is addressed by its 'index', which can be an arbitrary object rather than a simple integer. For example, an array index can be a series of numbers, supporting multidimensional sparse arrays; a bit vector, supporting collections of quadtree nodes; or a string. Methods can be invoked on any individual array element from any processor, and the elements can participate in reductions and broadcasts. Individual elements can be created or deleted dynamically at any time. Most importantly, the elements can migrate from processor to processor at any time. We discuss support for message delivery and collective operations in the face of such dynamic behavior. The migration capabilities of array elements have proven extremely useful, for example, in implementing flexible load balancing strategies and for exploiting workstation clusters adaptively. We present the design, an implementation, and performance results. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Sapphire: copying garbage collection without stopping the world | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.712 | ||||||||
Article ID | CPE712 | ||||||||
Author Name(s) | Eliot B. Moss1Richard L. Hudson2 | ||||||||
Author Email(s) | moss@cs.umass.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610, U.S.A. 1Intel Corporation, 2200 Mission College Boulevard, Santa Clara, CA 95052-8119, U.S.A. 2 | ||||||||
Keyword(s) | garbage collection, copying garbage collection, concurrent garbage collection, replicating garbage collection, Java garbage collection, | ||||||||
Abstract |
|||||||||
The growing use in concurrent systems of languages that require garbage collection (GC), such as Java, is raising practical interest in concurrent GC. Sapphire is a new algorithm for concurrent copying GC for Java. It stresses minimizing the amount of time any given application thread may need to block to support the collector. In particular, Sapphire is intended to work well in the presence of a large number of application threads, on small- to medium-scale shared memory multiprocessors. Sapphire extends previous concurrent copying algorithms, and is most closely related to replicating copying collection, a GC technique in which application threads observe and update primarily the old copies of objects. The key innovations of Sapphire are: (1) the ability to 'flip' one thread at a time (changing the thread's view from the old copies of objects to the new copies), as opposed to needing to stop all threads and flip them at the same time; (2) exploiting Java semantics and assuming any data races occur on volatile fields, to avoid a barrier on reads of non-volatile fields; and (3) working in concert with virtually any underlying (non-concurrent) copying collection algorithm. Copyright © 2003 John Wiley & Sons, Ltd |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Editorials | ||||||||
Article Title | Editorial: ACM 2001 Java Grande-ISCOPE (JGI2001) Conference | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | March 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.735 | ||||||||
Article ID | CPE735 | ||||||||
Author Name(s) | Geoffrey C. Fox1 | ||||||||
Author Email(s) | gcf@indiana.edu1 | ||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Spar: a set of extensions to Java for scientific computation | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 3-5 | ||||||||
Date | 03 00 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.659 | ||||||||
Article ID | CPE659 | ||||||||
Author Name(s) | C. van Reeuwijk1F. Kuijlman2H. J. Sips3 | ||||||||
Author Email(s) | C.vanReeuwijk@its.tudelft.nl1 | ||||||||
Affiliation(s) | Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands 1 2 3 | ||||||||
Keyword(s) | Java, multi-dimensional array, tuple, parameterized type, | ||||||||
Abstract |
|||||||||
inline | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Multi-wavelength image space: another Grid-enabled science | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 6 | ||||||||
Date | May 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.715 | ||||||||
Article ID | CPE715 | ||||||||
Author Name(s) | Roy Williams1Bruce Berriman2Ewa Deelman3John Good4Joseph Jacob5Carl Kesselman6Carol Lonsdale7Seb Oliver8Thomas A. Prince9 | ||||||||
Author Email(s) | roy@cacr.caltech.edu1 | ||||||||
Affiliation(s) | Center for Advanced Computing Research, California Institute of Technology 158-79, Pasadena, CA 91125, U.S.A. 12Infrared Processing and Analysis Center, California Institute of Technology 100-22, Pasadena, CA 91125, U.S.A. 2Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Key, CA 90292, U.S.A. 3 4 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109, U.S.A 53Information Sciences Institute, University of Southern California, 4676 Admiralty Way, Marina del Key, CA 90292, U.S.A. 6 7 Astronomy Centre, University of Sussex, Falmer, Brighton BN1 9R, U.K. 8 9 | ||||||||
Keyword(s) | astronomy, mosaicking, computational grid, montage, | ||||||||
Abstract |
|||||||||
We describe how the Grid enables new research possibilities in astronomy through multi-wavelength images. To see sky images in the same pixel space, they must be projected to that space, a computer-intensive process. There is thus a virtual data space induced that is defined by an image and the applied projection. This virtual data can be created and replicated with Planners and Replica catalog technology developed under the GriPhyN project. We plan to deploy our system (MONTAGE) on the U.S. Teragrid. Grid computing is also needed for ingesting data - computing background correction on each image - which forms a separate virtual data space. Multi-wavelength images can be used for pushing source detection and statistics by an order of magnitude from current techniques; for optimization of multi-wavelength image registration for detection and characterization of extended sources; and for detection of new classes of essentially multi-wavelength astronomical phenomena. The paper discusses both the Grid architecture and the scientific goals. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | CORBA request portable interceptors: analysis and applications | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 6 | ||||||||
Date | May 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.716 | ||||||||
Article ID | CPE716 | ||||||||
Author Name(s) | C. Marchetti1R. Baldoni2L. Verde3 | ||||||||
Author Email(s) | marchet@dis.uniroma1.it1 | ||||||||
Affiliation(s) | Dipartimento di Informatica e Sistemistica, Universitą di Roma La Sapienza, Via Salaria 113, 00198, Roma, Italy 1 2 3 | ||||||||
Keyword(s) | interceptors, CORBA, fault-tolerant CORBA, middleware, performance analysis, distributed object computing, | ||||||||
Abstract |
|||||||||
Interceptors are an emerging middleware technology enabling the addition of specific network-oriented capabilities to distributed applications. By exploiting interceptors, developers can register code within interception points, extending the basic middleware mechanisms with specific functionality, e.g. authentication, flow control, caching, etc. Notably, these extensions can be achieved without modifying either the application or the middleware code. In this paper we report the results of our experiences with CORBA request portable interceptors. In particular, we point out (i) the basic mechanisms implementable by these interceptors, i.e. request redirection and piggybacking and (ii) we analyze their limitations. We then propose a proxy-based technique to overcome the interceptors' limitations. Successively, we present a performance analysis carried out on three Java-CORBA platforms currently implementing the portable interceptors specification. Finally, we conclude our work with a case study in which portable interceptors are used to implement the fault-tolerant CORBA client invocation semantic without impacting on the client application code and on the CORBA ORB. We also release fragments of Java code for implementing the described techniques. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | An adaptive parallel genetic algorithm system for i-Computing environment | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 6 | ||||||||
Date | May 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.717 | ||||||||
Article ID | CPE717 | ||||||||
Author Name(s) | Yuk-Yin Wong1Kin-Hong Lee2Kwong-Sak Leung3 | ||||||||
Author Email(s) | clevin.wong@cityu.edu.hk1 | ||||||||
Affiliation(s) | Computing Services Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong 1Department of Computer Science and Engineering, Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 3 | ||||||||
Keyword(s) | Internet computing, distributed computing, parallel genetic algorithm, adaptive genetic algorithm, | ||||||||
Abstract |
|||||||||
Many real-world optimization problems in the scientific and engineering fields can be solved by genetic algorithms (GAs) but it still requires a long execution time for complex problems. At the same time, there are many under-utilized workstations on the Internet. In this paper, we present a self-adaptive parallel GA system named APGAIN, which utilizes the spare power of the heterogeneous workstations on the Internet to solve complex optimization problems. In order to maintain a balance between exploitation and exploration, we have devised a novel probabilistic rule-driven adaptive model (PRDAM) to adapt the GA parameters automatically. APGAIN is implemented on an Internet Computing system called DJM. In the implementation, we discover that DJM's original load balancing strategy is insufficient. Hence the strategy is extended with the job migration capability. The performance of the system is evaluated by solving the traveling salesman problem with data from a public database. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A scalable HPF implementation of a finite-volume computational electromagnetics application on a CRAY T3E parallel system | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 6 | ||||||||
Date | May 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.718 | ||||||||
Article ID | CPE718 | ||||||||
Author Name(s) | Yi Pan1Joseph J. S. Shang2Minyi Guo3 | ||||||||
Author Email(s) | pan@cs.gsu.edu1 | ||||||||
Affiliation(s) | 1Department of Computer Science, Georgia State University, Atlanta, GA 30303, U.S.A 1Air Vehicle Directorate, Air Force Research Laboratory, WPAFB, OH 45433-7521, U.S.A 2Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan 3 | ||||||||
Keyword(s) | parallel computers, execution time, efficiency, scalability, loop parallelization, Cray T3E, high performance Fortran, | ||||||||
Abstract |
|||||||||
The time-dependent Maxwell equations are one of the most important approaches to describing dynamic or wide-band frequency electromagnetic phenomena. A sequential finite-volume, characteristic-based procedure for solving the time-dependent, three-dimensional Maxwell equations has been successfully implemented in Fortran before. Due to its need for a large memory space and high demand on CPU time, it is impossible to test the code for a large array. Hence, it is essential to implement the code on a parallel computing system. In this paper, we discuss an efficient and scalable parallelization of the sequential Fortran time-dependent Maxwell equations solver using High Performance Fortran (HPF). The background to the project, the theory behind the efficiency being achieved, the parallelization methodologies employed and the experimental results obtained on the Cray T3E massively parallel computing system will be described in detail. Experimental runs show that the execution time is reduced drastically through parallel computing. The code is scalable up to 98 processors on the Cray T3E and has a performance similar to that of an MPI implementation. Based on the experimentation carried out in this research, we believe that a high-level parallel programming language such as HPF is a fast, viable and economical approach to parallelizing many existing sequential codes which exhibit a lot of parallelism. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Analysis of a prototype intelligent network interface | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | June 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.720 | ||||||||
Article ID | CPE720 | ||||||||
Author Name(s) | Keith D. Underwood1Walter B. Ligon III2Ron R. Sass3 | ||||||||
Author Email(s) | keithu@parl.clemson.edu1 | ||||||||
Affiliation(s) | Parallel Architecture Research Laboratory, Holcombe Department of Electrical and Computer Engineering, Clemson University, 105 Riggs Hall, Clemson, SC 29634-0915, U.S.A. 1 2 3 | ||||||||
Keyword(s) | intelligent network interface, cluster computing, Beowulf, programmable NIC, | ||||||||
Abstract |
|||||||||
With a focus on commodity PC systems, Beowulf clusters traditionally lack the cutting edge network architectures, memory subsystems, and processor technologies found in their more expensive supercomputer counterparts. Many users find that what Beowulf clusters lack in technology, they more than make up for with their significant cost advantage. In this paper, an architectural extension that adds reconfigurable computing to the network interface of Beowulf clusters is proposed. The proposed extension, called an intelligent network interface card (or INIC), enhances both the network and processor capabilities of the cluster, which has a significant impact on the performance of a crucial class of applications. Furthermore, for some applications, the proposed extension partially compensates for weaknesses in the PC memory subsystem. A prototype of the proposed architecture was constructed and analyzed. In addition, two applications, the 2D Fast Fourier Transform (2D-FFT) and integer sorting, which benefit from the resulting architecture, are discussed and analyzed on the proposed architecture. Specifically, results indicate that the 2D-FFT is performed 15-50% faster when using the prototype INIC rather than the comparable Gigabit Ethernet. Early simulation results also indicate that integer sorting will receive a 21-32% performance boost from the prototype. The significant improvements seen with a relatively limited prototype lead to the conclusion that cluster network interfaces enhanced with reconfigurable computing could significantly improve the Beowulf architecture. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Design and implementation of a user-level Sockets layer over Virtual Interface Architecture | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | June 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.721 | ||||||||
Article ID | CPE721 | ||||||||
Author Name(s) | Jin-Soo Kim1Kangho Kim2Sung-In Jung3Soonhoi Ha4 | ||||||||
Author Email(s) | jinsoo@cs.kaist.ac.kr1 | ||||||||
Affiliation(s) | Division of Computer Science, Korea Advanced Institute of Science and Technology, Daejeon 305-701, South Korea 1Computer Systems Research Department, Electronics and Telecommunications Research Institute, Daejeon 305-350, South Korea 2 3 School of Computer Science and Engineering, Seoul National University, Seoul 151-742, South Korea 4 | ||||||||
Keyword(s) | Virtual Interface Architecture, Sockets, user-level communication architecture, cluster computing, | ||||||||
Abstract |
|||||||||
The Virtual Interface Architecture (VIA) is an industry standard user-level communication architecture for system area networks. The VIA provides a protected, directly-accessible interface to a network hardware, removing the operating system from the critical communication path. In this paper, we design and implement a user-level Sockets layer over VIA, named SOVIA (Sockets Over VIA). Our objective is to use the SOVIA layer to accelerate the existing Sockets-based applications with a reasonable effort and to provide a portable and high-performance communication library based on VIA to application developers. SOVIA realizes comparable performance to native VIA, showing a minimum one-way latency of 10.5 |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | June 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.722 | ||||||||
Article ID | CPE722 | ||||||||
Author Name(s) | Greg Bruno1Philip M. Papadopoulos2Mason J. Katz3 | ||||||||
Author Email(s) | bruno@sdsc.edu1 | ||||||||
Affiliation(s) | San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093-0505, U.S.A. 1 2 3 | ||||||||
Keyword(s) | clusters, Beowulf, cluster management, scalable installation, kickstart, | ||||||||
Abstract |
|||||||||
High-performance computing clusters (commodity hardware with low-latency, high-bandwidth interconnects) based on Linux are rapidly becoming the dominant computing platform for a wide range of scientific disciplines. Yet, straightforward software installation, maintenance, and health monitoring for large-scale clusters has been a consistent and nagging problem for non-cluster experts. The NPACI Rocks distribution takes a fresh perspective on management and installation of clusters to dramatically simplify software version tracking and cluster integration. NPACI Rocks incorporates the latest Red Hat distribution (including security patches) with additional cluster-specific software. Using the identical software tools used to create the base distribution, users can customize and localize Rocks for their site. Strong adherence to widely-used (de facto) tools allows Rocks to move with the rapid pace of Linux development. Version 2.2.1 of the toolkit is available for download and installation. Over 100 Rocks clusters have been built by non-cluster experts at multiple institutions (residing in various countries) providing a peak aggregate of 2 TFLOPS of clustered computing. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | p-Jigsaw: a cluster-based Web server with cooperative caching support | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.723 | ||||||||
Article ID | CPE723 | ||||||||
Author Name(s) | Ge Chen1Cho-Li Wang2Francis C. M. Lau3 | ||||||||
Author Email(s) | gechen@csis.hku.hk1 | ||||||||
Affiliation(s) | Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong 1 2 3 | ||||||||
Keyword(s) | Web object cache, cooperative caching, Global Object Space, hot object, replacement algorithms, cluster, the World Wide Web, | ||||||||
Abstract |
|||||||||
Clustering provides a viable approach to building scalable Web systems with increased computing power and abundant storage space for data and contents. In this paper, we present a pure-Java-based parallel Web server system,p-Jigsaw, which operates on a cluster and uses the technique of cooperative caching to achieve high performance. We introduce the design of an in-memory cache layer, called Global Object Space (GOS), for dynamic caching of frequently requested Web objects. The GOS provides a unified view of cluster-wide memory resources for achieving location-transparent Web object accesses. The GOS relies on cooperative caching to minimize disk accesses. A requested Web object can be fetched from a server node's local cache or a peer node's local cache, with the disk serving only as the last resort. A prototype system based on the W3C Jigsaw server has been implemented on a 16-node PC cluster. Three cluster-aware cache replacement algorithms were tested and evaluated. The benchmark results show good speedups with a real-life access log, proving that cooperative caching can have significant positive impacts on the performance of cluster-based parallel Web servers. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Clusterfile: a flexible physical layout parallel file system | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | June 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.724 | ||||||||
Article ID | CPE724 | ||||||||
Author Name(s) | Florin Isaila1Walter F. Tichy2 | ||||||||
Author Email(s) | florin@ira.uka.de1 | ||||||||
Affiliation(s) | Fakultät für Informatik, Universität Karlsruhe Postfach 6980, 76128 Karlsruhe, Germany 1 2 | ||||||||
Keyword(s) | parallel file system, parallel I/O, physical and logical partitioning, non-contiguous I/O, data redistribution, mapping functions, | ||||||||
Abstract |
|||||||||
This paper presents Clusterfile, a parallel file system that provides parallel file access on a cluster of computers. We introduce a file partitioning model that has been used in the design of Clusterfile. The model uses a data representation that is optimized for multidimensional array partitioning while allowing arbitrary partitions. The paper shows how the file model can be employed for file partitioning into both physical subfiles and logical views. We also present how the conversion between two partitions of the same file is implemented using a general memory redistribution algorithm. We show how we use the algorithm to optimize non-contiguous read and write operations. The experimental results include performance comparisons with the Parallel Virtual File System (PVFS) and an MPI-IO implementation for PVFS. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Using multirail networks in high-performance clusters | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | June 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.725 | ||||||||
Article ID | CPE725 | ||||||||
Author Name(s) | Fabrizio Petrini1Salvador Coll2Eitan Frachtenberg3Adolfy Hoisie4Leonid Gurvits5 | ||||||||
Author Email(s) | fabrizio@lanl.gov1 | ||||||||
Affiliation(s) | CCS-3 Modeling, Algorithms and Informatics Group, Computer and Computational Sciences Division, Los Alamos National Laboratory, Los Alamos, CA, U.S.A. 1 2 3 4 5 | ||||||||
Keyword(s) | communication protocols, high-performance interconnection networks, performance evaluation, routing, communication libraries, parallel architectures, | ||||||||
Abstract |
|||||||||
Using multiple independent networks (also known as rails) is an emerging technique which is being used to overcome bandwidth limitations and enhance fault tolerance of current high-performance parallel computers. In this paper, we present and analyze various algorithms to allocate multiple communication rails, including static and dynamic allocation schemes. An analytical lower bound on the number of rails required for static rail allocation is shown. We also present an extensive experimental comparison of the behavior of various algorithms in terms of bandwidth and latency. We show that striping messages over multiple rails can substantially reduce network latency, depending on average message size, network load and allocation scheme. The methods compared include a static rail allocation, a basic round-robin rail allocation, a local-dynamic allocation based on local knowledge and a dynamic rail allocation that reserves both communication endpoints of a message before sending it. The last method is shown to perform better than the others at higher loads: up to 49% better than local-knowledge allocation and 37% better than the round-robin allocation. This allocation scheme also shows lower latency and it saturates at higher loads (for long enough messages). Most importantly, this proposed allocation scheme scales well with the number of rails and message size. In addition we propose a hybrid algorithm that combines the benefits of the local-dynamic allocation for short messages with those of the dynamic algorithm for large messages. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Editorial | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 7-8 | ||||||||
Date | Jan 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.750 | ||||||||
Article ID | CPE750 | ||||||||
Author Name(s) | Mark Baker1 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Comparing the performance of MPICH with Cray's MPI and with SGI's MPI | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 9 | ||||||||
Date | Aug 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.719 | ||||||||
Article ID | CPE719 | ||||||||
Author Name(s) | Glenn R. Luecke1Marina Kraeva2Lili Ju3 | ||||||||
Author Email(s) | grl@iastate.edu1 | ||||||||
Affiliation(s) | High Performance Computing Group, Iowa State University, Ames, Iowa 50011-2251, U.S.A. 1 2 3 | ||||||||
Keyword(s) | MPI, MPICH, performance, Cray T3E-900, SGI Origin 3000, | ||||||||
Abstract |
|||||||||
The purpose of this paper is to compare the performance of MPICH with the vendor Message Passing Interface (MPI) on a Cray T3E-900 and an SGI Origin 3000. Seven basic communication tests which include basic point-to-point and collective MPI communication routines were chosen to represent commonly-used communication patterns. Cray's MPI performed better (and sometimes significantly better) than Mississippi State University's (MSU's) MPICH for small and medium messages. They both performed about the same for large messages, however for three tests MSU's MPICH was about 20% faster than Cray's MPI. SGI's MPI performed and scaled better (and sometimes significantly better) than MPICH for all messages, except for the scatter test where MPICH outperformed SGI's MPI for 1 kbyte messages. The poor scalability of MPICH on the Origin 3000 suggests there may be scalability problems with MPICH. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | The LINPACK Benchmark: past, present and future | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 9 | ||||||||
Date | Aug 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.728 | ||||||||
Article ID | CPE728 | ||||||||
Author Name(s) | Jack J. Dongarra1Piotr Luszczek2Antoine 3 | ||||||||
Author Email(s) | dongarra@cs.utk.edu1 Petitet2 | ||||||||
Affiliation(s) | 1University of Tennessee, Department of Computer Science, Knoxville, TN 37996-3450, U.S.A. 1 2 Sun Microsystems, Inc., Paris, France 3 | ||||||||
Keyword(s) | benchmarking, BLAS, high-performance computing, HPL, linear algebra, LINPACK, TOP500, | ||||||||
Abstract |
|||||||||
This paper describes the LINPACK Benchmark and some of its variations commonly used to assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the TOP500 and the HPL codes are presented. The latter is frequently used to obtained results for TOP500 submissions. Information is also given on how to interpret the results of the benchmark and how the results fit into the performance evaluation process. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Enhancing OODB semantics to support browsing in an OODB vocabulary representation | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 9 | ||||||||
Date | Aug 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.732 | ||||||||
Article ID | CPE732 | ||||||||
Author Name(s) | James Geller1Li-min Liu2Yehoshua Perl3 | ||||||||
Author Email(s) | geller@la.njit.edu1 | ||||||||
Affiliation(s) | Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 11Department of Applied Mathematics, Chung Yuan Christian University, 22, Pu-Zen, Pu-Chung Li, Chung-Li, Taiwan, Republic of China 2 3 | ||||||||
Keyword(s) | object-oriented databases, object-oriented models, object-oriented systems, knowledge representation, database models, vocabulary systems, object-oriented hierarchical relationships, object-oriented semantic relationships, | ||||||||
Abstract |
|||||||||
In previous work, we have modeled a vocabulary given as a semantic network by an object-oriented database (OODB). The OODB schema thus obtained provides a compact abstract view of the vocabulary. This enables the fast traversal of the vocabulary by a user. In the semantic network vocabulary, the IS-A relationships express the specialization hierarchy. In our OODB modeling of the vocabulary, the SUBCLASS relationship expresses the specialization hierarchy of the classes and supports the inheritance of their properties. A typical IS-A path in the vocabulary has a corresponding shorter SUBCLASS path in the OODB schema. In this paper we expose several cases where the SUBCLASS hierarchy fails to fully correspond to the IS-A hierarchy of the vocabulary. In these cases there exist traversal paths in the semantic network for which there are no corresponding traversal paths in the OODB schema. The reason for this failure is the existence of some IS-A relationships between concepts of two classes, which are not connected by a SUBCLASS relationship. This phenomenon weakens the accuracy of our modeling. To rectify the situation we introduce a new OODB semantic relationship IS-A to represent the existence of IS-A relationships between concepts of a pair of classes which are not connected via a SUBCLASS relationship. The resulting schema contains both SUBCLASS relationships and IS-A relationships which completely model the IS-A hierarchy of the vocabulary. We define a mixed-class level traversal path to contain either SUBCLASS or IS-A relationships. Consequently, each traversal path in the semantic network has a corresponding mixed traversal path in the OODB schema. Hence the introduction of the semantic OODB IS-A relationship improves the modeling of semantic network vocabularies by OODBs. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Implementation of the EARTH programming model on SMP clusters: a multi-threaded language and runtime system | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.729 | ||||||||
Article ID | CPE729 | ||||||||
Author Name(s) | G. Tremblay1C. J. Morrone2J. N. Amaral3G. R. Gao4 | ||||||||
Author Email(s) | amaral@cs.ualberta.ca3 | ||||||||
Affiliation(s) | Département d`informatique, Université du Québecà Montréal, Montréal, QC, Canada 1Computer Architecture and Parallel Systems Laboratory (CAPSL), Department of Electrical and Computer Engineering, University of Delaware, Newark, DE, U.S.A. 2Department of Computing Science, University of Alberta, Edmonton, AB, Canada 3 4 | ||||||||
Keyword(s) | multi-threading, cluster computing, parallel programming language, | ||||||||
Abstract |
|||||||||
This paper describes the design and implementation of an Efficient Architecture for Running THreads (EARTH) runtime system for a multi-processor/multi-node cluster. The (EARTH) model was designed to support the efficient execution of parallel (multi-threaded) programs with irregular fine-grain parallelism using off-the-shelf computers. Implementing an EARTH runtime system requires an explicitly threaded runtime system. For portability, we built this runtime system on top of Pthreads under Linux and used sockets for inter-node communication. Moreover, in order to make the best use of the resources available on a cluster of symmetric multi-processors (SMP), this implementation enables the overlapping of communication and computation. We used Threaded-C, a language designed to implement the programming model supported by the EARTH architecture. This language allows the expression of various levels of parallelism and provides the primitives needed to manage the required communication and synchronization. The Threaded-C programming language supports irregular fine-grain parallelism through a two-level hierarchy of threads and fibers. It also provides various synchronization and communication constructs that reflect the nature of EARTH`s fibers-non-preemptive execution with data-driven scheduling-as well as the extensive use of split-phase transactions on EARTH to execute long-latency operations. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Editorial | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.736 | ||||||||
Article ID | CPE736 | ||||||||
Author Name(s) | Rizos Sakellariou1 | ||||||||
Author Email(s) | .1 | ||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Scalable causal message logging for wide-area environments | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.737 | ||||||||
Article ID | CPE737 | ||||||||
Author Name(s) | Keith Marzullo1Karan Bhatia2Lorenzo Alvisi3 | ||||||||
Author Email(s) | marzullo@cs.ucsd.edu1 | ||||||||
Affiliation(s) | Department of Computer Science and Engineering, University of California, La Jolla, CA 92093-0114, U.S.A. 1NPACI, UCSD, MC 0505, 9500 Gilman Drive, La Jolla, CA 92093-0505, U.S.A. 2Department of Computer Sciences, The University of Texas at Austin, 1 University Station C0500, Austin, TX 78712-0233, U.S.A. 3 | ||||||||
Keyword(s) | grid computing, wide-area computing, fault tolerance, causal message logging, | ||||||||
Abstract |
|||||||||
Wide-area systems are gaining in popularity as an infrastructure for running scientific applications. From a fault tolerance perspective, these environments are challenging because of their scale and their variability. Causal message logging protocols have attractive properties that make them suitable for these environments. They spread fault tolerance information around in the system providing high availability. This information can also be used to replicate objects that are otherwise inaccessible because of network partitions. However, current causal message logging protocols do not scale to thousands or millions of processes. We describe the Hierarchical Causal Message Logging Protocol (HCML) that uses a hierarchy of shared logging sites, or proxies, to significantly reduce the space requirements as compared with existing protocols. These proxies also act as caches for fault tolerance information and reduce the overall message overhead by as much as 50%. HCML also leverages differences in bandwidth between processes that reduces overall message latency by as much as 97%. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Achieving portable and efficient parallel CORBA objects | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.738 | ||||||||
Article ID | CPE738 | ||||||||
Author Name(s) | Christian Pérez1Alexandre Denis2Thierry Priol3 | ||||||||
Author Email(s) | Christian.Perez@irisa.fr1 | ||||||||
Affiliation(s) | IRISA/IFSIC, Campus de Beaulieu, 35042 Rennes Cedex, France 1IRISA/INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France 2 3 | ||||||||
Keyword(s) | CORBA, Grid computing, MPI, code coupling, high-performance network, | ||||||||
Abstract |
|||||||||
With the availability of Computational Grids, new kinds of applications are emerging. They raise the problem of how to program them on such computing systems. In this paper, we advocate a programming model based on a combination of parallel and distributed programming models. Compared to previous approaches, this work aims at bringing single program multiple data (SPMD) programming into CORBA in a portable way. For example, we want to interconnect two parallel codes by CORBA without modifying either CORBA or the parallel communication API. We show that such an approach does not entail any loss of performance compared to previous approaches that required modification to the CORBA standard. Moreover, using an ORB that is able to exploit high-performance networks, we show that portable parallel CORBA objects can efficiently make use of such networks. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | HA-PSLS: a highly available parallel single-level store system | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.739 | ||||||||
Article ID | CPE739 | ||||||||
Author Name(s) | Christine Morin1Anne-Marie Kermarrec2 | ||||||||
Author Email(s) | cmorin@irisa.fr1 | ||||||||
Affiliation(s) | IRISA/INRIA, Campus universitaire de Beaulieu, 35042 Rennes Cedex, France 1Microsoft Research, 7 J J Thomson Avenue, Cambridge CB3 0FB, U.K 2 | ||||||||
Keyword(s) | parallel single-level store, high-availability, fault tolerance, checkpointing, replication, integration, parallel file systems, shared virtual memory, | ||||||||
Abstract |
|||||||||
Parallel single-level store (PSLS) systems integrate a shared virtual memory and a parallel file system. They provide programmers with a global address space including both memory and file data. PSLS systems implemented in a cluster thus represent a natural support for long-running parallel applications, combining both the natural shared memory programming model and a large and efficient file system. However, the need to tolerate failures in such a system increases with the size of applications. In this paper we present a highly-available parallel single level store system (HA-PSLS), which smoothly integrates a backward error recovery high-availability mechanism into a PSLS system. Our system is able to tolerate multiple transient failures, a single permanent failure, and power cut failures affecting the whole cluster, without requiring any specialized hardware. For this purpose, HA-PSLS relies on a high degree of integration (and reusability) of high-availability and standard features. A prototype integrating our high-availability support has been implemented and we show some performance results in the paper. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel application of a novel domain decomposition preconditioner for the adaptive finite-element solution of three-dimensional convection-dominated PDEs | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.740 | ||||||||
Article ID | CPE740 | ||||||||
Author Name(s) | P. K. Jimack1S. A. Nadeem2 | ||||||||
Author Email(s) | pkj@comp.leeds.ac.uk1 | ||||||||
Affiliation(s) | Computational PDEs Unit, School of Computing, University of Leeds, Leeds LS2 9JT, U.K. 1 2 | ||||||||
Keyword(s) | domain decomposition, additive Schwarz, weakly overlapping, convection-diffusion, | ||||||||
Abstract |
|||||||||
We describe and analyse the parallel implementation of a novel domain decomposition preconditioner for the fast iterative solution of linear systems of algebraic equations arising from the discretization of elliptic partial differential equations (PDEs) in three dimensions. In previous theoretical work, this preconditioner has been proved to be optimal for symmetric positive-definite (SPD) linear systems. In this paper, we provide details of our three-dimensional parallel implementation and demonstrate that the technique may be generalized to the solution of non-symmetric algebraic systems, such as those arising when convection-diffusion problems are discretized using either Galerkin or stabilized finite-element methods (FEMs). Furthermore, we illustrate the potential of the preconditioner for use within an adaptive finite-element framework by successfully solving convection-dominated problems on locally, rather than globally, refined meshes. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A distributed object infrastructure for interaction and steering | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.741 | ||||||||
Article ID | CPE741 | ||||||||
Author Name(s) | Manish Parashar1Rajeev Muralidhar2 | ||||||||
Author Email(s) | parashar@caip.rutgers.edu1 | ||||||||
Affiliation(s) | The Applied Software Systems Laboratory, Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, 94 Brett Road, Piscataway, NJ 08854, U.S.A. 1 2 | ||||||||
Keyword(s) | distributed and dynamic objects, computational interaction and steering, sensors and actuators, computational collaboratory, scientific applications, parallel and distributed computing, | ||||||||
Abstract |
|||||||||
This paper presents the design, implementation and experimental evaluation of DIOS (Distributed Interactive Object Substrate), an interactive object infrastructure to enable the runtime monitoring, interaction and computational steering of parallel and distributed applications. DIOS enables application objects (data structures, algorithms) to be enhanced with sensors and actuators so that they can be interrogated and controlled. Application objects may be distributed (spanning many processors) and dynamic (be created, deleted, changed or migrated). Furthermore, DIOS provides a control network that interconnects the interactive objects in a parallel/distributed application and enables external discovery, interrogation, monitoring and manipulation of these objects at runtime. DIOS is currently being used to enable interactive visualization, monitoring and steering of a wide range of scientific applications, including oil reservoir, compressible turbulence and numerical relativity simulations. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Load redundancy elimination on executable code | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 10 | ||||||||
Date | Aug 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.742 | ||||||||
Article ID | CPE742 | ||||||||
Author Name(s) | Manel Fernįndez1Roger Espasa2Saumya Debray3 | ||||||||
Author Email(s) | mfernand@ac.upc.es1 | ||||||||
Affiliation(s) | Computer Architecture Department, Universitat Politčcnica de Catalunya, Barcelona, Spain 1 2 Department of Computer Science, University of Arizona, Tucson, AZ 08034, U.S.A. 3 | ||||||||
Keyword(s) | profile-guided optimizations, redundancy elimination, link-time optimizations, | ||||||||
Abstract |
|||||||||
Optimizations performed at link time or directly applied to final program executables have received increased attention in recent years. This paper discusses the discovery and elimination of redundant load operations in the context of a link-time optimizer, an optimization that we call Load Redundancy Elimination (LRE). Our experiments show that between 50% and 75% of a program's memory references can be considered redundant because they are accessing memory locations that have been referenced less than 200-400 instructions away. We then present three profile-based LRE algorithms targeted at optimizing away these redundancies. Our results show that between 5% and 30% of the redundancy detected can indeed be eliminated, which translates into program speedups of around 8%. We also test our algorithm assuming different cache latencies, and show that, if latencies continue to grow, the load redundancy elimination will become more important. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | SCALEA: a performance analysis tool for parallel programs | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.778 | ||||||||
Article ID | CPE778 | ||||||||
Author Name(s) | Hong-Linh Truong1Thomas Fahringer2 | ||||||||
Author Email(s) | truong@par.univie.ac.at1 | ||||||||
Affiliation(s) | Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1 2 | ||||||||
Keyword(s) | performance analysis, instrumentation, performance overheads, visualization, parallel programs, | ||||||||
Abstract |
|||||||||
Many existing performance analysis tools lack the flexibility to control instrumentation and performance measurement for code regions and performance metrics of interest. Performance analysis is commonly restricted to single experiments. In this paper we present SCALEA, which is a performance instrumentation, measurement, analysis, and visualization tool for parallel programs that supports post-mortem performance analysis. SCALEA currently focuses on performance analysis for OpenMP, MPI, HPF, and mixed parallel programs. It computes a variety of performance metrics based on a novel classification of overhead. SCALEA also supports multi-experiment performance analysis that allows one to compare and to evaluate the performance outcome of several experiments. A highly flexible instrumentation and measurement system is provided which can be controlled by command-line options and program directives. SCALEA can be interfaced by external tools through the provision of a full Fortran90 OpenMP/MPI/HPF frontend that allows one to instrument an abstract syntax tree at a very high-level with C-function calls and to generate source code. A graphical user interface is provided to view a large variety of performance metrics at the level of arbitrary code regions, threads, processes, and computational nodes for single- and multi-experiment performance analysis. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Deep Start: a hybrid strategy for automated performance problem searches | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.779 | ||||||||
Article ID | CPE779 | ||||||||
Author Name(s) | Philip C. Roth1Barton P. Miller2 | ||||||||
Author Email(s) | pcroth@cs.wisc.edu1 | ||||||||
Affiliation(s) | Computer Sciences Department, University of Wisconsin-Madison, Madison, WI 53706, U.S.A. 1 2 | ||||||||
Keyword(s) | automated performance diagnosis, search, sampling, | ||||||||
Abstract |
|||||||||
To attack the problem of scalability of performance diagnosis tools with respect to application code size, we have developed the Deep Start search strategy - a new technique that uses stack sampling to augment an automated search for application performance problems. Our hybrid approach locates performance problems more quickly and finds performance problems hidden from a more straightforward search strategy. The Deep Start strategy uses stack samples collected as a by-product of normal search instrumentation to select deep starters, functions that are likely to be application bottlenecks. With priorities and careful control of the search refinement, our strategy gives preference to experiments on the deep starters and their callees. This approach enables the Deep Start strategy to find application bottlenecks more efficiently and more effectively than a more straightforward search strategy. We implemented the Deep Start search strategy in the Performance Consultant, Paradyn's automated bottleneck detection component. In our tests, Deep Start found half of our test applications' known bottlenecks between 32% and 59% faster than the Performance Consultant's current search strategy, and finished finding bottlenecks between 10% and 61% faster. In addition to improving the search time, Deep Start often found more bottlenecks than the call graph search strategy. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A self-stabilizing token-based k-out-of-![]() |
||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.781 | ||||||||
Article ID | CPE781 | ||||||||
Author Name(s) | A. K. Datta1R. Hadid2V. Villain3 | ||||||||
Author Email(s) | datta@cs.unlv.edu1 | ||||||||
Affiliation(s) | School of Computer Science, University of Nevada Las Vegas, Las Vegas, NV 89154-4019, U.S.A 1LaRIA, Université de Picardie Jules Verne, 33 rue Saint Leu, 80039 Amiens Cedex, France 2 3 | ||||||||
Keyword(s) |
fault-tolerance, |
||||||||
Abstract |
|||||||||
In this paper, we present the first self-stabilizing solution to the k-out-of-![]() ![]() ![]() ![]() |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Articles | ||||||||
Article Title | Fairness in systems based on multiparty interactions | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.782 | ||||||||
Article ID | CPE782 | ||||||||
Author Name(s) | David Ruiz1Rafael Corchuelo2Miguel Toro3 | ||||||||
Author Email(s) | druiz@tdg.lsi.us.es1 | ||||||||
Affiliation(s) | ETSI Informįtica, Avda. de la Reina Mercedes s/n, Sevilla E-41012, Spain 1 2 3 | ||||||||
Keyword(s) | concurrent programs, multiparty interactions, fairness, fair finiteness, conspiracies, | ||||||||
Abstract |
|||||||||
In the context of the Multiparty Interaction Model, fairness is used to insure that an interaction that is enabled sufficiently often in a concurrent program will eventually be selected for execution. Unfortunately, this notion does not take conspiracies into account, i.e. situations in which an interaction never becomes enabled because of an unfortunate interleaving of independent actions; furthermore, eventual execution is usually too weak for practical purposes since this concept can only be used in the context of infinite executions. In this article, we present a new fairness notion, k-conspiracy-free fairness, that improves on others because it takes finite executions into account, alleviates conspiracies that are not inherent to a program, and k may be set a priori to control its goodness to address the above-mentioned problems. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Weak communication in single-hop radio networks: adjusting algorithms to industrial standards | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.783 | ||||||||
Article ID | CPE783 | ||||||||
Author Name(s) |
Tomasz Jurdzi![]() ![]() ![]() ![]() |
||||||||
Author Email(s) | mirekk@im.pwr.wroc.pl1 | ||||||||
Affiliation(s) |
Institute of Computer
Science, Wroc![]() ![]() |
||||||||
Keyword(s) | radio network, initialization, leader election, power consumption, | ||||||||
Abstract |
|||||||||
Quite often algorithms designed for no-collision-detection radio networks use a hidden form of collision detection: it is assumed that a station can simultaneously send and listen. If it cannot hear its own message, apparently the message has been scrambled by another station sending at the same time. Industrial standard IEEE 802.11 says that a station can either send or listen to a radio channel at a given time, but not both. In order to relate the industrial standard and theoretical algorithms we consider a weak radio network model with no collision detection in which a station cannot simultaneously send and receive signals. Otherwise we talk about a strong model. In this paper we consider a measure called energy cost (or 'power consumption') which is equal to the maximum over all stations of the number of steps in which the station is sending or listening. We show that computational power of weak and strong single-hop radio networks differ substantially in the deterministic case: deterministic leader election requires |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Performance analysis and qualitative results of an efficient parallel stochastic simulator for a marine host-parasite system | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.784 | ||||||||
Article ID | CPE784 | ||||||||
Author Name(s) | G. Latu1M. Langlais2J. Roman3P. Silan4 | ||||||||
Author Email(s) | latu@labri.fr1 LaBRI, UMR CNRS 5800, Université Bordeaux 1 et ENSEIRB, 351, cours de la Libération, 33405 Talence, France3 | ||||||||
Affiliation(s) | LaBRI, UMR CNRS 5800, Université Bordeaux 1 et ENSEIRB, 351, cours de la Libération, 33405 Talence, France 1MAB, UMR CNRS 5466, Université Bordeaux 2, 146 Léo Saignat, 33076 Bordeaux Cedex, France 2 3UMR CNRS 5000, Université Montpellier II, Station Méditerranéenne de lÉnvironnement Littoral, 1 Quai de la Daurade, 34200 Sčte, France 4 | ||||||||
Keyword(s) | parallel algorithmics, numerical simulation, stochastic simulation, host-parasite system, Monogenea, fish, | ||||||||
Abstract |
|||||||||
We are interested in a host-parasite system, i.e. the sea bass-Diplectanum aequans system. A discrete mathematical model is used to describe the dynamics of both populations. Our goal is notably to validate the model in the context of aquaculture. A deterministic numerical simulator and, recently, a stochastic simulator were developed to study this biological system. Parallelization is required because the execution times are too long. The Monte Carlo algorithm of the stochastic simulator and its three levels of parallelism are described. Analysis and performances, up to 256 processors, of a hybrid MPI/OpenMP code are then presented for a cluster of symmetric multi-processor (SMP) nodes. Qualitative results are given for the host-macroparasite system simulation. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Special Issue: Euro-Par 2002 | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | Sep 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.777 | ||||||||
Article ID | CPE777 | ||||||||
Author Name(s) | Rainer Feldman1Burkhard Monien2 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 2 | ||||||||
Keyword(s) | |||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel iterative methods for Navier-Stokes equations and application to eigenvalue computation | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | 09 00 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.785 | ||||||||
Article ID | CPE785 | ||||||||
Author Name(s) | Ivan G. Graham1Alastair Spence2Eero Vainikko3 | ||||||||
Author Email(s) | i.g.graham@bath.ac.uk1 | ||||||||
Affiliation(s) | Department of Mathematical Sciences, University of Bath, U.K. 1 2 Institute of Technology, University of Tartu, Estonia 3 | ||||||||
Keyword(s) | parallel iterative methods, domain decomposition, eigenvalues, incompressible Navier-Stokes equations, | ||||||||
Abstract |
|||||||||
We describe the construction of parallel iterative solvers for finite-element approximations of the Navier–Stokes equations on unstructured grids using domain decomposition methods. The iterative method used is FGMRES, preconditioned by a parallel adaptation of a block preconditioner recently proposed by Kay <I>et al.</I> The parallelization is achieved by adapting the technology of our domain decomposition solver <MONO>DOUG</MONO> (previously used for scalar problems) to block-systems. The iterative solver is applied to shifted linear systems that arise in eigenvalue calculations. To illustrate the performance of the solver, we compare several strategies both theoretically and practically for the calculation of the eigenvalues of large sparse non-symmetric matrices arising in the assessment of the stability of flow past a cylinder. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Dynamic query scheduling in parallel data warehouses | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 11-12 | ||||||||
Date | 09 00 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.786 | ||||||||
Article ID | CPE786 | ||||||||
Author Name(s) | Holger Märtens1Erhard Rahm2Thomas Stöhr3 | ||||||||
Author Email(s) | h.maertens@fh-wolfenbuettel.de1 | ||||||||
Affiliation(s) | University of Applied Sciences Braunschweig/Wolfenbüttel, Germany 1University of Leipzig, Institute of Computer Science, Augustusplatz 10-11, D-04109 Leipzig, Germany 2 3 | ||||||||
Keyword(s) | parallel database systems, query optimization, load balancing, data warehousing, Shared Disk architecture, disk contention, | ||||||||
Abstract |
|||||||||
Parallel processing is a key to high performance in very large data warehouse applications that execute complex analytical queries on huge amounts of data. Although parallel database systems (PDBSs) have been studied extensively in the past decades, the specifics of load balancing in parallel data warehouses have not been addressed in detail. In this study, we investigate how the load balancing potential of a Shared Disk (SD) architecture can be utilized for data warehouse applications. We propose an integrated scheduling strategy that simultaneously considers both processors and disks, regarding not only the total workload on each resource but also the distribution of load over time. We evaluate the performance of the new method in a comprehensive simulation study and compare it to several other approaches. The analysis incorporates skew aspects and considers typical data warehouse features such as star schemas with large fact tables and bitmap indices. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Resource management in open Linda systems | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 13 | ||||||||
Date | Nov 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.730 | ||||||||
Article ID | CPE730 | ||||||||
Author Name(s) | Ronaldo Menezes1 | ||||||||
Author Email(s) | rmenezes@cs.fit.edu1 | ||||||||
Affiliation(s) | Florida Institute of Technology, Department of Computer Sciences, 150 West University Boulevard, Melbourne, FL 32901, U.S.A. 1 | ||||||||
Keyword(s) | Linda, garbage collection, open systems, coordination systems, | ||||||||
Abstract |
|||||||||
Coordination systems, in particular Linda, have established themselves as important tools for the development of applications to open systems such as the Internet. This paper shows how to tackle a forgotten, but crucial problem in open coordination systems: memory management. As with any system which intends to be of wide use and because memory is a finite resource, coordination systems must address the problems of memory exhaustion. This paper first explores the orthogonality between coordination and computation in order to make it clear that the problem of memory exhaustion in coordination systems cannot be solved using garbage collection schemes implemented at the computation language - a garbage collection scheme must exist in the coordination environment as well. Following the explanation on orthogonality, the paper will focus on describing a garbage collection scheme for the Linda family of coordination systems. It is expected that the solution in Linda can be adapted to other coordination systems as long as they are based on tuple space communication. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A parallel data assimilation model for oceanographic observations | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 13 | ||||||||
Date | Nov 1 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.726 | ||||||||
Article ID | CPE726 | ||||||||
Author Name(s) | Aad J. van der Steen1Fons van Hees2Peter Jan van Leeuwen3 | ||||||||
Author Email(s) | a.vandersteen@phys.uu.nl1 | ||||||||
Affiliation(s) | High Performance Computing Group, Utrecht University, P.O. Box 80195, 3508 TD Utrecht, The Netherlands 11High Performance Computing Group, Utrecht University, P.O. Box 80195, 3508 TD Utrecht, The Netherlands 2Institute for Marine and Atmospheric Research Utrecht, Utrecht University, P.O. Box 80005, 3508 TA Utrecht, The Netherlands 3 | ||||||||
Keyword(s) | stochastic model, data assimilation, MPI/OpenMP implementation, | ||||||||
Abstract |
|||||||||
In this paper we describe the development of a program that aims at achieving the optimal integration of observed data in an oceanographic model describing the water transport phenomena in the Agulhas area at the tip of South Africa. Two parallel implementations, MPI and OpenMP, are described and experiments with respect to speed and scalability on a Compaq AlphaServer SC and an SGI Origin3000 are reported. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Coordinating components in middleware systems | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.727 | ||||||||
Article ID | CPE727 | ||||||||
Author Name(s) | Matthias Radestock1Susan Eisenbach2 | ||||||||
Author Email(s) | sue@doc.ic.ac.uk2 | ||||||||
Affiliation(s) | LShift, Burbage House, 83-85 Curtain Road, London EC2A 3BS, U.K. 1Department of Computing, Imperial College London, London SW7 2BZ, U.K. 2 | ||||||||
Keyword(s) | coordination, adaptive open distributed systems, traps, message interception, | ||||||||
Abstract |
|||||||||
Configuration coordination | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A set-oriented method definition language for object databases and its semantics | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 14 | ||||||||
Date | Dec 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.731 | ||||||||
Article ID | CPE731 | ||||||||
Author Name(s) | Giovanna Guerrini1Elisa Bertino2Isabella Merlo3 | ||||||||
Author Email(s) | guerrini@disi.unige.it1 | ||||||||
Affiliation(s) | Dipartimento di Informatica e Scienze dell'Informazione, Universitą degli Studi di Genova, Via Dodecaneso 35 - I16146 Genova, Italy 1Dipartimento di Scienze dell'Informazione, Universitą degli Studi di Milano, Via Comelico 39 - I20135 Milano, Italy 2 3 | ||||||||
Keyword(s) | object-oriented database systems, rule-based languages, database programming languages, | ||||||||
Abstract |
|||||||||
In this paper we propose a set-oriented rule-based method definition language for object-oriented databases. Most existing object-oriented database systems exploit a general-purpose imperative object-oriented programming language as the method definition language. Because methods are written in a general-purpose imperative language, it is difficult to analyze their properties and to optimize them. Optimization is important when dealing with a large amount of objects as in databases. We therefore believe that the use of an ad hoc, set-oriented language can offer some advantages, at least at the specification level. In particular, such a language can offer an appropriate framework to reason about method properties. In this paper, besides defining a set-oriented rule-based language for method definition, we formally define its semantics, addressing the problems of inconsistency and non-determinism in set-oriented updates. Moreover, we characterize some relevant properties of methods, such as conflicts among method specifications in sibling classes and behavioral refinement in subclasses. Copyright © 2003 John Wiley & Sons, Ltd. |
|||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Optimizing operating system performance for CC-NUMA architectures | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 14 | ||||||||
Date | 12 10 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.788 | ||||||||
Article ID | CPE788 | ||||||||
Author Name(s) | Moon Seok Chang1 | ||||||||
Author Email(s) | cmoon@us.ibm.com1 | ||||||||
Affiliation(s) | ETRI, 161 Gajeong-Dong, Yuseong-Gu, Daejon 305-350, Korea 1 | ||||||||
Keyword(s) | CC-NUMA, UNIX operating system, remote memory access, data striping, localization, load balancing, | ||||||||
Abstract |
|||||||||
The CC-NUMA (cache-coherent non-uniform memory access) architecture is an attractive solution to scalable servers. The performance of a CC-NUMA system heavily depends on the number of accesses to remote memory through an interconnection network. To reduce the number of remote accesses, an operating system needs to exploit the potential locality of the architecture. This paper describes the design and implementation of a UNIX-based operating system supporting the CC-NUMA architecture. The operating system implements various enhancements by revising kernel algorithms and data structures. This paper also analyzes the performance of the enhanced operating system by running commercial benchmarks on a real CC-NUMA system. The performance analysis shows that the operating system can achieve improved performance and scalability for CC-NUMA by implementing kernel data striping, localization and load balancing. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Frameworks for incorporating semantic relationships into object-oriented database systems | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 15 | ||||||||
Date | Dec 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.733 | ||||||||
Article ID | CPE733 | ||||||||
Author Name(s) | Michael Halper1Li-min Liu2James Geller3Yehoshua Perl4 | ||||||||
Author Email(s) | mhalper@kean.edu1 | ||||||||
Affiliation(s) | Department of Mathematics and Computer Science, Kean University, Union, NJ 07083, U.S.A. 12Department of Applied Mathematics, Chung Yuan Christian University, Chung-Li, Taiwan, Republic of China 2Computer Science Department, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 3 4 | ||||||||
Keyword(s) | semantic relationship, object-oriented database, object-oriented modeling, part-whole relationship, metaclass, | ||||||||
Abstract |
|||||||||
A semantic relationship is a data modeling construct that connects a pair of classes or categories and has inherent constraints and other functionalities that precisely reflect the characteristics of the specific relationship in an application domain. Examples of semantic relationships include part-whole, ownership, materialization and role-of. Such relationships are important in the construction of information models for advanced applications, whether one is employing traditional data-modeling techniques, knowledge-representation languages or object-oriented modeling methodologies. This paper focuses on the issue of providing built-in support for such constructs in the context of object-oriented database (OODB) systems. Most of the popular object-oriented modeling approaches include some semantic relationships in their repertoire of data-modeling primitives. However, commercial OODB systems, which are frequently used as implementation vehicles, tend not to do the same. We will present two frameworks by which a semantic relationship can be incorporated into an existing OODB system. The first only requires that the OODB system support manifest type with respect to its instances. The second assumes that the OODB system has a special kind of metaclass facility. The two frameworks are compared and contrasted. In order to ground our work in existing systems, we show the addition of a part-whole semantic relationship both to the ONTOS DB/Explorer OODB system and the VODAK Model Language. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Performance comparison of checkpoint and recovery protocols | ||||||||
Volume ID | 15 | ||||||||
Issue ID | 15 | ||||||||
Date | Dec 25 2003 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.743 | ||||||||
Article ID | CPE743 | ||||||||
Author Name(s) | Himadri Sekhar Paul1Arobinda Gupta2R. Badrinath3 | ||||||||
Author Email(s) | hpaul@cse.iitkgp.ernet.in1 | ||||||||
Affiliation(s) | Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur 721302, India 1 2 3 | ||||||||
Keyword(s) | checkpoint and recovery, fault tolerance, distributed systems, | ||||||||
Abstract |
|||||||||
Checkpoint and rollback recovery is a well-known technique for providing fault tolerance to long-running distributed applications. Performance of a checkpoint and recovery protocol depends on the characteristics of the application and the system on which it runs. However, given an application and system environment, there is no easy way to identify which checkpoint and recovery protocol will be most suitable for it. Conventional approaches require implementing the application with all the protocols under consideration, running them on the desired system, and comparing their performances. This process can be very tedious and time consuming. This paper first presents the design and implementation of a simulation environment, distributed process simulation or dPSIM, which enables easy implementation and evaluation of checkpoint and recovery protocols. The tool enables the protocols to be simulated under a wide variety of application, system, and network characteristics. The paper then presents performance evaluation of five checkpoint and recovery protocols. These protocols are implemented and executed in dPSIM under different simulated application, system, and network characteristics. Copyright © 2003 John Wiley & Sons, Ltd. | |||||||||