Concurrency and Computation: Practice and Experience

Published Papers for 2004

Journal Vision

WILEY Journal Home Page

Papers under review through 2004

Editorial Board


2004 Volume 16 Articles

Article ID: CPE745

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A comparison of task pools for dynamic load balancing of irregular algorithms
Volume ID 16
Issue ID 1
Date Jan 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.745
Article ID CPE745
Author Name(s) Matthias Korch1Thomas Rauber2
Author Email(s) matthias.korch@uni-bayreuth.de1
Affiliation(s) Faculty of Mathematics, Physics and Computer Science, University of Bayreuth, Bayreuth, Germany 1 2
Keyword(s) task pools, dynamic task scheduling, irregular algorithms, hierarchical radiosity, volume rendering, performance evaluation, threads,
Abstract
Since a static work distribution does not allow for satisfactory speed-ups of parallel irregular algorithms, there is a need for a dynamic distribution of work and data that can be adapted to the runtime behavior of the algorithm. Task pools are data structures which can distribute tasks dynamically to different processors where each task specifies computations to be performed and provides the data for these computations. This paper discusses the characteristics of task-based algorithms and describes the implementation of selected types of task pools for shared-memory multiprocessors. Several task pools have been implemented in C with POSIX threads and in Java. The task pools differ in the data structures to store the tasks, the mechanism to achieve load balance, and the memory manager used to store the tasks. Runtime experiments have been performed on three different shared-memory systems using a synthetic algorithm, the hierarchical radiosity method, and a volume rendering algorithm. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE746

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Security Maintenance Mediation: a technology for preventing unintended security breaches
Volume ID 16
Issue ID 1
Date Jan 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.746
Article ID CPE746
Author Name(s) Roger (Buzz) King1
Author Email(s) roger@cs.colorado.edu1
Affiliation(s) Department of Computer Science, University of Colorado, Boulder, CO 80309, U.S.A. 1
Keyword(s) data security, security policy, data mediation, collaborative systems, Web portals,
Abstract
Web-resident information is becoming ‘smarter’, in the sense that emerging technology will support the annotation of it with ontological terms, which will be used to locate and reuse information. This will pose a great security risk in the form of unintended breaches (as distinct from deliberate invasions). Web-resident information will be far more readily available and relevant, thus causing inadvertent releases of secure information to potentially cause it to be diffusely spread across the Internet. Then as this information is iteratively transformed and integrated with other information, it will become irretrievable and potentially used in a myriad of unpredictable ways. The problem is that ontological annotations, while making information more understandable in its original form, do not provide a means for easily capturing the complex semantics of information that has been transformed via abstraction, aggregation, and integration. This demands the development of a semantically rich way of specifying ‘views’ of Web information, to which security controls can be attached. Also needed is a way for users of secure information to easily and voluntarily blend-and thereby propagate-security controls as information is transformed. Information mediators designed by collaborative teams of experts are proposed as the vehicle for wrapping information, so that at each step of reuse, high-level views and their corresponding integrity controls can be made easily accessible to trusted users who will then be able to ensure their proper maintenance. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE747

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel operation of CartaBlanca on shared and distributed memory computers
Volume ID 16
Issue ID 1
Date Jan 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.747
Article ID CPE747
Author Name(s) N. T. Padial-Collins1W. B. VanderHeyden2D. Z. Zhang3E. D. Dendy4D. Livescu5
Author Email(s) nelylanl@lanl.gov1
Affiliation(s) Los Alamos National Laboratory Theoretical Division and Los Alamos Computer Science Institute, Los Alamos, NM 87545, U.S.A. 1 2 3 4 5
Keyword(s) Java, scientific computing, parallel computation,
Abstract
We describe the parallel performance of the pure Java CartaBlanca code on heat transfer and multiphase fluid flow problems. CartaBlanca is designed for parallel computations on partitioned unstructured meshes. It uses Java"s thread facility to manage computations on each of the mesh partitions. Inter-partition communications are handled by two compact objects for node-by-node communication along partition boundaries and for global reduction calculations across the entire mesh. For distributed calculations, the JavaParty package from the University of Karlsruhe is demonstrated to work with CartaBlanca. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE749

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Performance
Article Title Performance and scalability of MPI on PC clusters
Volume ID 16
Issue ID 1
Date Jan 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.749
Article ID CPE749
Author Name(s) Glenn R. Luecke1Marina Kraeva2Jing Yuan3Silvia Spanoyannis4
Author Email(s) grl@iastate.edu1
Affiliation(s) 291 Durham Center, Iowa State University, Ames, IA 50011, U.S.A. 1 2 3 4
Keyword(s) PC clusters, MPI, MPI performance,
Abstract
The purpose of this paper is to compare the communication performance and scalability of MPI communication routines on a Windows Cluster, a Linux Cluster, a Cray T3E-600, and an SGI Origin 2000. All tests in this paper were run using various numbers of processors and two message sizes. In spite of the fact that the Cray T3E-600 is about 7 years old, it performed best of all machines for most of the tests. The Linux Cluster with the Myrinet interconnect and Myricom"s MPI performed and scaled quite well and, in most cases, performed better than the Origin 2000, and in some cases better than the T3E. The Windows Cluster using the Giganet Full Interconnect and MPI/Pro"s MPI performed and scaled poorly for small messages compared with all of the other machines. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE770

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Group-SPMD programming with orthogonal processor groups
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.770
Article ID CPE770
Author Name(s) Thomas Rauber1Robert Reilein2Gudula Rünger3
Author Email(s) rauber@uni-bayreuth.de1
Affiliation(s) Department of Mathematics and Physics, University of Bayreuth, 95440 Bayreuth, Germany 1Department of Computer Science, Chemnitz University of Technology, 09111 Chemnitz, Germany 2 3
Keyword(s) communication library, group-SPMD, orthogonal processor groups, distributed memory,
Abstract
Many programs for message-passing machines can benefit from an implementation in a group-SPMD programming model due to the potential to reduce communication overhead and to increase scalability. In this paper, we consider group-SPMD programs exploiting different orthogonal processor partitions in one program. For each program this is a fixed set of predefined processor partitions given by the parallel hyperplanes of a two- or multi-dimensional virtual processor organization. We introduce a library built on top of MPI to support the programming with those orthogonal processor groups. The parallel programming model is appropriate for applications with a multi-dimensional task grid and task dependencies mainly aligned in the dimensions of the task grid. The library can be used to specify the appropriate processor partitions, which are then created by the library, and to define the mapping of tasks to the processor hyperplanes. Examples from numerical analysis illustrate the programming style and show that the runtime on distributed memory machines can be considerably reduced by using the library. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE768

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Managing distributed shared arrays in a bulk-synchronous parallel programming environment
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.768
Article ID CPE768
Author Name(s) Christoph W. Kessler1
Author Email(s) chrke@ida.liu.se1
Affiliation(s) Programming Environments Laboratory (PELAB), Department of Computer Science, Linkμping University, S-581 83 Linkμping, Sweden 1
Keyword(s) NestStep, BSP model, bulk-synchronous parallelism, parallel programming language, distributed shared array, runtime scheduling of communication,
Abstract
NestStep is a parallel programming language for the BSP (bulk-hronous parallel) programming model. In this article we describe the concept of distributed shared arrays in NestStep and its implementation on top of MPI. In particular, we present a novel method for runtime scheduling of irregular, direct remote accesses to sections of distributed shared arrays. Our method, which is fully parallelized, uses conventional two-sided message passing and thus avoids the overhead of a standard implementation of direct remote memory access based on one-sided communication. The main prerequisite is that the given program is structured in a BSP-compliant way. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE767

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Compiling data-parallel programs for clusters of SMPs
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.767
Article ID CPE767
Author Name(s) Siegfried Benkner1Thomas Brandes2
Author Email(s) sigi@ieee.org1
Affiliation(s) Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1SCAI-Fraunhofer Institute for Algorithms and Scientific Computing, Schloß Birlinghoven, D-53754 St. Augustin, Germany 2
Keyword(s) SMP clusters, parallel programming, HPF, OpenMP, MPI, hybrid parallelization,
Abstract
Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level application programming interfaces MPI and OpenMP, forcing users to deal with both distributed-memory and shared-memory parallelization details. In this paper we present extensions of High Performance Fortran (HPF) for SMP clusters which enable the compiler to adopt a hybrid parallelization strategy, efficiently combining distributed-memory with shared-memory parallelism. By means of a small set of new language features, the hierarchical structure of SMP clusters may be specified. This information is utilized by the compiler to derive inter-node data mappings for controlling distributed-memory parallelization across the nodes of a cluster and intra-node data mappings for extracting shared-memory parallelism within nodes. Additional mechanisms are proposed for specifying inter- and intra-node data mappings explicitly, for controlling specific shared-memory parallelization issues and for integrating OpenMP routines in HPF applications. The proposed features have been realized within the ADAPTOR and VFC compilers. The parallelization strategy for clusters of SMPs adopted by these compilers is discussed as well as a hybrid-parallel execution model based on a combination of MPI and OpenMP. Experimental results indicate the effectiveness of the proposed features. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE771

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A compiler for multiple memory models
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.771
Article ID CPE771
Author Name(s) S. P. Midkiff1J. Lee2D. A. Padua3
Author Email(s) smidkiff@purdue.edu1
Affiliation(s) School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47905-2305, U.S.A. 1School of Computer Science and Engineering, Seoul National University, Seoul, Korea 2Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL 61801, U.S.A. 3
Keyword(s) memory models, Java, explicit parallelism, compilers, program optimization, program analysis,
Abstract
The design of consistency models for both hardware and software is a difficult task. For a programming language, it is particularly difficult because the target audience for a high-level programming language is much wider than the target audience for a machine language, making usability a more important criterion. Exacerbating this problem is the reality that the programming language community has little experience designing programming language consistency models, and therefore each new attempt is very much a voyage into uncharted territory. A concrete example of the difficulties of the task is the current Java Memory Model. Although designed to be easy to use by Java programmers, it is poorly understood and at least one common idiom (the ‘double check idiom’) to exploit the model is unsafe. In this paper, we describe the design of an optimizing Java compiler that will accept either as input or as an interface implementation a consistency model for the code to be compiled. The compiler will use Shasha and Snir's delay set analysis, and our CSSA program representation to provide a canonical representation for the effects of different consistency models on optimizations and analysis. The compiler will serve as a testbed to prototype new memory models, and to measure the effects of different memory models on program performance. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE772

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Space-time mapping and tiling: a helpful combination
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.772
Article ID CPE772
Author Name(s) Martin Griebl1Peter Faber2Christian Lengauer3
Author Email(s) griebl@fmi.uni-passau.de1
Affiliation(s) Department of Mathematics and Informatics, University of Passau, D-94030 Passau, Germany 1 2 3
Keyword(s) automatic parallelization, loop transformation, tiling, space-time mapping, polyhedron model, loop optimization,
Abstract
Tiling is a well-known technique for sequential compiler optimization, as well as for automatic program parallelization. However, in the context of parallelization, tiling should not be considered as a stand-alone technique,but should be applied after a dedicated parallelization phase, in our case after space-time mapping.We show how tiling can benefit from space-time mapping, and we derive an algorithm for computing tiles which can minimize the number of communication startups, taking the number of physically available processors into account. We also present how the use of a simple cost model reduces real execution time. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE773

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The effect of cache models on iterative compilation for combined tiling and unrolling
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.773
Article ID CPE773
Author Name(s) P. M. W. Knijnenburg1T. Kisuki2K. Gallivan3M. F. P. O'Boyle4
Author Email(s) peterk@liacs.nl1
Affiliation(s) LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands 1 2 Department of Computer Science, Florida State University, U.S.A. 3Institute for Computing Systems Architecture, Edinburgh University, U.K. 4
Keyword(s) compiler optimization, phase ordering problem, cache models,
Abstract
Iterative compilation, where we search for the best program transformation based on profile feedback information, has been highly successful in determining good program optimizations, typically outperforming all static approaches. However, this benefit has come at a price, namely, a large number of executions of the target program which in many cases is unacceptable. This paper is concerned with reducing the number of program executions needed by iterative compilation by incorporating static models. We examine how static models may be incorporated into a purely iterative approach by developing several characterized heuristics. We empirically explore the tradeoff between reducing the number of executions required and the ultimate performance achieved for differing parameters or slack factors. We focus on tiling and unrolling as transformations and cache models as a test case for the feasibility of this approach. We show that a highly accurate model, used as a filter to profiling and appropriately parameterized, can reduce the number of executions by 50%. We also show that using a simple model to rank transformations and profiling, only those with the highest ranking can reduce the number of executions even further up to 70%, in cases where there is only a limited number of profiles available. We conclude that a production compiler might perform best using the last approach, gaining the benefit of iterative compilation at a much reduced cost. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE774

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A fast and accurate method for determining a lower bound on execution time
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.774
Article ID CPE774
Author Name(s) G. Fursin1M. F. P. O'Boyle2O. Temam3G. Watts4
Author Email(s) g.fursin@ed.ac.uk1
Affiliation(s) ICSA, School of Informatics, University of Edinburgh, Edinburgh EH9 3JZ, U.K. 1 2 Laboratoire de Recherche en Informatique, Bâtiment 490, Université Paris Sud, 91405 Orsay Cedex, France 3 4
Keyword(s) memory performance, optimization tool, memory latency analysis,
Abstract
In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE775

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel object-oriented framework optimization
Volume ID 16
Issue ID 2-3
Date 02 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.775
Article ID CPE775
Author Name(s) Daniel J. Quinlan1Markus Schordan2Brian Miller3Markus Kowarschik4
Author Email(s) dquinlan@llnl.gov1
Affiliation(s) Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 1 2 3 System Simulation Group, Department of Computer Science, University of Erlangen-Nuremberg, Germany 4
Keyword(s) telescoping languages, AST restructuring tools, semantics-based transformations, object-oriented optimizations,
Abstract
Sophisticated parallel languages are difficult to develop; most parallel distributed memory scientific applications are developed using a serial language, expressing parallelism through third party libraries (e.g. MPI). As a result, frameworks and libraries are often used to encapsulate significant complexities. We define a novel approach to optimize the use of libraries within applications. The resulting tool, named ROSE, leverages the additional semantics provided by library-defined abstractions enabling library specific optimization of application codes. It is a common perception that performance is inversely proportional to the level of abstraction. Our work shows that this is not the case if the additional semantics can be leveraged. We show how ROSE can be used to leverage the semantics within the compile-time optimization. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE766

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorials
Article Title Special Issue: Compilers for Parallel Computers (CPC 2001)
Volume ID 16
Issue ID 2-3
Date Feb 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.766
Article ID CPE766
Author Name(s) . .1
Author Email(s)
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE769

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Data partitioning‐based parallel irregular reductions
Volume ID 16
Issue ID 2-3
Date Feb 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.769
Article ID CPE769
Author Name(s) Eladio Guti?rrez1Oscar Plata2Emilio L. Zapata3
Author Email(s) eladio@ac.uma.es1
Affiliation(s) Department of Computer Architecture, University of Malaga, E-29071 Malaga, Spain 1 2 3
Keyword(s) irregular reductions, data locality, workload balancing, shared-memory multiprocessor, NUMA machines,
Abstract
Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain. The second technique handles a special case of load imbalance which occurs whenever a large number of write operations are concentrated on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods. Copyright @ 2004 John Wiley & Sons, Ltd.

Article ID: CPE776

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title <SC>SWARP</SC>: a retargetable preprocessor for multimedia instructions
Volume ID 16
Issue ID 2-3
Date Feb 1 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.776
Article ID CPE776
Author Name(s) Gilles Pokam1St?phane Bihan2Julien Simonnet3Fran?ois Bodin4
Author Email(s) gilles.pokam@irisa.fr1
Affiliation(s) IRISA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France 1 2 3 4
Keyword(s) multimedia instructions, compiler optimizations, embedded processors,
Abstract
In this paper, we propose SWARP, a retargetable preprocessor for exploiting multimedia instructions. The system mixes loop distribution, unrolling and pattern matching to exploit complex multimedia instructions. Contrary to all available systems, it can be extended at the user level. Using with a TriMedia processor we show that our system achieves important good code quality with a set of frequently used loop kernels for multimedia applications. Copyright @ 2004 John Wiley & Sons, Ltd.

Article ID: CPE751

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A comparison of concurrent programming and cooperative multithreading under load balancing applications
Volume ID 16
Issue ID 4
Date 04 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.751
Article ID CPE751
Author Name(s) Justin T. Maris1Aaron W. Keen2Takashi Ishihara3Ronald A. Olsson4
Author Email(s) olsson@cs.ucdavis.edu4
Affiliation(s) Department of Computer Science, University of California, Davis, CA 95616, U.S.A. 1 2 3 4
Keyword(s) cooperative multithreading, concurrent programming, load balancing, parallel and distributed programming languages, synchronization mechanisms,
Abstract
Two models of thread execution are the general concurrent programming execution model (CP) and the cooperative multithreading execution model (CM). CP provides nondeterministic thread execution where context switches occur arbitrarily. CM provides threads that execute one at a time until they explicitly choose to yield the processor. This paper focuses on a classic application to reveal the advantages and disadvantages of load balancing during thread execution under CP and CM styles; results from a second classic application were similar. These applications are programmed in two different languages (SR and Dynamic C) on different hardware (standard PCs and embedded system controllers). An SR-like run-time system, DesCaRTeS, was developed to provide interprocess communication for the Dynamic C implementations. This paper compares load balancing and non-load balancing implementations; it also compares CP and CM style implementations. The results show that in cases of very high or very low workloads, load balancing slightly hindered performance; and in cases of moderate workload, both SR and Dynamic C implementations of load balancing generally performed well. Further, for these applications, CM style programs outperform CP style programs in some cases, but the opposite occurs in some other cases. This paper also discusses qualitative tradeoffs between CM style programming and CP style programming for these applications. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE752

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title OpenMP-oriented applications for distributed shared memory architectures
Volume ID 16
Issue ID 4
Date 04 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.752
Article ID CPE752
Author Name(s) Ami Marowka1Zhenying Liu2Barbara Chapman3
Author Email(s) amimar2@yahoo.com1
Affiliation(s) Department of Computer Science, The University of Houston, Houston, TX 77204-3010, U.S.A. 1 2 3
Keyword(s) OpenMP, data locality, NAS Parallel Benchmarks, programming model,
Abstract
The rapid rise of OpenMP as the preferred parallel programming paradigm for small-to-medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model-of-choice for large scale high-performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc-NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy-inside-copy-back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP-SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse-grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE763

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Proceedings of the 10th International Symposium on High Performance Distributed Computing (HPDC-10)
Volume ID 16
Issue ID 4
Date 04 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.763
Article ID CPE763
Author Name(s) Anand Natrajan1Michael Crowley2Nancy Wilkins-Diehr3Marty A. Humphrey4Anthony D. Fox5Andrew S. Grimshaw6Charles L. Brooks7
Author Email(s) anand@virginia.edu1
Affiliation(s) Department of Computer Science, University of Virginia, 151 Engineer's Way, Charlottesville, VA 22904, U.S.A. 1Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, U.S.A. 2San Diego Supercomputing Center, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, U.S.A. 3 4 5 6 7
Keyword(s) Legion, computational Grid, CHARMM, protein folding,
Abstract
One benefit of a computational Grid is the ability to run high-performance applications over distributed resources simply and securely. We demonstrated this benefit with an experiment in which we studied the protein-folding process with the CHARMM molecular simulation package over a Grid managed by Legion, a Grid operating system. High-performance applications can take advantage of Grid resources if the Grid operating system provides both low-level functionality as well as high-level services. We describe the nature of services provided by Legion for high-performance applications. Our experiences indicate that human factors continue to play a crucial role in the configuration of Grid resources, underlying resources can be problematic, Grid services must tolerate underlying problems or inform the user, and high-level services must continue to evolve to meet user requirements. Our experiment not only helped a scientist perform an important study, but also showed the viability of an integrated approach such as Legion's for managing a Grid. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE748

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines
Volume ID 16
Issue ID 4
Date March 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.748
Article ID CPE748
Author Name(s) Sotirios G Ziavras1Xiaofang Wang2
Author Email(s) ziavras@njit.edu1 ziavras@njit.edu2
Affiliation(s) Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 1 2
Keyword(s) FPGA, LU factorization, matrix inversion, parallel processing, hardware design, SOPC/SOC,
Abstract
Configurable computing, where hardware resources are configured appropriately to match specific hardware designs, has recently demonstrated its ability to significantly improve performance for a wide range of computation-intensive applications. With steady advances in silicon technology, as predicted by Moore's Law, Field-Programmable Gate Array (FPGA) technologies have enabled the implementation of System-on-a-Programmable-Chip (SOPC or SOC) computing platforms, which, in turn, have given a significant boost to the field of configurable computing. It is possible to implement various specialized parallel machines in a single silicon chip. In this paper, we describe our design and implementation of a parallel machine on an SOPC development board, using multiple instances of a soft IP configurable processor; we use this machine for LU factorization. LU factorization is widely used in engineering and science to solve efficiently large systems of linear equations. Our implementation facilitates the efficient solution of linear equations at a cost much lower than that of supercomputers and networks of workstations. The intricacies of our FPGA-based design are presented along with tradeoff choices made for the purpose of illustration. Performance results prove the viability of our approach. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE819

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title An approach for quality of service adaptation in service-oriented Grids
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.819
Article ID CPE819
Author Name(s) Rashid Al-Ali1Abdelhakim Hafid2Omer Rana3David Walker4
Author Email(s) rashid@cs.cardiff.ac.uk1
Affiliation(s) Department of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, Wales, U.K. 1Telcordia Technologies, Inc., 331 Newman Springs Road, Red Bank, NJ 07701, U.S.A. 2 3 4
Keyword(s) service-oriented grids, quality of service (QoS), QoS adaptation,
Abstract
Some applications utilizing Grid computing infrastructure require the simultaneous allocation of resources, such as compute servers, networks, memory, disk storage and other specialized resources. Collaborative working and visualization is one example of such applications. In this context, quality of service (QoS) is related to Grid services, and not just to the network connecting these services. With the emerging interest in service-oriented Grids, resources may be advertised and traded as services based on a service level agreement (SLA). Such a SLA must include both general and technical specifications, including pricing policy and properties of the resources required to execute the service, to ensure QoS requirements are satisfied. An approach for QoS adaptation is presented to enable the dynamic adjustment of behavior of an application based on changes in the pre-defined SLA. The approach is particularly useful if workload or network traffic changes in unpredictable ways during an active session. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE818

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Middleware for Grid Computing
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.818
Article ID CPE818
Author Name(s) . .1
Author Email(s) .1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE820

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Grids of agents for computer and telecommunication network management
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.820
Article ID CPE820
Author Name(s) M. D. Assunção1F. L. Koch2C. B. Westphall3
Author Email(s) assuncao@lrg.ufsc.br1
Affiliation(s) Network and Management Laboratory, Federal University of Santa Catarina, Florianópolis, SC, 88049-970, P.O. 476, Brazil 1Intelligent Systems Group, University of Utrecht, The Netherlands 2 3
Keyword(s) Grid of agents, Grid computing, network management,
Abstract
The centralized system approach for computer and telecommunication network management has been presenting scalability problems along with the growth in the amount and diversity of managed equipment. Moreover, the increase in complexity of the services being offered through the networks also contributes to adding extra workload to the management station. The amount of data that must be handled and processed by only one administration point could lead to a situation where there is not enough processing and storage power to carry out an efficient job. In this work we present an alternative approach by creating a highly distributed computing environment through the use of Grids of autonomous agents to analyze large amounts of data, which reduce the processing costs by optimizing the load distribution and resource utilization. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE822

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Applying the reflective middleware approach in Grid computing
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.822
Article ID CPE822
Author Name(s) Geoff Coulson1Gordon Blair2Nikos Parlavantzas3Wai Kit Yeung4Wei Cai5
Author Email(s) geoff@comp.lancs.ac.uk1
Affiliation(s) Distributed Multimedia Research Group, Computing Department, Lancaster University, Lancaster LA1 4YR, U.K. 1 2 3 4 5
Keyword(s) Grid computing, reflective middleware, component-based middleware, quality of service,
Abstract
Significant progress has been made in the design and development of object-based distributed systems and, more recently, in component-based reflective middleware platforms-i.e. middleware platforms that, through reflection, can be flexibly configured, and runtime adapted/reconfigured-especially in terms of non-functional properties like quality of service (QoS). In this paper we consider how ideas and principles from such platforms can usefully be applied in a Grid context, specifically, an Open Grid Services Architecture (OGSA) context. To this end, we identify weaknesses in OGSA in the areas of generic service provision, scalability/performance engineering, and, especially, lack of support for QoS specification and realization, and suggest how these can be alleviated. We believe that such areas will become increasingly important as sophisticated e-Science applications start to exploit the potential of OGSA's service-based architecture. We also report on our current research on developing a binding-type framework, a resource management framework, and a set of performance optimizations for an ongoing OGSA implementation we are working on. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE826

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance evaluation of market-based resource allocation for Grid computing
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.826
Article ID CPE826
Author Name(s) J. Gomoluch1M. Schroeder2
Author Email(s) ck655@soi.city.ac.uk1
Affiliation(s) Department of Computing, City University, Northampton Square, London EC1V 0HB, U.K. 1 2
Keyword(s) Grid computing, market-based resource allocation, simulation, performance evaluation,
Abstract
Resource allocation is an important aspect of Grid computing. Over the past few years, various systems have been developed which use market mechanisms to allocate resources. However, the performance of such policies has not been sufficiently studied. In this paper, we investigate under which circumstances market-based resource allocation by continuous double auctions and by the proportional share protocol, respectively, outperforms a conventional round-robin approach. We develop a model for clients, servers and the market, and present simulation results. Factors which are studied include the amount of load in the system, the number of resources, different degrees of resource heterogeneity, and communication delays. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE821

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.821
Article ID CPE821
Author Name(s) Vinod E. F Rebello1
Author Email(s) vinod@ic.uff.br1
Affiliation(s) Instituto de Computa??o, Universidade Federal Fluminense, Bloco E, 3? Andar, Rua Passo da P?tria 156, S?o Domingos, Niter?i, CEP 24210-240 1
Keyword(s) Grid computing, system-aware applications, Grid middleware, MPI programs, task scheduling,
Abstract
One of the goals of the Grid is to aggregate collections of shared, heterogeneous, and distributed resources to provide computational power to parallel applications. However, designing applications capable of exploiting this potential with ease remains a challenge. This paper outlines the EasyGrid methodology for the efficient and robust execution of (legacy) MPI programs across distributed computing clusters. The principal objective of this work is to identify the application-oriented middleware necessary for, as well as to develop a framework to automatically generate, system-aware applications capable of executing in dynamic, unstable, distributed environments such as computational Grids. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE823

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Global multimedia collaboration system
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.823
Article ID CPE823
Author Name(s) Wenjun Wu1Geoffrey Fox2Ahmet Uyar3Hasan Bulut4Shrideep Pallickara5
Author Email(s) wewu@indiana.edu1
Affiliation(s) Community Grids Laboratory, Indiana University, Indiana University Research Park, 501 North Morton Street, Suite 222, Bloomington, IN 47404, U.S.A. 1 2 3 4 5
Keyword(s) collaboration, Web services, XGSP, Admire, NaradaBrokering,
Abstract
In order to build an integrated collaboration system over heterogeneous collaboration technologies, we propose a global multimedia collaboration system (Global-MMCS) based on XGSP A/V Web-Services framework. This system can integrate multiple A/V services, and support various collaboration clients and communities. Now the prototype is being developed and deployed across many universities in U.S.A. and China. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE824

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title InteGrade: object-oriented Grid middleware leveraging the idle computing power of desktop machines
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.824
Article ID CPE824
Author Name(s) Fabio Kon1Andrei Goldchleger2Alfredo Goldman3Marcelo Finger4Germano Capistrano Bezerra5
Author Email(s) kon@ime.usp.br1
Affiliation(s) Department of Computer Science, University of S?o Paulo, Rua do Mat?o 1010, 05508-090 S?o Paulo, SP, Brazil 1 2 3 4 5
Keyword(s) Grid computing, distributed object systems, CORBA, usage pattern analysis,
Abstract
Grid computing technology improves the computing experiences at organizations by effectively integrating distributed computing resources. However, just a small fraction of currently available Grid infrastructures focuses on reutilization of existing commodity computing resources. This paper introduces InteGrade, a novel object-oriented middleware Grid infrastructure that focuses on leveraging the idle computing power of shared desktop machines. Its features include support for a wide range of parallel applications and mechanisms to assure that the owners of shared resources do not perceive any loss in the quality of service. A prototype implementation is under construction and the current version is available for download. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE825

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A model for parallel job scheduling on dynamical computer Grids
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.825
Article ID CPE825
Author Name(s) Alfredo Goldman1Carlos Queiroz2
Author Email(s) gold@ime.usp.br1
Affiliation(s) Department of Computer Science, University of S?o Paulo, Rua do Matão 1010, 05508-090 São Paulo, SP, Brazil 1 2
Keyword(s) Grid computing, multi-site execution, scheduling of parallel applications,
Abstract
This work presents a model that allows the execution of parallel applications in a Grid environment. Our main focus is on how to share the idle cycles of clusters and computers to execute real parallel applications. We present a new model which introduces the notions of locality and adaptability. The locality is used for job allocation, and for job migration. The adaptability provides a simple mechanism to allow clusters to join or leave a Grid. We also propose the middleware architecture to implement the model, and provide some simulation results to show the main characteristics of the model. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE827

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Fine-grained authorization for job execution in the Grid: design and implementation
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.827
Article ID CPE827
Author Name(s) K. Keahey1V. Welch2S. Lang3B. Liu4S. Meder 5
Author Email(s) keahey@mcs.anl.gov1
Affiliation(s) Mathematics and Computer Science Division, Argonne National Laboratory, Building 221, 9700 South Cass Avenue, Argonne, IL 60439, U.S.A. 1University of Chicago, 5801 South Ellis Avenue, Chicago, IL, U.S.A. 2 3 University of Houston, 4800 Calhoun Road, Houston, TX, U.S.A. 4 5
Keyword(s) Grids, authorization, policy enforcement, resource management,
Abstract
In this paper, we describe our work on enabling fine-grained authorization for resource usage and management. We address the need of virtual organizations to enforce their own polices in addition to those of the resource owners, with regards to both resource consumption and job management. To implement this design, we propose changes and extensions to the Globus Toolkit's version 2 resource management mechanism. We describe the prototype and policy language that we have designed to express fine-grained policies and present an analysis of our solution. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE828

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A resource management framework for interactive Grids
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.828
Article ID CPE828
Author Name(s) Raj Kumar1Vanish Talwar2Sujoy Basu3
Author Email(s) raj.kumar@hp.com1
Affiliation(s) Hewlett-Packard Labs, 1501 Page Mill Road, MS 1181, Palo Alto, CA 94304, U.S.A. 1 2 3
Keyword(s) Grid computing, interactive sessions, resource management, agents,
Abstract
Traditional use of Grid computing systems has been for batch jobs in the scientific and academic computing. We envision the next generation Grid computing systems to support graphical interactive sessions. In this paper, we propose a resource management framework for supporting graphical interactive sessions in a Grid computing system. We describe the high-level architectural resource management framework distributed among the submission node, central scheduler node, and the execution node. We then describe in detail the resource management framework on the execution node. The description of the resource management framework on the scheduler node is kept at a high level in this paper. The framework on execution nodes consists of Resource Management Agents, an Admission Control system and Application Predictor system. The agents on the execution node are Startup Agents, Sensor Agents, Monitoring Agents, Aggregator Agents, Enforcement Agents and Registration Agents. The Session Admission Control system is responsible for determining if a new application session can be admitted to the execution node. An Application Predictor system is responsible for predicting the resource utilization behavior of applications based on data obtained from the Resource Management Agents. The proposed framework allows for implementation of a scalable and extensible middleware for interactive Grid resource management. It supports fine-grained performance guarantees specified in service level agreements and brings forth some important and novel contributions to enable graphical interactive sessions on Grids. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE829

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title GridSphere: a portal framework for building collaborations
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.829
Article ID CPE829
Author Name(s) Oliver Wehrens1Jason Novotny2Michael Russell3
Author Email(s) wehrens@aei.mpg.de1
Affiliation(s) Max-Planck-Institut fur Gravitationsphysik, Albert-Einstein-Institut, Am Mhlenberg 1, D-14476 Golm, Germany 1 2 3
Keyword(s) Grid computing, portals, portlets,
Abstract
Grid-enabled portals are becoming increasingly popular as a platform for providing access to Grid services and resources. Unfortunately, much of the work done in portal development has led to vertically layered solutions that work for a particular project but are difficult to extend or reuse for other projects. The GridSphere portal framework seeks to address these limitations by providing a framework that will offer external developers a model for easily adding new functionality and hence increasing community collaboration. The GridLab portal will serve as an initial prototype to showcase the GridSphere framework and provide access to services being developed within the GridLab project. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE830

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title CoDIMS: an adaptable middleware system for scientific visualization in Grids
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.830
Article ID CPE830
Author Name(s) G. Giraldi1F. Porto2J. C. Oliveira3R. Silva4B. Schulze5
Author Email(s) gilson@lncc.br1
Affiliation(s) National Laboratory for Scientific Computing, Department of Computer Science, Av. Getulio Vargas 333, 25651-075, Petrópolis, RJ, Brazil 1Department of Systems Engineering, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil 2 3 4 5
Keyword(s) Grid computing, middleware, scientific visualization, particle tracing,
Abstract
In this paper we propose a middleware infrastructure adapted for supporting scientific visualization applications over a Grid environment. We instantiate a middleware system from CoDIMS, which is an environment for the generation of configurable data integration middleware systems. CoDIMS adaptive architecture is based on the integration of special components managed by a control module that executes users workflows. We exemplify our proposal with a middleware system generated for computing particles' trajectories within the constraints imposed by a Grid environment. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE831

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title An asynchronous middleware for Grid resource monitoring
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.831
Article ID CPE831
Author Name(s) V. Quéma1R. Lachaize2E. Cecchet3
Author Email(s) vivien.quema@inrialpes.fr1
Affiliation(s) INRIA Rhône-Alpes, 655 avenue de l'Europe, 38334 Saint-Ismier Cedex, France 1Sardes Project LSR-IMAG - INRIA, 655, avenue de l'Europe, 38334 Saint-Ismier Cedex, France 2 3
Keyword(s) Grid computing, resource monitoring, asynchronous middleware, components,
Abstract
Resource management in a Grid computing environment raises several technical issues. The monitoring infrastructure must be scalable, flexible, configurable and adaptable to support thousands of devices in a highly dynamic environment where operational conditions are constantly changing. We propose to address these challenges by combining asynchronous communications with reflective component-based technologies. We introduce DREAM(Dynamic REflective Asynchronous Middleware), a Java component-based message oriented middleware. Asynchronous communications are used to achieve the scalability and flexibility objectives, whereas the reflective component technology provides the complementary configurability and adaptability features. We argue that this infrastructure is a viable approach to build a resource monitoring infrastructure for Grid computing. Moreover, DREAM makes the monitoring logic accessible from J2EE application servers. This allows the monitoring information to be presented as a Grid service and to integrate within the Open Grid Software Architecture. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE832

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Grid computing and active services
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.832
Article ID CPE832
Author Name(s) B. Schulze1E. R. Madeira2
Author Email(s) schulze@lncc.br1
Affiliation(s) National Laboratory for Scientific Computing LNCC, Department of Computer Science, Av. Getúlio Vargas 333, 25651-070 Petrópolis, RJ, Brazil 1Institute of Computing/State University of Campinas - UNICAMP, P.O. Box 6176, 13083-970 Campinas, SP, Brazil 2
Keyword(s) middleware, Grid computing, agents, Web services,
Abstract
Recent activities in Grid computing point in the direction of an open service architecture merging computational infrastructure with Grid services and Web services. We also see advantages in adding facilities offered by distributed object computing environments with their interoperable references and services. The use of open service architectures introduces the possibility of composing structures of nested services. In this paper we discuss architectural aspects of middleware for Grid computing based on an infrastructure of distributed clusters and/or also services, and an access portal as a demonstration project in progress. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE833

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Programming and coordinating Grid environments and applications
Volume ID 16
Issue ID 5
Date 04 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.833
Article ID CPE833
Author Name(s) Cristina Ururahy1Noemi Rodriguez2
Author Email(s) ururahy@inf.puc-rio.br1
Affiliation(s) Computer Science Department, PUC-Rio, Rua Marquês de São Vicente 225, 22453-900 Rio de Janeiro, RJ, Brazil 1Computer Science Department, PUC-Rio, Rua Marqu?s de S?o Vicente 225, 22453-900 Rio de Janeiro, RJ, Brazil 2
Keyword(s) distributed systems, parallel applications, Grid computing, interpreted languages, event-driven communication, middleware, coordination, dynamic adaptation,
Abstract
The heterogeneous and dynamic nature of Grid environments place new demands on models and paradigms for parallel programming. In this work we discuss how ALua, a programming system based on a dual programming language model, can help the programmer to develop applications for this environment, monitoring the state of resources and controlling the application so that it adapts to changes in this state. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE762

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel program debugging by specification
Volume ID 16
Issue ID 6
Date 05 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.762
Article ID CPE762
Author Name(s) Simon Huband1Chris McDonald2
Author Email(s) huey@csse.uwa.edu.au1
Affiliation(s) School of Computer Science and Software Engineering, The University of Western Australia, Crawley, Western Australia 6009, Australia 1 2
Keyword(s) Message Passing Interface (MPI), debugging, process topologies, conformance testing,
Abstract
Most message passing parallel programs employ logical process topologies with regular characteristics to support their computation. Since process topologies define the relationship between processes, they present an excellent opportunity for debugging. The primary benefit is that process behaviours can be correlated, allowing expected behaviour to be abstracted and identified, and undesirable behaviour reported. However, topology support is inadequate in most message passing parallel programming environments, including the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM). Programmers are forced to implement topology support themselves, increasing the possibility of introducing errors. This paper proposes a trace- and topology-based approach to parallel program debugging, driven by four distinct types of specifications. Trace specifications allow trace data from a variety of sources and message passing libraries to be interpreted in an abstract manner, and topology specifications address the lack of explicit topology knowledge, whilst also facilitating the construction of user-consistent views of the debugging activity. Loop specifications express topology-consistent patterns of expected trace events, allowing conformance testing of associated trace data, and error specifications specify undesirable event interactions, including mismatched message sizes and mismatched communication pairs. Both loop and error specifications are simplified by having knowledge of the actual topologies being debugged. The proposed debugging framework enables a wealth of potential debugging views and techniques. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE764

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A parallel Broyden approach to the Toeplitz inverse eigenproblem
Volume ID 16
Issue ID 6
Date 05 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.764
Article ID CPE764
Author Name(s) Jesús Peinado1Antonio M. Vidal2
Author Email(s) jpeinado@dsic.upv.es1
Affiliation(s) Departamento de Sistemas Informéticos y Computación, Universidad Politécnica Valencia, Valencia 46071, Spain 1 2
Keyword(s) inverse eigenvalue problem, real symmetric Toeplitz matrices, parallel algorithms, Broyden method, SCALAPACK,
Abstract
In this work we show a portable sequential and a portable parallel algorithm for solving the inverse eigenproblem for real symmetric Toeplitz matrices. Both algorithms are based on Broyden's method for solving nonlinear systems. We reduced the computational cost for some problem sizes, and furthermore we managed to reduce spatial cost considerably, compared in both cases with parallel algorithms proposed by other authors and by us, although sometimes quasi-Newton methods (as Broyden) do not reach convergence in all the test cases. We have implemented the parallel algorithm using the parallel numerical linear algebra library SCALAPACK based on the MPI environment. Experimental results have been obtained using two different architectures: a shared memory multiprocessor, the SGI PowerChallenge, and a cluster of Pentium II PCs connected through a myrinet network. The algorithms obtained are scalable in all the cases. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE765

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title User transparency: a fully sequential programming model for efficient data parallel image processing
Volume ID 16
Issue ID 6
Date 05 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.765
Article ID CPE765
Author Name(s) F. J. Seinstra1D. Koelma2
Author Email(s) fjseins@science.uva.nl1
Affiliation(s) Intelligent Sensory Information Systems, Faculty of Science, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands 1 2
Keyword(s) data parallel image processing, software architecture design, performance evaluation,
Abstract
Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high-performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high-performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi-baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE797

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Formal Techniques for Java-like Programs
Volume ID 16
Issue ID 7
Date 06 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.797
Article ID CPE797
Author Name(s) . .1
Author Email(s)
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE798

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Simple verification technique for complex Java bytecode subroutines
Volume ID 16
Issue ID 7
Date 06 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.798
Article ID CPE798
Author Name(s) Alessandro Coglio1
Author Email(s) coglio@kestrel.edu1
Affiliation(s) Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, U.S.A. 1
Keyword(s) Java, subroutines, bytecode verification,
Abstract
Java is normally compiled to bytecode, which is verified and then executed by the Java Virtual Machine. Bytecode produced via compilation must pass verification. The main cause of complexity for bytecode verification is subroutines, used by compilers to generate more compact code. The techniques to verify subroutines proposed in the literature reject certain programs produced by mundane compilers, are difficult to realize within an implementation of the Java Virtual Machine or are relatively complicated. This paper presents a novel technique which is very simple to understand, implement and prove sound. It is also very powerful: the set of accepted programs has a simple characterization which most likely includes all the code produced by current compilers and which enables future compilers to make more extensive use of subroutines. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE800

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Analysing the Java package/access concepts in Isabelle/HOL
Volume ID 16
Issue ID 7
Date 06 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.800
Article ID CPE800
Author Name(s) Norbert Schirmer1
Author Email(s) schirmer@in.tum.de1
Affiliation(s) Technical University of München, Department of Informatics, D-85748 Garching, Germany 1
Keyword(s) Java packages, Java access modifiers, theorem proving, type safety,
Abstract
Java access modifiers and packages provide a mechanism to restrict access to members and types, as an additional means of information hiding beyond the purely object-oriented concept of classes. In this paper we clarify the semantics of access modifiers and packages by adding them to our formal model of Java in the theorem prover Isabelle/HOL. We analyse which properties we can rely on at runtime, provided that the program has passed the static accessibility tests. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE801

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Transposing F to C: expressivity of parametric polymorphism in an object-oriented language
Volume ID 16
Issue ID 7
Date 06 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.801
Article ID CPE801
Author Name(s) Andrew Kennedy1Don Syme2
Author Email(s) akenn@microsoft.com1
Affiliation(s) Microsoft Research, 7 J. J. Thomson Avenue, Cambridge CB3 OFB, U.K. 1 2
Keyword(s) polymorphism, Java, C, System F, closure conversion,
Abstract
We present a type-preserving translation of System F (the polymorphic lambda calculus) into a forthcoming revision of the C programming language supporting parameterized classes and polymorphic methods. The forthcoming revision of Java in JDK 1.5 also makes a suitable target. We formalize the translation using a subset of C similar to Featherweight Java. We prove that the translation is fully type-preserving and that it preserves behaviour via the novel use of environment-style semantics for System F. We observe that whilst parameterized classes alone are sufficient to encode the parameterized datatypes and let-polymorphism of languages such as ML and Haskell, it is the presence of dynamic dispatch for polymorphic methods that supports the encoding of the first-class polymorphism found in System F and recent extensions to ML and Haskell. Copyright © 2004 John Wiley & Sons, Ltd

Article ID: CPE799

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Checking ownership and confinement
Volume ID 16
Issue ID 7
Date 06 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.799
Article ID CPE799
Author Name(s) James Noble1Alex Potanin2Robert Biddle3
Author Email(s) kjx@mcs.vuw.ac.nz1
Affiliation(s) School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand 1 2 3
Keyword(s) aliasing, uniqueness, ownership, confinement, object-oriented programming,
Abstract
A number of proposals to manage aliasing in Java-like programming languages have been advanced over the last five years. It is not clear how practical these proposals are, that is, how well they relate to the kinds of programs currently written in Java-like languages. To address this problem, we analysed heap snapshots from a corpus of Java programs. Our results indicate that object-oriented programs do in fact exhibit symptoms of encapsulation in practice, and that proposed models of uniqueness, ownership, and confinement can usefully describe the aliasing structures of object-oriented programs. Understanding the kinds of aliasing present in programs should help us to design formalisms to make explicit the kinds of aliasing implicit in object-oriented programs. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE789

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A parallel algorithm for static slicing of concurrent programs
Volume ID 16
Issue ID 8
Date 07 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.789
Article ID CPE789
Author Name(s) D. Goswami1R. Mall2
Author Email(s) dgoswami@iitg.ernet.in1
Affiliation(s) Department of Computer Science and Engineering, IIT Guwahati, India 1Department of Computer Science and Engineering, IIT Kharagpur, India 2
Keyword(s) program slicing, static slicing, process network, program representation, parallel algorithm, concurrent program,
Abstract
Slicing of concurrent programs is a compute-intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediate representation. We used a network of communicating processes to develop our parallel algorithm. We have implemented our parallel algorithm and the experimental results appear promising. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE791

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms
Volume ID 16
Issue ID 8
Date 07 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.791
Article ID CPE791
Author Name(s) F. Desprez1F. Suter2
Author Email(s) frederic.desprez@ens-lyon.fr1
Affiliation(s) LIP, UMR CNRS, ENS Lyon, INRIA 5668, 46 allée d'Italie, F-69364 Lyon Cedex 07, France 1GRAIL, UCSD, 9500 Gilman Drive, MC 0114, La Jolla, CA 92093-0114, U.S.A. 2
Keyword(s) mixed-parallelism, Grid computing, Strassen, Winograd, ScaLAPACK,
Abstract
In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithms. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE793

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Data structures in Java for matrix computations
Volume ID 16
Issue ID 8
Date 07 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.793
Article ID CPE793
Author Name(s) Geir Gundersen1Trond Steihaug2
Author Email(s) trond.steihaug@ii.uib.no2
Affiliation(s) Department of Informatics, University of Bergen, Norway 1 2
Keyword(s) Java, linear algebra, data structures, sparse matrices,
Abstract
In this paper we show how to utilize Java's native arrays for matrix computations. The disadvantages of Java arrays used as a 2D array for dense matrix computation are discussed and ways to improve the performance are examined. We show how to create efficient dynamic data structures for sparse matrix computations using Java's native arrays. This data structure is unique for Java and shown to be more dynamic and efficient than the traditional storage schemes for large sparse matrices. Numerical testing indicates that this new data structure, called Java Sparse Array, is competitive with the traditional Compressed Row Storage scheme on matrix computation routines. Java gives increased flexibility without losing efficiency. Compared with other object-oriented data structures Java Sparse Array is shown to have the same flexibility. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE787

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Design and evaluation of a TOP100 Linux Super Cluster system
Volume ID 16
Issue ID 8
Date 07 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.787
Article ID CPE787
Author Name(s) Erik Elmroth1Niklas Edmundsson2Bo K?gstr?m3Markus M?rtensson4Mats Nyl?n5?ke Sandgren6Mattias Wadenstein7
Author Email(s) elmroth@cs.umu.se1
Affiliation(s) High Performance Computing Center North (HPC2N), Ume? University, SE-901 87 Ume?, Sweden 1 2 3 4 5 6 7
Keyword(s) Linux cluster, performance, benchmark, STREAM, Pallas, HPC2N, HP Linpack, NAS, AMD, SCI network, MPI, TOP500,
Abstract
The High Performance Computing Center North (HPC2N) Super Cluster is a truly self-made high-performance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a high-bandwidth, low-latency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software issues and an extensive performance analysis. The performance is evaluated using a number of state-of-the-art benchmarks and software, including STREAM, Pallas MPI, the Atlas DGEMM, High-Performance Linpack and NAS Parallel benchmarks. Using these benchmarks we first determine the raw memory bandwidth and network characteristics; the practical peak performance of a single CPU, a single dual-node and the complete 240-processor system; and investigate the parallel performance for non-optimized dusty-deck Fortran applications. In summary, this $500 000 system is extremely cost-effective and shows the performance one would expect of a large-scale supercomputing system with distributed memory architecture. According to the TOP500 list of June 2002, this cluster was the 94th fastest computer in the world. It is now fully operational and stable as the main computing facility at HPC2N. The system's utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90% of the time, 24 hours a day, seven days a week. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE810

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Real-time primer design for DNA chips
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.810
Article ID CPE810
Author Name(s) H. Simmler1H. Singpiel2R. Männer3
Author Email(s) harald.simmler@acconovis.com1
Affiliation(s) Acconovis GmbH, Lindenhofstr. 42-44, 68163 Mannheim, Germany 1 2 University of Mannheim, B6, 23-29, 68131 Mannheim, Germany 3
Keyword(s) bioinformatics, primer design, microarray, DNA chips, FPGA processor, parallel architecture,
Abstract
The design of PCR or DNA chip experiments is a time-consuming process where bioinformatics is extensively used. The selection of the primers, which are immobilized on the DNA chip, requires a complex algorithm. Based on several parameters an optimized set of primers is automatically determined for a given gene sequence. This paper describes a parallel architecture which performs the optimization of the primer selection on a hardware accelerator. In contrast to the pure software approach, the parallel architecture gains a speedup of factor 500 using a PCI-based hardware accelerator. This approach allows an optimization of a specified primer set in real time. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE811

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Development and implementation of a parallel algorithm for the fast design of oligonucleotide probe sets for diagnostic DNA microarrays
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.811
Article ID CPE811
Author Name(s) H. Meier1A. Krause2M. Kräutner3A. Bode4
Author Email(s) meierh@in.tum.de1
Affiliation(s) Lehrstuhl fuer Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur (LRR-TUM), Institut fuer Informatik, Technische Universitaet Muenchen, Boltzmannstrasse 3, 85748 Garching, Germany 1 2 3 4
Keyword(s) DNA microarray, bioinformatic algorithm, molecular sequence analysis, parallel and distributed computing, oligonucleotide probe, combinatorial, DNA chip,
Abstract
We describe an accurate method for the automatic parallel generation of oligonucleotide probe sets for DNA microarrays. This approach includes a component for high-performance specificity evaluation of designed probes in large data sets. The three main algorithmic components of the method, namely probe preselection, hybridization prediction and probe selection are explained in detail. We introduce new combinatorial techniques for the efficient selection of probe sets of high differentiation capability even from sequence databases of conserved homologous genes. These techniques include the automatic generation of group specific probes as well as the design of excluding probes. A basic prototype has been implemented including a shared memory parallelization. Test runs have been performed on a multiprocessor personal computer with subsets of a small subunit ribosomal ribonucleic acid database, containing very conserved sequence data. The applicability of our program is pointed out by designing a set of oligonucleotide probes that shall allow a comprehensive parallel identification and differentiation of several groups of extremophilic prokaryotes by DNA microarray. The probe set is accessible via the Internet. On applying the parallel version on a dual processor system an efficiency of 80&percnt; was achieved. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE812

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A hybrid self-organizing maps and particle swarm optimization approach
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.812
Article ID CPE812
Author Name(s) Xiang Xiao1Ernst R. Dow2Russell Eberhart3Zina Ben Miled4Robert J. Oppelt5
Author Email(s) zmiled@iupui.edu4
Affiliation(s) Indiana University Purdue University Indianapolis, 723 West Michigan St., SL 160C, Indianapolis, IN 46202-5132, U.S.A. 1 2 3 4 5
Keyword(s) gene clustering, self-organizing maps, particle swarm optimization,
Abstract
Gene clustering, the process of grouping related genes in the same cluster, is at the foundation of different genomic studies that aim at analyzing the function of genes. Microarray technologies have made it possible to measure gene expression levels for thousands of genes simultaneously. For knowledge to be extracted from the datasets generated by these technologies, the datasets have to be presented to a scientist in a meaningful way. Gene clustering methods serve this purpose. In this paper, a hybrid clustering approach that is based on self-organizing maps and particle swarm optimization is proposed. In the proposed algorithm, the rate of convergence is improved by adding a conscience factor to the self-organizing maps algorithm. The robustness of the result is measured by using a resampling technique. The algorithm is implemented on a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE813

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A specialized hardware device for the protein similarity search
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.813
Article ID CPE813
Author Name(s) A. Marongiu1P. Palazzari2V. Rosato3
Author Email(s) palazzari@casaccia.enea.it2
Affiliation(s) ENEA, Computing and Modeling Unit, C. R. Casaccia, Via Anguillarese 301, SP111 - 00060 S. Maria di Galeria, Rome, Italy 1 2 3
Keyword(s) biological sequence analysis, FPGA, systolic architecture, System of Affine Recurrence Equations,
Abstract
This work presents the architecture of PROSIDIS, a special purpose processor designed to search for the occurrence of substrings similar to a given &lsquo;template string&rsquo; within a proteome. Similarity is determined by means of a weighting matrix which measures the degree of similarity between any two amino acids. The paper recalls the basis of the PHG (Parallel Hardware Generator) tool, developed in the framework of the HADES project (Hardware Design for Scientific Applications). Through PHG it is possible to design, in a nearly automatic way, a parallel hardware which efficiently implements an algorithm described through recurrence equations; PHG is able to generate a synthesizable VHDL which represents a parallel system derived from the System of Affine Recurrence Equations (SARE) describing the problem. In this work we illustrate the advantages derived from designing a special purpose processor to face the protein similarity discovery problem. Some preliminary results are given, reporting the time spent by several conventional computing architectures and by the PROSIDIS processor hosted by a personal computer to solve the same protein analysis problem. The results show that PROSIDIS, implemented on a Xilinx XV1000 FPGA, gives speedup figures ranging from 5.6 up to 55.6. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE814

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallelization of IBD computation for determining genetic disease maps
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.814
Article ID CPE814
Author Name(s) Nouhad J. Rizk1
Author Email(s) nrizk@ndu.edu.lb1
Affiliation(s) Computer Science Department, Notre Dame University, P.O. Box 72, Zouk-Mosbeh, Lebanon 1
Keyword(s) genetic linkage analysis, Identity by Descent, parallel computing,
Abstract
A number of software packages are available for the construction of comprehensive human genetic maps. In this paper we parallelize the widely used package Genehunter. We restrict our attention to only one function of the package, namely the computations of Identity By Descent (IBD) genes of a family. We use a master-slave model with the Message Passing Interface parallel environment. Our tests are done on two different architectures: a network of workstations and a shared memory multiprocessor. A new and efficient strategy to classify the parallelization of genetic linkage analysis programs results from our experiments. The classification is based on values of parameters which affect the complexity of the computation. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE815

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Development of distributed bioinformatics applications with GMP
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.815
Article ID CPE815
Author Name(s) Bertil Schmidt1Lin Feng2Amey Laud3Yusdi Santoso4
Author Email(s) asbschmidt@ntu.edu.sg1
Affiliation(s) School of Computer Engineering, Nanyang Technological University, Singapore 639798 1 2 Helixense Pte Ltd, 73 Science Park Drive, 02-05 Science Park 1, Singapore 118254 3 4
Keyword(s) message passing interfaces, parallel architectures for biological applications, computational genomics, bioinformatics systems development,
Abstract
We present the design and use of gMP as a tool for developing distributed bioinformatics applications. gMP is a purely Java-based interface that adds MPI-like message-passing and collective communication to the genomics Research Network Architecture (gRNA). The gRNA is a highly programmable, modular environment specifically designed to invigorate the development of genome-centric tools for life science research. We demonstrate the development of a distributed application to detect regulatory elements using correlation with gene expression data. Its implementation with gMP leads to significant runtime savings on our distributed gRNA system. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE816

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel divide and conquer approach for the protein threading problem
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.816
Article ID CPE816
Author Name(s) Nicola Yanev1Rumen Andonov2
Author Email(s) randonov@univ-valenciennes.fr2
Affiliation(s) University of Sofia, 5 J. Bouchier str., 1126 Sofia, Bulgaria 1LAMIH/ROI, University of Valenciennes, Le Mont Houy, 59313 Valenciennes Cedex 9, France 2
Keyword(s) protein threading problem formulations, network-flow optimization, integer programming,
Abstract
For the Protein Threading Problem (PTP) we propose a network-flow like formulation. We also show its equivalence with a generalization of the shortest path problem on a graph with a very particular structure. The underlying Mixed Integer Programming (MIP) model proves to be very appropriate for the PTP-large real-life instances have been solved much faster by using only the MIP solver CPLEX instead of a known special-purpose branch and bound algorithm. The properties of the MIP model permit a decomposition of the main problem into a large number of subproblems (tasks). We show that a branch and cut strategy can be efficiently applied for solving these tasks in a parallel manner, which leads to a significant reduction in the total running time. Computational experiments with large problem instances are presented. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE817

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The AxML program family for maximum likelihood-based phylogenetic tree inference
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.817
Article ID CPE817
Author Name(s) Alexandros P. Stamatakis1Thomas Ludwig2Harald Meier3
Author Email(s) stamatak@cs.tum.edu1
Affiliation(s) Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstr. 3, D-85748 Garching b. München, Germany 1Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany 2 3
Keyword(s) phylogenetic trees, maximum likelihood, parallel and distributed computing,
Abstract
Inference of phylogenetic (evolutionary) trees comprising hundreds or thousands of organisms based on the maximum likelihood criterion is a computationally extremely intensive task. This paper describes the evolution of the AxML program family which provides novel algorithmic as well as technical solutions for the maximum likelihood-based inference of huge phylogenetic trees. Algorithmic optimizations and a new tree building algorithm yield runtime improvements of a factor greater than 4 compared with fastDNAml and parallel fastDNAml, returning equally good trees at the same time. Various parallel, distributed, and Grid-based implementations of AxML give the program the capability to acquire the large amount of required computational resources for the inference of huge high-quality trees. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE808

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Sequence alignment on the Cray MTA-2
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.808
Article ID CPE808
Author Name(s) Shahid H. Bokhari1Jon R Sauer2
Author Email(s) shb@acm.org1
Affiliation(s) Department of Electrical Engineering, University of Engineering&Technology, Lahore 54890, Pakistan 12Eagle Research&Development, 1898 S. Flatiron Court, Suite 128, Boulder, CO 80301, U.S.A 2
Keyword(s) bioinformatics, sequence alignment, DNA, exact match, approximate matching, dynamic programming, multithreading, full/empty bits, parallel processing, MTA-2,
Abstract
Several variants of standard algorithms for DNA sequence alignment have been implemented on the Cray Multithreaded Architecture-2 (MTA-2). We describe the architecture of the MTA-2 and discuss how its hardware and software enable efficient implementation of parallel algorithms with little or no regard for issues of partitioning, mapping or scheduling. We describe how we ported variants of the naive algorithm for exact alignment and the dynamic programming algorithm for approximate alignment to the MTA-2 and provide detailed performance measurements. It is shown that, for the dynamic programming algorithm, the use of the MTA's Full/Empty synchronization bits leads to almost perfect speedup for large problems on one to eight processors. These results illustrate the versatility of the MTA's architecture and demonstrate its potential for providing a high-productivity platform for parallel processing. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE809

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Using hybrid alignment for iterative sequence database searches
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.809
Article ID CPE809
Author Name(s) Ralf Bundschuh1Yuheng Li2Mario Lauria3
Author Email(s) bundschuh@mps.ohio-state.edu1
Affiliation(s) Department of Physics, The Ohio State University, 174 West 18th Ave., Columbus, OH 43210, U.S.A 1Department of Computer and Information Science, The Ohio State University, 2015 Neil Ave. #395, Columbus, OH 43210, U.S.A. 2 3
Keyword(s) sequence alignment, hybrid algorithm, length correction, PSI-BLAST, MPI,
Abstract
Progressive sequence model refinement by means of iterative searches is an effective technique for high sensitivity database searches and is currently employed in popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. Here, we demonstrate that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. In addition, we evaluate the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. Copyright ? 2004 John Wiley & Sons, Ltd.

Article ID: CPE865

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: High Performance Computational Biology
Volume ID 16
Issue ID 9
Date 08 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.865
Article ID CPE865
Author Name(s) David A. Bader1Srinivas Aluru2
Author Email(s)
Affiliation(s) 1 2
Keyword(s) .,
Abstract
No abstract

Article ID: CPE792

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A PC cluster system employing IEEE 1394
Volume ID 16
Issue ID 10
Date 08 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.792
Article ID CPE792
Author Name(s) Kazuki Hyoudou1Ryota Ozaki2Yasuichi Nakayama3
Author Email(s) hyoudo-k@igo.cs.uec.ac.jp1
Affiliation(s) Department of Computer Science, The University of Electro-Communications, Chofu, Tokyo 182-8585, Japan 1 2 3
Keyword(s) cluster computing, IEEE 1394, communication libraries, performance evaluation,
Abstract
In this paper, we describe the design and evaluation of a PC cluster system in which IEEE 1394 is applied. Networks for parallel cluster computing require low latency and high bandwidth. It is also important that the networks be commercially available at low cost. Few network devices satisfy all of the above requirements. However, the IEEE 1394 standard provides a good compromise for fulfilling these requirements. We have used IEEE 1394 devices, which support a 400 Mbps data transfer rate, to connect the nodes of a PC cluster system which we have designed and implemented. We have implemented two communication libraries. One is a fast communication library called CF for IEEE 1394. The other is a MPI layer library on the CF library. Experimental results show that CF achieves a 17.2 microsecond round-trip time. On application benchmarks, the system was considerably faster than TCP&sol;IP over Fast Ethernet. Even though the system was constructed at very low cost, it provides good performance. Using the IEEE 1394 standard is thus a good solution for low-cost cluster systems. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE794

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Linda implementations in Java for concurrent systems
Volume ID 16
Issue ID 10
Date 08 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.794
Article ID CPE794
Author Name(s) G. C. Wells1A. G. Chalmers2P. G. Clayton3
Author Email(s) g.wells@ru.ac.za1
Affiliation(s) Department of Computer Science, Rhodes University, Grahamstown 6140, South Africa 1Department of Computer Science, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, U.K. 2 3
Keyword(s) Linda, Java, associative matching,
Abstract
This paper surveys a number of the implementations of Linda that are available in Java. It provides some discussion of their strengths and weaknesses, and presents the results from benchmarking experiments using a network of commodity workstations. Some extensions to the original Linda programming model are also presented and discussed, together with examples of their application to parallel processing problems. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE795

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The J2EE ECperf benchmark results: transient trophies or technology treasures?
Volume ID 16
Issue ID 10
Date 08 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.795
Article ID CPE795
Author Name(s) Paul Brebner1Jeffrey Gosper2
Author Email(s) paul.brebner@csiro.au1
Affiliation(s) CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, Australia 1 2
Keyword(s) J2EE, ECperf, Enterprise Java Beans (EJB), benchmarks, performance, scalability, throughput estimation,
Abstract
ECperf, the widely recognized industry standard J2EE benchmark, has attracted a large number of results submissions and their subsequent publications. However, ECperf places little restriction on the hardware platforms, operating systems and databases utilized in the benchmarking process. This, combined with the existence of only two primary metrics, makes it difficult to accurately compare the performance of the Application Server products themselves. By mining the full-disclosure archives for trends and correlations, we have discovered that J2EE technology is very scalable both in a scale-up and scale-out manner. Other observed trends include a linear correlation between middle-tier total processing power and throughput, as well as between J2EE Application Server license costs and throughput. However, the results clearly indicate that there is an increasing cost per user with increasing capacity systems and scale-up is proportionately more expensive than scale-out. Finally, the correlation between middle-tier processing power and throughput, combined with results obtained from a different &lsquo;lighter-weight&rsquo; benchmark, facilitates an estimate of throughput for different types of J2EE applications. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE796

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Performance
Article Title The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600
Volume ID 16
Issue ID 10
Date 08 25 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.796
Article ID CPE796
Author Name(s) Glenn R. Luecke1Silvia Spanoyannis2Marina Kraeva3
Author Email(s) grl@iastate.edu1
Affiliation(s) Iowa State University, Ames, IA 50011-2251, U.S.A. 1 2 3
Keyword(s) parallel computers, MPI-2 one-sided routines, SHMEM routines, SGI Origin 2000, Cray T3E-600, MPI-2 library, SHMEM library,
Abstract
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns with low contention (accessing distant messages, a circular right shift, a binary tree broadcast) to communication patterns with high contention (a &lsquo;naive&rsquo; broadcast and an all-to-all). For all the tests and for small message sizes, the SHMEM implementation significantly outperformed the MPI-2 implementation for both the SGI Origin 2000 and Cray T3E-600. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE804

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Augmenting LZ-77 with authentication and integrity assurance capabilities
Volume ID 16
Issue ID 11
Date 09 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.804
Article ID CPE804
Author Name(s) Mikhail J. Atallah1Stefano Lonardi2
Author Email(s) stelo@cs.ucr.edu2
Affiliation(s) CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, U.S.A. 1Department of Computer Science and Engineering, University of California, Riverside, CA 92521, U.S.A. 2
Keyword(s) authentication, integrity, data compression, LZ-77, fragile watermark,
Abstract
The formidable dissemination capability allowed by the current network technology makes it increasingly important to devise new methods to ensure authenticity and integrity. Nowadays it is common practice to distribute documents in compressed form. In this paper, we propose a simple variation on the classic LZ-77 algorithm that allows one to hide, within the compressed document, enough information to warrant its authenticity and integrity. The design is based on the unpredictability of a certain class of pseudo-random number generators, in such a way that the hidden data cannot be retrieved in a reasonable amount of time by an attacker (unless the secret bit-string key is known). Since it can still be decompressed by the original LZ-77 algorithm, the embedding is completely &lsquo;transparent&rsquo; and backward-compatible, making it possible to deploy it without disrupting service. Experiments show that the degradation in compression due to the embedding is almost negligible. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE806

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Access-controlled resource discovery in pervasive networks
Volume ID 16
Issue ID 11
Date 09 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.806
Article ID CPE806
Author Name(s) Sanjay Raman1Dwaine Clarke2Matt Burnside3Srinivas Devadas4Ronald Rivest5
Author Email(s) sraman@abp.lcs.mit.edu1
Affiliation(s) Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. 1 2 3 4 5
Keyword(s) access control, access control list, pervasive computing, intentional naming network, public key infrastructure,
Abstract
Networks of the future will be characterized by a variety of computational devices that display a level of dynamism not seen in traditional wired networks. Because of the dynamic nature of these networks, resource discovery is one of the fundamental problems that must be solved. While resource discovery systems are not a novel concept, securing these systems in an efficient and scalable way is challenging. This paper describes the design and implementation of an architecture for access-controlled resource discovery. This system achieves this goal by integrating access control with the Intentional Naming System (INS), a resource discovery and service location system. The integration is scalable, efficient, and fits well within a proxy-based security framework designed for dynamic networks. We provide performance experiments that show how our solution outperforms existing schemes. The result is a system that provides secure, access-controlled resource discovery that can scale to large numbers of resources and users. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE803

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Computer Security
Volume ID 16
Issue ID 11
Date 09 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.803
Article ID CPE803
Author Name(s) Giampaoli Bella1Ronaldo Menezes2
Author Email(s)
Affiliation(s) 1 2
Keyword(s) .,
Abstract
No abstract

Article ID: CPE805

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Identification and authentication of integrated circuits
Volume ID 16
Issue ID 11
Date 09 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.805
Article ID CPE805
Author Name(s) Blaise Gassend1Daihyun Lim2Dwaine Clarke3Srinivas Devadas4Marten van Dijk5
Author Email(s) blaise@gassend.com1
Affiliation(s) 1Research Lab for Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. 2 3 4 Prof Holstlaan 4, Eindhoven, The Netherlands 5
Keyword(s) authentication, identification, physical random function, physical security, smartcard, tamper resistance, unclonability,
Abstract
This paper describes a technique to reliably and securely identify individual integrated circuits (ICs) based on the precise measurement of circuit delays and a simple challenge-response protocol. This technique could be used to produce key-cards that are more difficult to clone than ones involving digital keys on the IC. We consider potential venues of attack against our system, and present candidate implementations. Experiments on Field Programmable Gate Arrays show that the technique is viable, but that our current implementations could require some strengthening before it can be considered as secure. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE807

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A role-based infrastructure management system: design and implementation
Volume ID 16
Issue ID 11
Date 09 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.807
Article ID CPE807
Author Name(s) Dongwan Shin1Gail-Joon Ahn2Sangrae Cho3Seunghun Jin4
Author Email(s) doshin@uncc.edu1
Affiliation(s) Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223, U.S.A. 1 2 Department of Information Security System, Electronics and Telecommunications Research Institute, Taejon, 305-350, South Korea 3 4
Keyword(s) role-based access control, role management, role engineering, role administration, authorization infrastructure,
Abstract
Over the last decade there has been a tremendous advance in the theory and practice of role-based access control (RBAC). One of the most significant aspects of RBAC can be viewed from its management of permissions on the basis of roles rather than individual users. Consequently, it reduces administrative costs and potential errors. The management of roles in various RBAC implementations, however, tends to be conducted on an ad hoc basis, closely coupled with a certain context of system environments. This paper discusses the development of a system whose purpose is to help manage a valid set of roles with assigned users and permissions for role-based authorization infrastructures. We have designed and implemented the system, called RolePartner. This system enables role administrators to build and configure various components of a RBAC model so as to embody organizational access control policies which can be separated from different enforcement mechanisms. Hence the system helps make it possible to lay a foundation for role-based authorization infrastructures. Three methodological constituents are introduced for our purposes, together with the design and implementation issues. The system has a role-centric view for easily managing constrained and hierarchical roles as well as assigned users and permissions. An LDAP-accessible directory service was used for a role database. We show that the system can be seamlessly integrated with an existing privilege-based authorization infrastructure. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE834

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Performance
Article Title Parallel bandwidth characteristics calculations for thin avalanche photodiodes on a SGI Origin 2000 supercomputer
Volume ID 16
Issue ID 12
Date 10 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.834
Article ID CPE834
Author Name(s) Yi Pan1Constantinos S. Ierotheou2Majeed M. Hayat3
Author Email(s) pan@cs.gsu.edu1
Affiliation(s) Department of Computer Science, Georgia State University, Atlanta, GA 30303, U.S.A. 1Parallel Processing Research Group, University of Greenwich, London SE10 9LS, U.K. 2Department of Electrical and Computer Engineering, The University of New Mexico, Albuquerque, NM 87131-1356, U.S.A. 3
Keyword(s) numerical modeling, optimization, MPI, OpenMP, efficiency, speedup, scalability, SGI Origin 2000,
Abstract
p i n

Article ID: CPE866

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Finding stale-value errors in concurrent programs
Volume ID 16
Issue ID 12
Date 10 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.866
Article ID CPE866
Author Name(s) Michael Burrows1K. Rustan M. Leino2
Author Email(s) leino@microsoft.com2
Affiliation(s) Google Inc., Bayshore Parkway, Mountain View, CA 94043, U.S.A. 1Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A. 2
Keyword(s) static checking, data races, race conditions, concurrency errors, error checking, stale values,
Abstract
Concurrent programs can suffer from many types of errors, not just the well-studied problems of deadlocks and simple race conditions on variables. This paper addresses a kind of race condition that arises from reading a variable whose value is possibly out of date. The paper introduces a simple technique for detecting such stale values, and reports on the encouraging experience with a compile-time checker that uses the technique. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE802

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Efficient parallel implementations of near Delaunay triangulation with High Performance Fortran
Volume ID 16
Issue ID 12
Date 10 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.802
Article ID CPE802
Author Name(s) Min-Bin Chen1Tyng-Ruey Chuang2Jan-Jan Wu3
Author Email(s) cmb@mail.ckitc.edu.tw1
Affiliation(s) Department of Management Information Systems, Chung Kuo Institute of Technology, Mucha, Taipei 116, Taiwan 1Institute of Information Science, Academia Sinica, Nankang, Taipei 115, Taiwan 2 3
Keyword(s) Delaunay triangulation, parallel computation,
Abstract
Unstructured mesh generation exposes highly irregular computation patterns, which imposes a challenge in implementing triangulation algorithms on parallel machines. This paper reports on an efficient parallel implementation of near Delaunay triangulation with High Performance Fortran (HPF). Our algorithm exploits embarrassing parallelism by performing sub-block triangulation and boundary merge independently at the same time. The sub-block triangulation is a divide & conquer Delaunay algorithm known for its sequential efficiency, and the boundary triangulation is an incremental construction algorithm with low overhead. Compared with prior work, our parallelization method is both simple and efficient. In the paper, we also describe a solution to the collinear points problem that usually arises in large data sets. Our experiences with the HPF implementation show that with careful control of the data distribution, we are able to parallelize the program using HPF's standard directives and extrinsic procedures. Experimental results on several parallel platforms, including an IBM SP2 and a DEC Alpha farm, show that a parallel efficiency of 42-86% can be achieved for an eight-node distributed memory system. We also compare efficiency of the HPF implementation with that of a similarly hand-coded MPI implementation. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE903

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title An order-based algorithm for implementing multiparty synchronization
Volume ID 16
Issue ID 12
Date 10 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.903
Article ID CPE903
Author Name(s) Jos? A. P?rez1Rafael Corchuelo2Miguel Toro3
Author Email(s) jperez@lsi.us.es1
Affiliation(s) The Distributed Group, University of Seville, Spain 1 2 3
Keyword(s) multiparty synchronization, coordination, concurrency control,
Abstract
Multiparty interactions are a powerful mechanism for coordinating several entities that need to cooperate in order to achieve a common goal. In this paper, we present an algorithm for implementing them that improves on previous results in that it does not require the whole set of entities or interactions to be known at compile- or run-time, and it can deal with both terminating and non-terminating systems. We also present a comprehensive simulation analysis that shows how sensitive to changes our algorithm is, and compare the results with well-known proposals by other authors. This study proves that our algorithm still performs comparably to other proposals in which the set of entities and interactions is known beforehand, but outperforms them in some situations that are clearly identified. In addition, these results prove that our algorithm can be combined with a technique called synchrony loosening without having an effect on efficiency. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE754

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Supporting Bulk Synchronous Parallelism with a high-bandwidth optical interconnect
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.754
Article ID CPE754
Author Name(s) I. Gourlay1P. M. Dew2K. Djemame3J. F. Snowdon4G. Russell5
Author Email(s) iain@comp.leeds.ac.uk1
Affiliation(s) Informatics Research Institute, University of Leeds, Leeds LS2 9JT, U.K. 1 2 3 Physics Department, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, U.K. 4 5
Keyword(s) parallel, BSP, optical interconnect, sorting,
Abstract
The list of applications requiring high-performance computing resources is constantly growing. The cost of inter-processor communication is critical in determining the performance of massively parallel computing systems for many of these applications. This paper considers the feasibility of a commodity processor-based system which uses a free-space optical interconnect. A novel architecture, based on this technology, is presented. Analytical and simulation results based on an implementation of BSP (Bulk Synchronous Parallelism) are presented, indicating that a significant performance enhancement, over architectures using conventional interconnect technology, is possible. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE755

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Towards a more realistic comparative analysis of multicomputer networks
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.755
Article ID CPE755
Author Name(s) H. Sarbazi-Azad1M. Ould-Khaoua2L. M. Mackenzie3
Author Email(s) azad@sharif.edu1
Affiliation(s) Department of Computer Engineering, Sharif University of Technology, and School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran 1Department of Computing Science, University of Glasgow, Glasgow, U.K. 2 3
Keyword(s) interconnection networks, k n, torus, hypercube, wormhole routing, fully-adaptive routing, hotspot, performance comparison,
Abstract
k n

Article ID: CPE756

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title RF-MVTC: an efficient risk-free multiversion concurrency control algorithm
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.756
Article ID CPE756
Author Name(s) Azzedine Boukerche1Terry Tuck2
Author Email(s) boukerche@site.uottawa.ca1
Affiliation(s) SITE, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5 1Department of Computer Science, University of North of Texas, Denton, TX 76203, U.S.A. 2
Keyword(s) concurrency, risk-free multiversion, databases, optimistic distributed simulations, rollbacks, cluster of workstations, multiversion timestamp ordering,
Abstract
In this paper, we focus on the temporary return of data values that are incorrect for given transactional semantics that could have catastrophic effects similar to those in parallel and discrete event simulation systems. In many applications using online transactions processing environments, for instance, it is best to delay the response to a transaction's read request until it is either known or unlikely that a write message from an older update transaction will not make the response incorrect. Examples of such applications are those where aberrant behavior is too costly, and those in which precommitted data is visible to some reactive entity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporally correct concurrency-control algorithm. We discuss the algorithm, its implementation and report on the performance results of simulation models using a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE759

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title On the scalability of many-to-many reliable multicast sessions
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.757
Article ID CPE759
Author Name(s) W. Y. Yoon1D. Lee2H. Y. Youn3
Author Email(s) wyyoon@sait.samsung.co.kr1
Affiliation(s) Samsung Advanced Institute of Technology, South Korea 1School of Engineering, Information and Communications University, South Korea 2School of Information and Communication Engineering, Sungkyunkwan University, South Korea 3
Keyword(s) reliable multicast, many-to-many session, exposure, implosion,
Abstract
Even though tree-based reliable multicast protocols are known to be most scalable for one-to-many sessions, there is still an open question as to whether these protocols are also scalable for many-to-many sessions. In this paper, we analyze and compare two promising multicast protocols - the receiver-initiated protocol with NACK suppression and the tree-based protocol - using a new spatial loss model. The proposed model considers the correlation of packet loss events for more realistic analysis unlike the previous work. The analysis results show that the tree-based protocol achieves much higher throughput than the receiver-initiated protocol for a many-to-many session as the number of participants in the session becomes larger. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE758

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance evaluation of a random-walk-based algorithm for embedding dynamically evolving trees in hypercubic networks
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.758
Article ID CPE758
Author Name(s) Keqin Li1
Author Email(s) li@mcs.newpaltz.edu1
Affiliation(s) Department of Computer Science, State University of New York, New Paltz, NY 12561-2499, U.S.A. 1
Keyword(s) asymmetric network, cube-connected cycles, de Bruijn graph, hypercube, hypercubic network, performance evaluation, random tree, randomized tree embedding, recurrence relation, shuffle-exchange network, symmetric network, wrapped butterfly,
Abstract
In many tree-structured parallel computations, the size and structure of a tree that represents a parallel computation is unpredictable at compile-time; the tree evolves gradually during the course of the computation. When such a computation is performed on a static network, the dynamic tree embedding problem is to distribute the tree nodes to the processors of the network such that all the processors receive roughly the same amount of load and that communicating nodes are assigned to neighboring processors. Furthermore, when a new tree node is generated, it should be immediately assigned to a processor for execution without any information on the further evolving of the tree; and load distribution is performed by all processors in a totally distributed fashion. We study the problem of embedding dynamically evolving trees in hypercubic networks, including shuffle-exchange, de Bruijn, cube-connected cycles, wrapped butterfly, and hypercube networks. The performance of a random-walk-based randomized tree embedding algorithm is evaluated. Several random tree models are considered. We develop recurrence relations for analyzing the performance of embedding of complete-tree-based random trees and randomized complete trees, and linear systems of equations for reproduction trees. We present more efficient recurrence relations and linear systems of equations for symmetric networks. We also demonstrate extensive numerical data of the performance ratio and make a number of interesting observations of randomized tree embedding in the five hypercubic networks. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE753

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title A performance study of job management systems
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.753
Article ID CPE753
Author Name(s) Tarek El-Ghazawi1Kris Gaj2Nikitas Alexandridis3Frederic Vroman4Nguyen Nguyen5Jacek R. Radzikowsk6Preeyapong Samipagdi7Suboh A. Suboh8
Author Email(s) tarek@seas.gwu.edu1
Affiliation(s) ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A 1ECE Department, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A. 2ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A. 3 4 5 6 7 8
Keyword(s) Job Management Systems, queueing systems, performance evaluation, benchmarking, Grid computing,
Abstract
Job Management Systems (JMSs) efficiently schedule and monitor jobs in parallel and distributed computing environments. Therefore, they are critical for improving the utilization of expensive resources in high-performance computing systems and centers, and an important component of Grid software infrastructure. With many JMSs available commercially and in the public domain, it is difficult to choose an optimum JMS for a given computing environment. In this paper, we present the results of the first empirical study of JMSs reported in the literature. Four commonly used systems, LSF, PBS Pro, Sun Grid Engine/CODINE, and Condor were considered. The study has revealed important strengths and weaknesses of these JMSs under different operational conditions. For example, LSF was shown to exhibit excellent throughput for a wide range of job types and submission rates. Alternatively, CODINE appeared to outperform other systems in terms of the average turn-around time for small jobs, and PBS appeared to excel in terms of turn-around time for relatively larger jobs. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE759

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title On the scalability of many-to-many reliable multicast sessions
Volume ID 16
Issue ID 13
Date 11 0 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.759
Article ID CPE759
Author Name(s) W. Y. Yoon1D. Lee2H. Y. Youn3
Author Email(s) wyyoon@sait.samsung.co.kr1
Affiliation(s) Samsung Advanced Institute of Technology, South Korea 1School of Engineering, Information and Communications University, South Korea 2School of Information and Communication Engineering, Sungkyunkwan University, South Korea 3
Keyword(s) reliable multicast, many-to-many session, exposure, implosion,
Abstract
Even though tree-based reliable multicast protocols are known to be most scalable for one-to-many sessions, there is still an open question as to whether these protocols are also scalable for many-to-many sessions. In this paper, we analyze and compare two promising multicast protocols - the receiver-initiated protocol with NACK suppression and the tree-based protocol - using a new spatial loss model. The proposed model considers the correlation of packet loss events for more realistic analysis unlike the previous work. The analysis results show that the tree-based protocol achieves much higher throughput than the receiver-initiated protocol for a many-to-many session as the number of participants in the session becomes larger. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE761

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Systems Performance Evaluation
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.761
Article ID CPE761
Author Name(s) . .1
Author Email(s) .1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE760

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title One-to-all broadcasting scheme for arbitrary static interconnection networks with bursty background traffic
Volume ID 16
Issue ID 13
Date 11 00 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.760
Article ID CPE760
Author Name(s) D. D. Kouvatsos1M. Mkwawa2I. Awan3
Author Email(s) d.d.kouvatsos@scm.brad.ac.uk1
Affiliation(s) Performance Modelling and Engineering Research Group, Department of Computing, University of Bradford, Bradford BD7 3BD, U.K. 1 2 3
Keyword(s) broadcasting, graph decomposition, interconnection networks, bursty traffic, queueing network models, maximum entropy principle,
Abstract
A novel one-to-all broadcasting scheme is proposed for arbitrary static interconnection networks with bursty background traffic having head-of-line priorities, complete buffer sharing and repetitive service blocking with random destination. The scheme facilitates inter-process communication amongst parallel computers and grid computing networks as well as asynchronous transfer mode space division architectures and it is applicable to a single-port mode of message passing communication. The scheme utilizes a queue-by-queue decomposition algorithm for arbitrary open queueing network models, based on the principle of maximum entropy, in conjunction with an information theoretic decomposition criterion and graph theoretic concepts. Evidence based on empirical studies indicates the suitability of the scheme for achieving an optimal one-to-all broadcasting cost. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE868

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A cache-efficient implementation of the lattice Boltzmann method for the two-dimensional diffusion equation
Volume ID 16
Issue ID 14
Date 12 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.868
Article ID CPE868
Author Name(s) A. C. Velivelli1K. M. Bryden2
Author Email(s) kmbryden@iastate.edu2
Affiliation(s) Department of Mechanical Engineering, Iowa State University, Ames, IA 50011, U.S.A. 1 2
Keyword(s) cache optimization, lattice Boltzmann method, two-dimensional diffusion equation,
Abstract
$T_t=mu(T_{xx}T_{yy})$

Article ID: CPE869

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Simulation of resource synchronization in a dynamic real-time distributed computing environment
Volume ID 16
Issue ID 14
Date 12 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.869
Article ID CPE869
Author Name(s) Chen Zhang1David Cordes2
Author Email(s) czhang@bryant.edu1
Affiliation(s) Department of Computer Information Systems, Bryant College, Smithfield, RI 02917-1284, U.S.A. 1Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A. 2
Keyword(s) synchronization, scheduling, distributed systems, real-time systems, embedded systems,
Abstract
Today, more and more distributed computer applications are being modeled and constructed using real-time principles and concepts. In 1989, the Object Management Group (OMG) formed a Real-Time Special Interest Group (RT SIG) with the goal of extending the Common Object Request Broker Architecture (CORBA) standard to include real-time specifications. This group's most recent efforts have focused on the requirements of dynamic distributed real-time systems. One open problem in this area is resource access synchronization for tasks employing dynamic priority scheduling. This paper presents two resource synchronization protocols that the authors have developed which meet the requirements of dynamic distributed real-time systems as specified by Dynamic Scheduling Real-Time CORBA (DSRT CORBA). The proposed protocols can be applied to both Earliest Deadline First (EDF) and Least Laxity First (LLF) dynamic scheduling algorithms, allow distributed nested critical sections, and avoid unnecessary runtime overhead. In order to evaluate the performance of the proposed protocols, we analyzed each protocol's schedulability. Since the schedulability of the system is affected by numerous system configuration parameters, we have designed simulation experiments to isolate and illustrate the impact of each individual system parameter. Simulation experiments show the proposed protocols have better performance than one would realize by applying a schema that utilizes dynamic priority ceiling update. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE870

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Semantic Link Network Builder and Intelligent Semantic Browser
Volume ID 16
Issue ID 14
Date 12 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.870
Article ID CPE870
Author Name(s) H. Zhuge1R. Jia2J. Liu3
Author Email(s) zhuge@ict.ac.cn1
Affiliation(s) China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Graduate School, Chinese Academy of Sciences, Beijing, People's Republic of China 1 2 3
Keyword(s) algorithm, browser, matching, reasoning, semantic link, Semantic Web,
Abstract
Semantic Link Network (SLN) is a semantic Web model using semantic links-the natural extension of the hyperlink-based Web. The SLN-Builder is a software tool that enables definition, modification and verification of, as well as access to the SLN. The SLN-Builder can convert a SLN definition into XML descriptions for cross-platform information exchange. The Intelligent Semantic Browser is used to visualize the SLN and carry out two types of reasoning in browsing time: small granularity reasoning by chaining semantic links and large granularity reasoning by matching semantic views of SLN. With the help of the reasoning mechanism, the browser can recommend the content that is semantically relevant to the current browsing content, and it enables users to foresee the end-side content of a semantic link chain. This paper presents the design and implementation of the SLN-Builder and the intelligent semantic browser as well as key algorithms. Copyright © 2004 John Wiley & Sons, Ltd.

Article ID: CPE867

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Resource Space Grid: model, method and platform
Volume ID 16
Issue ID 14
Date 12 10 2004
DOI(URI) http://dx.doi.org/10.1002/cpe.867
Article ID CPE867
Author Name(s) Hai Zhuge1
Author Email(s) zhuge@ict.ac.cn1
Affiliation(s) China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, People's Republic of China 1
Keyword(s) resource sharing, query language, Semantic Grid, Semantic Web, Web,
Abstract
A Resource Space Grid is a virtual Grid that aims at effectively sharing, using and managing versatile resources across the Internet. The kernel of the Resource Space Grid includes a Resource Space Model (RSM) and a uniform Resource Using Mechanism (RUM). This paper presents the Resource Space Grid's core scientific issues and methodology, architecture, model and theory, design criteria and method, and practice. A normal form theory is proposed to normalize the resource space - a coordinate system for uniformly specifying and organizing resources. The RUM provides not only the end-users with an operable resource browser to operate resources using the built-in Resource Operation Language (ROL), but also the application developers with the ROL-based programming environment. The prototype platform based on the proposed model and method has been implemented and used for sharing and managing resources in distributed research teams. Operations on Resource Spaces can constitute the virtual communities of Resource Space Grids - a platform independent resource sharing environment. Copyright © 2004 John Wiley & Sons, Ltd.