WILEY Journal Home Page
Papers under review through 2004
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A comparison of task pools for dynamic load balancing of irregular algorithms | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.745 | ||||||||
Article ID | CPE745 | ||||||||
Author Name(s) | Matthias Korch1Thomas Rauber2 | ||||||||
Author Email(s) | matthias.korch@uni-bayreuth.de1 | ||||||||
Affiliation(s) | Faculty of Mathematics, Physics and Computer Science, University of Bayreuth, Bayreuth, Germany 1 2 | ||||||||
Keyword(s) | task pools, dynamic task scheduling, irregular algorithms, hierarchical radiosity, volume rendering, performance evaluation, threads, | ||||||||
Abstract |
|||||||||
Since a static work distribution does not allow for satisfactory speed-ups of parallel irregular algorithms, there is a need for a dynamic distribution of work and data that can be adapted to the runtime behavior of the algorithm. Task pools are data structures which can distribute tasks dynamically to different processors where each task specifies computations to be performed and provides the data for these computations. This paper discusses the characteristics of task-based algorithms and describes the implementation of selected types of task pools for shared-memory multiprocessors. Several task pools have been implemented in C with POSIX threads and in Java. The task pools differ in the data structures to store the tasks, the mechanism to achieve load balance, and the memory manager used to store the tasks. Runtime experiments have been performed on three different shared-memory systems using a synthetic algorithm, the hierarchical radiosity method, and a volume rendering algorithm. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Security Maintenance Mediation: a technology for preventing unintended security breaches | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.746 | ||||||||
Article ID | CPE746 | ||||||||
Author Name(s) | Roger (Buzz) King1 | ||||||||
Author Email(s) | roger@cs.colorado.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, University of Colorado, Boulder, CO 80309, U.S.A. 1 | ||||||||
Keyword(s) | data security, security policy, data mediation, collaborative systems, Web portals, | ||||||||
Abstract |
|||||||||
Web-resident information is becoming ‘smarter’, in the sense that emerging technology will support the annotation of it with ontological terms, which will be used to locate and reuse information. This will pose a great security risk in the form of unintended breaches (as distinct from deliberate invasions). Web-resident information will be far more readily available and relevant, thus causing inadvertent releases of secure information to potentially cause it to be diffusely spread across the Internet. Then as this information is iteratively transformed and integrated with other information, it will become irretrievable and potentially used in a myriad of unpredictable ways. The problem is that ontological annotations, while making information more understandable in its original form, do not provide a means for easily capturing the complex semantics of information that has been transformed via abstraction, aggregation, and integration. This demands the development of a semantically rich way of specifying ‘views’ of Web information, to which security controls can be attached. Also needed is a way for users of secure information to easily and voluntarily blend-and thereby propagate-security controls as information is transformed. Information mediators designed by collaborative teams of experts are proposed as the vehicle for wrapping information, so that at each step of reuse, high-level views and their corresponding integrity controls can be made easily accessible to trusted users who will then be able to ensure their proper maintenance. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel operation of CartaBlanca on shared and distributed memory computers | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.747 | ||||||||
Article ID | CPE747 | ||||||||
Author Name(s) | N. T. Padial-Collins1W. B. VanderHeyden2D. Z. Zhang3E. D. Dendy4D. Livescu5 | ||||||||
Author Email(s) | nelylanl@lanl.gov1 | ||||||||
Affiliation(s) | Los Alamos National Laboratory Theoretical Division and Los Alamos Computer Science Institute, Los Alamos, NM 87545, U.S.A. 1 2 3 4 5 | ||||||||
Keyword(s) | Java, scientific computing, parallel computation, | ||||||||
Abstract |
|||||||||
We describe the parallel performance of the pure Java CartaBlanca code on heat transfer and multiphase fluid flow problems. CartaBlanca is designed for parallel computations on partitioned unstructured meshes. It uses Java"s thread facility to manage computations on each of the mesh partitions. Inter-partition communications are handled by two compact objects for node-by-node communication along partition boundaries and for global reduction calculations across the entire mesh. For distributed calculations, the JavaParty package from the University of Karlsruhe is demonstrated to work with CartaBlanca. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Performance | ||||||||
Article Title | Performance and scalability of MPI on PC clusters | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 1 | ||||||||
Date | Jan 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.749 | ||||||||
Article ID | CPE749 | ||||||||
Author Name(s) | Glenn R. Luecke1Marina Kraeva2Jing Yuan3Silvia Spanoyannis4 | ||||||||
Author Email(s) | grl@iastate.edu1 | ||||||||
Affiliation(s) | 291 Durham Center, Iowa State University, Ames, IA 50011, U.S.A. 1 2 3 4 | ||||||||
Keyword(s) | PC clusters, MPI, MPI performance, | ||||||||
Abstract |
|||||||||
The purpose of this paper is to compare the communication performance and scalability of MPI communication routines on a Windows Cluster, a Linux Cluster, a Cray T3E-600, and an SGI Origin 2000. All tests in this paper were run using various numbers of processors and two message sizes. In spite of the fact that the Cray T3E-600 is about 7 years old, it performed best of all machines for most of the tests. The Linux Cluster with the Myrinet interconnect and Myricom"s MPI performed and scaled quite well and, in most cases, performed better than the Origin 2000, and in some cases better than the T3E. The Windows Cluster using the Giganet Full Interconnect and MPI/Pro"s MPI performed and scaled poorly for small messages compared with all of the other machines. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Group-SPMD programming with orthogonal processor groups | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.770 | ||||||||
Article ID | CPE770 | ||||||||
Author Name(s) | Thomas Rauber1Robert Reilein2Gudula Rünger3 | ||||||||
Author Email(s) | rauber@uni-bayreuth.de1 | ||||||||
Affiliation(s) | Department of Mathematics and Physics, University of Bayreuth, 95440 Bayreuth, Germany 1Department of Computer Science, Chemnitz University of Technology, 09111 Chemnitz, Germany 2 3 | ||||||||
Keyword(s) | communication library, group-SPMD, orthogonal processor groups, distributed memory, | ||||||||
Abstract |
|||||||||
Many programs for message-passing machines can benefit from an implementation in a group-SPMD programming model due to the potential to reduce communication overhead and to increase scalability. In this paper, we consider group-SPMD programs exploiting different orthogonal processor partitions in one program. For each program this is a fixed set of predefined processor partitions given by the parallel hyperplanes of a two- or multi-dimensional virtual processor organization. We introduce a library built on top of MPI to support the programming with those orthogonal processor groups. The parallel programming model is appropriate for applications with a multi-dimensional task grid and task dependencies mainly aligned in the dimensions of the task grid. The library can be used to specify the appropriate processor partitions, which are then created by the library, and to define the mapping of tasks to the processor hyperplanes. Examples from numerical analysis illustrate the programming style and show that the runtime on distributed memory machines can be considerably reduced by using the library. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Managing distributed shared arrays in a bulk-synchronous parallel programming environment | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.768 | ||||||||
Article ID | CPE768 | ||||||||
Author Name(s) | Christoph W. Kessler1 | ||||||||
Author Email(s) | chrke@ida.liu.se1 | ||||||||
Affiliation(s) | Programming Environments Laboratory (PELAB), Department of Computer Science, Linkμping University, S-581 83 Linkμping, Sweden 1 | ||||||||
Keyword(s) | NestStep, BSP model, bulk-synchronous parallelism, parallel programming language, distributed shared array, runtime scheduling of communication, | ||||||||
Abstract |
|||||||||
NestStep is a parallel programming language for the BSP (bulk-hronous parallel) programming model. In this article we describe the concept of distributed shared arrays in NestStep and its implementation on top of MPI. In particular, we present a novel method for runtime scheduling of irregular, direct remote accesses to sections of distributed shared arrays. Our method, which is fully parallelized, uses conventional two-sided message passing and thus avoids the overhead of a standard implementation of direct remote memory access based on one-sided communication. The main prerequisite is that the given program is structured in a BSP-compliant way. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Compiling data-parallel programs for clusters of SMPs | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.767 | ||||||||
Article ID | CPE767 | ||||||||
Author Name(s) | Siegfried Benkner1Thomas Brandes2 | ||||||||
Author Email(s) | sigi@ieee.org1 | ||||||||
Affiliation(s) | Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1SCAI-Fraunhofer Institute for Algorithms and Scientific Computing, Schloß Birlinghoven, D-53754 St. Augustin, Germany 2 | ||||||||
Keyword(s) | SMP clusters, parallel programming, HPF, OpenMP, MPI, hybrid parallelization, | ||||||||
Abstract |
|||||||||
Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level application programming interfaces MPI and OpenMP, forcing users to deal with both distributed-memory and shared-memory parallelization details. In this paper we present extensions of High Performance Fortran (HPF) for SMP clusters which enable the compiler to adopt a hybrid parallelization strategy, efficiently combining distributed-memory with shared-memory parallelism. By means of a small set of new language features, the hierarchical structure of SMP clusters may be specified. This information is utilized by the compiler to derive inter-node data mappings for controlling distributed-memory parallelization across the nodes of a cluster and intra-node data mappings for extracting shared-memory parallelism within nodes. Additional mechanisms are proposed for specifying inter- and intra-node data mappings explicitly, for controlling specific shared-memory parallelization issues and for integrating OpenMP routines in HPF applications. The proposed features have been realized within the ADAPTOR and VFC compilers. The parallelization strategy for clusters of SMPs adopted by these compilers is discussed as well as a hybrid-parallel execution model based on a combination of MPI and OpenMP. Experimental results indicate the effectiveness of the proposed features. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A compiler for multiple memory models | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.771 | ||||||||
Article ID | CPE771 | ||||||||
Author Name(s) | S. P. Midkiff1J. Lee2D. A. Padua3 | ||||||||
Author Email(s) | smidkiff@purdue.edu1 | ||||||||
Affiliation(s) | School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47905-2305, U.S.A. 1School of Computer Science and Engineering, Seoul National University, Seoul, Korea 2Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL 61801, U.S.A. 3 | ||||||||
Keyword(s) | memory models, Java, explicit parallelism, compilers, program optimization, program analysis, | ||||||||
Abstract |
|||||||||
The design of consistency models for both hardware and software is a difficult task. For a programming language, it is particularly difficult because the target audience for a high-level programming language is much wider than the target audience for a machine language, making usability a more important criterion. Exacerbating this problem is the reality that the programming language community has little experience designing programming language consistency models, and therefore each new attempt is very much a voyage into uncharted territory. A concrete example of the difficulties of the task is the current Java Memory Model. Although designed to be easy to use by Java programmers, it is poorly understood and at least one common idiom (the ‘double check idiom’) to exploit the model is unsafe. In this paper, we describe the design of an optimizing Java compiler that will accept either as input or as an interface implementation a consistency model for the code to be compiled. The compiler will use Shasha and Snir's delay set analysis, and our CSSA program representation to provide a canonical representation for the effects of different consistency models on optimizations and analysis. The compiler will serve as a testbed to prototype new memory models, and to measure the effects of different memory models on program performance. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Space-time mapping and tiling: a helpful combination | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.772 | ||||||||
Article ID | CPE772 | ||||||||
Author Name(s) | Martin Griebl1Peter Faber2Christian Lengauer3 | ||||||||
Author Email(s) | griebl@fmi.uni-passau.de1 | ||||||||
Affiliation(s) | Department of Mathematics and Informatics, University of Passau, D-94030 Passau, Germany 1 2 3 | ||||||||
Keyword(s) | automatic parallelization, loop transformation, tiling, space-time mapping, polyhedron model, loop optimization, | ||||||||
Abstract |
|||||||||
Tiling is a well-known technique for sequential compiler optimization, as well as for automatic program parallelization. However, in the context of parallelization, tiling should not be considered as a stand-alone technique,but should be applied after a dedicated parallelization phase, in our case after space-time mapping.We show how tiling can benefit from space-time mapping, and we derive an algorithm for computing tiles which can minimize the number of communication startups, taking the number of physically available processors into account. We also present how the use of a simple cost model reduces real execution time. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | The effect of cache models on iterative compilation for combined tiling and unrolling | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.773 | ||||||||
Article ID | CPE773 | ||||||||
Author Name(s) | P. M. W. Knijnenburg1T. Kisuki2K. Gallivan3M. F. P. O'Boyle4 | ||||||||
Author Email(s) | peterk@liacs.nl1 | ||||||||
Affiliation(s) | LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands 1 2 Department of Computer Science, Florida State University, U.S.A. 3Institute for Computing Systems Architecture, Edinburgh University, U.K. 4 | ||||||||
Keyword(s) | compiler optimization, phase ordering problem, cache models, | ||||||||
Abstract |
|||||||||
Iterative compilation, where we search for the best program transformation based on profile feedback information, has been highly successful in determining good program optimizations, typically outperforming all static approaches. However, this benefit has come at a price, namely, a large number of executions of the target program which in many cases is unacceptable. This paper is concerned with reducing the number of program executions needed by iterative compilation by incorporating static models. We examine how static models may be incorporated into a purely iterative approach by developing several characterized heuristics. We empirically explore the tradeoff between reducing the number of executions required and the ultimate performance achieved for differing parameters or slack factors. We focus on tiling and unrolling as transformations and cache models as a test case for the feasibility of this approach. We show that a highly accurate model, used as a filter to profiling and appropriately parameterized, can reduce the number of executions by 50%. We also show that using a simple model to rank transformations and profiling, only those with the highest ranking can reduce the number of executions even further up to 70%, in cases where there is only a limited number of profiles available. We conclude that a production compiler might perform best using the last approach, gaining the benefit of iterative compilation at a much reduced cost. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A fast and accurate method for determining a lower bound on execution time | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.774 | ||||||||
Article ID | CPE774 | ||||||||
Author Name(s) | G. Fursin1M. F. P. O'Boyle2O. Temam3G. Watts4 | ||||||||
Author Email(s) | g.fursin@ed.ac.uk1 | ||||||||
Affiliation(s) | ICSA, School of Informatics, University of Edinburgh, Edinburgh EH9 3JZ, U.K. 1 2 Laboratoire de Recherche en Informatique, Bâtiment 490, Université Paris Sud, 91405 Orsay Cedex, France 3 4 | ||||||||
Keyword(s) | memory performance, optimization tool, memory latency analysis, | ||||||||
Abstract |
|||||||||
In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel object-oriented framework optimization | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | 02 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.775 | ||||||||
Article ID | CPE775 | ||||||||
Author Name(s) | Daniel J. Quinlan1Markus Schordan2Brian Miller3Markus Kowarschik4 | ||||||||
Author Email(s) | dquinlan@llnl.gov1 | ||||||||
Affiliation(s) | Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 1 2 3 System Simulation Group, Department of Computer Science, University of Erlangen-Nuremberg, Germany 4 | ||||||||
Keyword(s) | telescoping languages, AST restructuring tools, semantics-based transformations, object-oriented optimizations, | ||||||||
Abstract |
|||||||||
Sophisticated parallel languages are difficult to develop; most parallel distributed memory scientific applications are developed using a serial language, expressing parallelism through third party libraries (e.g. MPI). As a result, frameworks and libraries are often used to encapsulate significant complexities. We define a novel approach to optimize the use of libraries within applications. The resulting tool, named ROSE, leverages the additional semantics provided by library-defined abstractions enabling library specific optimization of application codes. It is a common perception that performance is inversely proportional to the level of abstraction. Our work shows that this is not the case if the additional semantics can be leveraged. We show how ROSE can be used to leverage the semantics within the compile-time optimization. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Editorials | ||||||||
Article Title | Special Issue: Compilers for Parallel Computers (CPC 2001) | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | Feb 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.766 | ||||||||
Article ID | CPE766 | ||||||||
Author Name(s) | . .1 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Data partitioning‐based parallel irregular reductions | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | Feb 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.769 | ||||||||
Article ID | CPE769 | ||||||||
Author Name(s) | Eladio Guti?rrez1Oscar Plata2Emilio L. Zapata3 | ||||||||
Author Email(s) | eladio@ac.uma.es1 | ||||||||
Affiliation(s) | Department of Computer Architecture, University of Malaga, E-29071 Malaga, Spain 1 2 3 | ||||||||
Keyword(s) | irregular reductions, data locality, workload balancing, shared-memory multiprocessor, NUMA machines, | ||||||||
Abstract |
|||||||||
Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years. We have classified all these methods and analyzed them in terms of a set of properties: data locality, memory overhead, exploited parallelism, and workload balancing. In this paper we propose several techniques to increase the amount of exploited parallelism and to introduce load balancing into an important class of these methods. Regarding parallelism, the proposed solution is based on the partial expansion of the reduction array. Load balancing is discussed in terms of two techniques. The first technique is a generic one, as it deals with any kind of load imbalance present in the problem domain. The second technique handles a special case of load imbalance which occurs whenever a large number of write operations are concentrated on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction methods. Copyright @ 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | <SC>SWARP</SC>: a retargetable preprocessor for multimedia instructions | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 2-3 | ||||||||
Date | Feb 1 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.776 | ||||||||
Article ID | CPE776 | ||||||||
Author Name(s) | Gilles Pokam1St?phane Bihan2Julien Simonnet3Fran?ois Bodin4 | ||||||||
Author Email(s) | gilles.pokam@irisa.fr1 | ||||||||
Affiliation(s) | IRISA, Campus Universitaire de Beaulieu, 35042 Rennes Cedex, France 1 2 3 4 | ||||||||
Keyword(s) | multimedia instructions, compiler optimizations, embedded processors, | ||||||||
Abstract |
|||||||||
In this paper, we propose SWARP, a retargetable preprocessor for exploiting multimedia instructions. The system mixes loop distribution, unrolling and pattern matching to exploit complex multimedia instructions. Contrary to all available systems, it can be extended at the user level. Using with a TriMedia processor we show that our system achieves important good code quality with a set of frequently used loop kernels for multimedia applications. Copyright @ 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A comparison of concurrent programming and cooperative multithreading under load balancing applications | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 4 | ||||||||
Date | 04 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.751 | ||||||||
Article ID | CPE751 | ||||||||
Author Name(s) | Justin T. Maris1Aaron W. Keen2Takashi Ishihara3Ronald A. Olsson4 | ||||||||
Author Email(s) | olsson@cs.ucdavis.edu4 | ||||||||
Affiliation(s) | Department of Computer Science, University of California, Davis, CA 95616, U.S.A. 1 2 3 4 | ||||||||
Keyword(s) | cooperative multithreading, concurrent programming, load balancing, parallel and distributed programming languages, synchronization mechanisms, | ||||||||
Abstract |
|||||||||
Two models of thread execution are the general concurrent programming execution model (CP) and the cooperative multithreading execution model (CM). CP provides nondeterministic thread execution where context switches occur arbitrarily. CM provides threads that execute one at a time until they explicitly choose to yield the processor. This paper focuses on a classic application to reveal the advantages and disadvantages of load balancing during thread execution under CP and CM styles; results from a second classic application were similar. These applications are programmed in two different languages (SR and Dynamic C) on different hardware (standard PCs and embedded system controllers). An SR-like run-time system, DesCaRTeS, was developed to provide interprocess communication for the Dynamic C implementations. This paper compares load balancing and non-load balancing implementations; it also compares CP and CM style implementations. The results show that in cases of very high or very low workloads, load balancing slightly hindered performance; and in cases of moderate workload, both SR and Dynamic C implementations of load balancing generally performed well. Further, for these applications, CM style programs outperform CP style programs in some cases, but the opposite occurs in some other cases. This paper also discusses qualitative tradeoffs between CM style programming and CP style programming for these applications. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | OpenMP-oriented applications for distributed shared memory architectures | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 4 | ||||||||
Date | 04 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.752 | ||||||||
Article ID | CPE752 | ||||||||
Author Name(s) | Ami Marowka1Zhenying Liu2Barbara Chapman3 | ||||||||
Author Email(s) | amimar2@yahoo.com1 | ||||||||
Affiliation(s) | Department of Computer Science, The University of Houston, Houston, TX 77204-3010, U.S.A. 1 2 3 | ||||||||
Keyword(s) | OpenMP, data locality, NAS Parallel Benchmarks, programming model, | ||||||||
Abstract |
|||||||||
The rapid rise of OpenMP as the preferred parallel programming paradigm for small-to-medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model-of-choice for large scale high-performance parallel computing in the coming decade. The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc-NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance. This paper presents a simple software programming approach called copy-inside-copy-back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP-SPMD style of programming that makes the development process of an OpenMP application more structured and simpler. The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse-grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Proceedings of the 10th International Symposium on High Performance Distributed Computing (HPDC-10) | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 4 | ||||||||
Date | 04 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.763 | ||||||||
Article ID | CPE763 | ||||||||
Author Name(s) | Anand Natrajan1Michael Crowley2Nancy Wilkins-Diehr3Marty A. Humphrey4Anthony D. Fox5Andrew S. Grimshaw6Charles L. Brooks7 | ||||||||
Author Email(s) | anand@virginia.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, University of Virginia, 151 Engineer's Way, Charlottesville, VA 22904, U.S.A. 1Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, U.S.A. 2San Diego Supercomputing Center, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, U.S.A. 3 4 5 6 7 | ||||||||
Keyword(s) | Legion, computational Grid, CHARMM, protein folding, | ||||||||
Abstract |
|||||||||
One benefit of a computational Grid is the ability to run high-performance applications over distributed resources simply and securely. We demonstrated this benefit with an experiment in which we studied the protein-folding process with the CHARMM molecular simulation package over a Grid managed by Legion, a Grid operating system. High-performance applications can take advantage of Grid resources if the Grid operating system provides both low-level functionality as well as high-level services. We describe the nature of services provided by Legion for high-performance applications. Our experiences indicate that human factors continue to play a crucial role in the configuration of Grid resources, underlying resources can be problematic, Grid services must tolerate underlying problems or inform the user, and high-level services must continue to evolve to meet user requirements. Our experiment not only helped a scientist perform an important study, but also showed the viability of an integrated approach such as Legion's for managing a Grid. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 4 | ||||||||
Date | March 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.748 | ||||||||
Article ID | CPE748 | ||||||||
Author Name(s) | Sotirios G Ziavras1Xiaofang Wang2 | ||||||||
Author Email(s) | ziavras@njit.edu1 ziavras@njit.edu2 | ||||||||
Affiliation(s) | Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, U.S.A. 1 2 | ||||||||
Keyword(s) | FPGA, LU factorization, matrix inversion, parallel processing, hardware design, SOPC/SOC, | ||||||||
Abstract |
|||||||||
Configurable computing, where hardware resources are configured appropriately to match specific hardware designs, has recently demonstrated its ability to significantly improve performance for a wide range of computation-intensive applications. With steady advances in silicon technology, as predicted by Moore's Law, Field-Programmable Gate Array (FPGA) technologies have enabled the implementation of System-on-a-Programmable-Chip (SOPC or SOC) computing platforms, which, in turn, have given a significant boost to the field of configurable computing. It is possible to implement various specialized parallel machines in a single silicon chip. In this paper, we describe our design and implementation of a parallel machine on an SOPC development board, using multiple instances of a soft IP configurable processor; we use this machine for LU factorization. LU factorization is widely used in engineering and science to solve efficiently large systems of linear equations. Our implementation facilitates the efficient solution of linear equations at a cost much lower than that of supercomputers and networks of workstations. The intricacies of our FPGA-based design are presented along with tradeoff choices made for the purpose of illustration. Performance results prove the viability of our approach. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | An approach for quality of service adaptation in service-oriented Grids | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.819 | ||||||||
Article ID | CPE819 | ||||||||
Author Name(s) | Rashid Al-Ali1Abdelhakim Hafid2Omer Rana3David Walker4 | ||||||||
Author Email(s) | rashid@cs.cardiff.ac.uk1 | ||||||||
Affiliation(s) | Department of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, Wales, U.K. 1Telcordia Technologies, Inc., 331 Newman Springs Road, Red Bank, NJ 07701, U.S.A. 2 3 4 | ||||||||
Keyword(s) | service-oriented grids, quality of service (QoS), QoS adaptation, | ||||||||
Abstract |
|||||||||
Some applications utilizing Grid computing infrastructure require the simultaneous allocation of resources, such as compute servers, networks, memory, disk storage and other specialized resources. Collaborative working and visualization is one example of such applications. In this context, quality of service (QoS) is related to Grid services, and not just to the network connecting these services. With the emerging interest in service-oriented Grids, resources may be advertised and traded as services based on a service level agreement (SLA). Such a SLA must include both general and technical specifications, including pricing policy and properties of the resources required to execute the service, to ensure QoS requirements are satisfied. An approach for QoS adaptation is presented to enable the dynamic adjustment of behavior of an application based on changes in the pre-defined SLA. The approach is particularly useful if workload or network traffic changes in unpredictable ways during an active session. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Special Issue: Middleware for Grid Computing | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.818 | ||||||||
Article ID | CPE818 | ||||||||
Author Name(s) | . .1 | ||||||||
Author Email(s) | .1 | ||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Grids of agents for computer and telecommunication network management | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.820 | ||||||||
Article ID | CPE820 | ||||||||
Author Name(s) | M. D. Assunção1F. L. Koch2C. B. Westphall3 | ||||||||
Author Email(s) | assuncao@lrg.ufsc.br1 | ||||||||
Affiliation(s) | Network and Management Laboratory, Federal University of Santa Catarina, Florianópolis, SC, 88049-970, P.O. 476, Brazil 1Intelligent Systems Group, University of Utrecht, The Netherlands 2 3 | ||||||||
Keyword(s) | Grid of agents, Grid computing, network management, | ||||||||
Abstract |
|||||||||
The centralized system approach for computer and telecommunication network management has been presenting scalability problems along with the growth in the amount and diversity of managed equipment. Moreover, the increase in complexity of the services being offered through the networks also contributes to adding extra workload to the management station. The amount of data that must be handled and processed by only one administration point could lead to a situation where there is not enough processing and storage power to carry out an efficient job. In this work we present an alternative approach by creating a highly distributed computing environment through the use of Grids of autonomous agents to analyze large amounts of data, which reduce the processing costs by optimizing the load distribution and resource utilization. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Applying the reflective middleware approach in Grid computing | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.822 | ||||||||
Article ID | CPE822 | ||||||||
Author Name(s) | Geoff Coulson1Gordon Blair2Nikos Parlavantzas3Wai Kit Yeung4Wei Cai5 | ||||||||
Author Email(s) | geoff@comp.lancs.ac.uk1 | ||||||||
Affiliation(s) | Distributed Multimedia Research Group, Computing Department, Lancaster University, Lancaster LA1 4YR, U.K. 1 2 3 4 5 | ||||||||
Keyword(s) | Grid computing, reflective middleware, component-based middleware, quality of service, | ||||||||
Abstract |
|||||||||
Significant progress has been made in the design and development of object-based distributed systems and, more recently, in component-based reflective middleware platforms-i.e. middleware platforms that, through reflection, can be flexibly configured, and runtime adapted/reconfigured-especially in terms of non-functional properties like quality of service (QoS). In this paper we consider how ideas and principles from such platforms can usefully be applied in a Grid context, specifically, an Open Grid Services Architecture (OGSA) context. To this end, we identify weaknesses in OGSA in the areas of generic service provision, scalability/performance engineering, and, especially, lack of support for QoS specification and realization, and suggest how these can be alleviated. We believe that such areas will become increasingly important as sophisticated e-Science applications start to exploit the potential of OGSA's service-based architecture. We also report on our current research on developing a binding-type framework, a resource management framework, and a set of performance optimizations for an ongoing OGSA implementation we are working on. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Performance evaluation of market-based resource allocation for Grid computing | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.826 | ||||||||
Article ID | CPE826 | ||||||||
Author Name(s) | J. Gomoluch1M. Schroeder2 | ||||||||
Author Email(s) | ck655@soi.city.ac.uk1 | ||||||||
Affiliation(s) | Department of Computing, City University, Northampton Square, London EC1V 0HB, U.K. 1 2 | ||||||||
Keyword(s) | Grid computing, market-based resource allocation, simulation, performance evaluation, | ||||||||
Abstract |
|||||||||
Resource allocation is an important aspect of Grid computing. Over the past few years, various systems have been developed which use market mechanisms to allocate resources. However, the performance of such policies has not been sufficiently studied. In this paper, we investigate under which circumstances market-based resource allocation by continuous double auctions and by the proportional share protocol, respectively, outperforms a conventional round-robin approach. We develop a model for clients, servers and the market, and present simulation results. Factors which are studied include the amount of load in the system, the number of resources, different degrees of resource heterogeneity, and communication delays. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | EasyGrid: towards a framework for the automatic Grid enabling of legacy MPI applications | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.821 | ||||||||
Article ID | CPE821 | ||||||||
Author Name(s) | Vinod E. F Rebello1 | ||||||||
Author Email(s) | vinod@ic.uff.br1 | ||||||||
Affiliation(s) | Instituto de Computa??o, Universidade Federal Fluminense, Bloco E, 3? Andar, Rua Passo da P?tria 156, S?o Domingos, Niter?i, CEP 24210-240 1 | ||||||||
Keyword(s) | Grid computing, system-aware applications, Grid middleware, MPI programs, task scheduling, | ||||||||
Abstract |
|||||||||
One of the goals of the Grid is to aggregate collections of shared, heterogeneous, and distributed resources to provide computational power to parallel applications. However, designing applications capable of exploiting this potential with ease remains a challenge. This paper outlines the EasyGrid methodology for the efficient and robust execution of (legacy) MPI programs across distributed computing clusters. The principal objective of this work is to identify the application-oriented middleware necessary for, as well as to develop a framework to automatically generate, system-aware applications capable of executing in dynamic, unstable, distributed environments such as computational Grids. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Global multimedia collaboration system | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.823 | ||||||||
Article ID | CPE823 | ||||||||
Author Name(s) | Wenjun Wu1Geoffrey Fox2Ahmet Uyar3Hasan Bulut4Shrideep Pallickara5 | ||||||||
Author Email(s) | wewu@indiana.edu1 | ||||||||
Affiliation(s) | Community Grids Laboratory, Indiana University, Indiana University Research Park, 501 North Morton Street, Suite 222, Bloomington, IN 47404, U.S.A. 1 2 3 4 5 | ||||||||
Keyword(s) | collaboration, Web services, XGSP, Admire, NaradaBrokering, | ||||||||
Abstract |
|||||||||
In order to build an integrated collaboration system over heterogeneous collaboration technologies, we propose a global multimedia collaboration system (Global-MMCS) based on XGSP A/V Web-Services framework. This system can integrate multiple A/V services, and support various collaboration clients and communities. Now the prototype is being developed and deployed across many universities in U.S.A. and China. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | InteGrade: object-oriented Grid middleware leveraging the idle computing power of desktop machines | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.824 | ||||||||
Article ID | CPE824 | ||||||||
Author Name(s) | Fabio Kon1Andrei Goldchleger2Alfredo Goldman3Marcelo Finger4Germano Capistrano Bezerra5 | ||||||||
Author Email(s) | kon@ime.usp.br1 | ||||||||
Affiliation(s) | Department of Computer Science, University of S?o Paulo, Rua do Mat?o 1010, 05508-090 S?o Paulo, SP, Brazil 1 2 3 4 5 | ||||||||
Keyword(s) | Grid computing, distributed object systems, CORBA, usage pattern analysis, | ||||||||
Abstract |
|||||||||
Grid computing technology improves the computing experiences at organizations by effectively integrating distributed computing resources. However, just a small fraction of currently available Grid infrastructures focuses on reutilization of existing commodity computing resources. This paper introduces InteGrade, a novel object-oriented middleware Grid infrastructure that focuses on leveraging the idle computing power of shared desktop machines. Its features include support for a wide range of parallel applications and mechanisms to assure that the owners of shared resources do not perceive any loss in the quality of service. A prototype implementation is under construction and the current version is available for download. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A model for parallel job scheduling on dynamical computer Grids | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.825 | ||||||||
Article ID | CPE825 | ||||||||
Author Name(s) | Alfredo Goldman1Carlos Queiroz2 | ||||||||
Author Email(s) | gold@ime.usp.br1 | ||||||||
Affiliation(s) | Department of Computer Science, University of S?o Paulo, Rua do Matão 1010, 05508-090 São Paulo, SP, Brazil 1 2 | ||||||||
Keyword(s) | Grid computing, multi-site execution, scheduling of parallel applications, | ||||||||
Abstract |
|||||||||
This work presents a model that allows the execution of parallel applications in a Grid environment. Our main focus is on how to share the idle cycles of clusters and computers to execute real parallel applications. We present a new model which introduces the notions of locality and adaptability. The locality is used for job allocation, and for job migration. The adaptability provides a simple mechanism to allow clusters to join or leave a Grid. We also propose the middleware architecture to implement the model, and provide some simulation results to show the main characteristics of the model. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Fine-grained authorization for job execution in the Grid: design and implementation | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.827 | ||||||||
Article ID | CPE827 | ||||||||
Author Name(s) | K. Keahey1V. Welch2S. Lang3B. Liu4S. Meder 5 | ||||||||
Author Email(s) | keahey@mcs.anl.gov1 | ||||||||
Affiliation(s) | Mathematics and Computer Science Division, Argonne National Laboratory, Building 221, 9700 South Cass Avenue, Argonne, IL 60439, U.S.A. 1University of Chicago, 5801 South Ellis Avenue, Chicago, IL, U.S.A. 2 3 University of Houston, 4800 Calhoun Road, Houston, TX, U.S.A. 4 5 | ||||||||
Keyword(s) | Grids, authorization, policy enforcement, resource management, | ||||||||
Abstract |
|||||||||
In this paper, we describe our work on enabling fine-grained authorization for resource usage and management. We address the need of virtual organizations to enforce their own polices in addition to those of the resource owners, with regards to both resource consumption and job management. To implement this design, we propose changes and extensions to the Globus Toolkit's version 2 resource management mechanism. We describe the prototype and policy language that we have designed to express fine-grained policies and present an analysis of our solution. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A resource management framework for interactive Grids | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.828 | ||||||||
Article ID | CPE828 | ||||||||
Author Name(s) | Raj Kumar1Vanish Talwar2Sujoy Basu3 | ||||||||
Author Email(s) | raj.kumar@hp.com1 | ||||||||
Affiliation(s) | Hewlett-Packard Labs, 1501 Page Mill Road, MS 1181, Palo Alto, CA 94304, U.S.A. 1 2 3 | ||||||||
Keyword(s) | Grid computing, interactive sessions, resource management, agents, | ||||||||
Abstract |
|||||||||
Traditional use of Grid computing systems has been for batch jobs in the scientific and academic computing. We envision the next generation Grid computing systems to support graphical interactive sessions. In this paper, we propose a resource management framework for supporting graphical interactive sessions in a Grid computing system. We describe the high-level architectural resource management framework distributed among the submission node, central scheduler node, and the execution node. We then describe in detail the resource management framework on the execution node. The description of the resource management framework on the scheduler node is kept at a high level in this paper. The framework on execution nodes consists of Resource Management Agents, an Admission Control system and Application Predictor system. The agents on the execution node are Startup Agents, Sensor Agents, Monitoring Agents, Aggregator Agents, Enforcement Agents and Registration Agents. The Session Admission Control system is responsible for determining if a new application session can be admitted to the execution node. An Application Predictor system is responsible for predicting the resource utilization behavior of applications based on data obtained from the Resource Management Agents. The proposed framework allows for implementation of a scalable and extensible middleware for interactive Grid resource management. It supports fine-grained performance guarantees specified in service level agreements and brings forth some important and novel contributions to enable graphical interactive sessions on Grids. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | GridSphere: a portal framework for building collaborations | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.829 | ||||||||
Article ID | CPE829 | ||||||||
Author Name(s) | Oliver Wehrens1Jason Novotny2Michael Russell3 | ||||||||
Author Email(s) | wehrens@aei.mpg.de1 | ||||||||
Affiliation(s) | Max-Planck-Institut fur Gravitationsphysik, Albert-Einstein-Institut, Am Mhlenberg 1, D-14476 Golm, Germany 1 2 3 | ||||||||
Keyword(s) | Grid computing, portals, portlets, | ||||||||
Abstract |
|||||||||
Grid-enabled portals are becoming increasingly popular as a platform for providing access to Grid services and resources. Unfortunately, much of the work done in portal development has led to vertically layered solutions that work for a particular project but are difficult to extend or reuse for other projects. The GridSphere portal framework seeks to address these limitations by providing a framework that will offer external developers a model for easily adding new functionality and hence increasing community collaboration. The GridLab portal will serve as an initial prototype to showcase the GridSphere framework and provide access to services being developed within the GridLab project. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | CoDIMS: an adaptable middleware system for scientific visualization in Grids | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.830 | ||||||||
Article ID | CPE830 | ||||||||
Author Name(s) | G. Giraldi1F. Porto2J. C. Oliveira3R. Silva4B. Schulze5 | ||||||||
Author Email(s) | gilson@lncc.br1 | ||||||||
Affiliation(s) | National Laboratory for Scientific Computing, Department of Computer Science, Av. Getulio Vargas 333, 25651-075, Petrópolis, RJ, Brazil 1Department of Systems Engineering, Military Institute of Engineering, Rio de Janeiro, RJ, Brazil 2 3 4 5 | ||||||||
Keyword(s) | Grid computing, middleware, scientific visualization, particle tracing, | ||||||||
Abstract |
|||||||||
In this paper we propose a middleware infrastructure adapted for supporting scientific visualization applications over a Grid environment. We instantiate a middleware system from CoDIMS, which is an environment for the generation of configurable data integration middleware systems. CoDIMS adaptive architecture is based on the integration of special components managed by a control module that executes users workflows. We exemplify our proposal with a middleware system generated for computing particles' trajectories within the constraints imposed by a Grid environment. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | An asynchronous middleware for Grid resource monitoring | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.831 | ||||||||
Article ID | CPE831 | ||||||||
Author Name(s) | V. Quéma1R. Lachaize2E. Cecchet3 | ||||||||
Author Email(s) | vivien.quema@inrialpes.fr1 | ||||||||
Affiliation(s) | INRIA Rhône-Alpes, 655 avenue de l'Europe, 38334 Saint-Ismier Cedex, France 1Sardes Project LSR-IMAG - INRIA, 655, avenue de l'Europe, 38334 Saint-Ismier Cedex, France 2 3 | ||||||||
Keyword(s) | Grid computing, resource monitoring, asynchronous middleware, components, | ||||||||
Abstract |
|||||||||
Resource management in a Grid computing environment raises several technical issues. The monitoring infrastructure must be scalable, flexible, configurable and adaptable to support thousands of devices in a highly dynamic environment where operational conditions are constantly changing. We propose to address these challenges by combining asynchronous communications with reflective component-based technologies. We introduce DREAM(Dynamic REflective Asynchronous Middleware), a Java component-based message oriented middleware. Asynchronous communications are used to achieve the scalability and flexibility objectives, whereas the reflective component technology provides the complementary configurability and adaptability features. We argue that this infrastructure is a viable approach to build a resource monitoring infrastructure for Grid computing. Moreover, DREAM makes the monitoring logic accessible from J2EE application servers. This allows the monitoring information to be presented as a Grid service and to integrate within the Open Grid Software Architecture. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Grid computing and active services | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.832 | ||||||||
Article ID | CPE832 | ||||||||
Author Name(s) | B. Schulze1E. R. Madeira2 | ||||||||
Author Email(s) | schulze@lncc.br1 | ||||||||
Affiliation(s) | National Laboratory for Scientific Computing LNCC, Department of Computer Science, Av. Getúlio Vargas 333, 25651-070 Petrópolis, RJ, Brazil 1Institute of Computing/State University of Campinas - UNICAMP, P.O. Box 6176, 13083-970 Campinas, SP, Brazil 2 | ||||||||
Keyword(s) | middleware, Grid computing, agents, Web services, | ||||||||
Abstract |
|||||||||
Recent activities in Grid computing point in the direction of an open service architecture merging computational infrastructure with Grid services and Web services. We also see advantages in adding facilities offered by distributed object computing environments with their interoperable references and services. The use of open service architectures introduces the possibility of composing structures of nested services. In this paper we discuss architectural aspects of middleware for Grid computing based on an infrastructure of distributed clusters and/or also services, and an access portal as a demonstration project in progress. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Programming and coordinating Grid environments and applications | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 5 | ||||||||
Date | 04 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.833 | ||||||||
Article ID | CPE833 | ||||||||
Author Name(s) | Cristina Ururahy1Noemi Rodriguez2 | ||||||||
Author Email(s) | ururahy@inf.puc-rio.br1 | ||||||||
Affiliation(s) | Computer Science Department, PUC-Rio, Rua Marquês de São Vicente 225, 22453-900 Rio de Janeiro, RJ, Brazil 1Computer Science Department, PUC-Rio, Rua Marqu?s de S?o Vicente 225, 22453-900 Rio de Janeiro, RJ, Brazil 2 | ||||||||
Keyword(s) | distributed systems, parallel applications, Grid computing, interpreted languages, event-driven communication, middleware, coordination, dynamic adaptation, | ||||||||
Abstract |
|||||||||
The heterogeneous and dynamic nature of Grid environments place new demands on models and paradigms for parallel programming. In this work we discuss how ALua, a programming system based on a dual programming language model, can help the programmer to develop applications for this environment, monitoring the state of resources and controlling the application so that it adapts to changes in this state. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel program debugging by specification | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 6 | ||||||||
Date | 05 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.762 | ||||||||
Article ID | CPE762 | ||||||||
Author Name(s) | Simon Huband1Chris McDonald2 | ||||||||
Author Email(s) | huey@csse.uwa.edu.au1 | ||||||||
Affiliation(s) | School of Computer Science and Software Engineering, The University of Western Australia, Crawley, Western Australia 6009, Australia 1 2 | ||||||||
Keyword(s) | Message Passing Interface (MPI), debugging, process topologies, conformance testing, | ||||||||
Abstract |
|||||||||
Most message passing parallel programs employ logical process topologies with regular characteristics to support their computation. Since process topologies define the relationship between processes, they present an excellent opportunity for debugging. The primary benefit is that process behaviours can be correlated, allowing expected behaviour to be abstracted and identified, and undesirable behaviour reported. However, topology support is inadequate in most message passing parallel programming environments, including the popular Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM). Programmers are forced to implement topology support themselves, increasing the possibility of introducing errors. This paper proposes a trace- and topology-based approach to parallel program debugging, driven by four distinct types of specifications. Trace specifications allow trace data from a variety of sources and message passing libraries to be interpreted in an abstract manner, and topology specifications address the lack of explicit topology knowledge, whilst also facilitating the construction of user-consistent views of the debugging activity. Loop specifications express topology-consistent patterns of expected trace events, allowing conformance testing of associated trace data, and error specifications specify undesirable event interactions, including mismatched message sizes and mismatched communication pairs. Both loop and error specifications are simplified by having knowledge of the actual topologies being debugged. The proposed debugging framework enables a wealth of potential debugging views and techniques. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A parallel Broyden approach to the Toeplitz inverse eigenproblem | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 6 | ||||||||
Date | 05 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.764 | ||||||||
Article ID | CPE764 | ||||||||
Author Name(s) | Jesús Peinado1Antonio M. Vidal2 | ||||||||
Author Email(s) | jpeinado@dsic.upv.es1 | ||||||||
Affiliation(s) | Departamento de Sistemas Informéticos y Computación, Universidad Politécnica Valencia, Valencia 46071, Spain 1 2 | ||||||||
Keyword(s) | inverse eigenvalue problem, real symmetric Toeplitz matrices, parallel algorithms, Broyden method, SCALAPACK, | ||||||||
Abstract |
|||||||||
In this work we show a portable sequential and a portable parallel algorithm for solving the inverse eigenproblem for real symmetric Toeplitz matrices. Both algorithms are based on Broyden's method for solving nonlinear systems. We reduced the computational cost for some problem sizes, and furthermore we managed to reduce spatial cost considerably, compared in both cases with parallel algorithms proposed by other authors and by us, although sometimes quasi-Newton methods (as Broyden) do not reach convergence in all the test cases. We have implemented the parallel algorithm using the parallel numerical linear algebra library SCALAPACK based on the MPI environment. Experimental results have been obtained using two different architectures: a shared memory multiprocessor, the SGI PowerChallenge, and a cluster of Pentium II PCs connected through a myrinet network. The algorithms obtained are scalable in all the cases. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | User transparency: a fully sequential programming model for efficient data parallel image processing | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 6 | ||||||||
Date | 05 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.765 | ||||||||
Article ID | CPE765 | ||||||||
Author Name(s) | F. J. Seinstra1D. Koelma2 | ||||||||
Author Email(s) | fjseins@science.uva.nl1 | ||||||||
Affiliation(s) | Intelligent Sensory Information Systems, Faculty of Science, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands 1 2 | ||||||||
Keyword(s) | data parallel image processing, software architecture design, performance evaluation, | ||||||||
Abstract |
|||||||||
Although many image processing applications are ideally suited for parallel implementation, most researchers in imaging do not benefit from high-performance computing on a daily basis. Essentially, this is due to the fact that no parallelization tools exist that truly match the image processing researcher's frame of reference. As it is unrealistic to expect imaging researchers to become experts in parallel computing, tools must be provided to allow them to develop high-performance applications in a highly familiar manner. In an attempt to provide such a tool, we have designed a software architecture that allows transparent (i.e. sequential) implementation of data parallel imaging applications for execution on homogeneous distributed memory MIMD-style multicomputers. This paper presents an extensive overview of the design rationale behind the software architecture, and gives an assessment of the architecture's effectiveness in providing significant performance gains. In particular, we describe the implementation and automatic parallelization of three well-known example applications that contain many fundamental imaging operations: (1) template matching; (2) multi-baseline stereo vision; and (3) line detection. Based on experimental results we conclude that our software architecture constitutes a powerful and user-friendly tool for obtaining high performance in many important image processing research areas. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Special Issue: Formal Techniques for Java-like Programs | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 7 | ||||||||
Date | 06 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.797 | ||||||||
Article ID | CPE797 | ||||||||
Author Name(s) | . .1 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Simple verification technique for complex Java bytecode subroutines | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 7 | ||||||||
Date | 06 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.798 | ||||||||
Article ID | CPE798 | ||||||||
Author Name(s) | Alessandro Coglio1 | ||||||||
Author Email(s) | coglio@kestrel.edu1 | ||||||||
Affiliation(s) | Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, U.S.A. 1 | ||||||||
Keyword(s) | Java, subroutines, bytecode verification, | ||||||||
Abstract |
|||||||||
Java is normally compiled to bytecode, which is verified and then executed by the Java Virtual Machine. Bytecode produced via compilation must pass verification. The main cause of complexity for bytecode verification is subroutines, used by compilers to generate more compact code. The techniques to verify subroutines proposed in the literature reject certain programs produced by mundane compilers, are difficult to realize within an implementation of the Java Virtual Machine or are relatively complicated. This paper presents a novel technique which is very simple to understand, implement and prove sound. It is also very powerful: the set of accepted programs has a simple characterization which most likely includes all the code produced by current compilers and which enables future compilers to make more extensive use of subroutines. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Analysing the Java package/access concepts in Isabelle/HOL | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 7 | ||||||||
Date | 06 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.800 | ||||||||
Article ID | CPE800 | ||||||||
Author Name(s) | Norbert Schirmer1 | ||||||||
Author Email(s) | schirmer@in.tum.de1 | ||||||||
Affiliation(s) | Technical University of München, Department of Informatics, D-85748 Garching, Germany 1 | ||||||||
Keyword(s) | Java packages, Java access modifiers, theorem proving, type safety, | ||||||||
Abstract |
|||||||||
Java access modifiers and packages provide a mechanism to restrict access to members and types, as an additional means of information hiding beyond the purely object-oriented concept of classes. In this paper we clarify the semantics of access modifiers and packages by adding them to our formal model of Java in the theorem prover Isabelle/HOL. We analyse which properties we can rely on at runtime, provided that the program has passed the static accessibility tests. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Transposing F to C![]() |
||||||||
Volume ID | 16 | ||||||||
Issue ID | 7 | ||||||||
Date | 06 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.801 | ||||||||
Article ID | CPE801 | ||||||||
Author Name(s) | Andrew Kennedy1Don Syme2 | ||||||||
Author Email(s) | akenn@microsoft.com1 | ||||||||
Affiliation(s) | Microsoft Research, 7 J. J. Thomson Avenue, Cambridge CB3 OFB, U.K. 1 2 | ||||||||
Keyword(s) |
polymorphism, Java, C![]() |
||||||||
Abstract |
|||||||||
We present a type-preserving translation of System F (the polymorphic lambda calculus) into a forthcoming revision of the C![]() ![]() |
|||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Checking ownership and confinement | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 7 | ||||||||
Date | 06 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.799 | ||||||||
Article ID | CPE799 | ||||||||
Author Name(s) | James Noble1Alex Potanin2Robert Biddle3 | ||||||||
Author Email(s) | kjx@mcs.vuw.ac.nz1 | ||||||||
Affiliation(s) | School of Mathematical and Computing Sciences, Victoria University of Wellington, P.O. Box 600, Wellington, New Zealand 1 2 3 | ||||||||
Keyword(s) | aliasing, uniqueness, ownership, confinement, object-oriented programming, | ||||||||
Abstract |
|||||||||
A number of proposals to manage aliasing in Java-like programming languages have been advanced over the last five years. It is not clear how practical these proposals are, that is, how well they relate to the kinds of programs currently written in Java-like languages. To address this problem, we analysed heap snapshots from a corpus of Java programs. Our results indicate that object-oriented programs do in fact exhibit symptoms of encapsulation in practice, and that proposed models of uniqueness, ownership, and confinement can usefully describe the aliasing structures of object-oriented programs. Understanding the kinds of aliasing present in programs should help us to design formalisms to make explicit the kinds of aliasing implicit in object-oriented programs. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A parallel algorithm for static slicing of concurrent programs | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 8 | ||||||||
Date | 07 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.789 | ||||||||
Article ID | CPE789 | ||||||||
Author Name(s) | D. Goswami1R. Mall2 | ||||||||
Author Email(s) | dgoswami@iitg.ernet.in1 | ||||||||
Affiliation(s) | Department of Computer Science and Engineering, IIT Guwahati, India 1Department of Computer Science and Engineering, IIT Kharagpur, India 2 | ||||||||
Keyword(s) | program slicing, static slicing, process network, program representation, parallel algorithm, concurrent program, | ||||||||
Abstract |
|||||||||
Slicing of concurrent programs is a compute-intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediate representation. We used a network of communicating processes to develop our parallel algorithm. We have implemented our parallel algorithm and the experimental results appear promising. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 8 | ||||||||
Date | 07 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.791 | ||||||||
Article ID | CPE791 | ||||||||
Author Name(s) | F. Desprez1F. Suter2 | ||||||||
Author Email(s) | frederic.desprez@ens-lyon.fr1 | ||||||||
Affiliation(s) | LIP, UMR CNRS, ENS Lyon, INRIA 5668, 46 allée d'Italie, F-69364 Lyon Cedex 07, France 1GRAIL, UCSD, 9500 Gilman Drive, MC 0114, La Jolla, CA 92093-0114, U.S.A. 2 | ||||||||
Keyword(s) | mixed-parallelism, Grid computing, Strassen, Winograd, ScaLAPACK, | ||||||||
Abstract |
|||||||||
In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithms. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Data structures in Java for matrix computations | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 8 | ||||||||
Date | 07 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.793 | ||||||||
Article ID | CPE793 | ||||||||
Author Name(s) | Geir Gundersen1Trond Steihaug2 | ||||||||
Author Email(s) | trond.steihaug@ii.uib.no2 | ||||||||
Affiliation(s) | Department of Informatics, University of Bergen, Norway 1 2 | ||||||||
Keyword(s) | Java, linear algebra, data structures, sparse matrices, | ||||||||
Abstract |
|||||||||
In this paper we show how to utilize Java's native arrays for matrix computations. The disadvantages of Java arrays used as a 2D array for dense matrix computation are discussed and ways to improve the performance are examined. We show how to create efficient dynamic data structures for sparse matrix computations using Java's native arrays. This data structure is unique for Java and shown to be more dynamic and efficient than the traditional storage schemes for large sparse matrices. Numerical testing indicates that this new data structure, called Java Sparse Array, is competitive with the traditional Compressed Row Storage scheme on matrix computation routines. Java gives increased flexibility without losing efficiency. Compared with other object-oriented data structures Java Sparse Array is shown to have the same flexibility. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Design and evaluation of a TOP100 Linux Super Cluster system | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 8 | ||||||||
Date | 07 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.787 | ||||||||
Article ID | CPE787 | ||||||||
Author Name(s) | Erik Elmroth1Niklas Edmundsson2Bo K?gstr?m3Markus M?rtensson4Mats Nyl?n5?ke Sandgren6Mattias Wadenstein7 | ||||||||
Author Email(s) | elmroth@cs.umu.se1 | ||||||||
Affiliation(s) | High Performance Computing Center North (HPC2N), Ume? University, SE-901 87 Ume?, Sweden 1 2 3 4 5 6 7 | ||||||||
Keyword(s) | Linux cluster, performance, benchmark, STREAM, Pallas, HPC2N, HP Linpack, NAS, AMD, SCI network, MPI, TOP500, | ||||||||
Abstract |
|||||||||
The High Performance Computing Center North (HPC2N) Super Cluster is a truly self-made high-performance Linux cluster with 240 AMD processors in 120 dual nodes, interconnected with a high-bandwidth, low-latency SCI network. This contribution describes the hardware selected for the system, the work needed to build it, important software issues and an extensive performance analysis. The performance is evaluated using a number of state-of-the-art benchmarks and software, including STREAM, Pallas MPI, the Atlas DGEMM, High-Performance Linpack and NAS Parallel benchmarks. Using these benchmarks we first determine the raw memory bandwidth and network characteristics; the practical peak performance of a single CPU, a single dual-node and the complete 240-processor system; and investigate the parallel performance for non-optimized dusty-deck Fortran applications. In summary, this $500 000 system is extremely cost-effective and shows the performance one would expect of a large-scale supercomputing system with distributed memory architecture. According to the TOP500 list of June 2002, this cluster was the 94th fastest computer in the world. It is now fully operational and stable as the main computing facility at HPC2N. The system's utilization figures exceed 90%, i.e. all 240 processors are on average utilized over 90% of the time, 24 hours a day, seven days a week. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Real-time primer design for DNA chips | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.810 | ||||||||
Article ID | CPE810 | ||||||||
Author Name(s) | H. Simmler1H. Singpiel2R. Männer3 | ||||||||
Author Email(s) | harald.simmler@acconovis.com1 | ||||||||
Affiliation(s) | Acconovis GmbH, Lindenhofstr. 42-44, 68163 Mannheim, Germany 1 2 University of Mannheim, B6, 23-29, 68131 Mannheim, Germany 3 | ||||||||
Keyword(s) | bioinformatics, primer design, microarray, DNA chips, FPGA processor, parallel architecture, | ||||||||
Abstract |
|||||||||
The design of PCR or DNA chip experiments is a time-consuming process where bioinformatics is extensively used. The selection of the primers, which are immobilized on the DNA chip, requires a complex algorithm. Based on several parameters an optimized set of primers is automatically determined for a given gene sequence. This paper describes a parallel architecture which performs the optimization of the primer selection on a hardware accelerator. In contrast to the pure software approach, the parallel architecture gains a speedup of factor 500 using a PCI-based hardware accelerator. This approach allows an optimization of a specified primer set in real time. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Development and implementation of a parallel algorithm for the fast design of oligonucleotide probe sets for diagnostic DNA microarrays | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.811 | ||||||||
Article ID | CPE811 | ||||||||
Author Name(s) | H. Meier1A. Krause2M. Kräutner3A. Bode4 | ||||||||
Author Email(s) | meierh@in.tum.de1 | ||||||||
Affiliation(s) | Lehrstuhl fuer Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur (LRR-TUM), Institut fuer Informatik, Technische Universitaet Muenchen, Boltzmannstrasse 3, 85748 Garching, Germany 1 2 3 4 | ||||||||
Keyword(s) | DNA microarray, bioinformatic algorithm, molecular sequence analysis, parallel and distributed computing, oligonucleotide probe, combinatorial, DNA chip, | ||||||||
Abstract |
|||||||||
We describe an accurate method for the automatic parallel generation of oligonucleotide probe sets for DNA microarrays. This approach includes a component for high-performance specificity evaluation of designed probes in large data sets. The three main algorithmic components of the method, namely probe preselection, hybridization prediction and probe selection are explained in detail. We introduce new combinatorial techniques for the efficient selection of probe sets of high differentiation capability even from sequence databases of conserved homologous genes. These techniques include the automatic generation of group specific probes as well as the design of excluding probes. A basic prototype has been implemented including a shared memory parallelization. Test runs have been performed on a multiprocessor personal computer with subsets of a small subunit ribosomal ribonucleic acid database, containing very conserved sequence data. The applicability of our program is pointed out by designing a set of oligonucleotide probes that shall allow a comprehensive parallel identification and differentiation of several groups of extremophilic prokaryotes by DNA microarray. The probe set is accessible via the Internet. On applying the parallel version on a dual processor system an efficiency of 80% was achieved. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A hybrid self-organizing maps and particle swarm optimization approach | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.812 | ||||||||
Article ID | CPE812 | ||||||||
Author Name(s) | Xiang Xiao1Ernst R. Dow2Russell Eberhart3Zina Ben Miled4Robert J. Oppelt5 | ||||||||
Author Email(s) | zmiled@iupui.edu4 | ||||||||
Affiliation(s) | Indiana University Purdue University Indianapolis, 723 West Michigan St., SL 160C, Indianapolis, IN 46202-5132, U.S.A. 1 2 3 4 5 | ||||||||
Keyword(s) | gene clustering, self-organizing maps, particle swarm optimization, | ||||||||
Abstract |
|||||||||
Gene clustering, the process of grouping related genes in the same cluster, is at the foundation of different genomic studies that aim at analyzing the function of genes. Microarray technologies have made it possible to measure gene expression levels for thousands of genes simultaneously. For knowledge to be extracted from the datasets generated by these technologies, the datasets have to be presented to a scientist in a meaningful way. Gene clustering methods serve this purpose. In this paper, a hybrid clustering approach that is based on self-organizing maps and particle swarm optimization is proposed. In the proposed algorithm, the rate of convergence is improved by adding a conscience factor to the self-organizing maps algorithm. The robustness of the result is measured by using a resampling technique. The algorithm is implemented on a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A specialized hardware device for the protein similarity search | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.813 | ||||||||
Article ID | CPE813 | ||||||||
Author Name(s) | A. Marongiu1P. Palazzari2V. Rosato3 | ||||||||
Author Email(s) | palazzari@casaccia.enea.it2 | ||||||||
Affiliation(s) | ENEA, Computing and Modeling Unit, C. R. Casaccia, Via Anguillarese 301, SP111 - 00060 S. Maria di Galeria, Rome, Italy 1 2 3 | ||||||||
Keyword(s) | biological sequence analysis, FPGA, systolic architecture, System of Affine Recurrence Equations, | ||||||||
Abstract |
|||||||||
This work presents the architecture of PROSIDIS, a special purpose processor designed to search for the occurrence of substrings similar to a given ‘template string’ within a proteome. Similarity is determined by means of a weighting matrix which measures the degree of similarity between any two amino acids. The paper recalls the basis of the PHG (Parallel Hardware Generator) tool, developed in the framework of the HADES project (Hardware Design for Scientific Applications). Through PHG it is possible to design, in a nearly automatic way, a parallel hardware which efficiently implements an algorithm described through recurrence equations; PHG is able to generate a synthesizable VHDL which represents a parallel system derived from the System of Affine Recurrence Equations (SARE) describing the problem. In this work we illustrate the advantages derived from designing a special purpose processor to face the protein similarity discovery problem. Some preliminary results are given, reporting the time spent by several conventional computing architectures and by the PROSIDIS processor hosted by a personal computer to solve the same protein analysis problem. The results show that PROSIDIS, implemented on a Xilinx XV1000 FPGA, gives speedup figures ranging from 5.6 up to 55.6. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallelization of IBD computation for determining genetic disease maps | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.814 | ||||||||
Article ID | CPE814 | ||||||||
Author Name(s) | Nouhad J. Rizk1 | ||||||||
Author Email(s) | nrizk@ndu.edu.lb1 | ||||||||
Affiliation(s) | Computer Science Department, Notre Dame University, P.O. Box 72, Zouk-Mosbeh, Lebanon 1 | ||||||||
Keyword(s) | genetic linkage analysis, Identity by Descent, parallel computing, | ||||||||
Abstract |
|||||||||
A number of software packages are available for the construction of comprehensive human genetic maps. In this paper we parallelize the widely used package Genehunter. We restrict our attention to only one function of the package, namely the computations of Identity By Descent (IBD) genes of a family. We use a master-slave model with the Message Passing Interface parallel environment. Our tests are done on two different architectures: a network of workstations and a shared memory multiprocessor. A new and efficient strategy to classify the parallelization of genetic linkage analysis programs results from our experiments. The classification is based on values of parameters which affect the complexity of the computation. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Development of distributed bioinformatics applications with GMP | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.815 | ||||||||
Article ID | CPE815 | ||||||||
Author Name(s) | Bertil Schmidt1Lin Feng2Amey Laud3Yusdi Santoso4 | ||||||||
Author Email(s) | asbschmidt@ntu.edu.sg1 | ||||||||
Affiliation(s) | School of Computer Engineering, Nanyang Technological University, Singapore 639798 1 2 Helixense Pte Ltd, 73 Science Park Drive, 02-05 Science Park 1, Singapore 118254 3 4 | ||||||||
Keyword(s) | message passing interfaces, parallel architectures for biological applications, computational genomics, bioinformatics systems development, | ||||||||
Abstract |
|||||||||
We present the design and use of gMP as a tool for developing distributed bioinformatics applications. gMP is a purely Java-based interface that adds MPI-like message-passing and collective communication to the genomics Research Network Architecture (gRNA). The gRNA is a highly programmable, modular environment specifically designed to invigorate the development of genome-centric tools for life science research. We demonstrate the development of a distributed application to detect regulatory elements using correlation with gene expression data. Its implementation with gMP leads to significant runtime savings on our distributed gRNA system. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Parallel divide and conquer approach for the protein threading problem | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.816 | ||||||||
Article ID | CPE816 | ||||||||
Author Name(s) | Nicola Yanev1Rumen Andonov2 | ||||||||
Author Email(s) | randonov@univ-valenciennes.fr2 | ||||||||
Affiliation(s) | University of Sofia, 5 J. Bouchier str., 1126 Sofia, Bulgaria 1LAMIH/ROI, University of Valenciennes, Le Mont Houy, 59313 Valenciennes Cedex 9, France 2 | ||||||||
Keyword(s) | protein threading problem formulations, network-flow optimization, integer programming, | ||||||||
Abstract |
|||||||||
For the Protein Threading Problem (PTP) we propose a network-flow like formulation. We also show its equivalence with a generalization of the shortest path problem on a graph with a very particular structure. The underlying Mixed Integer Programming (MIP) model proves to be very appropriate for the PTP-large real-life instances have been solved much faster by using only the MIP solver CPLEX instead of a known special-purpose branch and bound algorithm. The properties of the MIP model permit a decomposition of the main problem into a large number of subproblems (tasks). We show that a branch and cut strategy can be efficiently applied for solving these tasks in a parallel manner, which leads to a significant reduction in the total running time. Computational experiments with large problem instances are presented. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | The AxML program family for maximum likelihood-based phylogenetic tree inference | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.817 | ||||||||
Article ID | CPE817 | ||||||||
Author Name(s) | Alexandros P. Stamatakis1Thomas Ludwig2Harald Meier3 | ||||||||
Author Email(s) | stamatak@cs.tum.edu1 | ||||||||
Affiliation(s) | Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstr. 3, D-85748 Garching b. München, Germany 1Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany 2 3 | ||||||||
Keyword(s) | phylogenetic trees, maximum likelihood, parallel and distributed computing, | ||||||||
Abstract |
|||||||||
Inference of phylogenetic (evolutionary) trees comprising hundreds or thousands of organisms based on the maximum likelihood criterion is a computationally extremely intensive task. This paper describes the evolution of the AxML program family which provides novel algorithmic as well as technical solutions for the maximum likelihood-based inference of huge phylogenetic trees. Algorithmic optimizations and a new tree building algorithm yield runtime improvements of a factor greater than 4 compared with fastDNAml and parallel fastDNAml, returning equally good trees at the same time. Various parallel, distributed, and Grid-based implementations of AxML give the program the capability to acquire the large amount of required computational resources for the inference of huge high-quality trees. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Sequence alignment on the Cray MTA-2 | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.808 | ||||||||
Article ID | CPE808 | ||||||||
Author Name(s) | Shahid H. Bokhari1Jon R Sauer2 | ||||||||
Author Email(s) | shb@acm.org1 | ||||||||
Affiliation(s) | Department of Electrical Engineering, University of Engineering&Technology, Lahore 54890, Pakistan 12Eagle Research&Development, 1898 S. Flatiron Court, Suite 128, Boulder, CO 80301, U.S.A 2 | ||||||||
Keyword(s) | bioinformatics, sequence alignment, DNA, exact match, approximate matching, dynamic programming, multithreading, full/empty bits, parallel processing, MTA-2, | ||||||||
Abstract |
|||||||||
Several variants of standard algorithms for DNA sequence alignment have been implemented on the Cray Multithreaded Architecture-2 (MTA-2). We describe the architecture of the MTA-2 and discuss how its hardware and software enable efficient implementation of parallel algorithms with little or no regard for issues of partitioning, mapping or scheduling. We describe how we ported variants of the naive algorithm for exact alignment and the dynamic programming algorithm for approximate alignment to the MTA-2 and provide detailed performance measurements. It is shown that, for the dynamic programming algorithm, the use of the MTA's Full/Empty synchronization bits leads to almost perfect speedup for large problems on one to eight processors. These results illustrate the versatility of the MTA's architecture and demonstrate its potential for providing a high-productivity platform for parallel processing. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Using hybrid alignment for iterative sequence database searches | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.809 | ||||||||
Article ID | CPE809 | ||||||||
Author Name(s) | Ralf Bundschuh1Yuheng Li2Mario Lauria3 | ||||||||
Author Email(s) | bundschuh@mps.ohio-state.edu1 | ||||||||
Affiliation(s) | Department of Physics, The Ohio State University, 174 West 18th Ave., Columbus, OH 43210, U.S.A 1Department of Computer and Information Science, The Ohio State University, 2015 Neil Ave. #395, Columbus, OH 43210, U.S.A. 2 3 | ||||||||
Keyword(s) | sequence alignment, hybrid algorithm, length correction, PSI-BLAST, MPI, | ||||||||
Abstract |
|||||||||
Progressive sequence model refinement by means of iterative searches is an effective technique for high sensitivity database searches and is currently employed in popular tools such as PSI-BLAST and SAM. Recently, a novel alignment algorithm has been proposed that offers features expected to improve the sensitivity of such iterative approaches, specifically a well-characterized theory of its statistics even in the presence of position-specific gap costs. Here, we demonstrate that the new hybrid alignment algorithm is ready to be used as the alignment core of PSI-BLAST. In addition, we evaluate the accuracy of two proposed approaches to edge effect correction in short sequence alignment statistics that turns out to be one of the crucial issues in developing a hybrid-alignment based version of PSI-BLAST. Copyright ? 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Special Issue: High Performance Computational Biology | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 9 | ||||||||
Date | 08 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.865 | ||||||||
Article ID | CPE865 | ||||||||
Author Name(s) | David A. Bader1Srinivas Aluru2 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 2 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A PC cluster system employing IEEE 1394 | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 10 | ||||||||
Date | 08 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.792 | ||||||||
Article ID | CPE792 | ||||||||
Author Name(s) | Kazuki Hyoudou1Ryota Ozaki2Yasuichi Nakayama3 | ||||||||
Author Email(s) | hyoudo-k@igo.cs.uec.ac.jp1 | ||||||||
Affiliation(s) | Department of Computer Science, The University of Electro-Communications, Chofu, Tokyo 182-8585, Japan 1 2 3 | ||||||||
Keyword(s) | cluster computing, IEEE 1394, communication libraries, performance evaluation, | ||||||||
Abstract |
|||||||||
In this paper, we describe the design and evaluation of a PC cluster system in which IEEE 1394 is applied. Networks for parallel cluster computing require low latency and high bandwidth. It is also important that the networks be commercially available at low cost. Few network devices satisfy all of the above requirements. However, the IEEE 1394 standard provides a good compromise for fulfilling these requirements. We have used IEEE 1394 devices, which support a 400 Mbps data transfer rate, to connect the nodes of a PC cluster system which we have designed and implemented. We have implemented two communication libraries. One is a fast communication library called CF for IEEE 1394. The other is a MPI layer library on the CF library. Experimental results show that CF achieves a 17.2 microsecond round-trip time. On application benchmarks, the system was considerably faster than TCP/IP over Fast Ethernet. Even though the system was constructed at very low cost, it provides good performance. Using the IEEE 1394 standard is thus a good solution for low-cost cluster systems. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Linda implementations in Java for concurrent systems | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 10 | ||||||||
Date | 08 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.794 | ||||||||
Article ID | CPE794 | ||||||||
Author Name(s) | G. C. Wells1A. G. Chalmers2P. G. Clayton3 | ||||||||
Author Email(s) | g.wells@ru.ac.za1 | ||||||||
Affiliation(s) | Department of Computer Science, Rhodes University, Grahamstown 6140, South Africa 1Department of Computer Science, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, U.K. 2 3 | ||||||||
Keyword(s) | Linda, Java, associative matching, | ||||||||
Abstract |
|||||||||
This paper surveys a number of the implementations of Linda that are available in Java. It provides some discussion of their strengths and weaknesses, and presents the results from benchmarking experiments using a network of commodity workstations. Some extensions to the original Linda programming model are also presented and discussed, together with examples of their application to parallel processing problems. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | The J2EE ECperf benchmark results: transient trophies or technology treasures? | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 10 | ||||||||
Date | 08 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.795 | ||||||||
Article ID | CPE795 | ||||||||
Author Name(s) | Paul Brebner1Jeffrey Gosper2 | ||||||||
Author Email(s) | paul.brebner@csiro.au1 | ||||||||
Affiliation(s) | CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, Australia 1 2 | ||||||||
Keyword(s) | J2EE, ECperf, Enterprise Java Beans (EJB), benchmarks, performance, scalability, throughput estimation, | ||||||||
Abstract |
|||||||||
ECperf, the widely recognized industry standard J2EE benchmark, has attracted a large number of results submissions and their subsequent publications. However, ECperf places little restriction on the hardware platforms, operating systems and databases utilized in the benchmarking process. This, combined with the existence of only two primary metrics, makes it difficult to accurately compare the performance of the Application Server products themselves. By mining the full-disclosure archives for trends and correlations, we have discovered that J2EE technology is very scalable both in a scale-up and scale-out manner. Other observed trends include a linear correlation between middle-tier total processing power and throughput, as well as between J2EE Application Server license costs and throughput. However, the results clearly indicate that there is an increasing cost per user with increasing capacity systems and scale-up is proportionately more expensive than scale-out. Finally, the correlation between middle-tier processing power and throughput, combined with results obtained from a different ‘lighter-weight’ benchmark, facilitates an estimate of throughput for different types of J2EE applications. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Performance | ||||||||
Article Title | The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600 | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 10 | ||||||||
Date | 08 25 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.796 | ||||||||
Article ID | CPE796 | ||||||||
Author Name(s) | Glenn R. Luecke1Silvia Spanoyannis2Marina Kraeva3 | ||||||||
Author Email(s) | grl@iastate.edu1 | ||||||||
Affiliation(s) | Iowa State University, Ames, IA 50011-2251, U.S.A. 1 2 3 | ||||||||
Keyword(s) | parallel computers, MPI-2 one-sided routines, SHMEM routines, SGI Origin 2000, Cray T3E-600, MPI-2 library, SHMEM library, | ||||||||
Abstract |
|||||||||
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns with low contention (accessing distant messages, a circular right shift, a binary tree broadcast) to communication patterns with high contention (a ‘naive’ broadcast and an all-to-all). For all the tests and for small message sizes, the SHMEM implementation significantly outperformed the MPI-2 implementation for both the SGI Origin 2000 and Cray T3E-600. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Augmenting LZ-77 with authentication and integrity assurance capabilities | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 11 | ||||||||
Date | 09 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.804 | ||||||||
Article ID | CPE804 | ||||||||
Author Name(s) | Mikhail J. Atallah1Stefano Lonardi2 | ||||||||
Author Email(s) | stelo@cs.ucr.edu2 | ||||||||
Affiliation(s) | CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, U.S.A. 1Department of Computer Science and Engineering, University of California, Riverside, CA 92521, U.S.A. 2 | ||||||||
Keyword(s) | authentication, integrity, data compression, LZ-77, fragile watermark, | ||||||||
Abstract |
|||||||||
The formidable dissemination capability allowed by the current network technology makes it increasingly important to devise new methods to ensure authenticity and integrity. Nowadays it is common practice to distribute documents in compressed form. In this paper, we propose a simple variation on the classic LZ-77 algorithm that allows one to hide, within the compressed document, enough information to warrant its authenticity and integrity. The design is based on the unpredictability of a certain class of pseudo-random number generators, in such a way that the hidden data cannot be retrieved in a reasonable amount of time by an attacker (unless the secret bit-string key is known). Since it can still be decompressed by the original LZ-77 algorithm, the embedding is completely ‘transparent’ and backward-compatible, making it possible to deploy it without disrupting service. Experiments show that the degradation in compression due to the embedding is almost negligible. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Access-controlled resource discovery in pervasive networks | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 11 | ||||||||
Date | 09 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.806 | ||||||||
Article ID | CPE806 | ||||||||
Author Name(s) | Sanjay Raman1Dwaine Clarke2Matt Burnside3Srinivas Devadas4Ronald Rivest5 | ||||||||
Author Email(s) | sraman@abp.lcs.mit.edu1 | ||||||||
Affiliation(s) | Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. 1 2 3 4 5 | ||||||||
Keyword(s) | access control, access control list, pervasive computing, intentional naming network, public key infrastructure, | ||||||||
Abstract |
|||||||||
Networks of the future will be characterized by a variety of computational devices that display a level of dynamism not seen in traditional wired networks. Because of the dynamic nature of these networks, resource discovery is one of the fundamental problems that must be solved. While resource discovery systems are not a novel concept, securing these systems in an efficient and scalable way is challenging. This paper describes the design and implementation of an architecture for access-controlled resource discovery. This system achieves this goal by integrating access control with the Intentional Naming System (INS), a resource discovery and service location system. The integration is scalable, efficient, and fits well within a proxy-based security framework designed for dynamic networks. We provide performance experiments that show how our solution outperforms existing schemes. The result is a system that provides secure, access-controlled resource discovery that can scale to large numbers of resources and users. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Editorial | ||||||||
Article Title | Special Issue: Computer Security | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 11 | ||||||||
Date | 09 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.803 | ||||||||
Article ID | CPE803 | ||||||||
Author Name(s) | Giampaoli Bella1Ronaldo Menezes2 | ||||||||
Author Email(s) | |||||||||
Affiliation(s) | 1 2 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Identification and authentication of integrated circuits | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 11 | ||||||||
Date | 09 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.805 | ||||||||
Article ID | CPE805 | ||||||||
Author Name(s) | Blaise Gassend1Daihyun Lim2Dwaine Clarke3Srinivas Devadas4Marten van Dijk5 | ||||||||
Author Email(s) | blaise@gassend.com1 | ||||||||
Affiliation(s) | 1Research Lab for Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A 1Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A. 2 3 4 Prof Holstlaan 4, Eindhoven, The Netherlands 5 | ||||||||
Keyword(s) | authentication, identification, physical random function, physical security, smartcard, tamper resistance, unclonability, | ||||||||
Abstract |
|||||||||
This paper describes a technique to reliably and securely identify individual integrated circuits (ICs) based on the precise measurement of circuit delays and a simple challenge-response protocol. This technique could be used to produce key-cards that are more difficult to clone than ones involving digital keys on the IC. We consider potential venues of attack against our system, and present candidate implementations. Experiments on Field Programmable Gate Arrays show that the technique is viable, but that our current implementations could require some strengthening before it can be considered as secure. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A role-based infrastructure management system: design and implementation | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 11 | ||||||||
Date | 09 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.807 | ||||||||
Article ID | CPE807 | ||||||||
Author Name(s) | Dongwan Shin1Gail-Joon Ahn2Sangrae Cho3Seunghun Jin4 | ||||||||
Author Email(s) | doshin@uncc.edu1 | ||||||||
Affiliation(s) | Department of Software and Information Systems, University of North Carolina at Charlotte, Charlotte, NC 28223, U.S.A. 1 2 Department of Information Security System, Electronics and Telecommunications Research Institute, Taejon, 305-350, South Korea 3 4 | ||||||||
Keyword(s) | role-based access control, role management, role engineering, role administration, authorization infrastructure, | ||||||||
Abstract |
|||||||||
Over the last decade there has been a tremendous advance in the theory and practice of role-based access control (RBAC). One of the most significant aspects of RBAC can be viewed from its management of permissions on the basis of roles rather than individual users. Consequently, it reduces administrative costs and potential errors. The management of roles in various RBAC implementations, however, tends to be conducted on an ad hoc basis, closely coupled with a certain context of system environments. This paper discusses the development of a system whose purpose is to help manage a valid set of roles with assigned users and permissions for role-based authorization infrastructures. We have designed and implemented the system, called RolePartner. This system enables role administrators to build and configure various components of a RBAC model so as to embody organizational access control policies which can be separated from different enforcement mechanisms. Hence the system helps make it possible to lay a foundation for role-based authorization infrastructures. Three methodological constituents are introduced for our purposes, together with the design and implementation issues. The system has a role-centric view for easily managing constrained and hierarchical roles as well as assigned users and permissions. An LDAP-accessible directory service was used for a role database. We show that the system can be seamlessly integrated with an existing privilege-based authorization infrastructure. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Performance | ||||||||
Article Title | Parallel bandwidth characteristics calculations for thin avalanche photodiodes on a SGI Origin 2000 supercomputer | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 12 | ||||||||
Date | 10 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.834 | ||||||||
Article ID | CPE834 | ||||||||
Author Name(s) | Yi Pan1Constantinos S. Ierotheou2Majeed M. Hayat3 | ||||||||
Author Email(s) | pan@cs.gsu.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, Georgia State University, Atlanta, GA 30303, U.S.A. 1Parallel Processing Research Group, University of Greenwich, London SE10 9LS, U.K. 2Department of Electrical and Computer Engineering, The University of New Mexico, Albuquerque, NM 87131-1356, U.S.A. 3 | ||||||||
Keyword(s) | numerical modeling, optimization, MPI, OpenMP, efficiency, speedup, scalability, SGI Origin 2000, | ||||||||
Abstract |
|||||||||
p i n | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Finding stale-value errors in concurrent programs | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 12 | ||||||||
Date | 10 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.866 | ||||||||
Article ID | CPE866 | ||||||||
Author Name(s) | Michael Burrows1K. Rustan M. Leino2 | ||||||||
Author Email(s) | leino@microsoft.com2 | ||||||||
Affiliation(s) | Google Inc., Bayshore Parkway, Mountain View, CA 94043, U.S.A. 1Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A. 2 | ||||||||
Keyword(s) | static checking, data races, race conditions, concurrency errors, error checking, stale values, | ||||||||
Abstract |
|||||||||
Concurrent programs can suffer from many types of errors, not just the well-studied problems of deadlocks and simple race conditions on variables. This paper addresses a kind of race condition that arises from reading a variable whose value is possibly out of date. The paper introduces a simple technique for detecting such stale values, and reports on the encouraging experience with a compile-time checker that uses the technique. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Efficient parallel implementations of near Delaunay triangulation with High Performance Fortran | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 12 | ||||||||
Date | 10 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.802 | ||||||||
Article ID | CPE802 | ||||||||
Author Name(s) | Min-Bin Chen1Tyng-Ruey Chuang2Jan-Jan Wu3 | ||||||||
Author Email(s) | cmb@mail.ckitc.edu.tw1 | ||||||||
Affiliation(s) | Department of Management Information Systems, Chung Kuo Institute of Technology, Mucha, Taipei 116, Taiwan 1Institute of Information Science, Academia Sinica, Nankang, Taipei 115, Taiwan 2 3 | ||||||||
Keyword(s) | Delaunay triangulation, parallel computation, | ||||||||
Abstract |
|||||||||
Unstructured mesh generation exposes highly irregular computation patterns, which imposes a challenge in implementing triangulation algorithms on parallel machines. This paper reports on an efficient parallel implementation of near Delaunay triangulation with High Performance Fortran (HPF). Our algorithm exploits embarrassing parallelism by performing sub-block triangulation and boundary merge independently at the same time. The sub-block triangulation is a divide & conquer Delaunay algorithm known for its sequential efficiency, and the boundary triangulation is an incremental construction algorithm with low overhead. Compared with prior work, our parallelization method is both simple and efficient. In the paper, we also describe a solution to the collinear points problem that usually arises in large data sets. Our experiences with the HPF implementation show that with careful control of the data distribution, we are able to parallelize the program using HPF's standard directives and extrinsic procedures. Experimental results on several parallel platforms, including an IBM SP2 and a DEC Alpha farm, show that a parallel efficiency of 42-86% can be achieved for an eight-node distributed memory system. We also compare efficiency of the HPF implementation with that of a similarly hand-coded MPI implementation. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | An order-based algorithm for implementing multiparty synchronization | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 12 | ||||||||
Date | 10 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.903 | ||||||||
Article ID | CPE903 | ||||||||
Author Name(s) | Jos? A. P?rez1Rafael Corchuelo2Miguel Toro3 | ||||||||
Author Email(s) | jperez@lsi.us.es1 | ||||||||
Affiliation(s) | The Distributed Group, University of Seville, Spain 1 2 3 | ||||||||
Keyword(s) | multiparty synchronization, coordination, concurrency control, | ||||||||
Abstract |
|||||||||
Multiparty interactions are a powerful mechanism for coordinating several entities that need to cooperate in order to achieve a common goal. In this paper, we present an algorithm for implementing them that improves on previous results in that it does not require the whole set of entities or interactions to be known at compile- or run-time, and it can deal with both terminating and non-terminating systems. We also present a comprehensive simulation analysis that shows how sensitive to changes our algorithm is, and compare the results with well-known proposals by other authors. This study proves that our algorithm still performs comparably to other proposals in which the set of entities and interactions is known beforehand, but outperforms them in some situations that are clearly identified. In addition, these results prove that our algorithm can be combined with a technique called synchrony loosening without having an effect on efficiency. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Supporting Bulk Synchronous Parallelism with a high-bandwidth optical interconnect | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.754 | ||||||||
Article ID | CPE754 | ||||||||
Author Name(s) | I. Gourlay1P. M. Dew2K. Djemame3J. F. Snowdon4G. Russell5 | ||||||||
Author Email(s) | iain@comp.leeds.ac.uk1 | ||||||||
Affiliation(s) | Informatics Research Institute, University of Leeds, Leeds LS2 9JT, U.K. 1 2 3 Physics Department, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, U.K. 4 5 | ||||||||
Keyword(s) | parallel, BSP, optical interconnect, sorting, | ||||||||
Abstract |
|||||||||
The list of applications requiring high-performance computing resources is constantly growing. The cost of inter-processor communication is critical in determining the performance of massively parallel computing systems for many of these applications. This paper considers the feasibility of a commodity processor-based system which uses a free-space optical interconnect. A novel architecture, based on this technology, is presented. Analytical and simulation results based on an implementation of BSP (Bulk Synchronous Parallelism) are presented, indicating that a significant performance enhancement, over architectures using conventional interconnect technology, is possible. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Towards a more realistic comparative analysis of multicomputer networks | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.755 | ||||||||
Article ID | CPE755 | ||||||||
Author Name(s) | H. Sarbazi-Azad1M. Ould-Khaoua2L. M. Mackenzie3 | ||||||||
Author Email(s) | azad@sharif.edu1 | ||||||||
Affiliation(s) | Department of Computer Engineering, Sharif University of Technology, and School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran 1Department of Computing Science, University of Glasgow, Glasgow, U.K. 2 3 | ||||||||
Keyword(s) | interconnection networks, k n, torus, hypercube, wormhole routing, fully-adaptive routing, hotspot, performance comparison, | ||||||||
Abstract |
|||||||||
k n | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | RF-MVTC: an efficient risk-free multiversion concurrency control algorithm | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.756 | ||||||||
Article ID | CPE756 | ||||||||
Author Name(s) | Azzedine Boukerche1Terry Tuck2 | ||||||||
Author Email(s) | boukerche@site.uottawa.ca1 | ||||||||
Affiliation(s) | SITE, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5 1Department of Computer Science, University of North of Texas, Denton, TX 76203, U.S.A. 2 | ||||||||
Keyword(s) | concurrency, risk-free multiversion, databases, optimistic distributed simulations, rollbacks, cluster of workstations, multiversion timestamp ordering, | ||||||||
Abstract |
|||||||||
In this paper, we focus on the temporary return of data values that are incorrect for given transactional semantics that could have catastrophic effects similar to those in parallel and discrete event simulation systems. In many applications using online transactions processing environments, for instance, it is best to delay the response to a transaction's read request until it is either known or unlikely that a write message from an older update transaction will not make the response incorrect. Examples of such applications are those where aberrant behavior is too costly, and those in which precommitted data is visible to some reactive entity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporally correct concurrency-control algorithm. We discuss the algorithm, its implementation and report on the performance results of simulation models using a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | On the scalability of many-to-many reliable multicast sessions | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.757 | ||||||||
Article ID | CPE759 | ||||||||
Author Name(s) | W. Y. Yoon1D. Lee2H. Y. Youn3 | ||||||||
Author Email(s) | wyyoon@sait.samsung.co.kr1 | ||||||||
Affiliation(s) | Samsung Advanced Institute of Technology, South Korea 1School of Engineering, Information and Communications University, South Korea 2School of Information and Communication Engineering, Sungkyunkwan University, South Korea 3 | ||||||||
Keyword(s) | reliable multicast, many-to-many session, exposure, implosion, | ||||||||
Abstract |
|||||||||
Even though tree-based reliable multicast protocols are known to be most scalable for one-to-many sessions, there is still an open question as to whether these protocols are also scalable for many-to-many sessions. In this paper, we analyze and compare two promising multicast protocols - the receiver-initiated protocol with NACK suppression and the tree-based protocol - using a new spatial loss model. The proposed model considers the correlation of packet loss events for more realistic analysis unlike the previous work. The analysis results show that the tree-based protocol achieves much higher throughput than the receiver-initiated protocol for a many-to-many session as the number of participants in the session becomes larger. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Performance evaluation of a random-walk-based algorithm for embedding dynamically evolving trees in hypercubic networks | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.758 | ||||||||
Article ID | CPE758 | ||||||||
Author Name(s) | Keqin Li1 | ||||||||
Author Email(s) | li@mcs.newpaltz.edu1 | ||||||||
Affiliation(s) | Department of Computer Science, State University of New York, New Paltz, NY 12561-2499, U.S.A. 1 | ||||||||
Keyword(s) | asymmetric network, cube-connected cycles, de Bruijn graph, hypercube, hypercubic network, performance evaluation, random tree, randomized tree embedding, recurrence relation, shuffle-exchange network, symmetric network, wrapped butterfly, | ||||||||
Abstract |
|||||||||
In many tree-structured parallel computations, the size and structure of a tree that represents a parallel computation is unpredictable at compile-time; the tree evolves gradually during the course of the computation. When such a computation is performed on a static network, the dynamic tree embedding problem is to distribute the tree nodes to the processors of the network such that all the processors receive roughly the same amount of load and that communicating nodes are assigned to neighboring processors. Furthermore, when a new tree node is generated, it should be immediately assigned to a processor for execution without any information on the further evolving of the tree; and load distribution is performed by all processors in a totally distributed fashion. We study the problem of embedding dynamically evolving trees in hypercubic networks, including shuffle-exchange, de Bruijn, cube-connected cycles, wrapped butterfly, and hypercube networks. The performance of a random-walk-based randomized tree embedding algorithm is evaluated. Several random tree models are considered. We develop recurrence relations for analyzing the performance of embedding of complete-tree-based random trees and randomized complete trees, and linear systems of equations for reproduction trees. We present more efficient recurrence relations and linear systems of equations for symmetric networks. We also demonstrate extensive numerical data of the performance ratio and make a number of interesting observations of randomized tree embedding in the five hypercubic networks. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A performance study of job management systems | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.753 | ||||||||
Article ID | CPE753 | ||||||||
Author Name(s) | Tarek El-Ghazawi1Kris Gaj2Nikitas Alexandridis3Frederic Vroman4Nguyen Nguyen5Jacek R. Radzikowsk6Preeyapong Samipagdi7Suboh A. Suboh8 | ||||||||
Author Email(s) | tarek@seas.gwu.edu1 | ||||||||
Affiliation(s) | ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A 1ECE Department, George Mason University, 4400 University Drive, Fairfax, VA 22030, U.S.A. 2ECE Department, The George Washington University, 801 22nd Street NW, Washington, DC 20052, U.S.A. 3 4 5 6 7 8 | ||||||||
Keyword(s) | Job Management Systems, queueing systems, performance evaluation, benchmarking, Grid computing, | ||||||||
Abstract |
|||||||||
Job Management Systems (JMSs) efficiently schedule and monitor jobs in parallel and distributed computing environments. Therefore, they are critical for improving the utilization of expensive resources in high-performance computing systems and centers, and an important component of Grid software infrastructure. With many JMSs available commercially and in the public domain, it is difficult to choose an optimum JMS for a given computing environment. In this paper, we present the results of the first empirical study of JMSs reported in the literature. Four commonly used systems, LSF, PBS Pro, Sun Grid Engine/CODINE, and Condor were considered. The study has revealed important strengths and weaknesses of these JMSs under different operational conditions. For example, LSF was shown to exhibit excellent throughput for a wide range of job types and submission rates. Alternatively, CODINE appeared to outperform other systems in terms of the average turn-around time for small jobs, and PBS appeared to excel in terms of turn-around time for relatively larger jobs. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | On the scalability of many-to-many reliable multicast sessions | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 0 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.759 | ||||||||
Article ID | CPE759 | ||||||||
Author Name(s) | W. Y. Yoon1D. Lee2H. Y. Youn3 | ||||||||
Author Email(s) | wyyoon@sait.samsung.co.kr1 | ||||||||
Affiliation(s) | Samsung Advanced Institute of Technology, South Korea 1School of Engineering, Information and Communications University, South Korea 2School of Information and Communication Engineering, Sungkyunkwan University, South Korea 3 | ||||||||
Keyword(s) | reliable multicast, many-to-many session, exposure, implosion, | ||||||||
Abstract |
|||||||||
Even though tree-based reliable multicast protocols are known to be most scalable for one-to-many sessions, there is still an open question as to whether these protocols are also scalable for many-to-many sessions. In this paper, we analyze and compare two promising multicast protocols - the receiver-initiated protocol with NACK suppression and the tree-based protocol - using a new spatial loss model. The proposed model considers the correlation of packet loss events for more realistic analysis unlike the previous work. The analysis results show that the tree-based protocol achieves much higher throughput than the receiver-initiated protocol for a many-to-many session as the number of participants in the session becomes larger. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Systems Performance Evaluation | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.761 | ||||||||
Article ID | CPE761 | ||||||||
Author Name(s) | . .1 | ||||||||
Author Email(s) | .1 | ||||||||
Affiliation(s) | 1 | ||||||||
Keyword(s) | ., | ||||||||
Abstract |
|||||||||
No abstract | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | One-to-all broadcasting scheme for arbitrary static interconnection networks with bursty background traffic | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 13 | ||||||||
Date | 11 00 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.760 | ||||||||
Article ID | CPE760 | ||||||||
Author Name(s) | D. D. Kouvatsos1M. Mkwawa2I. Awan3 | ||||||||
Author Email(s) | d.d.kouvatsos@scm.brad.ac.uk1 | ||||||||
Affiliation(s) | Performance Modelling and Engineering Research Group, Department of Computing, University of Bradford, Bradford BD7 3BD, U.K. 1 2 3 | ||||||||
Keyword(s) | broadcasting, graph decomposition, interconnection networks, bursty traffic, queueing network models, maximum entropy principle, | ||||||||
Abstract |
|||||||||
A novel one-to-all broadcasting scheme is proposed for arbitrary static interconnection networks with bursty background traffic having head-of-line priorities, complete buffer sharing and repetitive service blocking with random destination. The scheme facilitates inter-process communication amongst parallel computers and grid computing networks as well as asynchronous transfer mode space division architectures and it is applicable to a single-port mode of message passing communication. The scheme utilizes a queue-by-queue decomposition algorithm for arbitrary open queueing network models, based on the principle of maximum entropy, in conjunction with an information theoretic decomposition criterion and graph theoretic concepts. Evidence based on empirical studies indicates the suitability of the scheme for achieving an optimal one-to-all broadcasting cost. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | A cache-efficient implementation of the lattice Boltzmann method for the two-dimensional diffusion equation | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 14 | ||||||||
Date | 12 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.868 | ||||||||
Article ID | CPE868 | ||||||||
Author Name(s) | A. C. Velivelli1K. M. Bryden2 | ||||||||
Author Email(s) | kmbryden@iastate.edu2 | ||||||||
Affiliation(s) | Department of Mechanical Engineering, Iowa State University, Ames, IA 50011, U.S.A. 1 2 | ||||||||
Keyword(s) | cache optimization, lattice Boltzmann method, two-dimensional diffusion equation, | ||||||||
Abstract |
|||||||||
$T_t=mu(T_{xx}T_{yy})$ | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Simulation of resource synchronization in a dynamic real-time distributed computing environment | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 14 | ||||||||
Date | 12 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.869 | ||||||||
Article ID | CPE869 | ||||||||
Author Name(s) | Chen Zhang1David Cordes2 | ||||||||
Author Email(s) | czhang@bryant.edu1 | ||||||||
Affiliation(s) | Department of Computer Information Systems, Bryant College, Smithfield, RI 02917-1284, U.S.A. 1Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A. 2 | ||||||||
Keyword(s) | synchronization, scheduling, distributed systems, real-time systems, embedded systems, | ||||||||
Abstract |
|||||||||
Today, more and more distributed computer applications are being modeled and constructed using real-time principles and concepts. In 1989, the Object Management Group (OMG) formed a Real-Time Special Interest Group (RT SIG) with the goal of extending the Common Object Request Broker Architecture (CORBA) standard to include real-time specifications. This group's most recent efforts have focused on the requirements of dynamic distributed real-time systems. One open problem in this area is resource access synchronization for tasks employing dynamic priority scheduling. This paper presents two resource synchronization protocols that the authors have developed which meet the requirements of dynamic distributed real-time systems as specified by Dynamic Scheduling Real-Time CORBA (DSRT CORBA). The proposed protocols can be applied to both Earliest Deadline First (EDF) and Least Laxity First (LLF) dynamic scheduling algorithms, allow distributed nested critical sections, and avoid unnecessary runtime overhead. In order to evaluate the performance of the proposed protocols, we analyzed each protocol's schedulability. Since the schedulability of the system is affected by numerous system configuration parameters, we have designed simulation experiments to isolate and illustrate the impact of each individual system parameter. Simulation experiments show the proposed protocols have better performance than one would realize by applying a schema that utilizes dynamic priority ceiling update. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley & Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Semantic Link Network Builder and Intelligent Semantic Browser | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 14 | ||||||||
Date | 12 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.870 | ||||||||
Article ID | CPE870 | ||||||||
Author Name(s) | H. Zhuge1R. Jia2J. Liu3 | ||||||||
Author Email(s) | zhuge@ict.ac.cn1 | ||||||||
Affiliation(s) | China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Graduate School, Chinese Academy of Sciences, Beijing, People's Republic of China 1 2 3 | ||||||||
Keyword(s) | algorithm, browser, matching, reasoning, semantic link, Semantic Web, | ||||||||
Abstract |
|||||||||
Semantic Link Network (SLN) is a semantic Web model using semantic links-the natural extension of the hyperlink-based Web. The SLN-Builder is a software tool that enables definition, modification and verification of, as well as access to the SLN. The SLN-Builder can convert a SLN definition into XML descriptions for cross-platform information exchange. The Intelligent Semantic Browser is used to visualize the SLN and carry out two types of reasoning in browsing time: small granularity reasoning by chaining semantic links and large granularity reasoning by matching semantic views of SLN. With the help of the reasoning mechanism, the browser can recommend the content that is semantically relevant to the current browsing content, and it enables users to foresee the end-side content of a semantic link chain. This paper presents the design and implementation of the SLN-Builder and the intelligent semantic browser as well as key algorithms. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||
Publisher | John Wiley Sons, Ltd. Chichester, UK | ||||||||
Category | Research Article | ||||||||
Article Title | Resource Space Grid: model, method and platform | ||||||||
Volume ID | 16 | ||||||||
Issue ID | 14 | ||||||||
Date | 12 10 2004 | ||||||||
DOI(URI) | http://dx.doi.org/10.1002/cpe.867 | ||||||||
Article ID | CPE867 | ||||||||
Author Name(s) | Hai Zhuge1 | ||||||||
Author Email(s) | zhuge@ict.ac.cn1 | ||||||||
Affiliation(s) | China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, People's Republic of China 1 | ||||||||
Keyword(s) | resource sharing, query language, Semantic Grid, Semantic Web, Web, | ||||||||
Abstract |
|||||||||
A Resource Space Grid is a virtual Grid that aims at effectively sharing, using and managing versatile resources across the Internet. The kernel of the Resource Space Grid includes a Resource Space Model (RSM) and a uniform Resource Using Mechanism (RUM). This paper presents the Resource Space Grid's core scientific issues and methodology, architecture, model and theory, design criteria and method, and practice. A normal form theory is proposed to normalize the resource space - a coordinate system for uniformly specifying and organizing resources. The RUM provides not only the end-users with an operable resource browser to operate resources using the built-in Resource Operation Language (ROL), but also the application developers with the ROL-based programming environment. The prototype platform based on the proposed model and method has been implemented and used for sharing and managing resources in distributed research teams. Operations on Resource Spaces can constitute the virtual communities of Resource Space Grids - a platform independent resource sharing environment. Copyright © 2004 John Wiley & Sons, Ltd. | |||||||||