Concurrency and Computation: Practice and
Experience
Published Papers for 2004
Journal
Vision
WILEY
Journal
Home Page
Papers under review through
2003
Editorial Board
Concurrency and Computation: Practice and Experience Volume 16 2004
CPE745:A comparison of task pools for dynamic load balancing of irregular algorithms
- Author(s):Matthias Korch,Thomas Rauber
- Faculty of Mathematics, Physics and Computer Science, University of Bayreuth, Bayreuth, Germany,
- http://dx.doi.org/10.1002/cpe.745
- light wiley document
- wiley 2004 full documents
- Abstract:
Since a static work distribution does not allow for satisfactory speed-ups of parallel irregular algorithms, there is a need for a dynamic distribution of work and data that can be adapted to the runtime behavior of the algorithm. Task pools are data structures which can distribute tasks dynamically to different processors where each task specifies computations to be performed and provides the data for these computations. This paper discusses the characteristics of task-based algorithms and describes the implementation of selected types of task pools for shared-memory multiprocessors. Several task pools have been implemented in C with POSIX threads and in Java. The task pools differ in the data structures to store the tasks, the mechanism to achieve load balance, and the memory manager used to store the tasks. Runtime experiments have been performed on three different shared-memory systems using a synthetic algorithm, the hierarchical radiosity method, and a volume rendering algorithm. Copyright © 2004 John Wiley & Sons, Ltd.
CPE746:Security Maintenance Mediation: a technology for preventing unintended security breaches
- Author(s):Roger (Buzz) King
- Department of Computer Science, University of Colorado, Boulder, CO 80309, U.S.A.
http://dx.doi.org/10.1002/cpe.746
light wiley document
wiley 2004 full documents
Abstract:
Web-resident information is becoming ‘smarter’, in the sense that emerging technology will support the annotation of it with ontological terms, which will be used to locate and reuse information. This will pose a great security risk in the form of unintended breaches (as distinct from deliberate invasions). Web-resident information will be far more readily available and relevant, thus causing inadvertent releases of secure information to potentially cause it to be diffusely spread across the Internet. Then as this information is iteratively transformed and integrated with other information, it will become irretrievable and potentially used in a myriad of unpredictable ways. The problem is that ontological annotations, while making information more understandable in its original form, do not provide a means for easily capturing the complex semantics of information that has been transformed via abstraction, aggregation, and integration. This demands the development of a semantically rich way of specifying ‘views’ of Web information, to which security controls can be attached. Also needed is a way for users of secure information to easily and voluntarily blend-and thereby propagate-security controls as information is transformed. Information mediators designed by collaborative teams of experts are proposed as the vehicle for wrapping information, so that at each step of reuse, high-level views and their corresponding integrity controls can be made easily accessible to trusted users who will then be able to ensure their proper maintenance. Copyright © 2004 John Wiley & Sons, Ltd.
CPE747:Parallel operation of CartaBlanca on shared and distributed memory computers
- Author(s):N. T. Padial-Collins,W. B. VanderHeyden,D. Z. Zhang,E. D. Dendy,D. Livescu
- Los Alamos National Laboratory Theoretical Division and Los Alamos Computer Science Institute, Los Alamos, NM 87545, U.S.A.,
- http://dx.doi.org/10.1002/cpe.747
- light wiley document
- wiley 2004 full documents
- Abstract:
We describe the parallel performance of the pure Java CartaBlanca code on heat transfer and multiphase fluid flow problems. CartaBlanca is designed for parallel computations on partitioned unstructured meshes. It uses Java"s thread facility to manage computations on each of the mesh partitions. Inter-partition communications are handled by two compact objects for node-by-node communication along partition boundaries and for global reduction calculations across the entire mesh. For distributed calculations, the JavaParty package from the University of Karlsruhe is demonstrated to work with CartaBlanca. Copyright © 2004 John Wiley & Sons, Ltd.
CPE749:Performance and scalability of MPI on PC clusters
- Author(s):Glenn R. Luecke,Marina Kraeva,Jing Yuan,Silvia Spanoyannis
- 291 Durham Center, Iowa State University, Ames, IA 50011, U.S.A.,
- http://dx.doi.org/10.1002/cpe.749
- light wiley document
- wiley 2004 full documents
- Abstract:
The purpose of this paper is to compare the communication performance and scalability of MPI communication routines on a Windows Cluster, a Linux Cluster, a Cray T3E-600, and an SGI Origin 2000. All tests in this paper were run using various numbers of processors and two message sizes. In spite of the fact that the Cray T3E-600 is about 7 years old, it performed best of all machines for most of the tests. The Linux Cluster with the Myrinet interconnect and Myricom"s MPI performed and scaled quite well and, in most cases, performed better than the Origin 2000, and in some cases better than the T3E. The Windows Cluster using the Giganet Full Interconnect and MPI/Pro"s MPI performed and scaled poorly for small messages compared with all of the other machines. Copyright © 2004 John Wiley & Sons, Ltd.
CPE770:Group-SPMD programming with orthogonal processor groups
- Author(s):Thomas Rauber,Robert Reilein,Gudula Rünger
- Department of Mathematics and Physics, University of Bayreuth, 95440 Bayreuth, Germany,Department of Computer Science, Chemnitz University of Technology, 09111 Chemnitz, Germany,
- http://dx.doi.org/10.1002/cpe.770
- light wiley document
- wiley 2004 full documents
- Abstract:
Many programs for message-passing machines can benefit from an implementation in a group-SPMD programming model due to the potential to reduce communication overhead and to increase scalability. In this paper, we consider group-SPMD programs exploiting different orthogonal processor partitions in one program. For each program this is a fixed set of predefined processor partitions given by the parallel hyperplanes of a two- or multi-dimensional virtual processor organization. We introduce a library built on top of MPI to support the programming with those orthogonal processor groups. The parallel programming model is appropriate for applications with a multi-dimensional task grid and task dependencies mainly aligned in the dimensions of the task grid. The library can be used to specify the appropriate processor partitions, which are then created by the library, and to define the mapping of tasks to the processor hyperplanes. Examples from numerical analysis illustrate the programming style and show that the runtime on distributed memory machines can be considerably reduced by using the library. Copyright © 2004 John Wiley & Sons, Ltd.
CPE768:Managing distributed shared arrays in a bulk-synchronous parallel programming environment
- Author(s):Christoph W. Kessler
- Programming Environments Laboratory (PELAB), Department of Computer Science, Linkμping University, S-581 83 Linkμping, Sweden
http://dx.doi.org/10.1002/cpe.768
light wiley document
wiley 2004 full documents
Abstract:
NestStep is a parallel programming language for the BSP (bulk-hronous parallel) programming model. In this article we describe the concept of distributed shared arrays in NestStep and its implementation on top of MPI. In particular, we present a novel method for runtime scheduling of irregular, direct remote accesses to sections of distributed shared arrays. Our method, which is fully parallelized, uses conventional two-sided message passing and thus avoids the overhead of a standard implementation of direct remote memory access based on one-sided communication. The main prerequisite is that the given program is structured in a BSP-compliant way. Copyright © 2004 John Wiley & Sons, Ltd.
CPE767:Compiling data-parallel programs for clusters of SMPs
- Author(s):Siegfried Benkner,Thomas Brandes
- Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria,SCAI-Fraunhofer Institute for Algorithms and Scientific Computing, Schloß Birlinghoven, D-53754 St. Augustin, Germany
http://dx.doi.org/10.1002/cpe.767
light wiley document
wiley 2004 full documents
Abstract:
Clusters of shared-memory multiprocessors (SMPs) have become the most promising parallel computing platforms for scientific computing. However, SMP clusters significantly increase the complexity of user application development when using the low-level application programming interfaces MPI and OpenMP, forcing users to deal with both distributed-memory and shared-memory parallelization details. In this paper we present extensions of High Performance Fortran (HPF) for SMP clusters which enable the compiler to adopt a hybrid parallelization strategy, efficiently combining distributed-memory with shared-memory parallelism. By means of a small set of new language features, the hierarchical structure of SMP clusters may be specified. This information is utilized by the compiler to derive inter-node data mappings for controlling distributed-memory parallelization across the nodes of a cluster and intra-node data mappings for extracting shared-memory parallelism within nodes. Additional mechanisms are proposed for specifying inter- and intra-node data mappings explicitly, for controlling specific shared-memory parallelization issues and for integrating OpenMP routines in HPF applications. The proposed features have been realized within the ADAPTOR and VFC compilers. The parallelization strategy for clusters of SMPs adopted by these compilers is discussed as well as a hybrid-parallel execution model based on a combination of MPI and OpenMP. Experimental results indicate the effectiveness of the proposed features. Copyright © 2004 John Wiley & Sons, Ltd.
CPE771:A compiler for multiple memory models
- Author(s):S. P. Midkiff,J. Lee,D. A. Padua
- School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47905-2305, U.S.A.,School of Computer Science and Engineering, Seoul National University, Seoul, Korea,Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL 61801, U.S.A.
http://dx.doi.org/10.1002/cpe.771
light wiley document
wiley 2004 full documents
Abstract:
The design of consistency models for both hardware and software is a difficult task. For a programming language, it is particularly difficult because the target audience for a high-level programming language is much wider than the target audience for a machine language, making usability a more important criterion. Exacerbating this problem is the reality that the programming language community has little experience designing programming language consistency models, and therefore each new attempt is very much a voyage into uncharted territory. A concrete example of the difficulties of the task is the current Java Memory Model. Although designed to be easy to use by Java programmers, it is poorly understood and at least one common idiom (the ‘double check idiom’) to exploit the model is unsafe. In this paper, we describe the design of an optimizing Java compiler that will accept either as input or as an interface implementation a consistency model for the code to be compiled. The compiler will use Shasha and Snir's delay set analysis, and our CSSA program representation to provide a canonical representation for the effects of different consistency models on optimizations and analysis. The compiler will serve as a testbed to prototype new memory models, and to measure the effects of different memory models on program performance. Copyright © 2004 John Wiley & Sons, Ltd.
CPE772:Space-time mapping and tiling: a helpful combination
- Author(s):Martin Griebl,Peter Faber,Christian Lengauer
- Department of Mathematics and Informatics, University of Passau, D-94030 Passau, Germany,
- http://dx.doi.org/10.1002/cpe.772
- light wiley document
- wiley 2004 full documents
- Abstract:Tiling is a well-known technique for sequential compiler optimization, as well as for automatic program parallelization. However, in the context of parallelization, tiling should
not be considered as a stand-alone technique,but should be applied after a dedicated parallelization phase, in our case after space-time mapping.We show how tiling can benefit from space-time mapping,
and we derive an algorithm for computing tiles which can minimize the number of communication startups, taking the number of physically available processors into account. We also present how the use of a simple cost model reduces real execution time. Copyright © 2004 John Wiley & Sons, Ltd.
CPE773:The effect of cache models on iterative compilation for combined tiling and unrolling
- Author(s):P. M. W. Knijnenburg,T. Kisuki,K. Gallivan,M. F. P. O'Boyle
- LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands,Department of Computer Science, Florida State University, U.S.A.,Institute for Computing Systems Architecture, Edinburgh University, U.K.
http://dx.doi.org/10.1002/cpe.773
light wiley document
wiley 2004 full documents
Abstract:
Iterative compilation, where we search for the best program transformation based on profile feedback information, has been highly successful in determining good program optimizations, typically outperforming all static approaches. However, this benefit has come at a price, namely, a large number of executions of the target program which in many cases is unacceptable. This paper is concerned with reducing the number of program executions needed by iterative compilation by incorporating static models. We examine how static models may be incorporated into a purely iterative approach by developing several characterized heuristics. We empirically explore the tradeoff between reducing the number of executions required and the ultimate performance achieved for differing parameters or slack factors. We focus on tiling and unrolling as transformations and cache models as a test case for the feasibility of this approach. We show that a highly accurate model, used as a filter to profiling and appropriately parameterized, can reduce the number of executions by 50%. We also show that using a simple model to rank transformations and profiling, only those with the highest ranking can reduce the number of executions even further up to 70%, in cases where there is only a limited number of profiles available. We conclude that a production compiler might perform best using the last approach, gaining the benefit of iterative compilation at a much reduced cost. Copyright © 2004 John Wiley & Sons, Ltd.
CPE774:A fast and accurate method for determining a lower bound on execution time
- Author(s):G. Fursin,M. F. P. O'Boyle,O. Temam,G. Watts
- ICSA, School of Informatics, University of Edinburgh, Edinburgh EH9 3JZ, U.K.,Laboratoire de Recherche en Informatique, Bâtiment 490, Université Paris Sud, 91405 Orsay Cedex, France,
- http://dx.doi.org/10.1002/cpe.774
- light wiley document
- wiley 2004 full documents
- Abstract:
In performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler-based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time-consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. Copyright © 2004 John Wiley & Sons, Ltd.
CPE775:Parallel object-oriented framework optimization
- Author(s):Daniel J. Quinlan,Markus Schordan,Brian Miller,Markus Kowarschik
- Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A.,System Simulation Group, Department of Computer Science, University of Erlangen-Nuremberg, Germany
http://dx.doi.org/10.1002/cpe.775
light wiley document
wiley 2004 full documents
Abstract:
Sophisticated parallel languages are difficult to develop; most parallel distributed memory scientific applications are developed using a serial language, expressing parallelism through third party libraries (e.g. MPI). As a result, frameworks and libraries are often used to encapsulate significant complexities. We define a novel approach to optimize the use of libraries within applications. The resulting tool, named ROSE, leverages the additional semantics provided by library-defined abstractions enabling library specific optimization of application codes. It is a common perception that performance is inversely proportional to the level of abstraction. Our work shows that this is not the case if the additional semantics can be leveraged. We show how ROSE can be used to leverage the semantics within the compile-time optimization. Copyright © 2004 John Wiley & Sons, Ltd.
CPE766:Special Issue: Compilers for Parallel Computers
(CPC 2001)
http://dx.doi.org/10.1002/cpe.766
light wiley document
wiley 2004 full documents
Abstract:No abstract
CPE769:Data partitioning‐based parallel
irregular reductions
- Author(s):Eladio Guti?rrez,Oscar Plata,Emilio L. Zapata
- Department of Computer
Architecture, University of Malaga,
E-29071 Malaga, Spain,
- http://dx.doi.org/10.1002/cpe.769
- light wiley document
- wiley 2004 full documents
- Abstract:Different parallelization methods for
irregular reductions on shared memory multiprocessors have
been proposed in the literature in recent years. We have
classified all these methods and analyzed them in terms of a
set of properties: data locality, memory overhead, exploited
parallelism, and workload balancing. In this paper we
propose several techniques to increase the amount of
exploited parallelism and to introduce load balancing into
an important class of these methods. Regarding parallelism,
the proposed solution is based on the partial expansion of
the reduction array. Load balancing is discussed in terms of
two techniques. The first technique is a generic one, as it
deals with any kind of load imbalance present in the problem
domain. The second technique handles a special case of load
imbalance which occurs whenever a large number of write
operations are concentrated on small regions of the
reduction arrays. Efficient implementations of the proposed
optimizing solutions for a particular method are presented,
experimentally tested on static and dynamic kernel codes,
and compared with other parallel reduction methods.
Copyright @ 2004 John Wiley & Sons, Ltd.
CPE776:<SC>SWARP</SC>: a
retargetable preprocessor for multimedia instructions
- Author(s):Gilles Pokam,St?phane Bihan,Julien Simonnet,Fran?ois Bodin
- IRISA, Campus
Universitaire de Beaulieu, 35042 Rennes Cedex, France,,
- http://dx.doi.org/10.1002/cpe.776
- light wiley document
- wiley 2004 full documents
- Abstract:In this paper, we propose
SWARP, a retargetable
preprocessor for exploiting multimedia instructions. The
system mixes loop distribution, unrolling and pattern
matching to exploit complex multimedia instructions.
Contrary to all available systems, it can be extended at the
user level. Using with a TriMedia processor we show that our
system achieves important good code quality with a set of
frequently used loop kernels for multimedia applications.
Copyright @ 2004 John Wiley & Sons, Ltd.
CPE751:A comparison of concurrent programming and cooperative multithreading under load balancing applications
- Author(s):Justin T. Maris,Aaron W. Keen,Takashi Ishihara,Ronald A. Olsson
- Department of Computer Science, University of California, Davis, CA 95616, U.S.A.,
- http://dx.doi.org/10.1002/cpe.751
- light wiley document
- wiley 2004 full documents
- Abstract:
Two models of thread execution are the general concurrent programming execution model (CP) and the cooperative multithreading execution model (CM). CP provides nondeterministic thread execution where context switches occur arbitrarily. CM provides threads that execute one at a time until they explicitly choose to yield the processor. This paper focuses on a classic application to reveal the advantages and disadvantages of load balancing during thread execution under CP and CM styles; results from a second classic application were similar. These applications are programmed in two different languages (SR and Dynamic C) on different hardware (standard PCs and embedded system controllers). An SR-like run-time system, DesCaRTeS, was developed to provide interprocess communication for the Dynamic C implementations. This paper compares load balancing and non-load balancing implementations; it also compares CP and CM style implementations. The results show that in cases of very high or very low workloads, load balancing slightly hindered performance; and in cases of moderate workload, both SR and Dynamic C implementations of load balancing generally performed well. Further, for these applications, CM style programs outperform CP style programs in some cases, but the opposite occurs in some other cases. This paper also discusses qualitative tradeoffs between CM style programming and CP style programming for these applications. Copyright © 2004 John Wiley & Sons, Ltd.
CPE752:OpenMP-oriented applications for distributed shared memory architectures
- Author(s):Ami Marowka,Zhenying Liu,Barbara Chapman
- Department of Computer Science, The University of Houston, Houston, TX 77204-3010, U.S.A.,
- http://dx.doi.org/10.1002/cpe.752
- light wiley document
- wiley 2004 full documents
- Abstract:
The rapid rise of OpenMP as the preferred parallel programming paradigm for small-to-medium scale parallelism could slow unless OpenMP can show capabilities for becoming the model-of-choice for large scale high-performance parallel computing in the coming decade.
The main stumbling block for the adaptation of OpenMP to distributed shared memory (DSM) machines, which are based on architectures like cc-NUMA, stems from the lack of capabilities for data placement among processors and threads for achieving data locality. The absence of such a mechanism causes remote memory accesses and inefficient cache memory use, both of which lead to poor performance.
This paper presents a simple software programming approach called copy-inside-copy-back (CC) that exploits the data privatization mechanism of OpenMP for data placement and replacement. This technique enables one to distribute data manually without taking away control and flexibility from the programmer and is thus an alternative to the automat and implicit approaches. Moreover, the CC approach improves on the OpenMP-SPMD style of programming that makes the development process of an OpenMP application more structured and simpler.
The CC technique was tested and analyzed using the NAS Parallel Benchmarks on SGI Origin 2000 multiprocessor machines. This study shows that OpenMP improves performance of coarse-grained parallelism, although a fast copy mechanism is essential. Copyright © 2004 John Wiley & Sons, Ltd.
CPE763:Proceedings of the 10th International Symposium on High Performance Distributed Computing (HPDC-10)
- Author(s):Anand Natrajan,Michael Crowley,Nancy Wilkins-Diehr,Marty A. Humphrey,Anthony D. Fox,Andrew S. Grimshaw,Charles L. Brooks
- Department of Computer Science, University of Virginia, 151 Engineer's Way, Charlottesville, VA 22904, U.S.A.,Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, U.S.A.,San Diego Supercomputing Center, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093, U.S.A.,
- http://dx.doi.org/10.1002/cpe.763
- light wiley document
- wiley 2004 full documents
- Abstract:
One benefit of a computational Grid is the ability to run high-performance applications over distributed resources simply and securely. We demonstrated this benefit with an experiment in which we studied the protein-folding process with the CHARMM molecular simulation package over a Grid managed by Legion, a Grid operating system. High-performance applications can take advantage of Grid resources if the Grid operating system provides both low-level functionality as well as high-level services. We describe the nature of services provided by Legion for high-performance applications. Our experiences indicate that human factors continue to play a crucial role in the configuration of Grid resources, underlying resources can be problematic, Grid services must tolerate underlying problems or inform the user, and high-level services must continue to evolve to meet user requirements. Our experiment not only helped a scientist perform an important study, but also showed the viability of an integrated approach such as Legion's for managing a Grid. Copyright © 2004 John Wiley & Sons, Ltd.
CPE748:Parallel LU factorization of sparse matrices on
FPGA-based configurable computing engines
- Author(s):Sotirios G Ziavras,Xiaofang Wang
- Department of
Electrical and Computer Engineering, New Jersey
Institute of Technology, Newark, NJ 07102, U.S.A.,
- http://dx.doi.org/10.1002/cpe.748
- light wiley document
- wiley 2004 full documents
- Abstract:Configurable computing, where hardware
resources are configured appropriately to match specific
hardware designs, has recently demonstrated its ability to
significantly improve performance for a wide range of
computation-intensive applications. With steady advances in
silicon technology, as predicted by Moore's Law,
Field-Programmable Gate Array (FPGA) technologies have
enabled the implementation of System-on-a-Programmable-Chip
(SOPC or SOC) computing platforms, which, in turn, have
given a significant boost to the field of configurable
computing. It is possible to implement various specialized
parallel machines in a single silicon chip. In this paper,
we describe our design and implementation of a parallel
machine on an SOPC development board, using multiple
instances of a soft IP configurable processor; we use this
machine for LU factorization. LU factorization is widely
used in engineering and science to solve efficiently large
systems of linear equations. Our implementation facilitates
the efficient solution of linear equations at a cost much
lower than that of supercomputers and networks of
workstations. The intricacies of our FPGA-based design are
presented along with tradeoff choices made for the purpose
of illustration. Performance results prove the viability of
our approach. Copyright ? 2004 John Wiley & Sons, Ltd.
CPE819:An approach for quality of service adaptation in service-oriented Grids
- Author(s):Rashid Al-Ali,Abdelhakim Hafid,Omer Rana,David Walker
- Department of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, Wales, U.K.,Telcordia Technologies, Inc., 331 Newman Springs Road, Red Bank, NJ 07701, U.S.A.,
- http://dx.doi.org/10.1002/cpe.819
- light wiley document
- wiley 2004 full documents
- Abstract:
Some applications utilizing Grid computing infrastructure require the simultaneous allocation of resources, such as compute servers, networks, memory, disk storage and other specialized resources. Collaborative working and visualization is one example of such applications. In this context, quality of service (QoS) is related to Grid services, and not just to the network connecting these services. With the emerging interest in service-oriented Grids, resources may be advertised and traded as services based on a service level agreement (SLA). Such a SLA must include both general and technical specifications, including pricing policy and properties of the resources required to execute the service, to ensure QoS requirements are satisfied. An approach for QoS adaptation is presented to enable the dynamic adjustment of behavior of an application based on changes in the pre-defined SLA. The approach is particularly useful if workload or network traffic changes in unpredictable ways during an active session. Copyright © 2004 John Wiley & Sons, Ltd.
CPE818:Special Issue: Middleware for Grid Computing
http://dx.doi.org/10.1002/cpe.818
light wiley document
wiley 2004 full documents
Abstract:
No abstract
CPE820:Grids of agents for computer and telecommunication network management
- Author(s):M. D. Assunção,F. L. Koch,C. B. Westphall
- Network and Management Laboratory, Federal University of Santa Catarina, Florianópolis, SC, 88049-970, P.O. 476, Brazil,Intelligent Systems Group, University of Utrecht, The Netherlands,
- http://dx.doi.org/10.1002/cpe.820
- light wiley document
- wiley 2004 full documents
- Abstract:
The centralized system approach for computer and telecommunication network management has been presenting scalability problems along with the growth in the amount and diversity of managed equipment. Moreover, the increase in complexity of the services being offered through the networks also contributes to adding extra workload to the management station. The amount of data that must be handled and processed by only one administration point could lead to a situation where there is not enough processing and storage power to carry out an efficient job. In this work we present an alternative approach by creating a highly distributed computing environment through the use of Grids of autonomous agents to analyze large amounts of data, which reduce the processing costs by optimizing the load distribution and resource utilization. Copyright © 2004 John Wiley & Sons, Ltd.
CPE822:Applying the reflective middleware approach in Grid computing
- Author(s):Geoff Coulson,Gordon Blair,Nikos Parlavantzas,Wai Kit Yeung,Wei Cai
- Distributed Multimedia Research Group, Computing Department, Lancaster University, Lancaster LA1 4YR, U.K.,
- http://dx.doi.org/10.1002/cpe.822
- light wiley document
- wiley 2004 full documents
- Abstract:
Significant progress has been made in the design and development of object-based distributed systems and, more recently, in component-based reflective middleware platforms-i.e. middleware platforms that, through reflection, can be flexibly configured, and runtime adapted/reconfigured-especially in terms of non-functional properties like quality of service (QoS). In this paper we consider how ideas and principles from such platforms can usefully be applied in a Grid context, specifically, an Open Grid Services Architecture (OGSA) context. To this end, we identify weaknesses in OGSA in the areas of generic service provision, scalability/performance engineering, and, especially, lack of support for QoS specification and realization, and suggest how these can be alleviated. We believe that such areas will become increasingly important as sophisticated e-Science applications start to exploit the potential of OGSA's service-based architecture. We also report on our current research on developing a binding-type framework, a resource management framework, and a set of performance optimizations for an ongoing OGSA implementation we are working on. Copyright © 2004 John Wiley & Sons, Ltd.
CPE826:Performance evaluation of market-based resource allocation for Grid computing
- Author(s):J. Gomoluch,M. Schroeder
- Department of Computing, City University, Northampton Square, London EC1V 0HB, U.K.,
- http://dx.doi.org/10.1002/cpe.826
- light wiley document
- wiley 2004 full documents
- Abstract:
Resource allocation is an important aspect of Grid computing. Over the past few years, various systems have been developed which use market mechanisms to allocate resources. However, the performance of such policies has not been sufficiently studied. In this paper, we investigate under which circumstances market-based resource allocation by continuous double auctions and by the proportional share protocol, respectively, outperforms a conventional round-robin approach. We develop a model for clients, servers and the market, and present simulation results. Factors which are studied include the amount of load in the system, the number of resources, different degrees of resource heterogeneity, and communication delays. Copyright © 2004 John Wiley & Sons, Ltd.
CPE821:EasyGrid: towards a framework for the automatic
Grid enabling of legacy MPI applications
- Author(s):Vinod E. F Rebello
- Instituto de
Computa??o, Universidade Federal Fluminense, Bloco
E, 3? Andar, Rua Passo da P?tria 156, S?o Domingos,
Niter?i, CEP 24210-240
http://dx.doi.org/10.1002/cpe.821
light wiley document
wiley 2004 full documents
Abstract:One of the goals of the Grid is to
aggregate collections of shared, heterogeneous, and
distributed resources to provide computational power to
parallel applications. However, designing applications
capable of exploiting this potential with ease remains a
challenge. This paper outlines the EasyGrid methodology for
the efficient and robust execution of (legacy) MPI programs
across distributed computing clusters. The principal
objective of this work is to identify the
application-oriented middleware necessary for, as well as to
develop a framework to automatically generate, system-aware
applications capable of executing in dynamic, unstable,
distributed environments such as computational Grids.
Copyright ? 2004 John Wiley & Sons, Ltd.
CPE823:Global multimedia collaboration system
- Author(s):Wenjun Wu,Geoffrey Fox,Ahmet Uyar,Hasan Bulut,Shrideep Pallickara
- Community Grids
Laboratory, Indiana University, Indiana University
Research Park, 501 North Morton Street, Suite 222,
Bloomington, IN 47404, U.S.A.,
- http://dx.doi.org/10.1002/cpe.823
- light wiley document
- wiley 2004 full documents
- Abstract:In order to build an integrated
collaboration system over heterogeneous collaboration
technologies, we propose a global multimedia collaboration
system (Global-MMCS) based on XGSP A/V Web-Services
framework. This system can integrate multiple A/V services,
and support various collaboration clients and communities.
Now the prototype is being developed and deployed across
many universities in U.S.A. and China. Copyright ? 2004 John
Wiley & Sons, Ltd.
CPE824:InteGrade: object-oriented Grid middleware
leveraging the idle computing power of desktop machines
- Author(s):Fabio Kon,Andrei Goldchleger,Alfredo Goldman,Marcelo Finger,Germano Capistrano Bezerra
- Department of Computer
Science, University of S?o Paulo, Rua do Mat?o 1010,
05508-090 S?o Paulo, SP, Brazil,
- http://dx.doi.org/10.1002/cpe.824
- light wiley document
- wiley 2004 full documents
- Abstract:Grid computing technology improves the
computing experiences at organizations by effectively
integrating distributed computing resources. However, just a
small fraction of currently available Grid infrastructures
focuses on reutilization of existing commodity computing
resources. This paper introduces InteGrade, a novel
object-oriented middleware Grid infrastructure that focuses
on leveraging the idle computing power of shared desktop
machines. Its features include support for a wide range of
parallel applications and mechanisms to assure that the
owners of shared resources do not perceive any loss in the
quality of service. A prototype implementation is under
construction and the current version is available for
download. Copyright ? 2004 John Wiley & Sons, Ltd.
CPE825:A model for parallel job scheduling on dynamical
computer Grids
- Author(s):Alfredo Goldman,Carlos Queiroz
- Department of Computer
Science, University of S?o Paulo, Rua do
Matão 1010, 05508-090 São
Paulo, SP, Brazil,
- http://dx.doi.org/10.1002/cpe.825
- light wiley document
- wiley 2004 full documents
- Abstract:This work presents a model that allows
the execution of parallel applications in a Grid
environment. Our main focus is on how to share the idle
cycles of clusters and computers to execute real parallel
applications. We present a new model which introduces the
notions of locality and adaptability. The locality is used
for job allocation, and for job migration. The adaptability
provides a simple mechanism to allow clusters to join or
leave a Grid. We also propose the middleware architecture to
implement the model, and provide some simulation results to
show the main characteristics of the model. Copyright
© 2004 John Wiley & Sons, Ltd.
CPE827:Fine-grained authorization for job execution in
the Grid: design and implementation
- Author(s):K. Keahey,V. Welch,S. Lang,B. Liu,S. Meder
- Mathematics and
Computer Science Division, Argonne National
Laboratory, Building 221, 9700 South Cass Avenue,
Argonne, IL 60439, U.S.A.,University of Chicago,
5801 South Ellis Avenue, Chicago, IL, U.S.A.,University of Houston,
4800 Calhoun Road, Houston, TX, U.S.A.,
- http://dx.doi.org/10.1002/cpe.827
- light wiley document
- wiley 2004 full documents
- Abstract:In this paper, we describe our work on
enabling fine-grained authorization for resource usage and
management. We address the need of virtual organizations to
enforce their own polices in addition to those of the
resource owners, with regards to both resource consumption
and job management. To implement this design, we propose
changes and extensions to the Globus Toolkit's version
2 resource management mechanism. We describe the prototype
and policy language that we have designed to express
fine-grained policies and present an analysis of our
solution. Copyright © 2004 John Wiley & Sons, Ltd.
CPE828:A resource management framework for interactive Grids
- Author(s):Raj Kumar,Vanish Talwar,Sujoy Basu
- Hewlett-Packard Labs,
1501 Page Mill Road, MS 1181, Palo Alto, CA 94304, U.S.A.,
- http://dx.doi.org/10.1002/cpe.828
- light wiley document
- wiley 2004 full documents
- Abstract:Traditional use of Grid computing
systems has been for batch jobs in the scientific and
academic computing. We envision the next generation Grid
computing systems to support graphical interactive sessions.
In this paper, we propose a resource management framework
for supporting graphical interactive sessions in a Grid
computing system. We describe the high-level architectural
resource management framework distributed among the
submission node, central scheduler node, and the execution
node. We then describe in detail the resource management
framework on the execution node. The description of the
resource management framework on the scheduler node is kept
at a high level in this paper. The framework on execution
nodes consists of Resource Management Agents, an Admission
Control system and Application Predictor system. The agents
on the execution node are Startup Agents, Sensor Agents,
Monitoring Agents, Aggregator Agents, Enforcement Agents and
Registration Agents. The Session Admission Control system is
responsible for determining if a new application session can
be admitted to the execution node. An Application Predictor
system is responsible for predicting the resource
utilization behavior of applications based on data obtained
from the Resource Management Agents. The proposed framework
allows for implementation of a scalable and extensible
middleware for interactive Grid resource management. It
supports fine-grained performance guarantees specified in
service level agreements and brings forth some important and
novel contributions to enable graphical interactive sessions
on Grids. Copyright © 2004 John Wiley & Sons, Ltd.
CPE829:GridSphere: a portal framework for building collaborations
- Author(s):Oliver Wehrens,Jason Novotny,Michael Russell
- Max-Planck-Institut fur
Gravitationsphysik, Albert-Einstein-Institut, Am
Mhlenberg 1, D-14476 Golm, Germany,
- http://dx.doi.org/10.1002/cpe.829
- light wiley document
- wiley 2004 full documents
- Abstract:Grid-enabled portals are becoming
increasingly popular as a platform for providing access to
Grid services and resources. Unfortunately, much of the work
done in portal development has led to vertically layered
solutions that work for a particular project but are
difficult to extend or reuse for other projects. The
GridSphere portal framework seeks to address these
limitations by providing a framework that will offer
external developers a model for easily adding new
functionality and hence increasing community collaboration.
The GridLab portal will serve as an initial prototype to
showcase the GridSphere framework and provide access to
services being developed within the GridLab project.
Copyright © 2004 John Wiley & Sons, Ltd.
CPE830:CoDIMS: an adaptable middleware system for
scientific visualization in Grids
- Author(s):G. Giraldi,F. Porto,J. C. Oliveira,R. Silva,B. Schulze
- National Laboratory for
Scientific Computing, Department of Computer
Science, Av. Getulio Vargas 333, 25651-075,
Petrópolis, RJ, Brazil,Department of Systems
Engineering, Military Institute of Engineering, Rio
de Janeiro, RJ, Brazil,
- http://dx.doi.org/10.1002/cpe.830
- light wiley document
- wiley 2004 full documents
- Abstract:In this paper we propose a middleware
infrastructure adapted for supporting scientific
visualization applications over a Grid environment. We
instantiate a middleware system from CoDIMS, which is an
environment for the generation of configurable data
integration middleware systems. CoDIMS adaptive architecture
is based on the integration of special components managed by
a control module that executes users workflows. We exemplify
our proposal with a middleware system generated for
computing particles' trajectories within the
constraints imposed by a Grid environment. Copyright
© 2004 John Wiley & Sons, Ltd.
CPE831:An asynchronous middleware for Grid resource monitoring
- Author(s):V. Quéma,R. Lachaize,E. Cecchet
- INRIA
Rhône-Alpes, 655 avenue de l'Europe,
38334 Saint-Ismier Cedex, France,Sardes Project LSR-IMAG
- INRIA, 655, avenue de l'Europe, 38334
Saint-Ismier Cedex, France,
- http://dx.doi.org/10.1002/cpe.831
- light wiley document
- wiley 2004 full documents
- Abstract:Resource management in a Grid computing
environment raises several technical issues. The monitoring
infrastructure must be scalable, flexible, configurable and
adaptable to support thousands of devices in a highly
dynamic environment where operational conditions are
constantly changing. We propose to address these
challenges by combining asynchronous communications with
reflective component-based technologies. We introduce
DREAM(Dynamic
REflective Asynchronous Middleware), a Java component-based
message oriented middleware. Asynchronous communications are
used to achieve the scalability and flexibility objectives,
whereas the reflective component technology provides the
complementary configurability and adaptability features.
We argue that this infrastructure is a viable approach to
build a resource monitoring infrastructure for Grid
computing. Moreover, DREAM makes the monitoring logic accessible from
J2EE application servers. This allows the monitoring
information to be presented as a Grid service and to
integrate within the Open Grid Software Architecture.
Copyright © 2004 John Wiley & Sons, Ltd.
CPE832:Grid computing and active services
- Author(s):B. Schulze ,E. R. Madeira
- National Laboratory for
Scientific Computing LNCC, Department of Computer
Science, Av. Getúlio Vargas 333,
25651-070 Petrópolis, RJ, Brazil,Institute of
Computing/State University of Campinas - UNICAMP,
P.O. Box 6176, 13083-970 Campinas, SP, Brazil
http://dx.doi.org/10.1002/cpe.832
light wiley document
wiley 2004 full documents
Abstract:Recent activities in Grid computing
point in the direction of an open service architecture
merging computational infrastructure with Grid services and
Web services. We also see advantages in adding facilities
offered by distributed object computing environments with
their interoperable references and services. The use of open
service architectures introduces the possibility of
composing structures of nested services. In this paper we
discuss architectural aspects of middleware for Grid
computing based on an infrastructure of distributed clusters
and/or also services, and an access portal as a
demonstration project in progress. Copyright © 2004
John Wiley & Sons, Ltd.
CPE833:Programming and coordinating Grid environments and applications
- Author(s):Cristina Ururahy,Noemi Rodriguez
- Computer Science
Department, PUC-Rio, Rua Marquês de
São Vicente 225, 22453-900 Rio de
Janeiro, RJ, Brazil,Computer Science
Department, PUC-Rio, Rua Marqu?s de S?o Vicente 225,
22453-900 Rio de Janeiro, RJ, Brazil
http://dx.doi.org/10.1002/cpe.833
light wiley document
wiley 2004 full documents
Abstract:The heterogeneous and dynamic nature of
Grid environments place new demands on models and paradigms
for parallel programming. In this work we discuss how ALua,
a programming system based on a dual programming language
model, can help the programmer to develop applications for
this environment, monitoring the state of resources and
controlling the application so that it adapts to changes in
this state. Copyright © 2004 John Wiley & Sons, Ltd.
CPE762:Parallel program debugging by specification
- Author(s):Simon Huband,Chris McDonald
- School of Computer
Science and Software Engineering, The University of
Western Australia, Crawley, Western Australia 6009, Australia,
- http://dx.doi.org/10.1002/cpe.762
- light wiley document
- wiley 2004 full documents
- Abstract:Most message passing parallel programs
employ logical process topologies with regular
characteristics to support their computation. Since process
topologies define the relationship between processes, they
present an excellent opportunity for debugging. The primary
benefit is that process behaviours can be correlated,
allowing expected behaviour to be abstracted and identified,
and undesirable behaviour reported. However, topology
support is inadequate in most message passing parallel
programming environments, including the popular Message
Passing Interface (MPI) and the Parallel Virtual Machine
(PVM). Programmers are forced to implement topology support
themselves, increasing the possibility of introducing
errors. This paper proposes a trace- and topology-based
approach to parallel program debugging, driven by four
distinct types of specifications. Trace specifications allow
trace data from a variety of sources and message passing
libraries to be interpreted in an abstract manner, and
topology specifications address the lack of explicit
topology knowledge, whilst also facilitating the
construction of user-consistent views of the debugging
activity. Loop specifications express topology-consistent
patterns of expected trace events, allowing conformance
testing of associated trace data, and error specifications
specify undesirable event interactions, including mismatched
message sizes and mismatched communication pairs. Both loop
and error specifications are simplified by having knowledge
of the actual topologies being debugged. The proposed
debugging framework enables a wealth of potential debugging
views and techniques. Copyright © 2004 John Wiley
& Sons, Ltd.
CPE764:A parallel Broyden approach to the Toeplitz
inverse eigenproblem
- Author(s):Jesús Peinado,Antonio M. Vidal
- Departamento de
Sistemas Informéticos y
Computación, Universidad
Politécnica Valencia, Valencia 46071, Spain,
- http://dx.doi.org/10.1002/cpe.764
- light wiley document
- wiley 2004 full documents
- Abstract:In this work we show a portable
sequential and a portable parallel algorithm for solving the
inverse eigenproblem for real symmetric Toeplitz matrices.
Both algorithms are based on Broyden's method for
solving nonlinear systems. We reduced the computational cost
for some problem sizes, and furthermore we managed to reduce
spatial cost considerably, compared in both cases with
parallel algorithms proposed by other authors and by us,
although sometimes quasi-Newton methods (as Broyden) do not
reach convergence in all the test cases. We have implemented
the parallel algorithm using the parallel numerical linear
algebra library SCALAPACK based on the MPI environment.
Experimental results have been obtained using two different
architectures: a shared memory multiprocessor, the SGI
PowerChallenge, and a cluster of Pentium II PCs connected
through a myrinet network. The algorithms obtained are
scalable in all the cases. Copyright © 2004 John
Wiley & Sons, Ltd.
CPE765:User transparency: a fully sequential programming
model for efficient data parallel image processing
- Author(s):F. J. Seinstra,D. Koelma
- Intelligent Sensory
Information Systems, Faculty of Science, University
of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands,
- http://dx.doi.org/10.1002/cpe.765
- light wiley document
- wiley 2004 full documents
- Abstract:Although many image processing
applications are ideally suited for parallel implementation,
most researchers in imaging do not benefit from
high-performance computing on a daily basis. Essentially,
this is due to the fact that no parallelization tools exist
that truly match the image processing researcher's
frame of reference. As it is unrealistic to expect imaging
researchers to become experts in parallel computing, tools
must be provided to allow them to develop high-performance
applications in a highly familiar manner. In an attempt to
provide such a tool, we have designed a software
architecture that allows transparent (i.e. sequential)
implementation of data parallel imaging applications for
execution on homogeneous distributed memory MIMD-style
multicomputers. This paper presents an extensive overview of
the design rationale behind the software architecture, and
gives an assessment of the architecture's effectiveness
in providing significant performance gains. In particular,
we describe the implementation and automatic parallelization
of three well-known example applications that contain many
fundamental imaging operations: (1) template matching; (2)
multi-baseline stereo vision; and (3) line detection. Based
on experimental results we conclude that our software
architecture constitutes a powerful and user-friendly tool
for obtaining high performance in many important image
processing research areas. Copyright © 2004 John
Wiley & Sons, Ltd.
CPE797:Special Issue: Formal Techniques for Java-like Programs
http://dx.doi.org/10.1002/cpe.797
light wiley document
wiley 2004 full documents
Abstract:
No abstract
CPE798:Simple verification technique for complex Java bytecode subroutines
- Author(s):Alessandro Coglio
- Kestrel Institute, 3260 Hillview Avenue, Palo Alto, CA 94304, U.S.A.
http://dx.doi.org/10.1002/cpe.798
light wiley document
wiley 2004 full documents
Abstract:
Java is normally compiled to bytecode, which is verified and then executed by the Java Virtual Machine. Bytecode produced via compilation must pass verification. The main cause of complexity for bytecode verification is subroutines, used by compilers to generate more compact code. The techniques to verify subroutines proposed in the literature reject certain programs produced by mundane compilers, are difficult to realize within an implementation of the Java Virtual Machine or are relatively complicated. This paper presents a novel technique which is very simple to understand, implement and prove sound. It is also very powerful: the set of accepted programs has a simple characterization which most likely includes all the code produced by current compilers and which enables future compilers to make more extensive use of subroutines. Copyright © 2004 John Wiley & Sons, Ltd.
CPE800:Analysing the Java package/access concepts in Isabelle/HOL
- Author(s):Norbert Schirmer
- Technical University of München, Department of Informatics, D-85748 Garching, Germany
http://dx.doi.org/10.1002/cpe.800
light wiley document
wiley 2004 full documents
Abstract:
Java access modifiers and packages provide a mechanism to restrict access to members and types, as an additional means of information hiding beyond the purely object-oriented concept of classes. In this paper we clarify the semantics of access modifiers and packages by adding them to our formal model of Java in the theorem prover Isabelle/HOL. We analyse which properties we can rely on at runtime, provided that the program has passed the static accessibility tests. Copyright © 2004 John Wiley & Sons, Ltd.
CPE801:Transposing F to C
: expressivity of
parametric polymorphism in an object-oriented language
- Author(s):Andrew Kennedy,Don Syme
- Microsoft Research, 7
J. J. Thomson Avenue, Cambridge CB3 OFB, U.K.,
- http://dx.doi.org/10.1002/cpe.801
- light wiley document
- wiley 2004 full documents
- Abstract:We present a type-preserving
translation of System F (the polymorphic lambda calculus)
into a forthcoming revision of the C
programming
language supporting parameterized classes and polymorphic
methods. The forthcoming revision of Java in JDK 1.5 also
makes a suitable target. We formalize the translation using
a subset of C
similar to
Featherweight Java. We prove that the translation is fully
type-preserving and that it preserves behaviour via the
novel use of environment-style semantics for System F. We
observe that whilst parameterized classes alone are
sufficient to encode the parameterized datatypes and
let-polymorphism of languages such as ML and Haskell, it is
the presence of dynamic dispatch for polymorphic methods
that supports the encoding of the first-class polymorphism
found in System F and recent extensions to ML and Haskell.
Copyright © 2004 John Wiley & Sons, Ltd
CPE799:Checking ownership and confinement
- Author(s):James Noble ,Alex Potanin,Robert Biddle
- School of Mathematical
and Computing Sciences, Victoria University of
Wellington, P.O. Box 600, Wellington, New Zealand,
- http://dx.doi.org/10.1002/cpe.799
- light wiley document
- wiley 2004 full documents
- Abstract:A number of proposals to manage
aliasing in Java-like programming languages have been
advanced over the last five years. It is not clear how
practical these proposals are, that is, how well they relate
to the kinds of programs currently written in Java-like
languages. To address this problem, we analysed heap
snapshots from a corpus of Java programs. Our results
indicate that object-oriented programs do in fact exhibit
symptoms of encapsulation in practice, and that proposed
models of uniqueness, ownership, and confinement can
usefully describe the aliasing structures of object-oriented
programs. Understanding the kinds of aliasing present in
programs should help us to design formalisms to make
explicit the kinds of aliasing implicit in object-oriented
programs. Copyright © 2004 John Wiley & Sons, Ltd.
CPE789:A parallel algorithm for static slicing of concurrent programs
- Author(s):D. Goswami,R. Mall
- Department of Computer Science and Engineering, IIT Guwahati, India,Department of Computer Science and Engineering, IIT Kharagpur, India
http://dx.doi.org/10.1002/cpe.789
light wiley document
wiley 2004 full documents
Abstract:
Slicing of concurrent programs is a compute-intensive task. To speed up the slicing process, we have developed a parallel algorithm. For this purpose we used the concurrent control flow graph (CCFG) as the intermediate representation. We used a network of communicating processes to develop our parallel algorithm. We have implemented our parallel algorithm and the experimental results appear promising. Copyright © 2004 John Wiley & Sons, Ltd.
CPE791:Impact of mixed-parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms
- Author(s):F. Desprez,F. Suter
- LIP, UMR CNRS, ENS Lyon, INRIA 5668, 46 allée d'Italie, F-69364 Lyon Cedex 07, France,GRAIL, UCSD, 9500 Gilman Drive, MC 0114, La Jolla, CA 92093-0114, U.S.A.
http://dx.doi.org/10.1002/cpe.791
light wiley document
wiley 2004 full documents
Abstract:
In this paper we study the impact of the simultaneous exploitation of data- and task-parallelism, so called mixed-parallelism, on the Strassen and Winograd matrix multiplication algorithms. This work takes place in the context of Grid computing and, in particular, in the Client-Agent(s)-Server(s) model, where data can already be distributed on the platform. For each of those algorithms, we propose two mixed-parallel implementations. The former follows the phases of the original algorithms while the latter has been designed as the result of a list scheduling algorithm. We give a theoretical comparison, in terms of memory usage and execution time, between our algorithms and classical data-parallel implementations. This analysis is corroborated by experiments. Finally, we give some hints about heterogeneous and recursive versions of our algorithms. Copyright © 2004 John Wiley & Sons, Ltd.
CPE793:Data structures in Java for matrix computations
- Author(s):Geir Gundersen,Trond Steihaug
- Department of Informatics, University of Bergen, Norway,
- http://dx.doi.org/10.1002/cpe.793
- light wiley document
- wiley 2004 full documents
- Abstract:
In this paper we show how to utilize Java's native arrays for matrix computations. The disadvantages of Java arrays used as a 2D array for dense matrix computation are discussed and ways to improve the performance are examined. We show how to create efficient dynamic data structures for sparse matrix computations using Java's native arrays. This data structure is unique for Java and shown to be more dynamic and efficient than the traditional storage schemes for large sparse matrices. Numerical testing indicates that this new data structure, called Java Sparse Array, is competitive with the traditional Compressed Row Storage scheme on matrix computation routines. Java gives increased flexibility without losing efficiency. Compared with other object-oriented data structures Java Sparse Array is shown to have the same flexibility. Copyright © 2004 John Wiley & Sons, Ltd.
CPE787:Design and evaluation of a TOP100 Linux Super
Cluster system
- Author(s):Erik Elmroth,Niklas Edmundsson,Bo K?gstr?m,Markus M?rtensson,Mats Nyl?n,?ke Sandgren,Mattias Wadenstein
- High Performance
Computing Center North (HPC2N), Ume? University,
SE-901 87 Ume?, Sweden,
- http://dx.doi.org/10.1002/cpe.787
- light wiley document
- wiley 2004 full documents
- Abstract:The High Performance Computing Center
North (HPC2N) Super Cluster is a truly self-made
high-performance Linux cluster with 240 AMD processors in
120 dual nodes, interconnected with a high-bandwidth,
low-latency SCI network. This contribution describes the
hardware selected for the system, the work needed to build
it, important software issues and an extensive performance
analysis. The performance is evaluated using a number of
state-of-the-art benchmarks and software, including STREAM,
Pallas MPI, the Atlas DGEMM, High-Performance Linpack and
NAS Parallel benchmarks. Using these benchmarks we first
determine the raw memory bandwidth and network
characteristics; the practical peak performance of a single
CPU, a single dual-node and the complete 240-processor
system; and investigate the parallel performance for
non-optimized dusty-deck Fortran applications. In summary,
this $500 000 system is extremely cost-effective and shows
the performance one would expect of a large-scale
supercomputing system with distributed memory architecture.
According to the TOP500 list of June 2002, this cluster was
the 94th fastest computer in the world. It is now fully
operational and stable as the main computing facility at
HPC2N. The system's utilization figures exceed 90%,
i.e. all 240 processors are on average utilized over 90% of
the time, 24 hours a day, seven days a week. Copyright ?
2004 John Wiley & Sons, Ltd.
CPE810:Real-time primer design for DNA chips
- Author(s):H. Simmler,H. Singpiel,R. Männer
- Acconovis GmbH, Lindenhofstr. 42-44, 68163 Mannheim, Germany,University of Mannheim, B6, 23-29, 68131 Mannheim, Germany
http://dx.doi.org/10.1002/cpe.810
light wiley document
wiley 2004 full documents
Abstract:
The design of PCR or DNA chip experiments is a time-consuming process where bioinformatics is extensively used. The selection of the primers, which are immobilized on the DNA chip, requires a complex algorithm. Based on several parameters an optimized set of primers is automatically determined for a given gene sequence.
This paper describes a parallel architecture which performs the optimization of the primer selection on a hardware accelerator. In contrast to the pure software approach, the parallel architecture gains a speedup of factor 500 using a PCI-based hardware accelerator. This approach allows an optimization of a specified primer set in real time. Copyright © 2004 John Wiley & Sons, Ltd.
CPE811:Development and implementation of a parallel algorithm for the fast design of oligonucleotide probe sets for diagnostic DNA microarrays
- Author(s):H. Meier,A. Krause,M. Kräutner,A. Bode
- Lehrstuhl fuer Rechnertechnik und Rechnerorganisation, Parallelrechnerarchitektur (LRR-TUM), Institut fuer Informatik, Technische Universitaet Muenchen, Boltzmannstrasse 3, 85748 Garching, Germany,
- http://dx.doi.org/10.1002/cpe.811
- light wiley document
- wiley 2004 full documents
- Abstract:
We describe an accurate method for the automatic parallel generation of oligonucleotide probe sets for DNA microarrays. This approach includes a component for high-performance specificity evaluation of designed probes in large data sets. The three main algorithmic components of the method, namely probe preselection, hybridization prediction and probe selection are explained in detail. We introduce new combinatorial techniques for the efficient selection of probe sets of high differentiation capability even from sequence databases of conserved homologous genes. These techniques include the automatic generation of group specific probes as well as the design of excluding probes. A basic prototype has been implemented including a shared memory parallelization. Test runs have been performed on a multiprocessor personal computer with subsets of a small subunit ribosomal ribonucleic acid database, containing very conserved sequence data. The applicability of our program is pointed out by designing a set of oligonucleotide probes that shall allow a comprehensive parallel identification and differentiation of several groups of extremophilic prokaryotes by DNA microarray. The probe set is accessible via the Internet. On applying the parallel version on a dual processor system an efficiency of 80% was achieved. Copyright © 2004 John Wiley & Sons, Ltd.
CPE812:A hybrid self-organizing maps and particle swarm optimization approach
- Author(s):Xiang Xiao,Ernst R. Dow,Russell Eberhart,Zina Ben Miled,Robert J. Oppelt
- Indiana University Purdue University Indianapolis, 723 West Michigan St., SL 160C, Indianapolis, IN 46202-5132, U.S.A.,
- http://dx.doi.org/10.1002/cpe.812
- light wiley document
- wiley 2004 full documents
- Abstract:
Gene clustering, the process of grouping related genes in the same cluster, is at the foundation of different genomic studies that aim at analyzing the function of genes. Microarray technologies have made it possible to measure gene expression levels for thousands of genes simultaneously. For knowledge to be extracted from the datasets generated by these technologies, the datasets have to be presented to a scientist in a meaningful way. Gene clustering methods serve this purpose. In this paper, a hybrid clustering approach that is based on self-organizing maps and particle swarm optimization is proposed. In the proposed algorithm, the rate of convergence is improved by adding a conscience factor to the self-organizing maps algorithm. The robustness of the result is measured by using a resampling technique. The algorithm is implemented on a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd.
CPE813:A specialized hardware device for the protein similarity search
- Author(s):A. Marongiu,P. Palazzari,V. Rosato
- ENEA, Computing and Modeling Unit, C. R. Casaccia, Via Anguillarese 301, SP111 - 00060 S. Maria di Galeria, Rome, Italy,
- http://dx.doi.org/10.1002/cpe.813
- light wiley document
- wiley 2004 full documents
- Abstract:
This work presents the architecture of PROSIDIS, a special purpose processor designed to search for the occurrence of substrings similar to a given ‘template string’ within a proteome. Similarity is determined by means of a weighting matrix which measures the degree of similarity between any two amino acids.
The paper recalls the basis of the PHG (Parallel Hardware Generator) tool, developed in the framework of the HADES project (Hardware Design for Scientific Applications). Through PHG it is possible to design, in a nearly automatic way, a parallel hardware which efficiently implements an algorithm described through recurrence equations; PHG is able to generate a synthesizable VHDL which represents a parallel system derived from the System of Affine Recurrence Equations (SARE) describing the problem.
In this work we illustrate the advantages derived from designing a special purpose processor to face the protein similarity discovery problem. Some preliminary results are given, reporting the time spent by several conventional computing architectures and by the PROSIDIS processor hosted by a personal computer to solve the same protein analysis problem. The results show that PROSIDIS, implemented on a Xilinx XV1000 FPGA, gives speedup figures ranging from 5.6 up to 55.6. Copyright © 2004 John Wiley & Sons, Ltd.
CPE814:Parallelization of IBD computation for determining genetic disease maps
- Author(s):Nouhad J. Rizk
- Computer Science Department, Notre Dame University, P.O. Box 72, Zouk-Mosbeh, Lebanon
http://dx.doi.org/10.1002/cpe.814
light wiley document
wiley 2004 full documents
Abstract:
A number of software packages are available for the construction of comprehensive human genetic maps. In this paper we parallelize the widely used package Genehunter. We restrict our attention to only one function of the package, namely the computations of Identity By Descent (IBD) genes of a family. We use a master-slave model with the Message Passing Interface parallel environment. Our tests are done on two different architectures: a network of workstations and a shared memory multiprocessor. A new and efficient strategy to classify the parallelization of genetic linkage analysis programs results from our experiments. The classification is based on values of parameters which affect the complexity of the computation. Copyright © 2004 John Wiley & Sons, Ltd.
CPE815:Development of distributed bioinformatics applications with GMP
- Author(s):Bertil Schmidt,Lin Feng,Amey Laud,Yusdi Santoso
- School of Computer Engineering, Nanyang Technological University, Singapore 639798,Helixense Pte Ltd, 73 Science Park Drive, 02-05 Science Park 1, Singapore 118254,
- http://dx.doi.org/10.1002/cpe.815
- light wiley document
- wiley 2004 full documents
- Abstract:
We present the design and use of gMP as a tool for developing distributed bioinformatics applications. gMP is a purely Java-based interface that adds MPI-like message-passing and collective communication to the genomics Research Network Architecture (gRNA). The gRNA is a highly programmable, modular environment specifically designed to invigorate the development of genome-centric tools for life science research. We demonstrate the development of a distributed application to detect regulatory elements using correlation with gene expression data. Its implementation with gMP leads to significant runtime savings on our distributed gRNA system. Copyright © 2004 John Wiley & Sons, Ltd.
CPE816:Parallel divide and conquer approach for the protein threading problem
- Author(s):Nicola Yanev,Rumen Andonov
- University of Sofia, 5 J. Bouchier str., 1126 Sofia, Bulgaria,LAMIH/ROI, University of Valenciennes, Le Mont Houy, 59313 Valenciennes Cedex 9, France
http://dx.doi.org/10.1002/cpe.816
light wiley document
wiley 2004 full documents
Abstract:
For the Protein Threading Problem (PTP) we propose a network-flow like formulation. We also show its equivalence with a generalization of the shortest path problem on a graph with a very particular structure. The underlying Mixed Integer Programming (MIP) model proves to be very appropriate for the PTP-large real-life instances have been solved much faster by using only the MIP solver CPLEX instead of a known special-purpose branch and bound algorithm. The properties of the MIP model permit a decomposition of the main problem into a large number of subproblems (tasks). We show that a branch and cut strategy can be efficiently applied for solving these tasks in a parallel manner, which leads to a significant reduction in the total running time. Computational experiments with large problem instances are presented. Copyright © 2004 John Wiley & Sons, Ltd.
CPE817:The AxML program family for maximum likelihood-based phylogenetic tree inference
- Author(s):Alexandros P. Stamatakis,Thomas Ludwig,Harald Meier
- Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstr. 3, D-85748 Garching b. München, Germany,Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany,
- http://dx.doi.org/10.1002/cpe.817
- light wiley document
- wiley 2004 full documents
- Abstract:
Inference of phylogenetic (evolutionary) trees comprising hundreds or thousands of organisms based on the maximum likelihood criterion is a computationally extremely intensive task. This paper describes the evolution of the AxML program family which provides novel algorithmic as well as technical solutions for the maximum likelihood-based inference of huge phylogenetic trees. Algorithmic optimizations and a new tree building algorithm yield runtime improvements of a factor greater than 4 compared with fastDNAml and parallel fastDNAml, returning equally good trees at the same time. Various parallel, distributed, and Grid-based implementations of AxML give the program the capability to acquire the large amount of required computational resources for the inference of huge high-quality trees. Copyright © 2004 John Wiley & Sons, Ltd.
CPE808:Sequence alignment on the Cray MTA-2
- Author(s):Shahid H. Bokhari,Jon R Sauer
- Department of
Electrical Engineering, University of Engineering
& Technology, Lahore 54890, Pakistan,2Eagle Research &
Development, 1898 S. Flatiron Court, Suite 128,
Boulder, CO 80301, U.S.A
http://dx.doi.org/10.1002/cpe.808
light wiley document
wiley 2004 full documents
Abstract:Several variants of standard algorithms
for DNA sequence alignment have been implemented on the Cray
Multithreaded Architecture-2 (MTA-2). We describe the
architecture of the MTA-2 and discuss how its hardware and
software enable efficient implementation of parallel
algorithms with little or no regard for issues of
partitioning, mapping or scheduling. We describe how we
ported variants of the naive algorithm for exact alignment
and the dynamic programming algorithm for approximate
alignment to the MTA-2 and provide detailed performance
measurements. It is shown that, for the dynamic programming
algorithm, the use of the MTA's Full/Empty
synchronization bits leads to almost perfect speedup for
large problems on one to eight processors. These results
illustrate the versatility of the MTA's architecture
and demonstrate its potential for providing a
high-productivity platform for parallel processing.
Copyright ? 2004 John Wiley & Sons, Ltd.
CPE809:Using hybrid alignment for iterative sequence
database searches
- Author(s):Ralf Bundschuh,Yuheng Li,Mario Lauria
- Department of Physics,
The Ohio State University, 174 West 18th Ave.,
Columbus, OH 43210, U.S.A,Department of Computer
and Information Science, The Ohio State University,
2015 Neil Ave. #395, Columbus, OH 43210, U.S.A.,
- http://dx.doi.org/10.1002/cpe.809
- light wiley document
- wiley 2004 full documents
- Abstract:Progressive sequence model refinement
by means of iterative searches is an effective technique for
high sensitivity database searches and is currently employed
in popular tools such as PSI-BLAST and SAM. Recently, a
novel alignment algorithm has been proposed that offers
features expected to improve the sensitivity of such
iterative approaches, specifically a well-characterized
theory of its statistics even in the presence of
position-specific gap costs. Here, we demonstrate that the
new hybrid alignment algorithm is ready to be used as the
alignment core of PSI-BLAST. In addition, we evaluate the
accuracy of two proposed approaches to edge effect
correction in short sequence alignment statistics that turns
out to be one of the crucial issues in developing a
hybrid-alignment based version of PSI-BLAST. Copyright ?
2004 John Wiley & Sons, Ltd.
CPE865:Special Issue: High Performance Computational Biology
CPE792:A PC cluster system employing IEEE 1394
- Author(s):Kazuki Hyoudou,Ryota Ozaki,Yasuichi Nakayama
- Department of Computer Science, The University of Electro-Communications, Chofu, Tokyo 182-8585, Japan,
- http://dx.doi.org/10.1002/cpe.792
- light wiley document
- wiley 2004 full documents
- Abstract:
In this paper, we describe the design and evaluation of a PC cluster system in which IEEE 1394 is applied. Networks for parallel cluster computing require low latency and high bandwidth. It is also important that the networks be commercially available at low cost. Few network devices satisfy all of the above requirements. However, the IEEE 1394 standard provides a good compromise for fulfilling these requirements. We have used IEEE 1394 devices, which support a 400 Mbps data transfer rate, to connect the nodes of a PC cluster system which we have designed and implemented. We have implemented two communication libraries. One is a fast communication library called CF for IEEE 1394. The other is a MPI layer library on the CF library. Experimental results show that CF achieves a 17.2 microsecond round-trip time. On application benchmarks, the system was considerably faster than TCP/IP over Fast Ethernet. Even though the system was constructed at very low cost, it provides good performance. Using the IEEE 1394 standard is thus a good solution for low-cost cluster systems. Copyright © 2004 John Wiley & Sons, Ltd.
CPE794:Linda implementations in Java for concurrent systems
- Author(s):G. C. Wells,A. G. Chalmers,P. G. Clayton
- Department of Computer Science, Rhodes University, Grahamstown 6140, South Africa,Department of Computer Science, University of Bristol, Merchant Venturers Building, Bristol BS8 1UB, U.K.,
- http://dx.doi.org/10.1002/cpe.794
- light wiley document
- wiley 2004 full documents
- Abstract:
This paper surveys a number of the implementations of Linda that are available in Java. It provides some discussion of their strengths and weaknesses, and presents the results from benchmarking experiments using a network of commodity workstations. Some extensions to the original Linda programming model are also presented and discussed, together with examples of their application to parallel processing problems. Copyright © 2004 John Wiley & Sons, Ltd.
CPE795:The J2EE ECperf benchmark results: transient trophies or technology treasures?
- Author(s):Paul Brebner,Jeffrey Gosper
- CSIRO Mathematical and Information Sciences, GPO Box 664, Canberra, Australia,
- http://dx.doi.org/10.1002/cpe.795
- light wiley document
- wiley 2004 full documents
- Abstract:
ECperf, the widely recognized industry standard J2EE benchmark, has attracted a large number of results submissions and their subsequent publications. However, ECperf places little restriction on the hardware platforms, operating systems and databases utilized in the benchmarking process. This, combined with the existence of only two primary metrics, makes it difficult to accurately compare the performance of the Application Server products themselves. By mining the full-disclosure archives for trends and correlations, we have discovered that J2EE technology is very scalable both in a scale-up and scale-out manner. Other observed trends include a linear correlation between middle-tier total processing power and throughput, as well as between J2EE Application Server license costs and throughput. However, the results clearly indicate that there is an increasing cost per user with increasing capacity systems and scale-up is proportionately more expensive than scale-out. Finally, the correlation between middle-tier processing power and throughput, combined with results obtained from a different ‘lighter-weight’ benchmark, facilitates an estimate of throughput for different types of J2EE applications. Copyright © 2004 John Wiley & Sons, Ltd.
CPE796:The performance and scalability of SHMEM and MPI-2 one-sided routines on a SGI Origin 2000 and a Cray T3E-600
- Author(s):Glenn R. Luecke,Silvia Spanoyannis,Marina Kraeva
- Iowa State University, Ames, IA 50011-2251, U.S.A.,
- http://dx.doi.org/10.1002/cpe.796
- light wiley document
- wiley 2004 full documents
- Abstract:
This paper compares the performance and scalability of SHMEM and MPI-2 one-sided routines on different communication patterns for a SGI Origin 2000 and a Cray T3E-600. The communication tests were chosen to represent commonly used communication patterns with low contention (accessing distant messages, a circular right shift, a binary tree broadcast) to communication patterns with high contention (a ‘naive’ broadcast and an all-to-all). For all the tests and for small message sizes, the SHMEM implementation significantly outperformed the MPI-2 implementation for both the SGI Origin 2000 and Cray T3E-600. Copyright © 2004 John Wiley & Sons, Ltd.
CPE804:Augmenting LZ-77 with authentication and integrity assurance capabilities
- Author(s):Mikhail J. Atallah,Stefano Lonardi
- CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, U.S.A.,Department of Computer Science and Engineering, University of California, Riverside, CA 92521, U.S.A.
http://dx.doi.org/10.1002/cpe.804
light wiley document
wiley 2004 full documents
Abstract:
The formidable dissemination capability allowed by the current network technology makes it increasingly important to devise new methods to ensure authenticity and integrity. Nowadays it is common practice to distribute documents in compressed form. In this paper, we propose a simple variation on the classic LZ-77 algorithm that allows one to hide, within the compressed document, enough information to warrant its authenticity and integrity. The design is based on the unpredictability of a certain class of pseudo-random number generators, in such a way that the hidden data cannot be retrieved in a reasonable amount of time by an attacker (unless the secret bit-string key is known). Since it can still be decompressed by the original LZ-77 algorithm, the embedding is completely ‘transparent’ and backward-compatible, making it possible to deploy it without disrupting service. Experiments show that the degradation in compression due to the embedding is almost negligible. Copyright © 2004 John Wiley & Sons, Ltd.
CPE806:Access-controlled resource discovery in pervasive networks
- Author(s):Sanjay Raman,Dwaine Clarke,Matt Burnside,Srinivas Devadas,Ronald Rivest
- Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.,
- http://dx.doi.org/10.1002/cpe.806
- light wiley document
- wiley 2004 full documents
- Abstract:
Networks of the future will be characterized by a variety of computational devices that display a level of dynamism not seen in traditional wired networks. Because of the dynamic nature of these networks, resource discovery is one of the fundamental problems that must be solved. While resource discovery systems are not a novel concept, securing these systems in an efficient and scalable way is challenging. This paper describes the design and implementation of an architecture for access-controlled resource discovery. This system achieves this goal by integrating access control with the Intentional Naming System (INS), a resource discovery and service location system. The integration is scalable, efficient, and fits well within a proxy-based security framework designed for dynamic networks. We provide performance experiments that show how our solution outperforms existing schemes. The result is a system that provides secure, access-controlled resource discovery that can scale to large numbers of resources and users. Copyright © 2004 John Wiley & Sons, Ltd.
CPE803:Special Issue: Computer Security
CPE805:Identification and authentication of integrated circuits
- Author(s):Blaise Gassend,Daihyun Lim,Dwaine Clarke,Srinivas Devadas ,Marten van Dijk
- 1Research Lab for
Electronics, Massachusetts Institute of Technology,
Cambridge, MA 02139, U.S.A,Computer Science and
Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, MA 02139, U.S.A.,Prof Holstlaan 4,
Eindhoven, The Netherlands
http://dx.doi.org/10.1002/cpe.805
light wiley document
wiley 2004 full documents
Abstract:This paper describes a technique to
reliably and securely identify individual integrated
circuits (ICs) based on the precise measurement of circuit
delays and a simple challenge-response protocol. This
technique could be used to produce key-cards that are more
difficult to clone than ones involving digital keys on the
IC. We consider potential venues of attack against our
system, and present candidate implementations. Experiments
on Field Programmable Gate Arrays show that the technique is
viable, but that our current implementations could require
some strengthening before it can be considered as secure.
Copyright © 2004 John Wiley & Sons, Ltd.
CPE807:A role-based infrastructure management system:
design and implementation
- Author(s):Dongwan Shin,Gail-Joon Ahn,Sangrae Cho,Seunghun Jin
- Department of Software
and Information Systems, University of North
Carolina at Charlotte, Charlotte, NC 28223, U.S.A.,Department of
Information Security System, Electronics and
Telecommunications Research Institute, Taejon,
305-350, South Korea,
- http://dx.doi.org/10.1002/cpe.807
- light wiley document
- wiley 2004 full documents
- Abstract:Over the last decade there has been a
tremendous advance in the theory and practice of role-based
access control (RBAC). One of the most significant aspects
of RBAC can be viewed from its management of permissions on
the basis of roles rather than individual users.
Consequently, it reduces administrative costs and potential
errors. The management of roles in various RBAC
implementations, however, tends to be conducted on an ad hoc
basis, closely coupled with a certain context of system
environments. This paper discusses the development of a
system whose purpose is to help manage a valid set of roles
with assigned users and permissions for role-based
authorization infrastructures. We have designed and
implemented the system, called RolePartner. This system
enables role administrators to build and configure various
components of a RBAC model so as to embody organizational
access control policies which can be separated from
different enforcement mechanisms. Hence the system helps
make it possible to lay a foundation for role-based
authorization infrastructures. Three methodological
constituents are introduced for our purposes, together with
the design and implementation issues. The system has a
role-centric view for easily managing constrained and
hierarchical roles as well as assigned users and
permissions. An LDAP-accessible directory service was used
for a role database. We show that the system can be
seamlessly integrated with an existing privilege-based
authorization infrastructure. Copyright © 2004 John
Wiley & Sons, Ltd.
CPE834:Parallel bandwidth characteristics calculations for thin avalanche photodiodes on a SGI Origin 2000 supercomputer
- Author(s):Yi Pan,Constantinos S. Ierotheou,Majeed M. Hayat
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, U.S.A.,Parallel Processing Research Group, University of Greenwich, London SE10 9LS, U.K.,Department of Electrical and Computer Engineering, The University of New Mexico, Albuquerque, NM 87131-1356, U.S.A.
http://dx.doi.org/10.1002/cpe.834
light wiley document
wiley 2004 full documents
Abstract:
p
i
n
CPE866:Finding stale-value errors in concurrent programs
- Author(s):Michael Burrows,K. Rustan M. Leino
- Google Inc., Bayshore Parkway, Mountain View, CA 94043, U.S.A.,Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A.
http://dx.doi.org/10.1002/cpe.866
light wiley document
wiley 2004 full documents
Abstract:
Concurrent programs can suffer from many types of errors, not just the well-studied problems of deadlocks and simple race conditions on variables. This paper addresses a kind of race condition that arises from reading a variable whose value is possibly out of date. The paper introduces a simple technique for detecting such stale values, and reports on the encouraging experience with a compile-time checker that uses the technique. Copyright © 2004 John Wiley & Sons, Ltd.
CPE802:Efficient parallel implementations of near
Delaunay triangulation with High Performance Fortran
- Author(s):Min-Bin Chen,Tyng-Ruey Chuang,Jan-Jan Wu
- Department of
Management Information Systems, Chung Kuo Institute
of Technology, Mucha, Taipei 116, Taiwan,Institute of
Information Science, Academia Sinica, Nankang,
Taipei 115, Taiwan,
- http://dx.doi.org/10.1002/cpe.802
- light wiley document
- wiley 2004 full documents
- Abstract:Unstructured mesh generation exposes
highly irregular computation patterns, which imposes a
challenge in implementing triangulation algorithms on
parallel machines. This paper reports on an efficient
parallel implementation of near Delaunay triangulation with
High Performance Fortran (HPF). Our algorithm exploits
embarrassing parallelism by performing sub-block
triangulation and boundary merge independently at the same
time. The sub-block triangulation is a divide & conquer
Delaunay algorithm known for its sequential efficiency, and
the boundary triangulation is an incremental construction
algorithm with low overhead. Compared with prior work, our
parallelization method is both simple and efficient. In the
paper, we also describe a solution to the collinear points
problem that usually arises in large data sets. Our
experiences with the HPF implementation show that with
careful control of the data distribution, we are able to
parallelize the program using HPF's standard directives
and extrinsic procedures. Experimental results on several
parallel platforms, including an IBM SP2 and a DEC Alpha
farm, show that a parallel efficiency of 42-86% can be
achieved for an eight-node distributed memory system. We
also compare efficiency of the HPF implementation with that
of a similarly hand-coded MPI implementation. Copyright
© 2004 John Wiley & Sons, Ltd.
CPE903:An order-based algorithm for implementing
multiparty synchronization
- Author(s):Jos? A. P?rez,Rafael Corchuelo,Miguel Toro
- The Distributed Group,
University of Seville, Spain,
- http://dx.doi.org/10.1002/cpe.903
- light wiley document
- wiley 2004 full documents
- Abstract:Multiparty interactions are a powerful
mechanism for coordinating several entities that need to
cooperate in order to achieve a common goal. In this paper,
we present an algorithm for implementing them that improves
on previous results in that it does not require the whole
set of entities or interactions to be known at compile- or
run-time, and it can deal with both terminating and
non-terminating systems. We also present a comprehensive
simulation analysis that shows how sensitive to changes our
algorithm is, and compare the results with well-known
proposals by other authors. This study proves that our
algorithm still performs comparably to other proposals in
which the set of entities and interactions is known
beforehand, but outperforms them in some situations that are
clearly identified. In addition, these results prove that
our algorithm can be combined with a technique called
synchrony loosening without having an effect on efficiency.
Copyright © 2004 John Wiley & Sons, Ltd.
CPE754:Supporting Bulk Synchronous Parallelism with a high-bandwidth optical interconnect
- Author(s):I. Gourlay,P. M. Dew,K. Djemame,J. F. Snowdon,G. Russell
- Informatics Research Institute, University of Leeds, Leeds LS2 9JT, U.K.,Physics Department, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS, U.K.,
- http://dx.doi.org/10.1002/cpe.754
- light wiley document
- wiley 2004 full documents
- Abstract:
The list of applications requiring high-performance computing resources is constantly growing. The cost of inter-processor communication is critical in determining the performance of massively parallel computing systems for many of these applications. This paper considers the feasibility of a commodity processor-based system which uses a free-space optical interconnect. A novel architecture, based on this technology, is presented. Analytical and simulation results based on an implementation of BSP (Bulk Synchronous Parallelism) are presented, indicating that a significant performance enhancement, over architectures using conventional interconnect technology, is possible. Copyright © 2004 John Wiley & Sons, Ltd.
CPE755:Towards a more realistic comparative analysis of multicomputer networks
- Author(s):H. Sarbazi-Azad,M. Ould-Khaoua,L. M. Mackenzie
- Department of Computer Engineering, Sharif University of Technology, and School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran,Department of Computing Science, University of Glasgow, Glasgow, U.K.,
- http://dx.doi.org/10.1002/cpe.755
- light wiley document
- wiley 2004 full documents
- Abstract:
k
n
CPE756:RF-MVTC: an efficient risk-free multiversion concurrency control algorithm
- Author(s):Azzedine Boukerche,Terry Tuck
- SITE, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5,Department of Computer Science, University of North of Texas, Denton, TX 76203, U.S.A.
http://dx.doi.org/10.1002/cpe.756
light wiley document
wiley 2004 full documents
Abstract:
In this paper, we focus on the temporary return of data values that are incorrect for given transactional semantics that could have catastrophic effects similar to those in parallel and discrete event simulation systems. In many applications using online transactions processing environments, for instance, it is best to delay the response to a transaction's read request until it is either known or unlikely that a write message from an older update transaction will not make the response incorrect. Examples of such applications are those where aberrant behavior is too costly, and those in which precommitted data is visible to some reactive entity. In light of the avoidance of risk in this approach, we propose a risk-free multiversion temporally correct concurrency-control algorithm. We discuss the algorithm, its implementation and report on the performance results of simulation models using a cluster of workstations. Copyright © 2004 John Wiley & Sons, Ltd.
CPE759:On the scalability of many-to-many reliable
multicast sessions
- Author(s):W. Y. Yoon, D. Lee,H. Y. Youn
- Samsung Advanced
Institute of Technology, South Korea,School of Engineering,
Information and Communications University, South Korea,School of Information
and Communication Engineering, Sungkyunkwan
University, South Korea
http://dx.doi.org/10.1002/cpe.757
light wiley document
wiley 2004 full documents
Abstract:Even though tree-based reliable
multicast protocols are known to be most scalable for
one-to-many sessions, there is still an open question as to
whether these protocols are also scalable for many-to-many
sessions. In this paper, we analyze and compare two
promising multicast protocols - the receiver-initiated
protocol with NACK suppression and the tree-based protocol -
using a new spatial loss model. The proposed model considers
the correlation of packet loss events for more realistic
analysis unlike the previous work. The analysis results show
that the tree-based protocol achieves much higher throughput
than the receiver-initiated protocol for a many-to-many
session as the number of participants in the session becomes
larger. Copyright © 2004 John Wiley & Sons, Ltd.
CPE758:Performance evaluation of a random-walk-based algorithm for embedding dynamically evolving trees in hypercubic networks
- Author(s):Keqin Li
- Department of Computer Science, State University of New York, New Paltz, NY 12561-2499, U.S.A.
http://dx.doi.org/10.1002/cpe.758
light wiley document
wiley 2004 full documents
Abstract:
In many tree-structured parallel computations, the size and structure of a tree that represents a parallel computation is unpredictable at compile-time; the tree evolves gradually during the course of the computation. When such a computation is performed on a static network, the dynamic tree embedding problem is to distribute the tree nodes to the processors of the network such that all the processors receive roughly the same amount of load and that communicating nodes are assigned to neighboring processors. Furthermore, when a new tree node is generated, it should be immediately assigned to a processor for execution without any information on the further evolving of the tree; and load distribution is performed by all processors in a totally distributed fashion.
We study the problem of embedding dynamically evolving trees in hypercubic networks, including shuffle-exchange, de Bruijn, cube-connected cycles, wrapped butterfly, and hypercube networks. The performance of a random-walk-based randomized tree embedding algorithm is evaluated. Several random tree models are considered. We develop recurrence relations for analyzing the performance of embedding of complete-tree-based random trees and randomized complete trees, and linear systems of equations for reproduction trees. We present more efficient recurrence relations and linear systems of equations for symmetric networks. We also demonstrate extensive numerical data of the performance ratio and make a number of interesting observations of randomized tree embedding in the five hypercubic networks. Copyright © 2004 John Wiley & Sons, Ltd.
CPE753:A performance study of job management systems
- Author(s):Tarek El-Ghazawi ,Kris Gaj,Nikitas Alexandridis,Frederic Vroman ,Nguyen Nguyen,Jacek R. Radzikowsk,Preeyapong Samipagdi ,Suboh A. Suboh
- ECE Department, The
George Washington University, 801 22nd Street NW,
Washington, DC 20052, U.S.A,ECE Department, George
Mason University, 4400 University Drive, Fairfax, VA
22030, U.S.A.,ECE Department, The
George Washington University, 801 22nd Street NW,
Washington, DC 20052, U.S.A.,
- http://dx.doi.org/10.1002/cpe.753
- light wiley document
- wiley 2004 full documents
- Abstract:Job Management Systems (JMSs)
efficiently schedule and monitor jobs in parallel and
distributed computing environments. Therefore, they are
critical for improving the utilization of expensive
resources in high-performance computing systems and centers,
and an important component of Grid software infrastructure.
With many JMSs available commercially and in the public
domain, it is difficult to choose an optimum JMS for a given
computing environment. In this paper, we present the results
of the first empirical study of JMSs reported in the
literature. Four commonly used systems, LSF, PBS Pro, Sun
Grid Engine/CODINE, and Condor were considered. The study
has revealed important strengths and weaknesses of these
JMSs under different operational conditions. For example,
LSF was shown to exhibit excellent throughput for a wide
range of job types and submission rates. Alternatively,
CODINE appeared to outperform other systems in terms of the
average turn-around time for small jobs, and PBS appeared to
excel in terms of turn-around time for relatively larger
jobs. Copyright © 2004 John Wiley & Sons, Ltd.
CPE759:On the scalability of many-to-many reliable
multicast sessions
- Author(s):W. Y. Yoon, D. Lee,H. Y. Youn
- Samsung Advanced
Institute of Technology, South Korea,School of Engineering,
Information and Communications University, South Korea,School of Information
and Communication Engineering, Sungkyunkwan
University, South Korea
http://dx.doi.org/10.1002/cpe.759
light wiley document
wiley 2004 full documents
Abstract:Even though tree-based reliable
multicast protocols are known to be most scalable for
one-to-many sessions, there is still an open question as to
whether these protocols are also scalable for many-to-many
sessions. In this paper, we analyze and compare two
promising multicast protocols - the receiver-initiated
protocol with NACK suppression and the tree-based protocol -
using a new spatial loss model. The proposed model considers
the correlation of packet loss events for more realistic
analysis unlike the previous work. The analysis results show
that the tree-based protocol achieves much higher throughput
than the receiver-initiated protocol for a many-to-many
session as the number of participants in the session becomes
larger. Copyright © 2004 John Wiley & Sons, Ltd.
CPE761:Systems Performance Evaluation
http://dx.doi.org/10.1002/cpe.761
light wiley document
wiley 2004 full documents
Abstract:
No abstract
CPE760:One-to-all broadcasting scheme for arbitrary
static interconnection networks with bursty background traffic
- Author(s):D. D. Kouvatsos,M. Mkwawa,I. Awan
- Performance Modelling
and Engineering Research Group, Department of
Computing, University of Bradford, Bradford BD7 3BD, U.K.,,
- http://dx.doi.org/10.1002/cpe.760
- light wiley document
- wiley 2004 full documents
- Abstract:A novel one-to-all broadcasting scheme
is proposed for arbitrary static interconnection networks
with bursty background traffic having head-of-line
priorities, complete buffer sharing and repetitive service
blocking with random destination. The scheme facilitates
inter-process communication amongst parallel computers and
grid computing networks as well as asynchronous transfer
mode space division architectures and it is applicable to a
single-port mode of message passing communication. The
scheme utilizes a queue-by-queue decomposition algorithm for
arbitrary open queueing network models, based on the
principle of maximum entropy, in conjunction with an
information theoretic decomposition criterion and graph
theoretic concepts. Evidence based on empirical studies
indicates the suitability of the scheme for achieving an
optimal one-to-all broadcasting cost. Copyright ©
2004 John Wiley & Sons, Ltd.
CPE868:A cache-efficient implementation of the lattice Boltzmann method for the two-dimensional diffusion equation
CPE869:Simulation of resource synchronization in a dynamic real-time distributed computing environment
- Author(s):Chen Zhang,David Cordes
- Department of Computer Information Systems, Bryant College, Smithfield, RI 02917-1284, U.S.A.,Department of Computer Science, The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A.
http://dx.doi.org/10.1002/cpe.869
light wiley document
wiley 2004 full documents
Abstract:
Today, more and more distributed computer applications are being modeled and constructed using real-time principles and concepts. In 1989, the Object Management Group (OMG) formed a Real-Time Special Interest Group (RT SIG) with the goal of extending the Common Object Request Broker Architecture (CORBA) standard to include real-time specifications. This group's most recent efforts have focused on the requirements of dynamic distributed real-time systems.
One open problem in this area is resource access synchronization for tasks employing dynamic priority scheduling. This paper presents two resource synchronization protocols that the authors have developed which meet the requirements of dynamic distributed real-time systems as specified by Dynamic Scheduling Real-Time CORBA (DSRT CORBA). The proposed protocols can be applied to both Earliest Deadline First (EDF) and Least Laxity First (LLF) dynamic scheduling algorithms, allow distributed nested critical sections, and avoid unnecessary runtime overhead. In order to evaluate the performance of the proposed protocols, we analyzed each protocol's schedulability. Since the schedulability of the system is affected by numerous system configuration parameters, we have designed simulation experiments to isolate and illustrate the impact of each individual system parameter. Simulation experiments show the proposed protocols have better performance than one would realize by applying a schema that utilizes dynamic priority ceiling update. Copyright © 2004 John Wiley & Sons, Ltd.
CPE870:Semantic Link Network Builder and Intelligent Semantic Browser
- Author(s):H. Zhuge,R. Jia,J. Liu
- China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Graduate School, Chinese Academy of Sciences, Beijing, People's Republic of China,
- http://dx.doi.org/10.1002/cpe.870
- light wiley document
- wiley 2004 full documents
- Abstract:
Semantic Link Network (SLN) is a semantic Web model using semantic links-the natural extension of the hyperlink-based Web. The SLN-Builder is a software tool that enables definition, modification and verification of, as well as access to the SLN. The SLN-Builder can convert a SLN definition into XML descriptions for cross-platform information exchange. The Intelligent Semantic Browser is used to visualize the SLN and carry out two types of reasoning in browsing time: small granularity reasoning by chaining semantic links and large granularity reasoning by matching semantic views of SLN. With the help of the reasoning mechanism, the browser can recommend the content that is semantically relevant to the current browsing content, and it enables users to foresee the end-side content of a semantic link chain. This paper presents the design and implementation of the SLN-Builder and the intelligent semantic browser as well as key algorithms. Copyright © 2004 John Wiley & Sons, Ltd.
CPE867:Resource Space Grid: model, method and platform
- Author(s):Hai Zhuge
- China Knowledge Grid
Research Group, Key Lab of Intelligent Information
Processing, Institute of Computing Technology,
Chinese Academy of Sciences, Beijing 100080,
People's Republic of China
http://dx.doi.org/10.1002/cpe.867
light wiley document
wiley 2004 full documents
Abstract:A Resource Space Grid is a virtual Grid
that aims at effectively sharing, using and managing
versatile resources across the Internet. The kernel of the
Resource Space Grid includes a Resource Space Model (RSM)
and a uniform Resource Using Mechanism (RUM). This paper
presents the Resource Space Grid's core scientific
issues and methodology, architecture, model and theory,
design criteria and method, and practice. A normal form
theory is proposed to normalize the resource space - a
coordinate system for uniformly specifying and organizing
resources. The RUM provides not only the end-users with an
operable resource browser to operate resources using the
built-in Resource Operation Language (ROL), but also the
application developers with the ROL-based programming
environment. The prototype platform based on the proposed
model and method has been implemented and used for sharing
and managing resources in distributed research teams.
Operations on Resource Spaces can constitute the virtual
communities of Resource Space Grids - a platform independent
resource sharing environment. Copyright © 2004 John
Wiley & Sons, Ltd.