Article ID: CPE871

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Autonomic oil reservoir optimization on the Grid
Volume ID 17
Issue ID 1
Date 01 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.871
Article ID CPE871
Author Name(s) Manish Parashar1Vincent Matossian2Viraj Bhat3Magorzata Peszyska4Mrinal Sen5Paul Stoffa6Mary F. Wheeler7
Author Email(s) parashar@caip.rutgers.edu1
Affiliation(s) The Applied Software Systems Laboratory, Rutgers University, Piscataway, NJ 08855, U.S.A. 1 2 3 Department of Mathematics, Oregon State University, Corvallis, OR 97331, U.S.A. 4Institute for Geophysics, University of Texas at Austin, Austin, TX 78759, U.S.A. 5 6 Center for Subsurface Modeling, University of Texas at Austin, Austin, TX 78712, U.S.A. 7
Keyword(s) autonomic computing, oil reservoir optimization, peer-to-peer, Grid services,
Abstract
The emerging Grid infrastructure and its support for seamless and secure interactions is enabling a new generation of autonomic applications where the application components, Grid services, resources, and data interact as peers to manage, adapt and optimize themselves and the overall application. In this paper we describe the design, development and operation of a prototype of such an application that uses peer-to-peer interactions between distributed services and data on the Grid to enable the autonomic optimization of an oil reservoir. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE877

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A role for Pareto optimality in mining performance data
Volume ID 17
Issue ID 1
Date 01 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.877
Article ID CPE877
Author Name(s) Joël M. Malard1
Author Email(s) jm.malard@pnl.gov1
Affiliation(s) Pacific Northwest National Laboratory, Battelle Boulevard, P.O. Box 999, Richland, WA 99352, U.S.A. 1
Keyword(s) Pareto efficiency, dendogram, multiobjective optimization, software performance, hardware events,
Abstract
Improvements in performance modeling and identification of computational regimes within software libraries is a critical first step in developing software libraries that are truly agile with respect to the application as well as to the hardware. It is shown here that Pareto ranking, a concept from multi-objective optimization, can be an effective tool for mining large performance datasets. The approach is illustrated using software performance data gathered using both the public domain LAPACK library and an asynchronous communication library based on IBM LAPI active message library. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE883

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Solving the block-Toeplitz least-squares problem in parallel
Volume ID 17
Issue ID 1
Date 01 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.883
Article ID CPE883
Author Name(s) P. Alonso1J. M. Badía2A. M. Vidal3
Author Email(s) palonso@dsic.upv.es1
Affiliation(s) Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46071 Valencia, Spain 1Departamento de Ingenierá y Ciencia de los Computadores, Universidad Jaume I, 12080 Castellón, Spain 2 3
Keyword(s) least-squares problem, block-Toeplitz matrices, Toeplitz-block matrices, displacement structure, Generalized Schur Algorithm, parallel algorithms, clusters of personal computers,
Abstract
In this paper we present two versions of a parallel algorithm to solve the block-Toeplitz least-squares problem on distributed-memory architectures. We derive a parallel algorithm based on the seminormal equations arising from the triangular decomposition of the product TTT. Our parallel algorithm exploits the displacement structure of the Toeplitz-like matrices using the Generalized Schur Algorithm to obtain the solution in O(mn) flops instead of O(mn2) flops of the algorithms for non-structured matrices. The strong regularity of the previous product of matrices and an appropriate computation of the hyperbolic rotations improve the stability of the algorithms. We have reduced the communication cost of previous versions, and have also reduced the memory access cost by appropriately arranging the elements of the matrices. Furthermore, the second version of the algorithm has a very low spatial cost, because it does not store the triangular factor of the decomposition. The experimental results show a good scalability of the parallel algorithm on two different clusters of personal computers. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE884

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance evaluation of the SX-6 vector architecture for scientific computations
Volume ID 17
Issue ID 1
Date 01 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.884
Article ID CPE884
Author Name(s) Leonid Oliker1Andrew Canning2Jonathan Carter3John Shalf4David Skinner5Stéphane Ethier6Rupak Biswas7Jahed Djomehri8Rob Van der Wijngaart9
Author Email(s) rbiswas@nas.nasa.gov7
Affiliation(s) CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A. 1 2 3 4 5 Princeton Plasma Physics Laboratory, Princeton University, Princeton, NJ 08453, U.S.A. 6NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, U.S.A. 7 8 9
Keyword(s) microbenchmarks, NAS Parallel Benchmarks, scientific applications, vectorization, superscalar performance,
Abstract
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high-performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE922

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Special Issue: Grid Performance
Volume ID 17
Issue ID 2-4
Date 01 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.922
Article ID CPE922
Author Name(s) John Gurd1
Author Email(s)
Affiliation(s) 1
Keyword(s) .,
Abstract
.

Article ID: CPE923

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance engineering in data Grids
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.923
Article ID CPE923
Author Name(s) Erwin Laure1Heinz Stockinger2Kurt Stockinger3
Author Email(s) erwin.laure@cern.ch1
Affiliation(s) CERN, European Organization for Nuclear Research, Geneva, Switzerland 1 2 3
Keyword(s) data Grid, performance, scheduling, replica management,
Abstract
The vision of Grid computing is to facilitate worldwide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects, this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error-prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large-scale data sets in the tera- and petabyte range (also referred to as data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques, we investigate how major Grid projects deal with performance issues particularly related to data Grids and how they implement the techniques presented. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE924

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance of a possible Grid message infrastructure
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.924
Article ID CPE924
Author Name(s) Shrideep Pallickara1Geoffrey Fox2Ahmet Uyar3Hongbin Liu4Xi Rao5David Walker6Beytullah Yildiz7
Author Email(s) gcf@indiana.edu2
Affiliation(s) Community Grid Laboratory, Indiana University, Lindley Hall 215, Bloomington, IN 47405, U.S.A. 1 2 3 4 5 Department of Computer Science, Cardiff University, U.K. 6 7
Keyword(s) middleware, distributed messaging, P2P systems, Grid systems, audio/video conferencing,
Abstract
In this paper we present the results pertaining to the NaradaBrokering middleware infrastructure. NaradaBrokering is designed to run on a large network of cooperating broker nodes. NaradaBrokering capabilities include, among other things, support for a wide variety of transport protocols, Java Message Service compliance, support for routing JXTA interactions, support for audio/video conferencing applications and, finally, support for multiple constraint specification formats such as XPath, SQL and regular expression queries. This paper demonstrates the suitability of NaradaBrokering to a wide variety of applications and scenarios. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE925

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance-based middleware for Grid computing
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.925
Article ID CPE925
Author Name(s) G. R. Nudd1S. A. Jarvis2
Author Email(s) stephen.jarvis@warwick.ac.uk2
Affiliation(s) High Performance Systems Group, Department of Computer Science, University of Warwick, Coventry CV4 7AL, U.K. 1 2
Keyword(s) Grid computing, middleware, performance prediction, scheduling, quality of service,
Abstract
This paper describes a stateful service-oriented middleware infrastructure for the management of scientific tasks running on multi-domain heterogeneous distributed architectures. Allocating scientific workload across multiple administrative boundaries is a key issue in Grid computing and as a result a number of supporting services including match-making, scheduling and staging have been developed. Each of these services allows the scientist to utilize the available resources, although a sustainable level of service in such shared environments cannot always be guaranteed. A performance-based middleware infrastructure is described in which prediction data for each scientific task are calculated, stored and published through a Globus-based performance information service. Distributing these data allows additional performance-based middleware services to be built, two of which are described in this paper: an intra-domain predictive co-scheduler and a multi-domain workload steering system. These additional facilities significantly improve the ability of the system to meet task deadlines, as well as enhancing inter-domain load-balancing and system-wide resource utilization. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE926

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Performance control of scientific coupled models in Grid environments
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.926
Article ID CPE926
Author Name(s) Christopher Armstrong1Rupert W. Ford2John R. Gurd3Mikel Luján4Kenneth R. Mayes5Graham D. Riley6
Author Email(s) griley@cs.man.ac.uk6
Affiliation(s) Centre for Novel Computing, University of Manchester, Oxford Road, Manchester M13 9PL, U.K. 1 2 3 4 5 6
Keyword(s) Grid computing, performance control, component-oriented programming, coupled models, distributed computing,
Abstract
In recent years, there has been increasing interest in the development of computer simulations of complex biological systems, and of multi-physics and multi-scale physical phenomena. Applications have been developed that involve the coupling together of separate executable models of individual systems, where these models may have been developed in isolation. A lightweight yet general solution is required to problems of linking coupled models, and of handling the incompatibilities between interacting models that arise from their diverse origins and natures. Many such models require high-performance computers to provide acceptable execution times, and there is increasing interest in utilizing Grid technologies. However, Grid applications need the ability to cope with heterogeneous and dynamically changing execution environments, particularly where run-time changes can affect application performance. A general coupling framework (GCF) is described that allows the construction of flexible coupled models. This approach results in a component-based implementation of a coupled model application. A semi-formal presentation of GCF is given. Components under GCF are separately deployable and coupled by simple data flows, making them appropriate structures for dynamic execution platforms such as the Grid. The design and initial implementation of a performance control system (PERCO) is reviewed. PERCO acts by redeploying components, and is thus appropriate for controlling GCF coupled model applications. Redeployment decisions in PERCO require performance prediction capabilities. A proof-of-concept performance prediction algorithm is presented, based on the descriptions of GCF and PERCO. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE928

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The role of performance engineering techniques in the context of the Grid
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.928
Article ID CPE928
Author Name(s) A. J. G. Hey1J. Papay2M. Surridge3
Author Email(s) jp@ecs.soton.ac.uk2
Affiliation(s) EPSRC, Polaris House, North Star Avenue, Swindon SN2 1ET, U.K. 1ECS, University of Southampton, Southampton SO17 1BJ, U.K. 2IT Innovation Centre, 2 Venture Road, Chilworth Science Park, Southampton SO16 7NP, U.K. 3
Keyword(s) performance engineering, Grid, scheduling, benchmarking, meta-applications, neuro-fuzzy models, financial transaction processing,
Abstract
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of Southampton and presents application case studies in task scheduling for engineering meta-applications, and capacity engineering for a financial transaction processing system. These show that it is important to describe performance in terms of a resource model, and that the choice of models may have to trade accuracy for utility in addressing the operational issues. We then present work from the on-going EU funded Grid project GRIA, and show how lessons learned from the earlier work have been applied to support a viable business model for Grid service delivery to a specified quality of service level. The key in this case is to accept the limitations of performance estimation methods, and design business models that take these limitations into account rather than attempting to provide hard guarantees over performance. We conclude by identifying some of the key lessons learned in the course of our work over many years and suggest possible directions for future investigations. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE929

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title ASKALON: a tool set for cluster and Grid computing
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.929
Article ID CPE929
Author Name(s) Thomas Fahringer1Alexandru Jugravu2Sabri Pllana3Radu Prodan4Clovis Seragiotto5Hong-Linh Truong6
Author Email(s) thomas.fahringer@uibk.ac.at1
Affiliation(s) Institute for Computer Science, University of Innsbruck, Technikerstrasse 25/7, A-6020 Innsbruck, Austria 1Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 2 3 4 5 6
Keyword(s) cluster computing, Grid computing, parallel and distributed applications, performance prediction, measurement and analysis, bottleneck detection, experiment management,
Abstract
Performance engineering of parallel and distributed applications is a complex task that iterates through various phases, ranging from modeling and prediction, to performance measurement, experiment management, data collection, and bottleneck analysis. There is no evidence so far that all of these phases should/can be integrated into a single monolithic tool. Moreover, the emergence of computational Grids as a common single wide-area platform for high-performance computing raises the idea to provide tools as interacting Grid services that share resources, support interoperability among different users and tools, and, most importantly, provide omnipresent services over the Grid. We have developed the ASKALON tool set to support performance-oriented development of parallel and distributed (Grid) applications. ASKALON comprises four tools, coherently integrated into a service-oriented architecture. SCALEA is a performance instrumentation, measurement, and analysis tool of parallel and distributed applications. ZENTURIO is a general purpose experiment management tool with advanced support for multi-experiment performance analysis and parameter studies. AKSUM provides semi-automatic high-level performance bottleneck detection through a special-purpose performance property specification language. The PerformanceProphet enables the user to model and predict the performance of parallel applications at the early stages of development. In this paper we describe the overall architecture of the ASKALON tool set and outline the basic functionality of the four constituent tools. The structure of each tool is based on the composition and sharing of remote Grid services, thus enabling tool interoperability. In addition, a data repository allows the tools to share the common application performance and output data that have been derived by the individual tools. A service repository is used to store common portable Grid service implementations. A general-purpose Factory service is employed to create service instances on arbitrary remote Grid sites. Discovering and dynamically binding to existing remote services is achieved through registry services. The ASKALON visualization diagrams support both online and post-mortem visualization of performance and output data. We demonstrate the usefulness and effectiveness of ASKALON by applying the tools to real-world applications. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE930

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Automatic performance analysis tools for the Grid
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.930
Article ID CPE930
Author Name(s) M. Gerndt1
Author Email(s) m.gerndt@in.tum.de1
Affiliation(s) Institut für Informatik X, Technische Universität München, 85748 Garching, Germany 1
Keyword(s) performance analysis, Grid computing, parallel/distributed systems,
Abstract
Applications on Grids require scalable and online performance analysis tools. The execution environment of such applications includes a large number of processors. In addition, some of the resources such as the network will be shared with other applications. This requires applications to adapt dynamically to resource changes. The article presents the requirements of Grid application classes for performance analysis tools. It introduces a new analysis environment currently being developed for teraflop computers within the Peridot project at Technische Universität München. This environment applies a distributed automatic performance analysis approach. It is based on a formal specification of performance properties in the APART specification language. It uses a hierarchy of analysis agents that obtain performance data from a configurable monitoring system. This scalable design allows performance analysis for large Grid applications. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE938

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Distributed computing in practice: the Condor experience
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.938
Article ID CPE938
Author Name(s) Douglas Thain1Todd Tannenbaum2Miron Livny3
Author Email(s) thain@cs.wisc.edu1
Affiliation(s) Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, U.S.A. 1 2 3
Keyword(s) Condor, Grid, history, community, planning, scheduling, split execution,
Abstract
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational Grid. In this paper, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE939

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The design and implementation of Grid database services in OGSA-DAI
Volume ID 17
Issue ID 2-4
Date 02 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.939
Article ID CPE939
Author Name(s) Mario Antonioletti1Malcolm Atkinson2Rob Baxter3Andrew Borley4Neil P. Chue Hong5Brian Collins6Neil Hardman7Alastair C. Hume8Alan Knox9Mike Jackson10Amy Krause11Simon Laws12James Magowan13Norman W. Paton14Dave Pearson15Tom Sugden16Paul Watson17Martin Westhead18
Author Email(s) paul.watson@newcastle.ac.uk17
Affiliation(s) EPCC, University of Edinburgh, James Clerk Maxwell Building, Mayfield Road, Edinburgh EH9 3JZ, U.K. 1National e-Science Centre, 15 South College Street, Edinburgh EH8 9AA, U.K. 2 3 IBM United Kingdom Ltd, Hursley Park, Winchester SO21 2JN, U.K. 4 5 6 7 8 9 10 11 12 13 Department of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, U.K. 14Oracle UK Ltd, Thames Valley Park, Reading RG6 1RA, U.K. 15 16 School of Computing Science, University of Newcastle, Newcastle-upon-Tyne NE1 7RU, U.K. 17 18
Keyword(s) Grid, databases, e-Science, Web Services,
Abstract
http://www.ogsadai.org.uk

Article ID: CPE835

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Adding tuples to Java: a study in lightweight data structures
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.835
Article ID CPE835
Author Name(s) C. van Reeuwijk1H. J. Sips2
Author Email(s) c.vanreeuwijk@cs.tudelft.nl1
Affiliation(s) Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands 1 2
Keyword(s) tuple, Java, lightweight data structures,
Abstract
lightweight tuples

Article ID: CPE836

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Object combining: a new aggressive optimization for object intensive programs
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.836
Article ID CPE836
Author Name(s) Ronald Veldema1Ceriel J. H. Jacobs2Rutger F. H. Hofman3Henri E. Bal4
Author Email(s) veldema@cs.fau.de1
Affiliation(s) Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands 1 2 3 4
Keyword(s) Java, garbage collection, object management,
Abstract
Object combining tries to put objects together that have roughly the same life times in order to reduce strain on the memory manager and to reduce the number of pointer indirections during a program's execution. Object combining works by appending the fields of one object to another, allowing allocation and freeing of multiple objects with a single heap (de)allocation. Unlike object inlining, which will only optimize objects where one has a (unique) pointer to another, our optimization also works if there is no such relation. Object inlining also directly replaces the pointer by the inlined object's fields. Object combining leaves the pointer in place to allow more combining. Elimination of the pointer accesses is implemented in a separate compiler optimization pass. Unlike previous object inlining systems, reference field overwrites are allowed and handled, resulting in much more aggressive optimization. Our object combining heuristics also allow unrelated objects to be combined, for example, those allocated inside a loop; recursive data structures (linked lists, trees) can be allocated several at a time and objects that are always used together can be combined. As Java explicitly permits code to be loaded at runtime and allows the new code to contribute to a running computation, we do not require a closed-world assumption to enable these optimizations (but it will increase performance). The main focus of object combining in this paper is on reducing object (de)allocation overhead, by reducing both garbage collection work and the number of object allocations. Reduction of memory management overhead causes execution time to be reduced by up to 35%. Indirection removal further reduces execution time by up to 6%. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE837

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title UMM: an operational memory model specification framework with integrated model checking capability
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.837
Article ID CPE837
Author Name(s) Yue Yang1Ganesh Gopalakrishnan2Gary Lindstrom3
Author Email(s) yyang@cs.utah.edu1
Affiliation(s) School of Computing, University of Utah, Salt Lake City, UT 84112, U.S.A. 1 2 3
Keyword(s) memory model, operational specification, Java thread, formal verification,
Abstract
Given the complicated nature of modern shared memory systems, it is vital to have a systematic approach to specifying and analyzing memory consistency requirements. In this paper, we present the UMM specification framework, which integrates two key features to support memory model verification: (i) it employs a simple and generic memory abstraction that can capture a large collection of memory models as guarded commands with a uniform notation, and (ii) it provides built-in model checking capability to enable formal reasoning about thread behaviors. Using this framework, memory models can be specified in a parameterized style - designers can simply redefine a few bypassing rules and visibility ordering rules to obtain an executable specification of another memory model. We formalize several classical memory models, including Sequential Consistency, Coherence, and PRAM, to illustrate the general techniques of applying this framework. We then provide an alternative specification of the Java memory model, based on a proposal from Manson and Pugh, and demonstrate how to analyze Java thread semantics using model checking. We also compare our operational specification style with axiomatic specification styles and explore a mechanism that converts a memory model definition from one style to the other.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE843

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Ravenscar-Java: a high-integrity profile for real-time Java
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.843
Article ID CPE843
Author Name(s) Jagun Kwon1Andy Wellings2Steve King3
Author Email(s) jagun@cs.york.ac.uk1
Affiliation(s) Real-Time Systems Research Group, Department of Computer Science, University of York, Heslington, York YO10 5DD, U.K. 1 2 3
Keyword(s) high-integrity systems, real-time Java, profile,
Abstract
For many, Java is the antithesis of a high-integrity programming language. Its combination of object-oriented programming features, its automatic garbage collection, and its poor support for real-time multi-threading are all seen as particular impediments. The Real-Time Specification for Java has introduced many new features that help in the real-time domain. However, the expressive power of these features means that very complex programming models can be created, necessitating complexity in the supporting real-time virtual machine. Consequently, Java, with the real-time extensions as they stand, seems too complex for confident use in high-integrity systems. This paper presents a Java profile for the development of software-intensive high-integrity real-time systems. This restricted programming model removes language features with high overheads and complex semantics, on which it is hard to perform timing and functional analyses. The profile fits within the J2ME framework and is consistent with well-known guidelines for high-integrity software development, such as those defined by the U.S. Nuclear Regulatory Commission.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE845

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Improving the memory management performance of RTSJ
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.845
Article ID CPE845
Author Name(s) M. Teresa Higuera-Toledano1Valerie Issarny2
Author Email(s) mthiguer@dacya.ucm.es1
Affiliation(s) Universidad Complutense de Madrid, Ciudad Universitaria, Madrid 28040, Spain 1INRIA-Rocquencourt, Domaine de Voluceau, BP 105, 78153 Le Chesnay Cedex, France 2
Keyword(s) Java, real-time, embedded systems, garbage collection, memory regions, Java microprocessor, write barriers, performance,
Abstract
From a real-time perspective, the garbage collector (GC) introduces unpredictable pauses that are not tolerated by real-time tasks. Real-time collectors eliminate this problem but introduce a high overhead. Another approach is to use memory regions (MRs) within which allocation and deallocation is customized. This facility is supported by the memory model of the Real-Time Specification for Java (RTSJ). RTSJ imposes strict access and assignment rules to avoid both the dangling inter-region references and the delays of critical tasks of the GC. A dynamic check solution can incur high overhead, which can be reduced by taking advantage of hardware features. This paper provides an in-depth analytical investigation of the overhead introduced by dynamic assignments checks in RTSJ, describing and analysing several solutions to reduce the introduced overhead. Copyright ? 2005 John Wiley © Sons, Ltd.

Article ID: CPE847

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Elimination of Java array bounds checks in the presence of indirection
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.847
Article ID CPE847
Author Name(s) Mikel Luj?n1John R. Gurd2T. L. Freeman3Jos? Miguel4
Author Email(s) jagun@cs.york.ac.uk1
Affiliation(s) Centre for Novel Computing, University of Manchester, Oxford Road, Manchester M13 9PL, U.K. 1 2 3 Departamento de Arquitectura y Tecnolog?a de Computadores, UPV-EHU, P.O. Box 649, 20080 Donostia-San Sebastian, Spain 4
Keyword(s) array bounds check, Java, array indirection,
Abstract
The Java language specification states that every access to an array needs to be within the bounds of that array, i.e. between 0 and array length - 1. Different techniques for different programming languages have been proposed to eliminate explicit bounds checks. Some of these techniques are implemented in off-the-shelf Java Virtual Machines (JVMs). The underlying principle of these techniques is that bounds checks can be removed when a JVM/compiler has enough information to guarantee that a sequence of accesses (e.g. inside a for-loop) is safe (within the bounds). Most of the techniques for the elimination of array bounds checks have been developed for programming languages that do not support multi-threading and/or enable dynamic class loading. These two characteristics make most of these techniques unsuitable for Java. Techniques developed specifically for Java have not addressed the elimination of array bounds checks in the presence of indirection; that is, when the index is stored in another array (indirection array). With the objective of optimizing applications with array indirection, this paper proposes and evaluates three implementation strategies, each implemented as a Java class. The classes provide the functionality of Java arrays of type int so that objects of the classes can be used instead of indirection arrays. Each strategy enables JVMs, when examining only one of these classes at a time, to obtain enough information to remove array bounds checks.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE848

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Run-time evaluation of opportunities for object inlining in Java
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.848
Article ID CPE848
Author Name(s) Ondrej Lhot?k1Laurie Hendren2
Author Email(s) olhotak@sable.mcgill.ca1
Affiliation(s) Sable Research Group, School of Computer Science, McGill University, Montreal, Canada 1 2
Keyword(s) object inlining, Java, compilers, optimization,
Abstract
Object-oriented languages, such as Java, encourage the use of many small objects linked together by field references, instead of a few monolithic structures. While this practice is beneficial from a program design perspective, it can slow down program execution by incurring many pointer indirections. One solution to this problem is object inlining: when the compiler can safely do so, it fuses small objects together, thus removing the reads/writes to the removed field, saving the memory needed to store the field and object header, and reducing the number of object allocations. The objective of this paper is to measure the potential for object inlining by studying the run-time behaviour of a comprehensive set of Java programs. We study the traces of program executions in order to determine which fields behave like inlinable fields. Since we are using dynamic information instead of a static analysis, our results give an upper bound on what could be achieved via a static compiler-based approach. Our experimental results measure the potential improvements attainable with object inlining, including reductions in the numbers of field reads and writes, and reduced memory usage. Our study shows that some Java programs can benefit significantly from object inlining, with close to a 10% speedup. Somewhat to our surprise, our study found one case, the db benchmark, where the most important inlinable field was the result of unusual program design, and fixing this small flaw led to both better performance and clearer program design. However, the opportunities for object inlining are highly dependent on the individual program being considered, and are in many cases very limited. Furthermore, fields that are inlinable also have properties that make them potential candidates for other optimizations such as removing redundant memory accesses. The memory savings possible through object inlining are moderate. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE849

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Jeeg: temporal constraints for the synchronization of concurrent objects
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.849
Article ID CPE849
Author Name(s) Giuseppe Milicia1Vladimiro Sassone2
Author Email(s) milicia@brics.dk1
Affiliation(s) Basic Research in Computer Science (BRICS), University of Aarhus, IT-parken Aabogade 34, DK-8200 Aarhus, Denmark 1Informatics, University of Sussex, Brighton BN1 9QH, U.K. 2
Keyword(s) Java, inheritance anomaly, temporal logic,
Abstract
We introduce Jeeg, a dialect of Java based on a declarative replacement of the synchronization mechanisms of Java that results in a complete decoupling of the ‘business’ and the ‘synchronization’ code of classes. Synchronization constraints in Jeeg are expressed in a linear temporal logic, which allows one to effectively limit the occurrence of the inheritance anomaly that commonly affects concurrent object-oriented languages. Jeeg is inspired by the current trend in aspect-oriented languages. In a Jeeg program the sequential and concurrent aspects of object behaviors are decoupled: specified separately by the programmer, these are then weaved together by the Jeeg compiler. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE850

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Compiling almost-whole Java programs
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.850
Article ID CPE850
Author Name(s) Zoran Budimlic1Ken Kennedy2
Author Email(s) zoran@cs.rice.edu1
Affiliation(s) Center for High Performance Software Research, Rice University, Houston, TX 77005, U.S.A. 1 2
Keyword(s) whole-program complition, almost-whole-program compilation, Java, optimization,
Abstract
This paper presents a strategy, called almost-whole-program compilation, for extending the benefits of whole-program optimization to large collections of Java components that are packaged as a group after the development phase. This strategy has been implemented in a framework that uses Java visibility and scoping rules to transform a collection of classes into a package that is amenable to whole-program optimizations, without precluding extensions to the optimized and compiled code. Thus, it enables the Java developer to balance performance against flexibility of the program after the development phase, without compromising the design process. The transformation is shown to incur only modest performance penalties, which are more than compensated for by the interprocedural optimizations it enables. The paper concludes with experimental results showing the benefits that can be achieved using this approach. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE851

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Recurrence analysis for effective array prefetching in Java
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.851
Article ID CPE851
Author Name(s) Brendon Cahoon1Kathryn S. McKinley2
Author Email(s) mckinley@cs.utexas.edu2
Affiliation(s) Department of Computer Sciences, University of Texas at Austin, Austin, TX 78757, U.S.A. 1 2
Keyword(s) memory optimization, array prefetching, static analysis,
Abstract
Java is an attractive choice for numerical, as well as other, algorithms due to the software engineering benefits of object-oriented programming. Because numerical programs often use large arrays that do not fit in the cache, they suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work used data-flow analysis to discover linked data structure accesses. We generalize our prior approach to identify loop induction variables as well, which we call recurrence analysis. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all of our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Because our analysis is much simpler and quicker than previous techniques, it is suitable for including in a just-in-time compiler. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. We further show that the additional loop transformations and careful scheduling of prefetches from previous work are not always necessary for modern architectures and Java programs. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE852

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title The Open Runtime Platform: a flexible high-performance managed runtime environment An earlier version of this paper was published online 1
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.852
Article ID CPE852
Author Name(s) Michal Cierniak1Marsha Eng2Neal Glew3Brian Lewis4James Stichnoth5
Author Email(s) michaljc@microsoft.com1
Affiliation(s) Microsoft Corporation, 1 Microsoft Way, Redmond, WA 98052, U.S.A. 1Microprocessor Technology Laboratory, Intel Corporation, 2200 Mission College Boulevard, Santa Clara, CA, U.S.A. 2 3 4 5
Keyword(s) MRTE, Java, CLI, virtual machine, interface design,
Abstract
The Open Runtime Platform (ORP) is a high-performance managed runtime environment (MRTE) that features exact generational garbage collection, fast thread synchronization, and multiple coexisting just-in-time compilers (JITs). ORP was designed for flexibility in order to support experiments in dynamic compilation, garbage collection, synchronization, and other technologies. It can be built to run either Java or Common Language Infrastructure (CLI) applications, to run under the Windows or Linux operating systems, and to run on the IA-32 or Itanium processor family (IPF) architectures. Achieving high performance in a MRTE presents many challenges, particularly when flexibility is a major goal. First, to enable the use of different garbage collectors and JITs, each component must be isolated from the rest of the environment through a well-defined software interface. Without careful attention, this isolation could easily harm performance. Second, MRTEs have correctness and safety requirements that traditional languages such as C++ lack. These requirements, including null pointer checks, array bounds checks, and type checks, impose additional runtime overhead. Finally, the dynamic nature of MRTEs makes some traditional compiler optimizations, such as devirtualization of method calls, more difficult to implement or more limited in applicability. To get full performance, JITs and the core virtual machine (VM) must cooperate to reduce or eliminate (where possible) these MRTE-specific overheads. In this paper, we describe the structure of ORP in detail, paying particular attention to how it supports flexibility while preserving high performance. We describe the interfaces between the garbage collector, the JIT, and the core VM; how these interfaces enable multiple garbage collectors and JITs without sacrificing performance; and how they allow the JIT and the core VM to reduce or eliminate MRTE-specific performance issues. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE853

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Immutability specification and its applications
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.853
Article ID CPE853
Author Name(s) Igor Pechtchanski1Vivek Sarkar2
Author Email(s) igor@watson.ibm.com1
Affiliation(s) BM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, U.S.A 1 2
Keyword(s) immutability, annotations, optimization,
Abstract
A location is said to be immutable if its value and the values of selected locations reachable from it are guaranteed to remain unchanged during a specified time interval. We introduce a framework for immutability specification, and discuss its application to code optimization. Compared with a final declaration, an immutability assertion in our framework can express a richer set of immutability properties along three dimensions - lifetime, reachability and context. We present a framework for processing and verifying immutability annotations in Java, as well as extending optimizations so as to exploit immutability information. Preliminary experimental results show that a significant number (61%) of read accesses could potentially be classified as immutable in our framework. Further, use of immutability information yields substantial reductions (33-99%) in the number of dynamic read accesses, and also measurable speedups in the range of 5-10% for certain benchmark programs.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE858

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Experience in integrating Java with C# and .NET
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.858
Article ID CPE858
Author Name(s) Judith Bishop1R. Nigel Horspool2Basil Worrall3
Author Email(s) jbishop@cs.up.ac.za1
Affiliation(s) Computer Science Department, University of Pretoria, Pretoria 0002, South Africa 1Computer Science Department, University of Victoria, Victoria, BC, Canada V8W 3P6 2 3
Keyword(s) integration, Java, C#, XML, native code, Web services, GUI,
Abstract
Java programmers cannot help but be aware of the advent of C#, the .NET network environment, and a host of new supporting technologies, such as Web services. Before taking the big step of moving all development to a new environment, programmers will want to know what are the advantages of C# as a language over Java, and whether the new and interesting features of C# and .NET can be incorporated into existing Java software. This paper surveys the advantages of C# and then presents and evaluates experience with connecting it to Java in a variety of ways. The first way provides evidence that Java can be linked to C# at the native code level, albeit through C++ wrappers. The second is a means for retaining the useful applet feature of Java in the server-side architecture of Web services written in C#. The third is by providing a common XML-based class for the development of graphical user interfaces (GUIs), which can be incorporated into Java or C#. An added advantage of this system, called Views, is that it can run independently of the resource-intensive development environment that would otherwise be needed for using C#. A major advantage of the methods described in this paper is that in all cases the Java program is not affected by the fact that it is interfacing with C#. The paper concludes that there are many common shared technologies that bring Java and C# close together, and that innovative ways of using others can open up opportunities not hitherto imagined. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE933

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: ACM 2002 Java Grande-ISCOPE Conference
Volume ID 17
Issue ID 5-6
Date 04 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.933
Article ID CPE933
Author Name(s) Geoffrey Fox1
Author Email(s) gcf@indiana.edu1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE838

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Support and optimization of Java RMI over a Bluetooth environment
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.838
Article ID CPE838
Author Name(s) Pu-Chen Wei1Chung-Hsin Chen2Cheng-Wei Chen3Jenq-Kuen Lee4
Author Email(s) jklee@pllab.cs.nthu.edu.tw1
Affiliation(s) Department of Computer Science, National Tsing Hua University, HsinChu 30055, Taiwan 1 2 3 4
Keyword(s) distributed object-oriented computing, wireless computing, Java RMI, Bluetooth architectures, collaborative computing,
Abstract
Distributed object-oriented platforms are increasingly important over wireless environments for providing frameworks for collaborative computations and for managing a large pool of distributed resources. Due to limited bandwidths and heterogeneous architectures of wireless devices, studies are needed into supporting object-oriented frameworks over heterogeneous wireless environments and optimizing system performance. In our research work, we are working towards efficiently supporting object-oriented environments over heterogeneous wireless environments. In this paper, we report the issues and our research results related to the efficient support of Java RMI over a Bluetooth environment. In our work, we first implement support for Java RMI over Bluetooth protocol stacks, by incorporating a set of protocol stack layers for Bluetooth developed by us (which we call JavaBT) and by supporting the L2CAP layer with sockets that support the RMI socket. In addition, we model the cost for the access patterns of Java RMI communications. This cost model is used to guide the formation and optimizations of the scatternets of a Java RMI Bluetooth environment. In our approach, we employ the well-known BTCP algorithm to observe initial configurations for the number of piconets. Using the communication-access cost as a criterion, we then employ a spectral-bisection method to cluster the nodes in a piconet and then use a bipartite matching scheme to form the scatternet. Experimental results with the prototypes of Java RMI support over a Bluetooth environment show that our scatternet-formation algorithm incorporating an access-cost model can further optimize the performances of such as system.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE839

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title CSFS: a Java enabled network file storage system
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.839
Article ID CPE839
Author Name(s) Han Hua1Dai Yafei2Li Xiaoming3
Author Email(s) hh@net.pku.edu.cn1
Affiliation(s) Computer Networks and Distributed Systems Laboratory, Department of Computer Science and Technology, Peking University, Beijing 100871, People's Republic of China 1 2 3
Keyword(s) distributed file system, network file storage, namespace,
Abstract
The CSFS (cryptographic storage file system) is a network file storage system that is suitable for a small-to-medium sized networks such as a campus network. In a CSFS, a lot of distributed file servers are organized in a star-liked architecture. File names are stored in a name server and file data are stored in distributed file servers. A CSFS has good scalability and is able to accommodate hundreds of file servers. We implemented a CSFS in pure Java and tested the system with a benchmark. The test results show that CSFS delivers acceptable performance for general file operations. Its performance of file upload and download is as efficient as FTP. It can support more than 450 concurrent online users.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE840

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title JavaSymphony: a new programming paradigm to control and synchronize locality, parallelism and load balancing for parallel and distributed computing
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.840
Article ID CPE840
Author Name(s) Thomas Fahringer1Alexandru Jugravu2
Author Email(s) thomas.fahringer@uibk.ac.at1
Affiliation(s) Institute for Software Science, University ofVienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria 1 2
Keyword(s) distributed Java, parallel and concurrent computing, locality, load balancing, parallelism,
Abstract
There has been an increasing research interest in extending the use of Java towards performance-oriented programming for distributed and concurrent applications. JavaSymphony is a Java-based programming paradigm that allows the programmer to control parallelism, load balancing, and locality at a high level of abstraction. Objects can be explicitly distributed and migrated based on a high-level API to static/dynamic system parameters and dynamic virtual distributed architectures, which impose a virtual hierarchy on a distributed system of physical computing nodes. In this paper we describe various extensions to the original JavaSymphony API, which includes a generalization of virtual architectures that can be used to specify and to request arbitrary heterogeneous distributed and concurrent architectures inside of a JavaSymphony program. The number of threads that execute an object's methods can be controlled dynamically through single- and multi-threaded objects. Conventional Java objects can be dynamically converted to JavaSymphony objects. A (un)lock mechanism has been introduced in order to avoid inconsistent modifications of objects or virtual architectures. A sophisticated event mechanism for asynchronous communication, coordination, and interaction is provided. Several synchronization constructs including distributed barrier synchronization and synchronization for asynchronous method invocations have been included. Several experiments are presented to demonstrate the effectiveness and efficiency of JavaSymphony.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE841

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Enterprise JavaBeans caching in clustered environments
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.841
Article ID CPE841
Author Name(s) Avraham Leff1James T. Rayfield2
Author Email(s) avraham@us.ibm.com1
Affiliation(s) IBM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, U.S.A. 1 2
Keyword(s) Enterprise JavaBeans (EJBs), EJB caching, performance analysis, transactional caching, database offload,
Abstract
We present the design of an Enterprise JavaBeans (EJBs) caching architecture, and show that EJB caching can greatly improve application throughput in clustered environments. Throughput is improved because data serving is offloaded from the database server to the cache-enabled application server. EJB caching is successful because it exploits the low-latency connections between the database server and application servers that exist in a clustered environment. An important feature of our architecture is that the caching function is transparent to applications that use it. The cache-enabled application server uses the same (EJB) programming model, and the same transactional semantics, as provided by non-caching architectures. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE842

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Object-oriented programming in peer-to-peer system
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.842
Article ID CPE842
Author Name(s) Patrick Th. Eugster1Sebastien Baehni2
Author Email(s) patrick.eugster@epfl.ch1
Affiliation(s) Distributed Programming Laboratory, Swiss Federal Institute of Technology, Lausanne, Switzerland 1 2
Keyword(s) borrow/lend, peer-to-peer, abstraction, type, Java,
Abstract
Leveraged by the success of applications aiming at the free sharing of data in the Internet, the paradigm of peer-to-peer (P2P) computing has had substantial consideration devoted to it recently. This paper presents a high-level abstraction for remote object interaction in a P2P environment, called borrow/lend (BL). We present the principles underlying our BL abstraction, and illustrate how this abstraction can be used to program P2P applications in Java. We contrast our abstraction with established abstractions for distributed programming such as the remote method invocation or the tuple space, illustrating how the BL abstraction, obviously influenced by such previous abstractions, unifies flavors of these, but also how it captures the constraints specific to P2P environments.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE844

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Features from functional programming for a C++ skeleton library
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.844
Article ID CPE844
Author Name(s) J. Striegnitz1H. Kuchen2
Author Email(s) j.striegnitz@fz-juelich.de1
Affiliation(s) Research Center J?lich, Central Institute for Applied Mathematics, 52425 J?lich, Germany 1University of M?nster, Department of Information Systems, Leonardo Campus 3, 48149 M?nster, Germany 2
Keyword(s) algorithmic skeletons, C++,
Abstract
Message passing based on libraries such as MPI is typically used to program parallel machines with distributed memory. This is efficient, but error prone. Algorithmic skeletons are intended to simplify parallel programming by increasing expressive power. The idea is to offer typical parallel programming patterns as polymorphic higher-order functions which are efficiently implemented in parallel. The present paper describes how C++ templates and operator overloading can be used in order to provide the main features needed for algorithmic skeletons, namely higher-order functions, partial applications and parametric polymorphism. Experimental results based on a draft implementation of our C++ skeleton library show that higher expressive power can be gained without a significant performance penalty.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE846

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A method-level comparison of the Java Grande and SPEC JVM98 benchmark suites
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.846
Article ID CPE846
Author Name(s) David Gregg1James Power2John Waldron3
Author Email(s) james.power@may.ie2
Affiliation(s) Department of Computer Science, Trinity College, Dublin 2, Ireland 1Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland 2 3
Keyword(s) benchmark suites, Java Virtual Machine, dynamic profiling,
Abstract
In this paper we seek to provide a foundation for the study of the level of use of object-oriented techniques in Java programs in general, and scientific applications in particular. Specifically, we investigate the profiles of Java programs from a number of perspectives, including the use of class library methods, the size of methods called, the mode of invoke instruction used and the polymorphicity of call sites. We also present a categorization of the nature of small methods used in Java programs. We compare the Java Grande and SPEC JVM98 benchmark suites, and note a significant difference in the nature and composition of these suites, with the programs from the Java Grande suite demonstrating a less object-oriented approach. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE854

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title JOPI: a Java object-passing interface
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.854
Article ID CPE854
Author Name(s) Jameela Al-Jaroodi1Nader Mohamed2Hong Jiang3David Swanson4
Author Email(s) nmohamed@stevens.edu2
Affiliation(s) Department of Electrical and Computer Engineering, Stevens Institute of Technology, Burchard 216, Castle Point on Hudson, Hoboken, NJ 07030, U.S.A. 1 2 Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115, U.S.A. 3 4
Keyword(s) Java, heterogeneous systems, cluster, parallel programming, object-passing, object-oriented systems,
Abstract
Recently there has been an increasing interest in developing parallel programming capabilities in Java to harness the vast resources available in clusters, grids and heterogeneous networked systems. In this paper, we introduce a Java object-passing interface (JOPI) library. JOPI provides Java programmers with the necessary functionality to write object-passing parallel programs in distributed heterogeneous systems. JOPI provides a Message Passing Interface (MPI)-like interface that can be used to exchange objects among processes. In addition to the well-known benefits of the object-oriented development model, using objects to exchange information in JOPI is advantageous because it facilitates passing complex structures and enables the programmer to isolate the problem space from the parallelization problem. The run-time environment for JOPI is portable, efficient and provides the necessary functionality to deploy and execute parallel Java programs. Experiments were conducted on a cluster system and a collection of heterogeneous platforms to measure JOPI's performance and compare it with MPI. The results show good performance gains using JOPI. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE855

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Advanced eager scheduling for Java-based adaptive parallel computing
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.855
Article ID CPE855
Author Name(s) Michael O. Neary1Peter Cappello2
Author Email(s) cappello@cs.ucsb.edu2
Affiliation(s) Department of Computer Science, University of California, Santa Barbara, CA 93106, U.S.A. 1 2
Keyword(s) concurrent programming, branch-and-bound, eager scheduling, fault tolerance, Grid computing, parallel computing,
Abstract
Javelin 3 is a software system for developing large-scale, fault-tolerant, adaptively parallel applications. When all or part of their application can be cast as a master-worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault-tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE856

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Implementation and performance of a particle-in-cell code written in Java
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.856
Article ID CPE856
Author Name(s) G. Lapenta1S. Markidis2W. B. VanderHeyden3Z. Budimlic4
Author Email(s) lapenta@lanl.gov1
Affiliation(s) Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A. 1 2 3 4
Keyword(s) PIC method, object-oriented programming, high-performance computing, Java,
Abstract
Plasma simulation is an important example of a high-performance computing application where computer science issues are of great relevance. In a plasma, each particle, electron or ion, interacts with the external fields and with other particles in ways that can be readily and effectively emulated using object-oriented programming. However, the great cost of plasma simulations has traditionally discouraged object-oriented implementations due to their perceived inferior performance compared with classic procedural FORTRAN or C. In the present paper, we revisit this issue. We have developed a Java particle-in-cell code for plasma simulation, called Parsek. The paper considers different choices for the object orientation and tests their performance. We find that coarse-grained object orientation is faster and practically immune from any degradation compared with a standard procedural implementation (with static classes). The loss in performance for a fine-grained object orientation is a factor of about 50%, which can be almost completely eliminated using advanced Java compilation techniques. The Java code Parsek also provides an interesting realistic application of high-performance computing to compare the performance of Java with FORTRAN. We have conducted a series of tests considering various Java implementations and various FORTRAN implementations. We have also considered different computer architectures and different Java Virtual Machines and FORTRAN compilers. The conclusion is that with Parsek, object-oriented Java can reach CPU speed performances more or less comparable with procedural FORTRAN. This conclusion is remarkable and it is in agreement with the most recent benchmarks, but is at variance with widely held misconceptions about the alleged slowness of Java.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE857

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title On the conditions necessary for removing abstraction penalties in OOLALA
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.857
Article ID CPE857
Author Name(s) Mikel Luján1T. L. Freeman2John R. Gurd3
Author Email(s) mlujan@cs.man.ac.uk1
Affiliation(s) Centre for Novel Computing, University of Manchester, Oxford Road, Manchester M13 9PL, U.K. 1 2 3
Keyword(s) object-oriented programming, numerical linear algebra, Java, abstraction penalty,
Abstract
OOLALA is an object-oriented linear algebra library designed to reduce the effort of software development and maintenance. In contrast with traditional (Fortran-based) libraries, it provides two high abstraction levels that significantly reduce the number of implementations necessary for particular linear algebra operations. Initial performance evaluations of a Java implementation of OOLALA show that the two high abstraction levels are not competitive with the low abstraction level of traditional libraries. These initial performance results motivate the present contribution - the characterization of a set of storage formats (data structures) and matrix properties (special features) for which implementations at the two high abstraction levels can be transformed into implementations at the low (more efficient) abstraction level.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE859

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Collective communication for the HPJava programming language
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.859
Article ID CPE859
Author Name(s) Han-ku Lee1Sang Boem Lim2Bryan Carpenter3Geoffrey Fox4
Author Email(s) hlee@konkuk.ac.kr1
Affiliation(s) School of Internet and Multimedia Engineering, Konkuk University, Seoul 143-701, Korea 1Pervasive Technology Labs, Indiana University, Bloomington, IN 47404, U.S.A. 2 3 4
Keyword(s) HPspmd programming model, HPJava, Adlib, Java,
Abstract
This paper addresses functionality and implementation of a HPJava version of the Adlib collective communication library for data parallel programming. We begin by illustrating typical use of the library, through an example multigrid application. Then we describe implementation issues for the high-level library. At a software engineering level, we illustrate how the primitives of the HPJava language assist in writing library methods whose implementation can be largely independent of the distribution format of the argument arrays. We also describe a low-level API called mpjdev, which handles basic communication underlying the Adlib implementation. Finally we present some benchmark results, and some conclusions.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE860

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Ibis: a flexible and efficient Java-based Grid programming environment
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.860
Article ID CPE860
Author Name(s) Rob V. van Nieuwpoort1Jason Maassen2Gosia Wrzesinska3Rutger F. H Hofman4Ceriel J. H. Jacobs5Thilo Kielmann6Henri E. Bal7
Author Email(s) rob@cs.vu.nl1
Affiliation(s) aculty of Sciences, Department of Computer Science, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands 1 2 3 4 5 6 7
Keyword(s) Grid computing, Java, portability, performance,
Abstract
In computational Grids, performance-hungry applications need to simultaneously tap the computational power of multiple, dynamically available sites. The crux of designing Grid programming environments stems exactly from the dynamic availability of compute cycles: Grid programming environments (a) need to be portable to run on as many sites as possible, (b) they need to be flexible to cope with different network protocols and dynamically changing groups of compute nodes, while (c) they need to provide efficient (local) communication that enables high-performance computing in the first place. Existing programming environments are either portable (Java), or flexible (Jini, Java Remote Method Invocation or (RMI)), or they are highly efficient (Message Passing Interface). No system combines all three properties that are necessary for Grid computing. In this paper, we present Ibis, a new programming environment that combines Java's run everywhere portability both with flexible treatment of dynamically available networks and processor pools, and with highly efficient, object-based communication. Ibis can transfer Java objects very efficiently by combining streaming object serialization with a zero-copy protocol. Using RMI as a simple test case, we show that Ibis outperforms existing RMI implementations, achieving up to nine times higher throughputs with trees of objects.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE861

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Using MPI with C# and the Common Language Infrastructure
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.861
Article ID CPE861
Author Name(s) Jeremiah Willcock1Andrew Lumsdaine2Arch Robison3
Author Email(s) lums@osl.iu.edu2
Affiliation(s) Indiana University Open Systems Laboratory, Bloomington, IN 47405, U.S.A. 1 2 KAI Software Laboratory, Intel Corporation, 1906 Fox Drive, Champaign, IL 61820, U.S.A. 3
Keyword(s) Common Language Infrastructure (CLI), .NET, C#, Message Passing Interface (MPI),
Abstract
We describe two different libraries for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first library provides C# bindings that closely match the original MPI library specification. The second library presents a fully object-oriented interface to MPI and exploits modern language features of C#. The interfaces described here use the P/Invoke feature of the CLI to dispatch to a native implementation of MPI, such as LAM/MPI or MPICH. Performance results using the Shared Source CLI demonstrate only a small performance overhead. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE862

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title inAspect: interfacing Java and VSIPL applications
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.862
Article ID CPE862
Author Name(s) Torey Alford1Vijay P. Shah2Anthony Skjellum3Nicholas H. Younan4
Author Email(s) tony@mpi-softtech.com1
Affiliation(s) MPI Software Technology, Inc., 101 S. Lafayette St., Suite 33, Starkville, MS 39759, U.S.A. 1Department of Electrical and Computer Engineering, Mississippi State University, P.O. Box 9571, Mississippi State, MS 39762, U.S.A. 2 3 Department of Electrical and Computer Engineering, Mississippi State University, P.O. Box 9571, Mississippi State, MS 39762, U.S.A 4
Keyword(s) high-performance signal processing, embedded Java, fixed-point arithmetic, Java Native Interface, VSIPL, CORDIC algorithms,
Abstract
In this paper, we discuss the origin, design, performance, and directions of the inAspect high-performance signal- and image-processing package for Java. The Vector Signal and Image Processing Library (VSIPL) community provides a standardized application programmer interface (API) for high-performance signal and image processing plus linear algebra with a C emphasis and object-based design framework. Java programmers need high-performance and/or portable APIs for this broad base of functionality as well. inAspect addresses PDAs, embedded Java boards, workstations, and servers, with emphasis on embedded systems at present. Efforts include supporting integer precisions and utilizing coordinate rotation digital computer (CORDIC) algorithms - both aimed at added relevance for limited-performance environments, such as present-day PDAs.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE863

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Towards enabling peer-to-peer Grids
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.863
Article ID CPE863
Author Name(s) Shrideep Pallickara1Geoffrey Fox2Xi Rao3
Author Email(s) spallick@indiana.edu1
Affiliation(s) Community Grid Computing Laboratory, Indiana University, Suite 224, 501 North Morton Street, IN 47404, U.S.A. 1 2 3
Keyword(s) event distribution systems, middleware, P2P systems, JXTA, Grid computing,
Abstract
In this paper we propose a peer-to-peer (P2P) Grid comprising resources such as relatively static clients, high-end resources and a dynamic collection of multiple P2P subsystems. We investigate the architecture of the messaging and event service that will support such a hybrid environment. We designed a distributed publish-subscribe system NaradaBrokering for XML-specified messages. NaradaBrokering provides support for centralized, distributed and P2P (via JXTA) interactions. Here we investigate and present our strategy for the integration of JXTA into NaradaBrokering. The resultant system naturally scales with multiple Peer Groups linked by NaradaBrokering.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE864

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Generic programming for high-performance scientific applications
Volume ID 17
Issue ID 7-8
Date 06 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.864
Article ID CPE864
Author Name(s) Lie-Quan Lee1Andrew Lumsdaine2
Author Email(s) lums@osl.iu.edu2
Affiliation(s) Open Systems Laboratory, Pervasive Technology Laboratories, Indiana University, Bloomington, IN 47405, U.S.A. 1 2
Keyword(s) C++, generic programming, high-performance computing, iterative solvers, Krylov subspace, message passing, templates,
Abstract
We present case studies that apply generic programming to the development of high-performance parallel code for solving two archetypal partial differential equations (PDEs). We examine the overall structure of the example scientific codes and consider their generic implementation. With a generic approach it is a straightforward matter to reuse software components from different sources; implementations with components from the Iterative Template Library (ITL), the Matrix Template Library (MTL), Blitz++, A++/P++, and Fortran BLAS are presented. Our newly developed Generic Message Passing library is used for communication. We compare the generic implementations with equivalent implementations developed with alternative libraries and languages and discuss performance as well as software engineering issues. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE885

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Concurrent urban legends
Volume ID 17
Issue ID 9
Date 08 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.885
Article ID CPE885
Author Name(s) Peter A. Buhr1Ashif S. Harji2
Author Email(s) pabuhr@uwaterloo.ca1
Affiliation(s) University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1 1 2
Keyword(s) concurrent, parallel, coroutine, synchronization, mutual exclusion, Dekker, Peterson, inheritance anomaly, signalling, spurious wakeup,
Abstract
This discussion addresses a number of urban legends about concurrency in an attempt to separate the myth from the fact. These legends are as follows: 1 concurrent = parallel; 2 coroutining = concurrency; 3 synchronization = mutual exclusion; 4 Dekker < Peterson; 5 concurrency = library; 6 inheritance anomaly = major concurrency problem; 7 signalling = hints; 8 spurious wakeup = efficiency. Identifying and understanding the fundamental concepts underlying concurrency is essential to the field. Equally important is not to confuse sequential and concurrent concepts. Finally, approaches based solely on efficiency are insufficient to justify a weak or difficult to use concurrent concept or construct.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE886

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Pipelines on heterogeneous systems: models and tools
Volume ID 17
Issue ID 9
Date 08 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.886
Article ID CPE886
Author Name(s) F. Almeida1D. Gonzalez2M. Moreno3C. Rodriguez4
Author Email(s) falmeida@ull.es1
Affiliation(s) Dpto. Estad?stica, I. O. y Computaci?n, Edificio F?sicas/Matem?ticas, Universidad de La Laguna, La Laguna, Tenerife, Spain 1 2 3 4
Keyword(s) heterogeneous systems, pipelines, analytical models,
Abstract
We study the performance of pipeline algorithms in heterogeneous networks. The concept of heterogeneity is not only restricted to the differences in computational power of the nodes, but also refers to the network capabilities. We develop a skeleton tool that allows us an efficient block-cyclic mapping of pipelines on heterogeneous systems. The tool supports pipelines with a number of stages much larger than the number of physical processors available. We derive an analytical formula that allows us to predict the performance of pipelines in heterogeneous systems. According to the analytical complexity formula, numerical strategies to solve the optimal mapping problem are proposed. The computational results prove the accuracy of the predictions and effectiveness of the approach.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE901

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Distributed computing with Triana on the Grid
Volume ID 17
Issue ID 9
Date 08 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.901
Article ID CPE901
Author Name(s) Ian Taylor1Ian Wang2Matthew Shields3Shalil Majithia4
Author Email(s) i.j.taylor@cs.cardiff.ac.uk1
Affiliation(s) School of Computer Science, Cardiff University, P.O. Box 916, Cardiff CF24 3XF, U.K. 1School of Physics, Astronomy and Computer Science, Queens Buildings, Cardiff University, 5 The Parade, Cardiff CF24 3YB, U.K. 2 3 4
Keyword(s) Triana, distributed systems, Grid, workflow, peer-to-peer,
Abstract
In this paper, we describe Triana, a distributed problem-solving environment that makes use of the Grid to enable a user to compose applications from a set of components, select resources on which the composed application can be distributed and then execute the application on those resources. We describe Triana's current pluggable architecture that can support many different modes of operation by the use of flexible writers for many popular Web service choreography languages. We further show, that the Triana architecture is middleware-independent through the use of the Grid Application Toolkit (GAT) API and demonstrate this through the use of a GAT binding to JXTA. We describe how other bindings being developed to Grid infrastructures, such as OGSA, can seamlessly be integrated within the current prototype by using the switching capability of the GAT. Finally, we outline an experiment we conducted using this prototype and discuss its current status.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE890

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: The High Performance Architectural Challenge: Mass Market versus Proprietary Components? This article is a U.S. Government work and is in the public domain in the U.S.A.
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.890
Article ID CPE890
Author Name(s) David B. Nelson1
Author Email(s) a@a.com1
Affiliation(s) National Coordination Office for Information Technology Research and Development, Arlington, VA 22230, U.S.A. 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE891

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title This article is a U.S. Government work and is in the public domain in the U.S.A.
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.891
Article ID CPE891
Author Name(s) Darren J. Kerbyson1Adolfy Hoisie2Harvey J. Wasserman3
Author Email(s) djk@lanl.gov1
Affiliation(s) Performance and Architectures Laboratory (PAL), Modeling, Algorithms and Informatics Group, CCS-3, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A. 1 2 3
Keyword(s) performance modeling, performance analysis, performance prediction, large-scale systems, application analysis, extreme-scale parallel systems,
Abstract
This work gives a detailed analysis of the relative performance of the recently installed Earth Simulator and the next top four systems in the Top500 list using predictive performance models. The Earth Simulator uses vector processing nodes interconnected using a single-stage, cross-bar network, whereas the next top four systems are built using commodity based superscalar microprocessors and interconnection networks. The performance that can be achieved results from an interplay of system characteristics, application requirements and scalability behavior. Detailed performance models are used here to predict the performance of two codes representative of the ASCI workload, namely SAGE and Sweep3D. The performance models encapsulate fully the behavior of these codes and have been previously validated on many large-scale systems. One result of this analysis is to size systems, built from the same nodes and networks as those in the top five, that will have the same performance as the Earth Simulator. In particular, the largest ASCI machine, ASCI Q, is expected to achieve a similar performance to the Earth Simulator on the representative workload. Published in 2005 by John Wiley & Sons, Ltd.

Article ID: CPE892

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title This article is a U.S. Government work and is in the public domain in the U.S.A.
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.892
Article ID CPE892
Author Name(s) Jeffrey S. Vetter1Bronis R. de Supinski2Lynn Kissel3John May4Sheila Vaidya5
Author Email(s) vetter@computer.org1
Affiliation(s) Oak Ridge National Laboratory, Oak Ridge, TN 37922, U.S.A. 1Lawrence Livermore National Laboratory, Livermore, CA 94551, U.S.A. 2 3 4 5
Keyword(s) high-performance computing, performance evaluation, computer architecture, parallel and distributed computing,
Abstract
Comparisons of high-performance computers based on their peak floating point performance are common but seldom useful when comparing performance on real workloads. Factors that influence sustained performance extend beyond a system's floating-point units, and real applications exercise machines in complex and diverse ways. Even when it is possible to compare systems based on their performance, other considerations affect which machine is best for a given organization. These include the cost, the facilities requirements (power, floorspace, etc.), the programming model, the existing code base, and so on. This paper describes some of the important measures for evaluating high-performance computers. We present data for many of these metrics based on our experience at Lawrence Livermore National Laboratory (LLNL), and we compare them with published information on the Earth Simulator. We argue that evaluating systems involves far more than comparing benchmarks and acquisition costs. We show that evaluating systems often involves complex choices among a variety of factors that influence the value of a supercomputer to an organization, and that the high-end computing community should view cost&sol;performance comparisons of different architectures with skepticism. Published in 2005 by John Wiley & Sons, Ltd.

Article ID: CPE893

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Architectural specification for massively parallel computers: an experience and measurement-based approach.
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.893
Article ID CPE893
Author Name(s) Ron Brightwell1William Camp2Benjamin Cole3Erik DeBenedictis4Robert Leland5James Tomkins6Arthur B. Maccabe7
Author Email(s) rbbrigh@sandia.gov1
Affiliation(s) Sandia National Laboratories, Scalable Computer Systems, P.O. Box 5800, Albuquerque, NM 87185-1110, U.S.A. 1 2 3 4 5 6 7
Keyword(s) massively parallel computing, supercomputing, commodity processors, vector processors, Amdahl, shared memory, distributed memory,
Abstract
In this paper, we describe the hardware and software architecture of the Red Storm system developed at Sandia National Laboratories. We discuss the evolution of this architecture and provide reasons for the different choices that have been made. We contrast our approach of leveraging high-volume, mass-market commodity processors to that taken for the Earth Simulator. We present a comparison of benchmarks and application performance that support our approach. We also project the performance of Red Storm and the Earth Simulator. This projection indicates that the Red Storm architecture is a much more cost-effective approach to massively parallel computing. Published in 2005 by John Wiley & Sons, Ltd.

Article ID: CPE894

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Practical performance portability in the Parallel Ocean Program (POP).This article is a U.S. Government work and is in the public domain in the U.S.A.
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.894
Article ID CPE894
Author Name(s) P. W. Jones1P. H. Worley2Y. Yoshida3J. B. White III4J. Levesque5
Author Email(s) pwjones@lanl.gov1
Affiliation(s) Theoretical Division, Los Alamos National Laboratory, T-3, MS B216, Los Alamos, NM 87545-1663, U.S.A. 1Computer Science and Mathematics Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831-6367, U.S.A. 2Central Research Institute of Electric Power Industry, 1646 Abiko Abiko-shi Chiba, 270-1194, Japan 3 4 Cray Inc., 411 First Avenue South, Suite 600, Seattle, WA 98104-2860, U.S.A. 5
Keyword(s) ocean models, portability, performance, vector, scalar,
Abstract
The design of the Parallel Ocean Program (POP) is described with an emphasis on portability. Performance of POP is presented on a wide variety of computational architectures, including vector architectures and commodity clusters. Analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability. A new design of the POP model, including a cache blocking and land point elimination scheme, is described with some preliminary performance results. Published in 2005 by John Wiley & Sons, Ltd.

Article ID: CPE895

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title HPCx: towards capability computing
Volume ID 17
Issue ID 10
Date 08 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.895
Article ID CPE895
Author Name(s) Mike Ashworth1Ian J. Bush2Martyn F. Guest3Andrew G. Sunderland4Stephen Booth5Joachim Hein6Lorna Smith7Kevin Stratford8Alessandro Curioni9
Author Email(s) m.ashworth@dl.ac.uk1
Affiliation(s) Computational Science and Engineering Department, CLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire WA4 4AD, U.K. 1 2 3 4 Edinburgh Parallel Computing Centre, University of Edinburgh, JCMB, The King's Buildings, Mayfield Road, Edinburgh EH9 3JZ, U.K. 5 6 7 8 IBM Research, Zurich Research Laboratory, 8803 Riuschlikon, Switzerland 9
Keyword(s) high-performance computing, capability computing, scalable algorithms, computational science, computational engineering,
Abstract
We introduce HPCx-the U.K.'s new National HPC Service-which aims to deliver a world-class service for capability computing to the U.K. scientific community. HPCx is targeting an environment that will both result in world-leading science and address the challenges involved in scaling existing codes to the capability levels required. Close working relationships with scientific consortia and user groups throughout the research process will be a central feature of the service. A significant number of key user applications have already been ported to the system. We present initial benchmark results from this process and discuss the optimization of the codes and the performance levels achieved on HPCx in comparison with other systems. We find a range of performance with some algorithms scaling far better than others. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE896

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: High-Performance Computing in Geosciences
Volume ID 17
Issue ID 11
Date 09 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.896
Article ID CPE896
Author Name(s)
Author Email(s)
Affiliation(s)
Keyword(s)
Abstract
No abstract

Article ID: CPE897

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Full waveform seismic inversion using a distributed system of computers
Volume ID 17
Issue ID 11
Date 09 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.897
Article ID CPE897
Author Name(s) Mrinal K. Sen1Indrajit G. Roy2Carlos Torres-Verdin3
Author Email(s) mrinal@ig.utexas.edu1
Affiliation(s) Institute for Geophysics, John A. and Katherine G. Jackson School of Geosciences, The University of Texas at Austin, Austin, TX 78759, U.S.A. 1 2 Department of Petroleum and Geo-systems Engineering, The University of Texas at Austin, Austin, TX 78759, U.S.A. 3
Keyword(s) seismic waveform, inversion, adaptive regularization, distributed systems,
Abstract
The aim of seismic waveform inversion is to estimate the elastic properties of the Earth's subsurface layers from recordings of seismic waveform data. This is usually accomplished by using constrained optimization often based on very simplistic assumptions. Full waveform inversion uses a more accurate wave propagation model but is extremely difficult to use for routine analysis and interpretation. This is because computational difficulties arise due to: (1) strong nonlinearity of the inverse problem; (2) extreme ill-posedness; and (3) large dimensions of data and model spaces. We show that some of these difficulties can be overcome by using: (1) an improved forward problem solver and efficient technique to generate sensitivity matrix; (2) an iteration adaptive regularized truncated Gauss-Newton technique; (3) an efficient technique for matrix-matrix and matrix-vector multiplication; and (4) a parallel programming implementation with a distributed system of processors. We use a message-passing interface in the parallel programming environment. We present inversion results for synthetic and field data, and a performance analysis of our parallel implementation. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE898

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A simulation and data analysis system for large-scale, data-driven oil reservoir simulation studies
Volume ID 17
Issue ID 11
Date 09 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.898
Article ID CPE898
Author Name(s) Tahsin Kurc1Umit Catalyurek2Xi Zhang3Joel Saltz4Ryan Martino5Mary Wheeler6Małgorzata Peszyńska7Alan Sussman8Christian Hansen9Mrinal Sen10Roustam Seifoullaev11Paul Stoffa12Carlos Torres-Verdin13Manish Parashar14
Author Email(s) kurc@bmi.osu.edu1
Affiliation(s) Biomedical Informatics Department, The Ohio State University, 3184 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, U.S.A. 1 2 3 4 Center for Subsurface Modeling, Institute for Computational Engineering and Sciences, The University of Texas at Austin, 201 East 24th Street, ACE 5.324, Campus Mail C0200, Austin, TX 78712, U.S.A. 5 6 Department of Mathematics, Kidder Hall 368, Oregon State University, Corvallis, OR 97331-4605, U.S.A. 7Department of Computer Science, University of Maryland, A.V. Williams Building, College Park, MD 20742, U.S.A. 8 9 Institute for Geophysics, The University of Texas at Austin, 4412 Spicewood Springs Road, Building 600, Austin, TX 78758-8500, U.S.A. 10 11 12 Department of Petroleum and Geosystems Engineering, The University of Texas at Austin, 1 University Station, CO300, Austin, TX 78712-0228, U.S.A. 13Department of Electrical and Computer Engineering, Rutgers, State University of New Jersey, 94 Brett Road, Piscataway, NJ 08854-8058, U.S.A. 14
Keyword(s) oil reservoir simulation, Grid computing, data intensive computing, data management,
Abstract
The main goal of oil reservoir management is to provide more efficient, cost-effective and environmentally safer production of oil from reservoirs. Numerical simulations can aid in the design and implementation of optimal production strategies. However, traditional simulation-based approaches to optimizing reservoir management are rapidly overwhelmed by data volume when large numbers of realizations are sought using detailed geologic descriptions. In this paper, we describe a software architecture to facilitate large-scale simulation studies, involving ensembles of long-running simulations and analysis of vast volumes of output data. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE899

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Enabling interactive and collaborative oil reservoir simulations on the Grid
Volume ID 17
Issue ID 11
Date 09 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.899
Article ID CPE899
Author Name(s) Manish Jack 4, Mary Wheeler Parashar1Rajeev Muralidhar2Wonsuck Lee3Dorian Arnold4Jack Dongarra5Mary Wheeler6
Author Email(s) arashar@caip.rutgers.edu1
Affiliation(s) The Applied Software Systems Laboratory, Rutgers University, Piscataway, NJ 08854, U.S.A. 1 2 Computing Science Research, Bell Laboratories, Murray Hill, NJ 07974, U.S.A. 3Computer Science Department, University of Wisconsin, Madison, WI 53706, U.S.A. 4Computer Science Department, University of Tennessee at Knoxville, Knoxville, TN 37996, U.S.A. 5Center for Subsurface Modeling, University of Texas at Austin, Austin, TX 78712, U.S.A. 6
Keyword(s) oil reservoir simulation, Grid-based computational collaboratory, interactive monitoring and steering, IPARS;, Discover, NetSolve,
Abstract
Grid-enabled infrastructures and problem-solving environments can significantly increase the scale, cost-effectiveness and utility of scientific simulations, enabling highly accurate simulations that provide in-depth insight into complex phenomena. This paper presents a prototype of such an environment, i.e. an interactive and collaborative problem-solving environment for the formulation, development, deployment and management of oil reservoir and environmental flow simulations in computational Grid environments. The project builds on three independent research efforts: (1) the IPARS oil reservoir and environmental flow simulation framework; (2) the NetSolve Grid engine; and (3) the Discover Grid-based computational collaboratory. Its primary objective is to demonstrate the advantages of an integrated simulation infrastructure towards effectively supporting scientific investigation on the Grid, and to investigate the components and capabilities of such an infrastructure. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE900

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Large-scale density-driven flow simulations using parallel unstructured Grid adaptation and local multigrid methods
Volume ID 17
Issue ID 11
Date 09 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.900
Article ID CPE900
Author Name(s) Stefan Lang1Gabriel Wittum2
Author Email(s) stefan.lang@iwr.uni-heidelberg.de1
Affiliation(s) Technical Simulation, Interdisciplinary Center for Scientific Computing (IWR), University of Heidelberg, INF 368 II, 69120 Heidelberg, Germany 1 2
Keyword(s) density-driven flow, parallel computation, multigrid methods, local Grid adaption, dynamic load migration and balancing,
Abstract
h h h

Article ID: CPE878

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Foundations of Middleware Technologies
Volume ID 17
Issue ID 12
Date 10 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.878
Article ID CPE878
Author Name(s) Priya Narasimhan1
Author Email(s)
Affiliation(s) Carnegie Mellon University 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE879

Publisher John Wiley Sons, Ltd. Chichester, UK
Category null
Article Title On the modelling of publish/subscribe communication systems
Volume ID 17
Issue ID 12
Date 10 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.879
Article ID CPE879
Author Name(s) R. Baldoni1R. Beraldi2S. Tucci Piergiovanni3A. Virgillito4
Author Email(s) baldoni@dis.uniroma1.it1
Affiliation(s) Dipartimento di Informatica e Sistemistica, Universit? di Roma La Sapienza, Via Salaria 113, 00198 Roma, Italy 1 2 3 4
Keyword(s) publish/subscribe, event-based middleware,
Abstract
This paper presents a formal framework of a distributed computation based on a publish/subscribe system. The framework abstracts the system through two delays, namely the subscription/unsubscription delay and the diffusion delay. This abstraction allows one to model concurrent execution of publication and subscription operations without waiting for the stability of the system state and to define a Liveness property which gives the conditions for the presence of a notification event in the global history of the system. This formal framework allows us to analytically define a measure of the effectiveness of a publish/subscribe system, which reflects the percentage of notifications guaranteed by the system to subscribers. A simulation study confirms the validity of the analytical measurements. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE880

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Distributed filter processes
Volume ID 17
Issue ID 12
Date 10 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.880
Article ID CPE880
Author Name(s) Rushikesh K. Joshi1
Author Email(s) rkj@cse.iitb.ac.in1
Affiliation(s) Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai 400 076, India 1
Keyword(s) distributed processes, dynamic evolution, filter processes, cooperative filters, transparency,
Abstract
A programming model called distributed filter processes for transparent filtering in distributed systems is presented. Filter processes are modeled as extensions to distributed processes. Messages can be captured transparently by intermediate filter processes. The programming abstractions are presented with applications to the evolution of programs.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE881

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Benchmarking message-oriented middleware: TIB/RV versus SonicMQ
Volume ID 17
Issue ID 12
Date 10 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.881
Article ID CPE881
Author Name(s) Piyush Maheshwari1Michael Pang2
Author Email(s) piyush@cse.unsw.edu.au1
Affiliation(s) School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia 1SAP Australia Pty. Ltd, 168 Walker Street, North Sydney, NSW 2061, Australia 2
Keyword(s) middleware, message-oriented middleware, benchmarking, performance evaluation, publish/subscribe messaging, point-to-point messaging,
Abstract
Message-oriented middleware (MOM) has become a vital part of the complex application integration projects. MOM is used to pass data and workflow in the form of messages between different enterprise applications. The performance of integrated applications greatly depends on how effectively the MOM performs. This paper presents a benchmark comparison between two industry well-known MOMs-TIBCO Rendezvous (TIB&sol;RV) and SonicMQ. Although the two MOMs are very similar in certain respects, their native implementation and architecture are very different. We provide an unbiased benchmark reference to the middleware selection process. The primary objective of our work is to evaluate and compare the MOMs by testing their effectiveness in the delivery of messages in publish&sol;subscribe and point-to-point message domains, their program stability and the system resource utilization. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE882

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title MEAD: support for Real-Time Fault-Tolerant CORBA
Volume ID 17
Issue ID 12
Date 10 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.882
Article ID CPE882
Author Name(s) P. Narasimhan1T. A Dumitra2A. M. Paulos3S. M. Pertet4C. F. Reverte5J. G Slember6D. Srivastava7
Author Email(s) priya@cs.cmu.edu1
Affiliation(s) Electrical and Computer Engineering Department, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213-3890, U.S.A. 1 2 3 4 5 6 7
Keyword(s) real-time, fault tolerance, trade-offs, non-determinism, CORBA, predictability, recovery,
Abstract
The OMG's Real-Time CORBA (RT-CORBA) and Fault-Tolerant CORBA (FT-CORBA) specifications make it possible for today's CORBA implementations to exhibit either real-time or fault tolerance in isolation. While real-time requires a priori knowledge of the system's temporal operation, fault tolerance necessarily deals with faults that occur unexpectedly, and with possibly unpredictable fault recovery times. The MEAD (Middleware for Embedded Adaptive Dependability) system attempts to identify and to reconcile the conflicts between real-time and fault tolerance, in a resource-aware manner, for distributed CORBA applications. MEAD supports transparent yet tunable fault tolerance in real-time, proactive dependability, resource-aware system adaptation to crash, communication and timing faults with bounded fault detection and fault recovery.Copyright ? 2005 John Wiley & Sons, Ltd.

Article ID: CPE889

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Lightweight monitoring of MPI programs in real time
Volume ID 17
Issue ID 13
Date 11 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.889
Article ID CPE889
Author Name(s) German Florez1Zhen Liu2Susan M. Bridges3Anthony Skjellum4Rayford B. Vaughn5
Author Email(s) gf24@cse.msstate.edu1
Affiliation(s) Center for Computer Security Research and High Performance Computer Laboratory, Department of Computer Science and Engineering, Mississippi State University, MS 39762-9637, U.S.A. 1 2 3 4 5
Keyword(s) anomaly detection, message-passing interface, cluster monitoring, interposition library, loadable kernel module, hidden Markov model, artificial neural networks,
Abstract
Current technologies allow efficient data collection by several sensors to determine an overall evaluation of the status of a cluster. However, no previous work of which we are aware analyzes the behavior of the parallel programs themselves in real time. In this paper, we perform a comparison of different artificial intelligence techniques that can be used to implement a lightweight monitoring and analysis system for parallel applications on a cluster of Linux workstations. We study the accuracy and performance of deterministic and stochastic algorithms when we observe the flow of both library-function and operating-system calls of parallel programs written with C and MPI. We demonstrate that monitoring of MPI programs can be achieved with high accuracy and in some cases with a false-positive rate near 0% in real time, and we show that the added computational load on each node is small. As an example, the monitoring of function calls using a hidden Markov model generates less than 5% overhead. The proposed system is able to automatically detect deviations of a process from its expected behavior in any node of the cluster, and thus it can be used as an anomaly detector, for performance monitoring to complement other systems or as a debugging tool. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE902

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title A static mapping heuristics to map parallel applications to heterogeneous computing systems
Volume ID 17
Issue ID 13
Date 11 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.902
Article ID CPE902
Author Name(s) Ranieri Baraglia1Renato Ferrini2Pierluigi Ritrovato3
Author Email(s) ranieri.baraglia@cnuce.cnr.it1
Affiliation(s) High Performance Computing Laboratory, Information Science and Technologies Institute (ISTI), Italian National Research Council (CNR), Via G. Moruzzi, 1, 56126 Pisa, Italy 1 2 Centro di Ricerca in Matematica Pura ed Applicata, Universitá degli Studi di Salerno, Fisciano (SA), Italy 3
Keyword(s) parallel processing, static mapping, task mapping, performance evaluation,
Abstract
In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well-known NP-hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph-based mapping algorithm, called Heterogeneous Multi-phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta-heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE906

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework
Volume ID 17
Issue ID 13
Date 11 00 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.906
Article ID CPE906
Author Name(s) Arun Krishnan1
Author Email(s) arun@bii.a-star.edu.sg1
Affiliation(s) BioInformatics Institute, 30 Biopolis Street,#07-01, Singapore 138671 1
Keyword(s) Grid application, high-throughput BLAST,
Abstract
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid-enabled, high-throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid-enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a &lsquo;mini-Grid&rsquo; testbed and the results presented here show that for large problem sizes, a distributed, Grid-enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE948

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Editorial
Article Title Special Issue: Third IEEE International Workshop on High Performance Computational Biology (HiCOMB 2004)
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.948
Article ID CPE948
Author Name(s) Dan Marinescu1
Author Email(s) a@a.com1
Affiliation(s) 1
Keyword(s) .,
Abstract
No abstract

Article ID: CPE949

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Study of a highly accurate and fast protein-ligand docking method based on molecular dynamics
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.949
Article ID CPE949
Author Name(s) C. L. Brooks III1M. Taufer2M. Crowley3D. J. Price4A. A. Chien5
Author Email(s) brooks@scripps.edu1
Affiliation(s) Department of Molecular Biology (TPC6), The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, U.S.A. 1 2 3 4 5
Keyword(s) force-field-based methods, docking accuracy, desktop Grid computing,
Abstract
Few methods use molecular dynamics simulations in concert with atomically detailed force fields to perform protein-ligand docking calculations because they are considered too time demanding, despite their accuracy. In this paper we present a docking algorithm based on molecular dynamics which has a highly flexible computational granularity. We compare the accuracy and the time required with well-known, commonly used docking methods such as AutoDock, DOCK, FlexX, ICM, and GOLD. We show that our algorithm is accurate, fast and, because of its flexibility, applicable even to loosely coupled distributed systems such as desktop Grids for docking. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE950

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel protein folding with STAPL
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.950
Article ID CPE950
Author Name(s) Nancy M. Amato1Shawna Thomas2Gabriel Tanase3Lucia K. Dale4Jose M. Moreira5Lawrence Rauchwerger6
Author Email(s) amato@cs.tamu.edu1
Affiliation(s) Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, U.S.A. 1 2 Parasol Laboratory, Department of Computer Science, Texas A&M University, College Station, TX 77843-3112, U.S.A 3University of the South, Sewanee, TN 37383-1000, U.S.A. 4IBM TJ Watson Research Center, Yorktown Heights, NY 10598-0218, U.S.A. 5 6
Keyword(s) protein folding, motion planning, parallel libraries, C++,
Abstract
The protein-folding problem is a study of how a protein dynamically folds to its so-called native state - an energetically stable, three-dimensional conformation. Understanding this process is of great practical importance since some devastating diseases such as Alzheimer's and bovine spongiform encephalopathy (Mad Cow) are associated with the misfolding of proteins. We have developed a new computational technique for studying protein folding that is based on probabilistic roadmap methods for motion planning. Our technique yields an approximate map of a protein's potential energy landscape that contains thousands of feasible folding pathways. We have validated our method against known experimental results. Other simulation techniques, such as molecular dynamics or Monte Carlo methods, require many orders of magnitude more time to produce a single, partial trajectory. In this paper we report on our experiences parallelizing our method using STAPL (Standard Template Adaptive Parallel Library) that is being developed in the Parasol Lab at Texas A&M. An efficient parallel version will enable us to study larger proteins with increased accuracy. We demonstrate how STAPL enables portable efficiency across multiple platforms, ranging from small Linux clusters to massively parallel machines such as IBM's BlueGene/L, without user code modification.Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE951

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Dynamic programming for LR-PCR segmentation of bacterium genomes
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.951
Article ID CPE951
Author Name(s) Rumen Andonov1Dominique Lavenier2Philippe Veber3Lucia K. Dale4Nicola Yanev5
Author Email(s) randonov@irisa.fr1
Affiliation(s) IRISA, Campis de Beaulieu, 35042 Rennes Cedex, France 1 2 3 University of the South, Sewanee, TN 37383-1000, U.S.A. 4University of Sofia, 5 J. Bouchier str., 1126 Sofia, Bulgaria 5
Keyword(s) dynamic programming, LR-PCR, optimization, genomics,
Abstract
Bacterium genome plasticity can efficiently be studied by the long-range polymerase chain reaction (LR-PCR) technique: genomes of different strains are split into hundreds of short segments which, after LR-PCR amplification, are used to sketch profiles. The segments have to: (1) cover the entire genome; (2) overlap each other; and (3) be of nearly identical size. This paper addresses the problem of finding a list of segments satisfying these constraints as much as possible. Two algorithms based on dynamic programming approach are presented. They differ on the optimization criteria for measuring the quality of the covering. The first considers the maximal deviation of the segment lengths relatively to an ideal length. The second automatically finds a segment length that minimizes the maximal deviation. .Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE952

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Parallel RNA secondary structure prediction using stochastic context-free grammars
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.952
Article ID CPE952
Author Name(s) Tong Liu1Bertil Schmidt2
Author Email(s) asbschmidt@ntu.edu.sg1
Affiliation(s) School of Computer Engineering, Nanyang Technological University, Singapore 639798 1 2
Keyword(s) RNA secondary structure, stochastic context-free grammars, parallel processing,
Abstract
With the growing number of known RNA genes efficient and accurate computational analysis of RNA sequences is becoming increasingly important. Stochastic context-free grammars (SCFGs) are used as a popular tool to model RNA secondary structures. However, algorithms for aligning a RNA sequence to a SCFG are highly compute-intensive. This has so far limited applications of SCFGs to relatively small problem sizes. In this paper we present the design of a parallel RNA sequence-structure alignment algorithm. Its implementation on parallel systems leads to significant runtime savings. This makes it possible to compute sequence-structure alignments of even the largest RNAs such as small subunit ribosomal rRNAs and long subunit ribosomal rRNAs in reasonable time. .Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE953

Publisher John Wiley Sons, Ltd. Chichester, UK
Category Research Article
Article Title Peptide identification via constrained multi-objective optimization: Pareto-based genetic algorithms
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.953
Article ID CPE953
Author Name(s) J. M. Malard1A. Heredia-Langner2W. R. Cannon3R. Mooney4D. J. Baxter5
Author Email(s) jm.malard@pnl.gov1
Affiliation(s) Pacific Northwest National Laboratory, Richland, WA, U.S.A. 1 2 3 4 5
Keyword(s) data-intensive computation, genetic algorithms, multiobjective optimization, peptide identification, tandem mass spectrometry,
Abstract
Automatic peptide identification from collision-induced dissociation tandem mass spectrometry data using optimization techniques is made difficult by large plateaus in the fitness landscapes of scoring functions, by the fuzzy nature of constraints from noisy data and by the existence of diverse but equally justifiable probabilistic models of peak matching. Here, two different scoring functions are combined into a parallel multi-objective optimization framework. It is shown how multi-objective optimization can be used to empirically test for independence between distinct scoring functions. The loss of selection pressure during the evolution of a population of putative peptide sequences by a Pareto-driven genetic algorithm is addressed by alternating between two definitions of fitness according to a numerical threshold. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE954

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title RAxML-II: a program for sequential, parallel and distributed inference of large phylogenetic trees
Volume ID 17
Issue ID 14
Date 12 10 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.954
Article ID CPE954
Author Name(s) Alexandros Stamatakis1Thomas Ludwig2Harald Meier3
Author Email(s) stamatak@cs.tum.edu1
Affiliation(s) Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstrasse 3, D-85748 Garching b. München, Germany 1Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany 2 3
Keyword(s) phylogenetic trees, maximum likelihood, parallel and distributed computing,
Abstract
Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally intensive. We present simple heuristics which yield accurate trees for synthetic as well as real data and significantly reduce execution time. Those heuristics have been implemented in a sequential, parallel, and distributed program called RAxML-II, which is freely available as open source code. We compare the performance of the sequential program with PHYML and MrBayes which-to the best of our knowledge-are currently the fastest and most accurate programs for phylogenetic tree inference based on statistical methods. Experiments are conducted using 50 synthetic 100 taxon alignments as well as nine real-world alignments comprising 101 up to 1000 sequences. RAxML-II outperforms MrBayes for real-world data both in terms of speed and final likelihood values. Furthermore, for real data RAxML-II requires less time (a factor of 2-8) than PHYML to reach PHYML's final likelihood values and yields better final trees due to its more exhaustive search strategy. For synthetic data MrBayes is slightly more accurate than RAxML-II and PHYML but significantly slower. The non-deterministic parallel program shows good speedup values and has been used to infer a 10 000-taxon tree comprising organisms from the domains Eukarya, Bacteria, and Archaea. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE887

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Loosely coordinated coscheduling in the context of other approaches for dynamic job scheduling: a survey
Volume ID 17
Issue ID 15
Date 12 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.887
Article ID CPE887
Author Name(s) A. C. Sodan1
Author Email(s) acsodan@cs.uwindsor.ca1
Affiliation(s) University of Windsor, Computer Science, 401 Sunset Avenue, Windsor, Ontario, Canada N9B 3P4 1
Keyword(s) job scheduling, coscheduling, clusters, resource allocation, dynamic adaptation, computational Grids,
Abstract
Loosely coordinated (implicit&sol;dynamic) coscheduling is a time-sharing approach that originates from network of workstations environments of mixed parallel&sol;serial workloads and limitedsoftware support. It is meant to be an easy-to-implement and scalable approach. Considering that the percentage of clusters in parallel computing is increasing and easily portable software is needed, loosely coordinated coscheduling becomes an attractive approach for dedicated machines. Loose coordination offers attractive features as a dynamic approach. Static approaches for local job scheduling assign resources exclusively and non-preemptively. Such approaches still remain beyond the desirable resource utilization and average response times. Conversely, approaches for dynamic scheduling of jobs can preempt resources and&sol;or adapt their allocation. They typically provide better resource utilization and response times. Existing dynamic approaches are full preemption with checkpointing, dynamic adaptation of node&sol;CPU allocation, and time sharing via gang or loosely coordinated coscheduling. This survey presents and compares the different approaches, while particularly focusing on the less well-explored loosely coordinated time sharing. The discussion particularly focuses on the implementation problems, in terms of modification of standard operating systems, the runtime system and the communication libraries. Copyright © 2005 John Wiley & Sons, Ltd.

Article ID: CPE888

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title Neuroscience instrumentation and distributed analysis of brain activity data: a case for eScience on global Grids
Volume ID 17
Issue ID 15
Date 12 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.888
Article ID CPE888
Author Name(s) Rajkumar Buyya1Susumu Date2Yuko Mizuno-Matsumoto3Srikumar Venugopal4David Abramson5
Author Email(s) raj@cs.mu.oz.au1
Affiliation(s) Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia 1Graduate School of Information Science and Technology, Department of Bioinformatics Engineering, Osaka University, Japan 2Department of Information Systems Engineering, Graduate School of Osaka University, Japan 3 4 School of Computer Science and Software Engineering, Monash University, Melbourne, Australia 5
Keyword(s) Grid computing, eScience, neuroscience, Gridbus middleware, Nimrod-G, virtual instrumentation,
Abstract
distribution need

Article ID: CPE918

Publisher John Wiley & Sons, Ltd. Chichester, UK
Category Research Article
Article Title http://nenya.ms.mff.cuni.cz/projects/corba/oopsla-workshop-03/
Volume ID 17
Issue ID 15
Date 12 25 2005
DOI(URI) http://dx.doi.org/10.1002/cpe.918
Article ID CPE918
Author Name(s) Paul Brebner1Emmanuel Cecchet2Julie Marguerite3Petr Tůma4Octavian Ciuhandu5Bruno Dufour6Lieven Eeckhout7Stéphane Frénot8Arvind S. Krishna9John Murphy10Clark Verbrugge11
Author Email(s) paul.brebner@csiro.au1
Affiliation(s) CSIRO ICT Centre, G.P.O. Box 664, Canberra, ACT 2601, Australia 1INRIA Rhône-Alpes/ObjectWeb, 655 avenue de l'Europe, 38330 Montbonnot, St. Martin, France 2INRIA Rhône-Alpes/ObjectWeb, 655 avenue de l'Europe, 38330 Montbonnot, St. Martin, France 3Department of Software Engineering, Charles University, Malostranské náměstí 25, Praha 1, Czech Republic 4Performance Engineering Laboratory, Dublin City University, Dublin 9, Ireland 5School of Computer Science, McGill University, Montréal, Québec, Canada H3A 2A7 6Department of Electronics and Information Systems (ELIS), Ghent University, St. Pietersnieuwstraat 41, B9000 Gent, Belgium 7INRIA Arès, INSA-CITI Bat. Léonard de Vinci, 69661 Villeurbanne Cedex, France 8Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235, U.S.A. 9 10 11
Keyword(s) middleware benchmarking, middleware performance, middleware scalability, middleware evaluation, middleware benchmark design, benchmarking guidelines, benchmarking measurement, benchmarking metrics, OOPSLA 2003 workshop,
Abstract
The report summarizes the results of the Workshop on Middleware Benchmarking held during OOPSLA 2003. The goal of the workshop was to help advance the current practice of gathering performance characteristics of middleware implementations through benchmarking. The participants of the workshop have focused on identifying requirements of and obstacles to middleware benchmarking and forming a position on the related issues. Selected requirements and obstacles are presented, together with guidelines to adhere to when benchmarking, open issues of current practice, and perspectives on further research. Copyright © 2005 John Wiley & Sons, Ltd.