CPE871:Autonomic oil reservoir optimization on the Grid
- Author(s):Manish Parashar,Vincent Matossian,Viraj Bhat ,Magorzata Peszyska ,Mrinal Sen ,Paul Stoffa ,Mary F. Wheeler
- The Applied Software
Systems Laboratory, Rutgers University, Piscataway,
NJ 08855, U.S.A.,Department of
Mathematics, Oregon State University, Corvallis, OR
97331, U.S.A.,Institute for
Geophysics, University of Texas at Austin, Austin,
TX 78759, U.S.A.,Center for Subsurface
Modeling, University of Texas at Austin, Austin, TX
78712, U.S.A.
http://dx.doi.org/10.1002/cpe.871
CCPE Title/Author Summary
CCPE Metadata
Abstract:The emerging Grid infrastructure and
its support for seamless and secure interactions is enabling
a new generation of autonomic applications where the
application components, Grid services, resources, and data
interact as peers to manage, adapt and optimize themselves
and the overall application. In this paper we describe the
design, development and operation of a prototype of such an
application that uses peer-to-peer interactions between
distributed services and data on the Grid to enable the
autonomic optimization of an oil reservoir. Copyright
© 2005 John Wiley & Sons, Ltd.
CPE877:A role for Pareto optimality in mining performance data
- Author(s):Joël M. Malard
- Pacific Northwest National Laboratory, Battelle Boulevard, P.O. Box 999, Richland, WA 99352, U.S.A.
http://dx.doi.org/10.1002/cpe.877
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Improvements in performance modeling and identification of computational regimes within software libraries is a critical first step in developing software libraries that are truly agile with respect to the application as well as to the hardware. It is shown here that Pareto ranking, a concept from multi-objective optimization, can be an effective tool for mining large performance datasets. The approach is illustrated using software performance data gathered using both the public domain LAPACK library and an asynchronous communication library based on IBM LAPI active message library. Copyright © 2005 John Wiley & Sons, Ltd.
CPE883:Solving the block-Toeplitz least-squares problem in parallel
- Author(s):P. Alonso,J. M. Badía,A. M. Vidal
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46071 Valencia, Spain,Departamento de Ingenierá y Ciencia de los Computadores, Universidad Jaume I, 12080 Castellón, Spain,
- http://dx.doi.org/10.1002/cpe.883
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
In this paper we present two versions of a parallel algorithm to solve the block-Toeplitz least-squares problem on distributed-memory architectures.
We derive a parallel algorithm based on the seminormal equations arising from the triangular decomposition of the product
TTT. Our parallel algorithm exploits the displacement structure of the Toeplitz-like matrices
using the Generalized Schur Algorithm to obtain the solution in O(mn)
flops instead of O(mn2) flops of the algorithms for non-structured matrices.
The strong regularity of the previous product of matrices and an appropriate computation of the hyperbolic
rotations improve the stability of the algorithms. We have reduced the communication cost of previous versions,
and have also reduced the memory access cost by appropriately arranging the elements of the matrices.
Furthermore, the second version of the algorithm has a very low spatial cost, because it does not store
the triangular factor of the decomposition. The experimental results show a good scalability of the parallel
algorithm on two different clusters of personal computers. Copyright © 2005 John Wiley & Sons, Ltd.
CPE884:Performance evaluation of the SX-6 vector architecture for scientific computations
- Author(s):Leonid Oliker,Andrew Canning,Jonathan Carter,John Shalf,David Skinner,Stéphane Ethier,Rupak Biswas,Jahed Djomehri,Rob Van der Wijngaart
- CRD/NERSC, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, U.S.A.,Princeton Plasma Physics Laboratory, Princeton University, Princeton, NJ 08453, U.S.A.,NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, U.S.A.,
- http://dx.doi.org/10.1002/cpe.884
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high-performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively. Copyright © 2005 John Wiley & Sons, Ltd.
CPE922:Special Issue: Grid Performance
http://dx.doi.org/10.1002/cpe.922
CCPE Title/Author Summary
CCPE Metadata
Abstract:.
CPE923:Performance engineering in data Grids
- Author(s):Erwin Laure,Heinz Stockinger,Kurt Stockinger
- CERN, European Organization for Nuclear Research, Geneva, Switzerland,
- http://dx.doi.org/10.1002/cpe.923
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
The vision of Grid computing is to facilitate worldwide resource sharing among distributed collaborations. With the help of numerous national and international Grid projects, this vision is becoming reality and Grid systems are attracting an ever increasing user base. However, Grids are still quite complex software systems whose efficient use is a difficult and error-prone task. In this paper we present performance engineering techniques that aim to facilitate an efficient use of Grid systems, in particular systems that deal with the management of large-scale data sets in the tera- and petabyte range (also referred to as data Grids). These techniques are applicable at different layers of a Grid architecture and we discuss the tools required at each of these layers to implement them. Having discussed important performance engineering techniques, we investigate how major Grid projects deal with performance issues particularly related to data Grids and how they implement the techniques presented. Copyright © 2005 John Wiley & Sons, Ltd.
CPE924:Performance of a possible Grid message infrastructure
- Author(s):Shrideep Pallickara,Geoffrey Fox,Ahmet Uyar,Hongbin Liu,Xi Rao,David Walker,Beytullah Yildiz
- Community Grid Laboratory, Indiana University, Lindley Hall 215, Bloomington, IN 47405, U.S.A.,Department of Computer Science, Cardiff University, U.K.,
- http://dx.doi.org/10.1002/cpe.924
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
In this paper we present the results pertaining to the NaradaBrokering middleware infrastructure. NaradaBrokering is designed to run on a large network of cooperating broker nodes. NaradaBrokering capabilities include, among other things, support for a wide variety of transport protocols, Java Message Service compliance, support for routing JXTA interactions, support for audio/video conferencing applications and, finally, support for multiple constraint specification formats such as XPath, SQL and regular expression queries. This paper demonstrates the suitability of NaradaBrokering to a wide variety of applications and scenarios. Copyright © 2005 John Wiley & Sons, Ltd.
CPE925:Performance-based middleware for Grid computing
- Author(s):G. R. Nudd,S. A. Jarvis
- High Performance Systems Group, Department of Computer Science, University of Warwick, Coventry CV4 7AL, U.K.,
- http://dx.doi.org/10.1002/cpe.925
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
This paper describes a stateful service-oriented middleware infrastructure for the management of scientific tasks running on multi-domain heterogeneous distributed architectures. Allocating scientific workload across multiple administrative boundaries is a key issue in Grid computing and as a result a number of supporting services including match-making, scheduling and staging have been developed. Each of these services allows the scientist to utilize the available resources, although a sustainable level of service in such shared environments cannot always be guaranteed. A performance-based middleware infrastructure is described in which prediction data for each scientific task are calculated, stored and published through a Globus-based performance information service. Distributing these data allows additional performance-based middleware services to be built, two of which are described in this paper: an intra-domain predictive co-scheduler and a multi-domain workload steering system. These additional facilities significantly improve the ability of the system to meet task deadlines, as well as enhancing inter-domain load-balancing and system-wide resource utilization. Copyright © 2005 John Wiley & Sons, Ltd.
CPE926:Performance control of scientific coupled models in Grid environments
- Author(s):Christopher Armstrong,Rupert W. Ford,John R. Gurd,Mikel Luján,Kenneth R. Mayes,Graham D. Riley
- Centre for Novel Computing, University of Manchester, Oxford Road, Manchester M13 9PL, U.K.,
- http://dx.doi.org/10.1002/cpe.926
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
In recent years, there has been increasing interest in the development of computer simulations of complex biological systems, and of multi-physics and multi-scale physical phenomena. Applications have been developed that involve the coupling together of separate executable models of individual systems, where these models may have been developed in isolation. A lightweight yet general solution is required to problems of linking coupled models, and of handling the incompatibilities between interacting models that arise from their diverse origins and natures. Many such models require high-performance computers to provide acceptable execution times, and there is increasing interest in utilizing Grid technologies. However, Grid applications need the ability to cope with heterogeneous and dynamically changing execution environments, particularly where run-time changes can affect application performance. A general coupling framework (GCF) is described that allows the construction of flexible coupled models. This approach results in a component-based implementation of a coupled model application. A semi-formal presentation of GCF is given. Components under GCF are separately deployable and coupled by simple data flows, making them appropriate structures for dynamic execution platforms such as the Grid. The design and initial implementation of a performance control system (PERCO) is reviewed. PERCO acts by redeploying components, and is thus appropriate for controlling GCF coupled model applications. Redeployment decisions in PERCO require performance prediction capabilities. A proof-of-concept performance prediction algorithm is presented, based on the descriptions of GCF and PERCO. Copyright © 2005 John Wiley & Sons, Ltd.
CPE928:The role of performance engineering techniques in the context of the Grid
- Author(s):A. J. G. Hey,J. Papay,M. Surridge
- EPSRC, Polaris House, North Star Avenue, Swindon SN2 1ET, U.K.,ECS, University of Southampton, Southampton SO17 1BJ, U.K.,IT Innovation Centre, 2 Venture Road, Chilworth Science Park, Southampton SO16 7NP, U.K.
http://dx.doi.org/10.1002/cpe.928
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Performance engineering can be described as a collection of techniques and methodologies whose aim is to provide reliable prediction, measurement and validation of the performance of applications on a variety of computing platforms. This paper reviews techniques for performance estimation and performance engineering developed at the University of Southampton and presents application case studies in task scheduling for engineering meta-applications, and capacity engineering for a financial transaction processing system. These show that it is important to describe performance in terms of a resource model, and that the choice of models may have to trade accuracy for utility in addressing the operational issues. We then present work from the on-going EU funded Grid project GRIA, and show how lessons learned from the earlier work have been applied to support a viable business model for Grid service delivery to a specified quality of service level. The key in this case is to accept the limitations of performance estimation methods, and design business models that take these limitations into account rather than attempting to provide hard guarantees over performance. We conclude by identifying some of the key lessons learned in the course of our work over many years and suggest possible directions for future investigations. Copyright © 2005 John Wiley & Sons, Ltd.
CPE929:ASKALON: a tool set for cluster and Grid computing
- Author(s):Thomas Fahringer,Alexandru Jugravu,Sabri Pllana,Radu Prodan,Clovis Seragiotto,Hong-Linh Truong
- Institute for Computer Science, University of Innsbruck, Technikerstrasse 25/7, A-6020 Innsbruck, Austria,Institute for Software Science, University of Vienna, Liechtensteinstrasse 22, A-1090 Vienna, Austria,
- http://dx.doi.org/10.1002/cpe.929
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Performance engineering of parallel and distributed applications is a complex task that iterates through various phases, ranging from modeling and prediction, to performance measurement, experiment management, data collection, and bottleneck analysis. There is no evidence so far that all of these phases should/can be integrated into a single monolithic tool. Moreover, the emergence of computational Grids as a common single wide-area platform for high-performance computing raises the idea to provide tools as interacting Grid services that share resources, support interoperability among different users and tools, and, most importantly, provide omnipresent services over the Grid. We have developed the ASKALON tool set to support performance-oriented development of parallel and distributed (Grid) applications. ASKALON comprises four tools, coherently integrated into a service-oriented architecture. SCALEA is a performance instrumentation, measurement, and analysis tool of parallel and distributed applications. ZENTURIO is a general purpose experiment management tool with advanced support for multi-experiment performance analysis and parameter studies. AKSUM provides semi-automatic high-level performance bottleneck detection through a special-purpose performance property specification language. The PerformanceProphet enables the user to model and predict the performance of parallel applications at the early stages of development. In this paper we describe the overall architecture of the ASKALON tool set and outline the basic functionality of the four constituent tools. The structure of each tool is based on the composition and sharing of remote Grid services, thus enabling tool interoperability. In addition, a data repository allows the tools to share the common application performance and output data that have been derived by the individual tools. A service repository is used to store common portable Grid service implementations. A general-purpose Factory service is employed to create service instances on arbitrary remote Grid sites. Discovering and dynamically binding to existing remote services is achieved through registry services. The ASKALON visualization diagrams support both online and post-mortem visualization of performance and output data. We demonstrate the usefulness and effectiveness of ASKALON by applying the tools to real-world applications. Copyright © 2005 John Wiley & Sons, Ltd.
CPE930:Automatic performance analysis tools for the Grid
- Author(s):M. Gerndt
- Institut für Informatik X, Technische Universität München, 85748 Garching, Germany
http://dx.doi.org/10.1002/cpe.930
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Applications on Grids require scalable and online performance analysis tools. The execution environment of such applications includes a large number of processors. In addition, some of the resources such as the network will be shared with other applications. This requires applications to adapt dynamically to resource changes. The article presents the requirements of Grid application classes for performance analysis tools. It introduces a new analysis environment currently being developed for teraflop computers within the Peridot project at Technische Universität München. This environment applies a distributed automatic performance analysis approach. It is based on a formal specification of performance properties in the APART specification language. It uses a hierarchy of analysis agents that obtain performance data from a configurable monitoring system. This scalable design allows performance analysis for large Grid applications. Copyright © 2005 John Wiley & Sons, Ltd.
CPE938:Distributed computing in practice: the Condor experience
- Author(s):Douglas Thain,Todd Tannenbaum,Miron Livny
- Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, U.S.A.,
- http://dx.doi.org/10.1002/cpe.938
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, the project continues to explore the social and technical problems of cooperative computing on scales ranging from the desktop to the world-wide computational Grid. In this paper, we provide the history and philosophy of the Condor project and describe how it has interacted with other projects and evolved along with the field of distributed computing. We outline the core components of the Condor system and describe how the technology of computing must correspond to social structures. Throughout, we reflect on the lessons of experience and chart the course travelled by research ideas as they grow into production systems. Copyright © 2005 John Wiley & Sons, Ltd.
CPE939:The design and implementation of Grid database services in OGSA-DAI
- Author(s):Mario Antonioletti,Malcolm Atkinson,Rob Baxter,Andrew Borley,Neil P. Chue Hong,Brian Collins,Neil Hardman,Alastair C. Hume,Alan Knox,Mike Jackson,Amy Krause,Simon Laws,James Magowan,Norman W. Paton,Dave Pearson,Tom Sugden,Paul Watson,Martin Westhead
- EPCC, University of Edinburgh, James Clerk Maxwell Building, Mayfield Road, Edinburgh EH9 3JZ, U.K.,National e-Science Centre, 15 South College Street, Edinburgh EH8 9AA, U.K.,IBM United Kingdom Ltd, Hursley Park, Winchester SO21 2JN, U.K.,Department of Computer Science, University of Manchester, Oxford Road, Manchester M13 9PL, U.K.,Oracle UK Ltd, Thames Valley Park, Reading RG6 1RA, U.K.,School of Computing Science, University of Newcastle, Newcastle-upon-Tyne NE1 7RU, U.K.,
- http://dx.doi.org/10.1002/cpe.939
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
http://www.ogsadai.org.uk
CPE835:Adding tuples to Java: a study in lightweight data structures
CPE836:Object combining: a new aggressive optimization
for object intensive programs
- Author(s):Ronald Veldema,Ceriel J. H. Jacobs,Rutger F. H. Hofman,Henri E. Bal
- Department of
Mathematics and Computer Science, Vrije
Universiteit, Amsterdam, The Netherlands,
- http://dx.doi.org/10.1002/cpe.836
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Object combining tries to put objects
together that have roughly the same life times in order to
reduce strain on the memory manager and to reduce the number
of pointer indirections during a program's execution.
Object combining works by appending the fields of one object
to another, allowing allocation and freeing of multiple
objects with a single heap (de)allocation. Unlike object
inlining, which will only optimize objects where one has a
(unique) pointer to another, our optimization also works if
there is no such relation. Object inlining also directly
replaces the pointer by the inlined object's fields.
Object combining leaves the pointer in place to allow more
combining. Elimination of the pointer accesses is
implemented in a separate compiler optimization pass. Unlike
previous object inlining systems, reference field overwrites
are allowed and handled, resulting in much more aggressive
optimization. Our object combining heuristics also allow
unrelated objects to be combined, for example, those
allocated inside a loop; recursive data structures (linked
lists, trees) can be allocated several at a time and objects
that are always used together can be combined. As Java
explicitly permits code to be loaded at runtime and allows
the new code to contribute to a running computation, we do
not require a closed-world assumption to enable these
optimizations (but it will increase performance). The main
focus of object combining in this paper is on reducing
object (de)allocation overhead, by reducing both garbage
collection work and the number of object allocations.
Reduction of memory management overhead causes execution
time to be reduced by up to 35%. Indirection removal further
reduces execution time by up to 6%. Copyright ©
2005 John Wiley & Sons, Ltd.
CPE837:UMM: an operational memory model specification
framework with integrated model checking capability
- Author(s):Yue Yang,Ganesh Gopalakrishnan,Gary Lindstrom
- School of Computing,
University of Utah, Salt Lake City, UT 84112, U.S.A.,
- http://dx.doi.org/10.1002/cpe.837
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Given the complicated nature of modern
shared memory systems, it is vital to have a systematic
approach to specifying and analyzing memory consistency
requirements. In this paper, we present the UMM
specification framework, which integrates two key features
to support memory model verification: (i) it employs a
simple and generic memory abstraction that can capture a
large collection of memory models as guarded commands with a
uniform notation, and (ii) it provides built-in model
checking capability to enable formal reasoning about thread
behaviors. Using this framework, memory models can be
specified in a parameterized style - designers can simply
redefine a few bypassing rules and visibility ordering rules
to obtain an executable specification of another memory
model. We formalize several classical memory models,
including Sequential Consistency, Coherence, and PRAM, to
illustrate the general techniques of applying this
framework. We then provide an alternative specification of
the Java memory model, based on a proposal from Manson and
Pugh, and demonstrate how to analyze Java thread semantics
using model checking. We also compare our operational
specification style with axiomatic specification styles and
explore a mechanism that converts a memory model definition
from one style to the other.Copyright © 2005
John Wiley & Sons, Ltd.
CPE843:Ravenscar-Java: a high-integrity profile for
real-time Java
- Author(s):Jagun Kwon,Andy Wellings,Steve King
- Real-Time Systems
Research Group, Department of Computer Science,
University of York, Heslington, York YO10 5DD, U.K.,
- http://dx.doi.org/10.1002/cpe.843
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:For many, Java is the antithesis of a
high-integrity programming language. Its combination of
object-oriented programming features, its automatic garbage
collection, and its poor support for real-time
multi-threading are all seen as particular impediments. The
Real-Time Specification for Java has introduced many new
features that help in the real-time domain. However, the
expressive power of these features means that very complex
programming models can be created, necessitating complexity
in the supporting real-time virtual machine. Consequently,
Java, with the real-time extensions as they stand, seems too
complex for confident use in high-integrity systems. This
paper presents a Java profile for the development of
software-intensive high-integrity real-time systems. This
restricted programming model removes language features with
high overheads and complex semantics, on which it is hard to
perform timing and functional analyses. The profile fits
within the J2ME framework and is consistent with well-known
guidelines for high-integrity software development, such as
those defined by the U.S. Nuclear Regulatory
Commission.Copyright © 2005 John Wiley &
Sons, Ltd.
CPE845:Improving the memory management performance of RTSJ
- Author(s):M. Teresa Higuera-Toledano,Valerie Issarny
- Universidad Complutense
de Madrid, Ciudad Universitaria, Madrid 28040, Spain,INRIA-Rocquencourt,
Domaine de Voluceau, BP 105, 78153 Le Chesnay Cedex, France
http://dx.doi.org/10.1002/cpe.845
CCPE Title/Author Summary
CCPE Metadata
Abstract:From a real-time perspective, the
garbage collector (GC) introduces unpredictable pauses that
are not tolerated by real-time tasks. Real-time collectors
eliminate this problem but introduce a high overhead.
Another approach is to use memory regions (MRs) within which
allocation and deallocation is customized. This facility is
supported by the memory model of the Real-Time Specification
for Java (RTSJ). RTSJ imposes strict access and assignment
rules to avoid both the dangling inter-region references and
the delays of critical tasks of the GC. A dynamic check
solution can incur high overhead, which can be reduced by
taking advantage of hardware features. This paper provides
an in-depth analytical investigation of the overhead
introduced by dynamic assignments checks in RTSJ, describing
and analysing several solutions to reduce the introduced
overhead. Copyright ? 2005 John Wiley © Sons, Ltd.
CPE847:Elimination of Java array bounds checks in the
presence of indirection
- Author(s):Mikel Luj?n,John R. Gurd,T. L. Freeman,Jos? Miguel
- Centre for Novel
Computing, University of Manchester, Oxford Road,
Manchester M13 9PL, U.K.,Departamento de
Arquitectura y Tecnolog?a de Computadores, UPV-EHU,
P.O. Box 649, 20080 Donostia-San Sebastian, Spain
http://dx.doi.org/10.1002/cpe.847
CCPE Title/Author Summary
CCPE Metadata
Abstract:The Java language specification states
that every access to an array needs to be within the bounds
of that array, i.e. between 0 and array length - 1.
Different techniques for different programming languages
have been proposed to eliminate explicit bounds checks. Some
of these techniques are implemented in off-the-shelf Java
Virtual Machines (JVMs). The underlying principle of these
techniques is that bounds checks can be removed when a
JVM/compiler has enough information to guarantee that a
sequence of accesses (e.g. inside a for-loop) is safe
(within the bounds). Most of the techniques for the
elimination of array bounds checks have been developed for
programming languages that do not support multi-threading
and/or enable dynamic class loading. These two
characteristics make most of these techniques unsuitable for
Java. Techniques developed specifically for Java have not
addressed the elimination of array bounds checks in the
presence of indirection; that is, when the index is stored
in another array (indirection array). With the objective of
optimizing applications with array indirection, this paper
proposes and evaluates three implementation strategies, each
implemented as a Java class. The classes provide the
functionality of Java arrays of type int so that objects of
the classes can be used instead of indirection arrays. Each
strategy enables JVMs, when examining only one of these
classes at a time, to obtain enough information to remove
array bounds checks.Copyright © 2005 John Wiley
& Sons, Ltd.
CPE848:Run-time evaluation of opportunities for object
inlining in Java
- Author(s):Ondrej Lhot?k,Laurie Hendren
- Sable Research Group,
School of Computer Science, McGill University,
Montreal, Canada,
- http://dx.doi.org/10.1002/cpe.848
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Object-oriented languages, such as
Java, encourage the use of many small objects linked
together by field references, instead of a few monolithic
structures. While this practice is beneficial from a program
design perspective, it can slow down program execution by
incurring many pointer indirections. One solution to this
problem is object inlining: when the compiler can safely do
so, it fuses small objects together, thus removing the
reads/writes to the removed field, saving the memory needed
to store the field and object header, and reducing the
number of object allocations. The objective of this paper is
to measure the potential for object inlining by studying the
run-time behaviour of a comprehensive set of Java programs.
We study the traces of program executions in order to
determine which fields behave like inlinable fields. Since
we are using dynamic information instead of a static
analysis, our results give an upper bound on what could be
achieved via a static compiler-based approach. Our
experimental results measure the potential improvements
attainable with object inlining, including reductions in the
numbers of field reads and writes, and reduced memory usage.
Our study shows that some Java programs can benefit
significantly from object inlining, with close to a 10%
speedup. Somewhat to our surprise, our study found one case,
the db benchmark, where the most important inlinable field
was the result of unusual program design, and fixing this
small flaw led to both better performance and clearer
program design. However, the opportunities for object
inlining are highly dependent on the individual program
being considered, and are in many cases very limited.
Furthermore, fields that are inlinable also have properties
that make them potential candidates for other optimizations
such as removing redundant memory accesses. The memory
savings possible through object inlining are moderate.
Copyright © 2005 John Wiley & Sons, Ltd.
CPE849:Jeeg: temporal constraints for the synchronization of concurrent objects
- Author(s):Giuseppe Milicia,Vladimiro Sassone
- Basic Research in Computer Science (BRICS), University of Aarhus, IT-parken Aabogade 34, DK-8200 Aarhus, Denmark,Informatics, University of Sussex, Brighton BN1 9QH, U.K.
http://dx.doi.org/10.1002/cpe.849
CCPE Title/Author Summary
CCPE Metadata
Abstract:
We introduce Jeeg, a dialect of Java based on a declarative replacement of the synchronization mechanisms of Java that results in a complete decoupling of the ‘business’ and the ‘synchronization’ code of classes. Synchronization constraints in Jeeg are expressed in a linear temporal logic, which allows one to effectively limit the occurrence of the inheritance anomaly that commonly affects concurrent object-oriented languages. Jeeg is inspired by the current trend in aspect-oriented languages. In a Jeeg program the sequential and concurrent aspects of object behaviors are decoupled: specified separately by the programmer, these are then weaved together by the Jeeg compiler. Copyright © 2005 John Wiley & Sons, Ltd.
CPE850:Compiling almost-whole Java programs
- Author(s):Zoran Budimlic,Ken Kennedy
- Center for High
Performance Software Research, Rice University,
Houston, TX 77005, U.S.A.,
- http://dx.doi.org/10.1002/cpe.850
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:This paper presents a strategy, called
almost-whole-program compilation, for extending the benefits
of whole-program optimization to large collections of Java
components that are packaged as a group after the
development phase. This strategy has been implemented in a
framework that uses Java visibility and scoping rules to
transform a collection of classes into a package that is
amenable to whole-program optimizations, without precluding
extensions to the optimized and compiled code. Thus, it
enables the Java developer to balance performance against
flexibility of the program after the development phase,
without compromising the design process. The transformation
is shown to incur only modest performance penalties, which
are more than compensated for by the interprocedural
optimizations it enables. The paper concludes with
experimental results showing the benefits that can be
achieved using this approach. Copyright © 2005 John
Wiley & Sons, Ltd.
CPE851:Recurrence analysis for effective array prefetching in Java
- Author(s):Brendon Cahoon,Kathryn S. McKinley
- Department of Computer Sciences, University of Texas at Austin, Austin, TX 78757, U.S.A.,
- http://dx.doi.org/10.1002/cpe.851
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Java is an attractive choice for numerical, as well as other, algorithms due to the software engineering benefits of object-oriented programming. Because numerical programs often use large arrays that do not fit in the cache, they suffer from poor memory performance. To hide memory latency, we describe a new unified compile-time analysis for software prefetching arrays and linked structures in Java. Our previous work used data-flow analysis to discover linked data structure accesses. We generalize our prior approach to identify loop induction variables as well, which we call recurrence analysis. Our algorithm schedules prefetches for all array references that contain induction variables. We evaluate our technique using a simulator of an out-of-order superscalar processor running a set of array-based Java programs. Across all of our programs, prefetching reduces execution time by a geometric mean of 23%, and the largest improvement is 58%. We also evaluate prefetching on a PowerPC processor, and we show that prefetching reduces execution time by a geometric mean of 17%. Because our analysis is much simpler and quicker than previous techniques, it is suitable for including in a just-in-time compiler. Traditional software prefetching algorithms for C and Fortran use locality analysis and sophisticated loop transformations. We further show that the additional loop transformations and careful scheduling of prefetches from previous work are not always necessary for modern architectures and Java programs. Copyright © 2005 John Wiley & Sons, Ltd.
CPE852:The Open Runtime Platform: a flexible high-performance managed runtime environment An earlier version of this paper was published online 1
- Author(s):Michal Cierniak,Marsha Eng,Neal Glew,Brian Lewis,James Stichnoth
- Microsoft Corporation, 1 Microsoft Way, Redmond, WA 98052, U.S.A.,Microprocessor Technology Laboratory, Intel Corporation, 2200 Mission College Boulevard, Santa Clara, CA, U.S.A.,
- http://dx.doi.org/10.1002/cpe.852
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
The Open Runtime Platform (ORP) is a high-performance managed runtime environment (MRTE) that features exact generational garbage collection, fast thread synchronization, and multiple coexisting just-in-time compilers (JITs). ORP was designed for flexibility in order to support experiments in dynamic compilation, garbage collection, synchronization, and other technologies. It can be built to run either Java or Common Language Infrastructure (CLI) applications, to run under the Windows or Linux operating systems, and to run on the IA-32 or Itanium processor family (IPF) architectures. Achieving high performance in a MRTE presents many challenges, particularly when flexibility is a major goal. First, to enable the use of different garbage collectors and JITs, each component must be isolated from the rest of the environment through a well-defined software interface. Without careful attention, this isolation could easily harm performance. Second, MRTEs have correctness and safety requirements that traditional languages such as C++ lack. These requirements, including null pointer checks, array bounds checks, and type checks, impose additional runtime overhead. Finally, the dynamic nature of MRTEs makes some traditional compiler optimizations, such as devirtualization of method calls, more difficult to implement or more limited in applicability. To get full performance, JITs and the core virtual machine (VM) must cooperate to reduce or eliminate (where possible) these MRTE-specific overheads. In this paper, we describe the structure of ORP in detail, paying particular attention to how it supports flexibility while preserving high performance. We describe the interfaces between the garbage collector, the JIT, and the core VM; how these interfaces enable multiple garbage collectors and JITs without sacrificing performance; and how they allow the JIT and the core VM to reduce or eliminate MRTE-specific performance issues. Copyright © 2005 John Wiley & Sons, Ltd.
CPE853:Immutability specification and its applications
- Author(s):Igor Pechtchanski ,Vivek Sarkar
- BM T. J. Watson
Research Center, P.O. Box 704, Yorktown Heights, NY
10598, U.S.A,
- http://dx.doi.org/10.1002/cpe.853
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:A location is said to be immutable if
its value and the values of selected locations reachable
from it are guaranteed to remain unchanged during a
specified time interval. We introduce a framework for
immutability specification, and discuss its application to
code optimization. Compared with a final declaration, an
immutability assertion in our framework can express a richer
set of immutability properties along three dimensions -
lifetime, reachability and context. We present a framework
for processing and verifying immutability annotations in
Java, as well as extending optimizations so as to exploit
immutability information. Preliminary experimental results
show that a significant number (61%) of read accesses could
potentially be classified as immutable in our framework.
Further, use of immutability information yields substantial
reductions (33-99%) in the number of dynamic read accesses,
and also measurable speedups in the range of 5-10% for
certain benchmark programs.Copyright © 2005 John
Wiley & Sons, Ltd.
CPE858:Experience in integrating Java with C# and .NET
- Author(s):Judith Bishop,R. Nigel Horspool,Basil Worrall
- Computer Science Department, University of Pretoria, Pretoria 0002, South Africa,Computer Science Department, University of Victoria, Victoria, BC, Canada V8W 3P6,
- http://dx.doi.org/10.1002/cpe.858
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Java programmers cannot help but be aware of the advent of C#, the .NET network environment, and a host of new supporting technologies, such as Web services. Before taking the big step of moving all development to a new environment, programmers will want to know what are the advantages of C# as a language over Java, and whether the new and interesting features of C# and .NET can be incorporated into existing Java software. This paper surveys the advantages of C# and then presents and evaluates experience with connecting it to Java in a variety of ways. The first way provides evidence that Java can be linked to C# at the native code level, albeit through C++ wrappers. The second is a means for retaining the useful applet feature of Java in the server-side architecture of Web services written in C#. The third is by providing a common XML-based class for the development of graphical user interfaces (GUIs), which can be incorporated into Java or C#. An added advantage of this system, called Views, is that it can run independently of the resource-intensive development environment that would otherwise be needed for using C#. A major advantage of the methods described in this paper is that in all cases the Java program is not affected by the fact that it is interfacing with C#. The paper concludes that there are many common shared technologies that bring Java and C# close together, and that innovative ways of using others can open up opportunities not hitherto imagined. Copyright © 2005 John Wiley & Sons, Ltd.
CPE933:Special Issue: ACM 2002 Java Grande-ISCOPE Conference
http://dx.doi.org/10.1002/cpe.933
CCPE Title/Author Summary
CCPE Metadata
Abstract:No abstract
CPE838:Support and optimization of Java RMI over a
Bluetooth environment
- Author(s):Pu-Chen Wei,Chung-Hsin Chen,Cheng-Wei Chen,Jenq-Kuen Lee
- Department of Computer
Science, National Tsing Hua University, HsinChu
30055, Taiwan,
- http://dx.doi.org/10.1002/cpe.838
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Distributed object-oriented platforms
are increasingly important over wireless environments for
providing frameworks for collaborative computations and for
managing a large pool of distributed resources. Due to
limited bandwidths and heterogeneous architectures of
wireless devices, studies are needed into supporting
object-oriented frameworks over heterogeneous wireless
environments and optimizing system performance. In our
research work, we are working towards efficiently supporting
object-oriented environments over heterogeneous wireless
environments. In this paper, we report the issues and our
research results related to the efficient support of Java
RMI over a Bluetooth environment. In our work, we first
implement support for Java RMI over Bluetooth protocol
stacks, by incorporating a set of protocol stack layers for
Bluetooth developed by us (which we call JavaBT) and by
supporting the L2CAP layer with sockets that support the RMI
socket. In addition, we model the cost for the access
patterns of Java RMI communications. This cost model is used
to guide the formation and optimizations of the scatternets
of a Java RMI Bluetooth environment. In our approach, we
employ the well-known BTCP algorithm to observe initial
configurations for the number of piconets. Using the
communication-access cost as a criterion, we then employ a
spectral-bisection method to cluster the nodes in a piconet
and then use a bipartite matching scheme to form the
scatternet. Experimental results with the prototypes of Java
RMI support over a Bluetooth environment show that our
scatternet-formation algorithm incorporating an access-cost
model can further optimize the performances of such as
system.Copyright © 2005 John Wiley & Sons, Ltd.
CPE839:CSFS: a Java enabled network file storage system
- Author(s):Han Hua,Dai Yafei,Li Xiaoming
- Computer Networks and
Distributed Systems Laboratory, Department of
Computer Science and Technology, Peking University,
Beijing 100871, People's Republic of China,
- http://dx.doi.org/10.1002/cpe.839
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:The CSFS (cryptographic storage file
system) is a network file storage system that is suitable
for a small-to-medium sized networks such as a campus
network. In a CSFS, a lot of distributed file servers are
organized in a star-liked architecture. File names are
stored in a name server and file data are stored in
distributed file servers. A CSFS has good scalability and is
able to accommodate hundreds of file servers. We implemented
a CSFS in pure Java and tested the system with a benchmark.
The test results show that CSFS delivers acceptable
performance for general file operations. Its performance of
file upload and download is as efficient as FTP. It can
support more than 450 concurrent online users.Copyright
© 2005 John Wiley & Sons, Ltd.
CPE840:JavaSymphony: a new programming paradigm to
control and synchronize locality, parallelism and load
balancing for parallel and distributed computing
- Author(s):Thomas Fahringer,Alexandru Jugravu
- Institute for Software
Science, University ofVienna, Liechtensteinstrasse
22, A-1090 Vienna, Austria,
- http://dx.doi.org/10.1002/cpe.840
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:There has been an increasing research
interest in extending the use of Java towards
performance-oriented programming for distributed and
concurrent applications. JavaSymphony is a Java-based
programming paradigm that allows the programmer to control
parallelism, load balancing, and locality at a high level of
abstraction. Objects can be explicitly distributed and
migrated based on a high-level API to static/dynamic system
parameters and dynamic virtual distributed architectures,
which impose a virtual hierarchy on a distributed system of
physical computing nodes. In this paper we describe various
extensions to the original JavaSymphony API, which includes
a generalization of virtual architectures that can be used
to specify and to request arbitrary heterogeneous
distributed and concurrent architectures inside of a
JavaSymphony program. The number of threads that execute an
object's methods can be controlled dynamically through
single- and multi-threaded objects. Conventional Java
objects can be dynamically converted to JavaSymphony
objects. A (un)lock mechanism has been introduced in order
to avoid inconsistent modifications of objects or virtual
architectures. A sophisticated event mechanism for
asynchronous communication, coordination, and interaction is
provided. Several synchronization constructs including
distributed barrier synchronization and synchronization for
asynchronous method invocations have been included. Several
experiments are presented to demonstrate the effectiveness
and efficiency of JavaSymphony.Copyright © 2005
John Wiley & Sons, Ltd.
CPE841:Enterprise JavaBeans caching in clustered environments
- Author(s):Avraham Leff,James T. Rayfield
- IBM T. J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY 10598, U.S.A.,
- http://dx.doi.org/10.1002/cpe.841
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
We present the design of an Enterprise JavaBeans (EJBs) caching architecture, and show that EJB caching can greatly improve application throughput in clustered environments. Throughput is improved because data serving is offloaded from the database server to the cache-enabled application server. EJB caching is successful because it exploits the low-latency connections between the database server and application servers that exist in a clustered environment. An important feature of our architecture is that the caching function is transparent to applications that use it. The cache-enabled application server uses the same (EJB) programming model, and the same transactional semantics, as provided by non-caching architectures. Copyright © 2005 John Wiley & Sons, Ltd.
CPE842:Object-oriented programming in peer-to-peer system
- Author(s):Patrick Th. Eugster,Sebastien Baehni
- Distributed Programming
Laboratory, Swiss Federal Institute of Technology,
Lausanne, Switzerland,
- http://dx.doi.org/10.1002/cpe.842
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Leveraged by the success of
applications aiming at the free sharing of data in the
Internet, the paradigm of peer-to-peer (P2P) computing has
had substantial consideration devoted to it recently. This
paper presents a high-level abstraction for remote object
interaction in a P2P environment, called borrow/lend (BL).
We present the principles underlying our BL abstraction, and
illustrate how this abstraction can be used to program P2P
applications in Java. We contrast our abstraction with
established abstractions for distributed programming such as
the remote method invocation or the tuple space,
illustrating how the BL abstraction, obviously influenced by
such previous abstractions, unifies flavors of these, but
also how it captures the constraints specific to P2P
environments.Copyright © 2005 John Wiley &
Sons, Ltd.
CPE844:Features from functional programming for a C++
skeleton library
- Author(s):J. Striegnitz,H. Kuchen
- Research Center J?lich,
Central Institute for Applied Mathematics, 52425
J?lich, Germany,University of M?nster,
Department of Information Systems, Leonardo Campus
3, 48149 M?nster, Germany
http://dx.doi.org/10.1002/cpe.844
CCPE Title/Author Summary
CCPE Metadata
Abstract:Message passing based on libraries such
as MPI is typically used to program parallel machines with
distributed memory. This is efficient, but error prone.
Algorithmic skeletons are intended to simplify parallel
programming by increasing expressive power. The idea is to
offer typical parallel programming patterns as polymorphic
higher-order functions which are efficiently implemented in
parallel. The present paper describes how C++ templates and
operator overloading can be used in order to provide the
main features needed for algorithmic skeletons, namely
higher-order functions, partial applications and parametric
polymorphism. Experimental results based on a draft
implementation of our C++ skeleton library show that higher
expressive power can be gained without a significant
performance penalty.Copyright © 2005 John Wiley
& Sons, Ltd.
CPE846:A method-level comparison of the Java Grande and SPEC JVM98 benchmark suites
- Author(s):David Gregg,James Power,John Waldron
- Department of Computer Science, Trinity College, Dublin 2, Ireland,Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, Ireland,
- http://dx.doi.org/10.1002/cpe.846
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
In this paper we seek to provide a foundation for the study of the level of use of object-oriented techniques in Java programs in general, and scientific applications in particular. Specifically, we investigate the profiles of Java programs from a number of perspectives, including the use of class library methods, the size of methods called, the mode of invoke instruction used and the polymorphicity of call sites. We also present a categorization of the nature of small methods used in Java programs. We compare the Java Grande and SPEC JVM98 benchmark suites, and note a significant difference in the nature and composition of these suites, with the programs from the Java Grande suite demonstrating a less object-oriented approach. Copyright © 2005 John Wiley & Sons, Ltd.
CPE854:JOPI: a Java object-passing interface
- Author(s):Jameela Al-Jaroodi,Nader Mohamed,Hong Jiang,David Swanson
- Department of Electrical and Computer Engineering, Stevens Institute of Technology, Burchard 216, Castle Point on Hudson, Hoboken, NJ 07030, U.S.A.,Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0115, U.S.A.,
- http://dx.doi.org/10.1002/cpe.854
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Recently there has been an increasing interest in developing parallel programming capabilities in Java to harness the vast resources available in clusters, grids and heterogeneous networked systems. In this paper, we introduce a Java object-passing interface (JOPI) library. JOPI provides Java programmers with the necessary functionality to write object-passing parallel programs in distributed heterogeneous systems. JOPI provides a Message Passing Interface (MPI)-like interface that can be used to exchange objects among processes. In addition to the well-known benefits of the object-oriented development model, using objects to exchange information in JOPI is advantageous because it facilitates passing complex structures and enables the programmer to isolate the problem space from the parallelization problem. The run-time environment for JOPI is portable, efficient and provides the necessary functionality to deploy and execute parallel Java programs. Experiments were conducted on a cluster system and a collection of heterogeneous platforms to measure JOPI's performance and compare it with MPI. The results show good performance gains using JOPI. Copyright © 2005 John Wiley & Sons, Ltd.
CPE855:Advanced eager scheduling for Java-based adaptive parallel computing
- Author(s):Michael O. Neary,Peter Cappello
- Department of Computer Science, University of California, Santa Barbara, CA 93106, U.S.A.,
- http://dx.doi.org/10.1002/cpe.855
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Javelin 3 is a software system for developing large-scale, fault-tolerant, adaptively parallel applications. When all or part of their application can be cast as a master-worker or branch-and-bound computation, Javelin 3 frees application developers from concerns about inter-processor communication and fault tolerance among networked hosts, allowing them to focus on the underlying application. The paper describes a fault-tolerant task scheduler and its performance analysis. The task scheduler integrates work stealing with an advanced form of eager scheduling. It enables dynamic task decomposition, which improves host load-balancing in the presence of tasks whose non-uniform computational load is evident only at execution time. Speedup measurements are presented of actual performance on up to 1000 hosts. We analyze the expected performance degradation due to unresponsive hosts, and measure actual performance degradation due to unresponsive hosts. Copyright © 2005 John Wiley & Sons, Ltd.
CPE856:Implementation and performance of a
particle-in-cell code written in Java
- Author(s):G. Lapenta,S. Markidis,W. B. VanderHeyden ,Z. Budimlic
- Theoretical Division,
Los Alamos National Laboratory, Los Alamos, NM
87545, U.S.A.,
- http://dx.doi.org/10.1002/cpe.856
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Plasma simulation is an important
example of a high-performance computing application where
computer science issues are of great relevance. In a plasma,
each particle, electron or ion, interacts with the external
fields and with other particles in ways that can be readily
and effectively emulated using object-oriented programming.
However, the great cost of plasma simulations has
traditionally discouraged object-oriented implementations
due to their perceived inferior performance compared with
classic procedural FORTRAN or C. In the present paper, we
revisit this issue. We have developed a Java
particle-in-cell code for plasma simulation, called Parsek.
The paper considers different choices for the object
orientation and tests their performance. We find that
coarse-grained object orientation is faster and practically
immune from any degradation compared with a standard
procedural implementation (with static classes). The loss in
performance for a fine-grained object orientation is a
factor of about 50%, which can be almost completely
eliminated using advanced Java compilation techniques. The
Java code Parsek also provides an interesting realistic
application of high-performance computing to compare the
performance of Java with FORTRAN. We have conducted a series
of tests considering various Java implementations and
various FORTRAN implementations. We have also considered
different computer architectures and different Java Virtual
Machines and FORTRAN compilers. The conclusion is that with
Parsek, object-oriented Java can reach CPU speed
performances more or less comparable with procedural
FORTRAN. This conclusion is remarkable and it is in
agreement with the most recent benchmarks, but is at
variance with widely held misconceptions about the alleged
slowness of Java.Copyright © 2005 John Wiley &
Sons, Ltd.
CPE857:On the conditions necessary for removing abstraction penalties in OOLALA
- Author(s):Mikel Luján,T. L. Freeman,John R. Gurd
- Centre for Novel Computing, University of Manchester, Oxford Road, Manchester M13 9PL, U.K.,
- http://dx.doi.org/10.1002/cpe.857
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
OOLALA is an object-oriented linear algebra library designed to reduce the effort of software development and maintenance. In contrast with traditional (Fortran-based) libraries, it provides two high abstraction levels that significantly reduce the number of implementations necessary for particular linear algebra operations. Initial performance evaluations of a Java implementation of OOLALA show that the two high abstraction levels are not competitive with the low abstraction level of traditional libraries. These initial performance results motivate the present contribution - the characterization of a set of storage formats (data structures) and matrix properties (special features) for which implementations at the two high abstraction levels can be transformed into implementations at the low (more efficient) abstraction level.Copyright © 2005 John Wiley & Sons, Ltd.
CPE859:Collective communication for the HPJava
programming language
- Author(s):Han-ku Lee,Sang Boem Lim,Bryan Carpenter ,Geoffrey Fox
- School of Internet and
Multimedia Engineering, Konkuk University, Seoul
143-701, Korea,Pervasive Technology
Labs, Indiana University, Bloomington, IN 47404, U.S.A.,
- http://dx.doi.org/10.1002/cpe.859
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:This paper addresses functionality and
implementation of a HPJava version of the Adlib collective
communication library for data parallel programming. We
begin by illustrating typical use of the library, through an
example multigrid application. Then we describe
implementation issues for the high-level library. At a
software engineering level, we illustrate how the primitives
of the HPJava language assist in writing library methods
whose implementation can be largely independent of the
distribution format of the argument arrays. We also describe
a low-level API called mpjdev, which handles basic
communication underlying the Adlib implementation. Finally
we present some benchmark results, and some
conclusions.Copyright © 2005 John Wiley & Sons, Ltd.
CPE860:Ibis: a flexible and efficient Java-based Grid
programming environment
- Author(s):Rob V. van Nieuwpoort,Jason Maassen,Gosia Wrzesinska,Rutger F. H Hofman,Ceriel J. H. Jacobs,Thilo Kielmann,Henri E. Bal
- aculty of Sciences,
Department of Computer Science, Vrije Universiteit,
De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands,
- http://dx.doi.org/10.1002/cpe.860
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:In computational Grids,
performance-hungry applications need to simultaneously tap
the computational power of multiple, dynamically available
sites. The crux of designing Grid programming environments
stems exactly from the dynamic availability of compute
cycles: Grid programming environments (a) need to be
portable to run on as many sites as possible, (b) they need
to be flexible to cope with different network protocols and
dynamically changing groups of compute nodes, while (c) they
need to provide efficient (local) communication that enables
high-performance computing in the first place. Existing
programming environments are either portable (Java), or
flexible (Jini, Java Remote Method Invocation or (RMI)), or
they are highly efficient (Message Passing Interface). No
system combines all three properties that are necessary for
Grid computing. In this paper, we present Ibis, a new
programming environment that combines Java's run
everywhere portability both with flexible treatment of
dynamically available networks and processor pools, and with
highly efficient, object-based communication. Ibis can
transfer Java objects very efficiently by combining
streaming object serialization with a zero-copy protocol.
Using RMI as a simple test case, we show that Ibis
outperforms existing RMI implementations, achieving up to
nine times higher throughputs with trees of
objects.Copyright ? 2005 John Wiley & Sons, Ltd.
CPE861:Using MPI with C# and the Common Language Infrastructure
- Author(s):Jeremiah Willcock,Andrew Lumsdaine,Arch Robison
- Indiana University Open Systems Laboratory, Bloomington, IN 47405, U.S.A.,KAI Software Laboratory, Intel Corporation, 1906 Fox Drive, Champaign, IL 61820, U.S.A.
http://dx.doi.org/10.1002/cpe.861
CCPE Title/Author Summary
CCPE Metadata
Abstract:
We describe two different libraries for using the Message Passing Interface (MPI) with the C# programming language and the Common Language Infrastructure (CLI). The first library provides C# bindings that closely match the original MPI library specification. The second library presents a fully object-oriented interface to MPI and exploits modern language features of C#. The interfaces described here use the P/Invoke feature of the CLI to dispatch to a native implementation of MPI, such as LAM/MPI or MPICH. Performance results using the Shared Source CLI demonstrate only a small performance overhead. Copyright © 2005 John Wiley & Sons, Ltd.
CPE862:inAspect: interfacing Java and VSIPL applications
- Author(s):Torey Alford ,Vijay P. Shah ,Anthony Skjellum,Nicholas H. Younan
- MPI Software
Technology, Inc., 101 S. Lafayette St., Suite 33,
Starkville, MS 39759, U.S.A.,Department of
Electrical and Computer Engineering, Mississippi
State University, P.O. Box 9571, Mississippi State,
MS 39762, U.S.A.,Department of
Electrical and Computer Engineering, Mississippi
State University, P.O. Box 9571, Mississippi State,
MS 39762, U.S.A
http://dx.doi.org/10.1002/cpe.862
CCPE Title/Author Summary
CCPE Metadata
Abstract:In this paper, we discuss the origin,
design, performance, and directions of the inAspect
high-performance signal- and image-processing package for
Java. The Vector Signal and Image Processing Library (VSIPL)
community provides a standardized application programmer
interface (API) for high-performance signal and image
processing plus linear algebra with a C emphasis and
object-based design framework. Java programmers need
high-performance and/or portable APIs for this broad base of
functionality as well. inAspect addresses PDAs, embedded
Java boards, workstations, and servers, with emphasis on
embedded systems at present. Efforts include supporting
integer precisions and utilizing coordinate rotation digital
computer (CORDIC) algorithms - both aimed at added relevance
for limited-performance environments, such as present-day
PDAs.Copyright ? 2005 John Wiley & Sons, Ltd.
CPE863:Towards enabling peer-to-peer Grids
- Author(s):Shrideep Pallickara,Geoffrey Fox,Xi Rao
- Community Grid
Computing Laboratory, Indiana University, Suite 224,
501 North Morton Street, IN 47404, U.S.A.,
- http://dx.doi.org/10.1002/cpe.863
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:In this paper we propose a peer-to-peer
(P2P) Grid comprising resources such as relatively static
clients, high-end resources and a dynamic collection of
multiple P2P subsystems. We investigate the architecture of
the messaging and event service that will support such a
hybrid environment. We designed a distributed
publish-subscribe system NaradaBrokering for XML-specified
messages. NaradaBrokering provides support for centralized,
distributed and P2P (via JXTA) interactions. Here we
investigate and present our strategy for the integration of
JXTA into NaradaBrokering. The resultant system naturally
scales with multiple Peer Groups linked by
NaradaBrokering.Copyright ? 2005 John Wiley & Sons, Ltd.
CPE864:Generic programming for high-performance scientific applications
- Author(s):Lie-Quan Lee,Andrew Lumsdaine
- Open Systems Laboratory, Pervasive Technology Laboratories, Indiana University, Bloomington, IN 47405, U.S.A.,
- http://dx.doi.org/10.1002/cpe.864
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
We present case studies that apply generic programming to the development of high-performance parallel code for solving two archetypal partial differential equations (PDEs). We examine the overall structure of the example scientific codes and consider their generic implementation. With a generic approach it is a straightforward matter to reuse software components from different sources; implementations with components from the Iterative Template Library (ITL), the Matrix Template Library (MTL), Blitz++, A++/P++, and Fortran BLAS are presented. Our newly developed Generic Message Passing library is used for communication. We compare the generic implementations with equivalent implementations developed with alternative libraries and languages and discuss performance as well as software engineering issues. Copyright © 2005 John Wiley & Sons, Ltd.
CPE885:Concurrent urban legends
- Author(s):Peter A. Buhr,Ashif S. Harji
- University of Waterloo,
200 University Avenue West, Waterloo, Ontario,
Canada N2L 3G1,
- http://dx.doi.org/10.1002/cpe.885
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:This discussion addresses a number of
urban legends about concurrency in an attempt to separate
the myth from the fact. These legends are as follows: 1
concurrent = parallel; 2 coroutining = concurrency; 3
synchronization = mutual exclusion; 4 Dekker < Peterson;
5 concurrency = library; 6 inheritance anomaly = major
concurrency problem; 7 signalling = hints; 8 spurious
wakeup = efficiency. Identifying and understanding the
fundamental concepts underlying concurrency is essential to
the field. Equally important is not to confuse sequential
and concurrent concepts. Finally, approaches based solely on
efficiency are insufficient to justify a weak or difficult
to use concurrent concept or construct.Copyright ? 2005 John
Wiley & Sons, Ltd.
CPE886:Pipelines on heterogeneous systems: models and tools
- Author(s):F. Almeida,D. Gonzalez,M. Moreno,C. Rodriguez
- Dpto. Estad?stica, I.
O. y Computaci?n, Edificio F?sicas/Matem?ticas,
Universidad de La Laguna, La Laguna, Tenerife, Spain,
- http://dx.doi.org/10.1002/cpe.886
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:We study the performance of pipeline
algorithms in heterogeneous networks. The concept of
heterogeneity is not only restricted to the differences in
computational power of the nodes, but also refers to the
network capabilities. We develop a skeleton tool that allows
us an efficient block-cyclic mapping of pipelines on
heterogeneous systems. The tool supports pipelines with a
number of stages much larger than the number of physical
processors available. We derive an analytical formula that
allows us to predict the performance of pipelines in
heterogeneous systems. According to the analytical
complexity formula, numerical strategies to solve the
optimal mapping problem are proposed. The computational
results prove the accuracy of the predictions and
effectiveness of the approach.Copyright ? 2005 John Wiley
& Sons, Ltd.
CPE901:Distributed computing with Triana on the Grid
- Author(s):Ian Taylor,Ian Wang,Matthew Shields,Shalil Majithia
- School of Computer
Science, Cardiff University, P.O. Box 916, Cardiff
CF24 3XF, U.K.,School of Physics,
Astronomy and Computer Science, Queens Buildings,
Cardiff University, 5 The Parade, Cardiff CF24 3YB, U.K.,
- http://dx.doi.org/10.1002/cpe.901
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:In this paper, we describe Triana, a
distributed problem-solving environment that makes use of
the Grid to enable a user to compose applications from a set
of components, select resources on which the composed
application can be distributed and then execute the
application on those resources. We describe Triana's
current pluggable architecture that can support many
different modes of operation by the use of flexible writers
for many popular Web service choreography languages. We
further show, that the Triana architecture is
middleware-independent through the use of the Grid
Application Toolkit (GAT) API and demonstrate this through
the use of a GAT binding to JXTA. We describe how other
bindings being developed to Grid infrastructures, such as
OGSA, can seamlessly be integrated within the current
prototype by using the switching capability of the GAT.
Finally, we outline an experiment we conducted using this
prototype and discuss its current status.Copyright ? 2005
John Wiley & Sons, Ltd.
CPE890:Special Issue: The High Performance Architectural
Challenge: Mass Market versus Proprietary Components? This
article is a U.S. Government work and is in the public
domain in the U.S.A.
- Author(s):David B. Nelson
- National Coordination
Office for Information Technology Research and
Development, Arlington, VA 22230, U.S.A.
http://dx.doi.org/10.1002/cpe.890
CCPE Title/Author Summary
CCPE Metadata
Abstract:No abstract
CPE891:This article is a U.S. Government work and is in the public domain in the U.S.A.
- Author(s):Darren J. Kerbyson,Adolfy Hoisie,Harvey J. Wasserman
- Performance and Architectures Laboratory (PAL), Modeling, Algorithms and Informatics Group, CCS-3, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A.,
- http://dx.doi.org/10.1002/cpe.891
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
This work gives a detailed analysis of the relative performance of the recently installed Earth Simulator and the next top four systems in the Top500 list using predictive performance models. The Earth Simulator uses vector processing nodes interconnected using a single-stage, cross-bar network, whereas the next top four systems are built using commodity based superscalar microprocessors and interconnection networks. The performance that can be achieved results from an interplay of system characteristics, application requirements and scalability behavior. Detailed performance models are used here to predict the performance of two codes representative of the ASCI workload, namely SAGE and Sweep3D. The performance models encapsulate fully the behavior of these codes and have been previously validated on many large-scale systems. One result of this analysis is to size systems, built from the same nodes and networks as those in the top five, that will have the same performance as the Earth Simulator. In particular, the largest ASCI machine, ASCI Q, is expected to achieve a similar performance to the Earth Simulator on the representative workload. Published in 2005 by John Wiley & Sons, Ltd.
CPE892:This article is a U.S. Government work and is in the public domain in the U.S.A.
- Author(s):Jeffrey S. Vetter,Bronis R. de Supinski,Lynn Kissel,John May,Sheila Vaidya
- Oak Ridge National Laboratory, Oak Ridge, TN 37922, U.S.A.,Lawrence Livermore National Laboratory, Livermore, CA 94551, U.S.A.,
- http://dx.doi.org/10.1002/cpe.892
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Comparisons of high-performance computers based on their peak floating point performance are common but seldom useful when comparing performance on real workloads. Factors that influence sustained performance extend beyond a system's floating-point units, and real applications exercise machines in complex and diverse ways. Even when it is possible to compare systems based on their performance, other considerations affect which machine is best for a given organization. These include the cost, the facilities requirements (power, floorspace, etc.), the programming model, the existing code base, and so on. This paper describes some of the important measures for evaluating high-performance computers. We present data for many of these metrics based on our experience at Lawrence Livermore National Laboratory (LLNL), and we compare them with published information on the Earth Simulator. We argue that evaluating systems involves far more than comparing benchmarks and acquisition costs. We show that evaluating systems often involves complex choices among a variety of factors that influence the value of a supercomputer to an organization, and that the high-end computing community should view cost/performance comparisons of different architectures with skepticism. Published in 2005 by John Wiley & Sons, Ltd.
CPE893:Architectural specification for massively parallel
computers: an experience and measurement-based approach.
- Author(s):Ron Brightwell,William Camp,Benjamin Cole,Erik DeBenedictis,Robert Leland,James Tomkins,Arthur B. Maccabe
- Sandia National
Laboratories, Scalable Computer Systems, P.O. Box
5800, Albuquerque, NM 87185-1110, U.S.A.,
http://dx.doi.org/10.1002/cpe.893
CCPE Title/Author Summary
CCPE Metadata
Abstract:In this paper, we describe the hardware
and software architecture of the Red Storm system developed
at Sandia National Laboratories. We discuss the evolution of
this architecture and provide reasons for the different
choices that have been made. We contrast our approach of
leveraging high-volume, mass-market commodity processors to
that taken for the Earth Simulator. We present a comparison
of benchmarks and application performance that support our
approach. We also project the performance of Red Storm and
the Earth Simulator. This projection indicates that the Red
Storm architecture is a much more cost-effective approach to
massively parallel computing. Published in 2005 by John
Wiley & Sons, Ltd.
CPE894:Practical performance portability in the Parallel
Ocean Program (POP).This article is a U.S. Government work
and is in the public domain in the U.S.A.
- Author(s):P. W. Jones, P. H. Worley,Y. Yoshida,J. B. White III,J. Levesque
- Theoretical Division,
Los Alamos National Laboratory, T-3, MS B216, Los
Alamos, NM 87545-1663, U.S.A.,Computer Science and
Mathematics Division, Oak Ridge National Laboratory,
P.O. Box 2008, Oak Ridge, TN 37831-6367, U.S.A.,Central Research
Institute of Electric Power Industry, 1646 Abiko
Abiko-shi Chiba, 270-1194, Japan,Cray Inc., 411 First
Avenue South, Suite 600, Seattle, WA 98104-2860, U.S.A.
http://dx.doi.org/10.1002/cpe.894
CCPE Title/Author Summary
CCPE Metadata
Abstract:The design of the Parallel Ocean
Program (POP) is described with an emphasis on portability.
Performance of POP is presented on a wide variety of
computational architectures, including vector architectures
and commodity clusters. Analysis of POP performance across
machines is used to characterize performance and identify
improvements while maintaining portability. A new design of
the POP model, including a cache blocking and land point
elimination scheme, is described with some preliminary
performance results. Published in 2005 by John Wiley &
Sons, Ltd.
CPE895:HPCx: towards capability computing
- Author(s):Mike Ashworth,Ian J. Bush,Martyn F. Guest,Andrew G. Sunderland,Stephen Booth,Joachim Hein,Lorna Smith,Kevin Stratford,Alessandro Curioni
- Computational Science and Engineering Department, CLRC Daresbury Laboratory, Daresbury, Warrington, Cheshire WA4 4AD, U.K.,Edinburgh Parallel Computing Centre, University of Edinburgh, JCMB, The King's Buildings, Mayfield Road, Edinburgh EH9 3JZ, U.K.,IBM Research, Zurich Research Laboratory, 8803 Riuschlikon, Switzerland
http://dx.doi.org/10.1002/cpe.895
CCPE Title/Author Summary
CCPE Metadata
Abstract:
We introduce HPCx-the U.K.'s new National HPC Service-which aims to deliver a world-class service for capability computing to the U.K. scientific community. HPCx is targeting an environment that will both result in world-leading science and address the challenges involved in scaling existing codes to the capability levels required. Close working relationships with scientific consortia and user groups throughout the research process will be a central feature of the service. A significant number of key user applications have already been ported to the system. We present initial benchmark results from this process and discuss the optimization of the codes and the performance levels achieved on HPCx in comparison with other systems. We find a range of performance with some algorithms scaling far better than others. Copyright © 2005 John Wiley & Sons, Ltd.
CPE896:Special Issue: High-Performance Computing in Geosciences
- Author(s):M. Peszynska,M. F. Wheeler
- Department of Mathematics, Oregon State University, Corvallis, OR, U.S.A.,Center for Subsurface Modeling, The University of Texas at Austin, Austin, TX, U.S.A.
http://dx.doi.org/10.1002/cpe.896
CCPE Title/Author Summary
CCPE Metadata
Abstract:
No abstract
CPE897:Full waveform seismic inversion using a
distributed system of computers
- Author(s):Mrinal K. Sen,Indrajit G. Roy,Carlos Torres-Verdin
- Institute for
Geophysics, John A. and Katherine G. Jackson School
of Geosciences, The University of Texas at Austin,
Austin, TX 78759, U.S.A.,Department of
Petroleum and Geo-systems Engineering, The
University of Texas at Austin, Austin, TX 78759, U.S.A.
http://dx.doi.org/10.1002/cpe.897
CCPE Title/Author Summary
CCPE Metadata
Abstract:The aim of seismic waveform inversion
is to estimate the elastic properties of the Earth's
subsurface layers from recordings of seismic waveform data.
This is usually accomplished by using constrained
optimization often based on very simplistic assumptions.
Full waveform inversion uses a more accurate wave
propagation model but is extremely difficult to use for
routine analysis and interpretation. This is because
computational difficulties arise due to: (1) strong
nonlinearity of the inverse problem; (2) extreme
ill-posedness; and (3) large dimensions of data and model
spaces. We show that some of these difficulties can be
overcome by using: (1) an improved forward problem solver
and efficient technique to generate sensitivity matrix; (2)
an iteration adaptive regularized truncated Gauss-Newton
technique; (3) an efficient technique for matrix-matrix and
matrix-vector multiplication; and (4) a parallel programming
implementation with a distributed system of processors. We
use a message-passing interface in the parallel programming
environment. We present inversion results for synthetic and
field data, and a performance analysis of our parallel
implementation. Copyright © 2005 John Wiley &
Sons, Ltd.
CPE898:A simulation and data analysis system for large-scale, data-driven oil reservoir simulation studies
- Author(s):Tahsin Kurc,Umit Catalyurek,Xi Zhang,Joel Saltz,Ryan Martino,Mary Wheeler,Małgorzata Peszyńska,Alan Sussman,Christian Hansen,Mrinal Sen,Roustam Seifoullaev,Paul Stoffa,Carlos Torres-Verdin,Manish Parashar
- Biomedical Informatics Department, The Ohio State University, 3184 Graves Hall, 333 West 10th Avenue, Columbus, OH 43210, U.S.A.,Center for Subsurface Modeling, Institute for Computational Engineering and Sciences, The University of Texas at Austin, 201 East 24th Street, ACE 5.324, Campus Mail C0200, Austin, TX 78712, U.S.A.,Department of Mathematics, Kidder Hall 368, Oregon State University, Corvallis, OR 97331-4605, U.S.A.,Department of Computer Science, University of Maryland, A.V. Williams Building, College Park, MD 20742, U.S.A.,Institute for Geophysics, The University of Texas at Austin, 4412 Spicewood Springs Road, Building 600, Austin, TX 78758-8500, U.S.A.,Department of Petroleum and Geosystems Engineering, The University of Texas at Austin, 1 University Station, CO300, Austin, TX 78712-0228, U.S.A.,Department of Electrical and Computer Engineering, Rutgers, State University of New Jersey, 94 Brett Road, Piscataway, NJ 08854-8058, U.S.A.
http://dx.doi.org/10.1002/cpe.898
CCPE Title/Author Summary
CCPE Metadata
Abstract:
The main goal of oil reservoir management is to provide more efficient, cost-effective and environmentally safer production of oil from reservoirs. Numerical simulations can aid in the design and implementation of optimal production strategies. However, traditional simulation-based approaches to optimizing reservoir management are rapidly overwhelmed by data volume when large numbers of realizations are sought using detailed geologic descriptions. In this paper, we describe a software architecture to facilitate large-scale simulation studies, involving ensembles of long-running simulations and analysis of vast volumes of output data. Copyright © 2005 John Wiley & Sons, Ltd.
CPE899:Enabling interactive and collaborative oil
reservoir simulations on the Grid
- Author(s):Manish Jack 4, Mary Wheeler Parashar,Rajeev Muralidhar,Wonsuck Lee,Dorian Arnold,Jack Dongarra ,Mary Wheeler
- The Applied Software
Systems Laboratory, Rutgers University, Piscataway,
NJ 08854, U.S.A.,Computing Science
Research, Bell Laboratories, Murray Hill, NJ 07974, U.S.A.,Computer Science
Department, University of Wisconsin, Madison, WI
53706, U.S.A.,Computer Science
Department, University of Tennessee at Knoxville,
Knoxville, TN 37996, U.S.A.,Center for Subsurface
Modeling, University of Texas at Austin, Austin, TX
78712, U.S.A.
http://dx.doi.org/10.1002/cpe.899
CCPE Title/Author Summary
CCPE Metadata
Abstract:Grid-enabled infrastructures and
problem-solving environments can significantly increase the
scale, cost-effectiveness and utility of scientific
simulations, enabling highly accurate simulations that
provide in-depth insight into complex phenomena. This paper
presents a prototype of such an environment, i.e. an
interactive and collaborative problem-solving environment
for the formulation, development, deployment and management
of oil reservoir and environmental flow simulations in
computational Grid environments. The project builds on three
independent research efforts: (1) the IPARS oil reservoir
and environmental flow simulation framework; (2) the
NetSolve Grid engine; and (3) the Discover Grid-based
computational collaboratory. Its primary objective is to
demonstrate the advantages of an integrated simulation
infrastructure towards effectively supporting scientific
investigation on the Grid, and to investigate the components
and capabilities of such an infrastructure. Copyright
© 2005 John Wiley & Sons, Ltd.
CPE900:Large-scale density-driven flow simulations using parallel unstructured Grid adaptation and local multigrid methods
CPE878:Special Issue: Foundations of Middleware Technologies
- Author(s):Priya Narasimhan
- Carnegie Mellon University
http://dx.doi.org/10.1002/cpe.878
CCPE Title/Author Summary
CCPE Metadata
Abstract:
No abstract
CPE879:On the modelling of publish/subscribe
communication systems
- Author(s):R. Baldoni,R. Beraldi,S. Tucci Piergiovanni,A. Virgillito
- Dipartimento di
Informatica e Sistemistica, Universit? di Roma La
Sapienza, Via Salaria 113, 00198 Roma, Italy,
- http://dx.doi.org/10.1002/cpe.879
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:This paper presents a formal framework
of a distributed computation based on a publish/subscribe
system. The framework abstracts the system through two
delays, namely the subscription/unsubscription delay and the
diffusion delay. This abstraction allows one to model
concurrent execution of publication and subscription
operations without waiting for the stability of the system
state and to define a Liveness property which gives the
conditions for the presence of a notification event in the
global history of the system. This formal framework allows
us to analytically define a measure of the effectiveness of
a publish/subscribe system, which reflects the percentage of
notifications guaranteed by the system to subscribers. A
simulation study confirms the validity of the analytical
measurements. Copyright © 2005 John Wiley &
Sons, Ltd.
CPE880:Distributed filter processes
- Author(s):Rushikesh K. Joshi
- Department of Computer
Science and Engineering, Indian Institute of
Technology Bombay, Mumbai 400 076, India
http://dx.doi.org/10.1002/cpe.880
CCPE Title/Author Summary
CCPE Metadata
Abstract:A programming model called distributed
filter processes for transparent filtering in distributed
systems is presented. Filter processes are modeled as
extensions to distributed processes. Messages can be
captured transparently by intermediate filter processes. The
programming abstractions are presented with applications to
the evolution of programs.Copyright © 2005 John
Wiley & Sons, Ltd.
CPE881:Benchmarking message-oriented middleware: TIB/RV versus SonicMQ
- Author(s):Piyush Maheshwari,Michael Pang
- School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia,SAP Australia Pty. Ltd, 168 Walker Street, North Sydney, NSW 2061, Australia
http://dx.doi.org/10.1002/cpe.881
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Message-oriented middleware (MOM) has become a vital part of the complex application integration projects. MOM is used to pass data and workflow in the form of messages between different enterprise applications. The performance of integrated applications greatly depends on how effectively the MOM performs. This paper presents a benchmark comparison between two industry well-known MOMs-TIBCO Rendezvous (TIB/RV) and SonicMQ. Although the two MOMs are very similar in certain respects, their native implementation and architecture are very different. We provide an unbiased benchmark reference to the middleware selection process. The primary objective of our work is to evaluate and compare the MOMs by testing their effectiveness in the delivery of messages in publish/subscribe and point-to-point message domains, their program stability and the system resource utilization. Copyright © 2005 John Wiley & Sons, Ltd.
CPE882:MEAD: support for Real-Time Fault-Tolerant CORBA
- Author(s):P. Narasimhan ,T. A Dumitra, A. M. Paulos,S. M. Pertet,C. F. Reverte,J. G Slember,D. Srivastava
- Electrical and
Computer Engineering Department, Carnegie Mellon
University, 5000 Forbes Avenue, Pittsburgh, PA
15213-3890, U.S.A.,
- http://dx.doi.org/10.1002/cpe.882
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:The OMG's Real-Time CORBA
(RT-CORBA) and Fault-Tolerant CORBA (FT-CORBA)
specifications make it possible for today's CORBA
implementations to exhibit either real-time or fault
tolerance in isolation. While real-time requires a priori
knowledge of the system's temporal operation, fault
tolerance necessarily deals with faults that occur
unexpectedly, and with possibly unpredictable fault recovery
times. The MEAD (Middleware for Embedded Adaptive
Dependability) system attempts to identify and to reconcile
the conflicts between real-time and fault tolerance, in a
resource-aware manner, for distributed CORBA applications.
MEAD supports transparent yet tunable fault tolerance in
real-time, proactive dependability, resource-aware system
adaptation to crash, communication and timing faults with
bounded fault detection and fault recovery.Copyright ? 2005
John Wiley & Sons, Ltd.
CPE889:Lightweight monitoring of MPI programs in real time
- Author(s):German Florez ,Zhen Liu,Susan M. Bridges,Anthony Skjellum,Rayford B. Vaughn
- Center for Computer
Security Research and High Performance Computer
Laboratory, Department of Computer Science and
Engineering, Mississippi State University, MS
39762-9637, U.S.A.,
- http://dx.doi.org/10.1002/cpe.889
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Current technologies allow efficient
data collection by several sensors to determine an overall
evaluation of the status of a cluster. However, no previous
work of which we are aware analyzes the behavior of the
parallel programs themselves in real time. In this paper, we
perform a comparison of different artificial intelligence
techniques that can be used to implement a lightweight
monitoring and analysis system for parallel applications on
a cluster of Linux workstations. We study the accuracy and
performance of deterministic and stochastic algorithms when
we observe the flow of both library-function and
operating-system calls of parallel programs written with C
and MPI. We demonstrate that monitoring of MPI programs can
be achieved with high accuracy and in some cases with a
false-positive rate near 0% in real time, and we show that
the added computational load on each node is small. As an
example, the monitoring of function calls using a hidden
Markov model generates less than 5% overhead. The proposed
system is able to automatically detect deviations of a
process from its expected behavior in any node of the
cluster, and thus it can be used as an anomaly detector, for
performance monitoring to complement other systems or as a
debugging tool. Copyright © 2005 John Wiley &
Sons, Ltd.
CPE902:A static mapping heuristics to map parallel applications to heterogeneous computing systems
- Author(s):Ranieri Baraglia,Renato Ferrini,Pierluigi Ritrovato
- High Performance Computing Laboratory, Information Science and Technologies Institute (ISTI), Italian National Research Council (CNR), Via G. Moruzzi, 1, 56126 Pisa, Italy,Centro di Ricerca in Matematica Pura ed Applicata, Universitá degli Studi di Salerno, Fisciano (SA), Italy
http://dx.doi.org/10.1002/cpe.902
CCPE Title/Author Summary
CCPE Metadata
Abstract:
In order to minimize the execution time of a parallel application running on a heterogeneously distributed computing system, an appropriate mapping scheme is needed to allocate the application tasks to the processors. The general problem of mapping tasks to machines is a well-known NP-hard problem and several heuristics have been proposed to approximate its optimal solution. In this paper we propose a static graph-based mapping algorithm, called Heterogeneous Multi-phase Mapping (HMM), which permits suboptimal mapping of a parallel application onto a heterogeneous computing distributed system by using a local search technique together with a tabu search meta-heuristic. HMM allocates parallel tasks by exploiting the information embedded in the parallelism forms used to implement an application, and considering an affinity parameter, that identifies which machine in the heterogeneous computing system is most suitable to execute a task. We compare HMM with some leading techniques and with an exhaustive mapping algorithm. We also give an example of mapping of two real applications using HMM. Experimental results show that HMM performs well demonstrating the applicability of our approach. Copyright © 2005 John Wiley & Sons, Ltd.
CPE906:GridBLAST: a Globus-based high-throughput implementation of BLAST in a Grid computing framework
- Author(s):Arun Krishnan
- BioInformatics Institute, 30 Biopolis Street, #07-01, Singapore 138671
http://dx.doi.org/10.1002/cpe.906
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Improvements in the performance of processors and networks have made it feasible to treat collections of workstations, servers, clusters and supercomputers as integrated computing resources or Grids. However, the very heterogeneity that is the strength of computational and data Grids can also make application development for such an environment extremely difficult. Application development in a Grid computing environment faces significant challenges in the form of problem granularity, latency and bandwidth issues as well as job scheduling. Currently existing Grid technologies limit the development of Grid applications to certain classes, namely, embarrassingly parallel, hierarchical parallelism, work flow and database applications. Of all these classes, embarrassingly parallel applications are the easiest to develop in a Grid computing framework. The work presented here deals with creating a Grid-enabled, high-throughput, standalone version of a bioinformatics application, BLAST, using Globus as the Grid middleware. BLAST is a sequence alignment and search technique that is embarrassingly parallel in nature and thus amenable to adaptation to a Grid environment. A detailed methodology for creating the Grid-enabled application is presented, which can be used as a template for the development of similar applications. The application has been tested on a ‘mini-Grid’ testbed and the results presented here show that for large problem sizes, a distributed, Grid-enabled version can help in significantly reducing execution times. Copyright © 2005 John Wiley & Sons, Ltd.
CPE948:Special Issue: Third IEEE International Workshop
on High Performance Computational Biology (HiCOMB 2004)
http://dx.doi.org/10.1002/cpe.948
CCPE Title/Author Summary
CCPE Metadata
Abstract:No abstract
CPE949:Study of a highly accurate and fast protein-ligand
docking method based on molecular dynamics
- Author(s):C. L. Brooks III,M. Taufer,M. Crowley,D. J. Price,A. A. Chien
- Department of
Molecular Biology (TPC6), The Scripps Research
Institute, 10550 North Torrey Pines Road, La Jolla,
CA 92037, U.S.A.,
- http://dx.doi.org/10.1002/cpe.949
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Few methods use molecular dynamics
simulations in concert with atomically detailed force fields
to perform protein-ligand docking calculations because they
are considered too time demanding, despite their accuracy.
In this paper we present a docking algorithm based on
molecular dynamics which has a highly flexible computational
granularity. We compare the accuracy and the time required
with well-known, commonly used docking methods such as
AutoDock, DOCK, FlexX, ICM, and GOLD. We show that our
algorithm is accurate, fast and, because of its flexibility,
applicable even to loosely coupled distributed systems such
as desktop Grids for docking. Copyright © 2005 John
Wiley & Sons, Ltd.
CPE950:Parallel protein folding with STAPL
- Author(s):Nancy M. Amato,Shawna Thomas ,Gabriel Tanase ,Lucia K. Dale,Jose M. Moreira,Lawrence Rauchwerger
- Parasol Laboratory,
Department of Computer Science, Texas A&M
University, College Station, TX 77843-3112, U.S.A.,Parasol Laboratory,
Department of Computer Science, Texas A&M
University, College Station, TX 77843-3112, U.S.A,University of the
South, Sewanee, TN 37383-1000, U.S.A.,IBM TJ Watson Research
Center, Yorktown Heights, NY 10598-0218, U.S.A.,
- http://dx.doi.org/10.1002/cpe.950
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:The protein-folding problem is a study
of how a protein dynamically folds to its so-called native
state - an energetically stable, three-dimensional
conformation. Understanding this process is of great
practical importance since some devastating diseases such as
Alzheimer's and bovine spongiform encephalopathy (Mad
Cow) are associated with the misfolding of proteins. We have
developed a new computational technique for studying protein
folding that is based on probabilistic roadmap methods for
motion planning. Our technique yields an approximate map of
a protein's potential energy landscape that contains
thousands of feasible folding pathways. We have validated
our method against known experimental results. Other
simulation techniques, such as molecular dynamics or Monte
Carlo methods, require many orders of magnitude more time to
produce a single, partial trajectory. In this paper we
report on our experiences parallelizing our method using
STAPL (Standard Template Adaptive Parallel Library) that is
being developed in the Parasol Lab at Texas A&M. An
efficient parallel version will enable us to study larger
proteins with increased accuracy. We demonstrate how STAPL
enables portable efficiency across multiple platforms,
ranging from small Linux clusters to massively parallel
machines such as IBM's BlueGene/L, without user code
modification.Copyright © 2005 John Wiley &
Sons, Ltd.
CPE951:Dynamic programming for LR-PCR segmentation of
bacterium genomes
- Author(s):Rumen Andonov,Dominique Lavenier,Philippe Veber,Lucia K. Dale,Nicola Yanev
- IRISA, Campis de
Beaulieu, 35042 Rennes Cedex, France,University of the
South, Sewanee, TN 37383-1000, U.S.A.,University of Sofia, 5
J. Bouchier str., 1126 Sofia, Bulgaria
http://dx.doi.org/10.1002/cpe.951
CCPE Title/Author Summary
CCPE Metadata
Abstract:Bacterium genome plasticity can
efficiently be studied by the long-range polymerase chain
reaction (LR-PCR) technique: genomes of different strains
are split into hundreds of short segments which, after
LR-PCR amplification, are used to sketch profiles. The
segments have to: (1) cover the entire genome; (2) overlap
each other; and (3) be of nearly identical size. This paper
addresses the problem of finding a list of segments
satisfying these constraints as much as possible. Two
algorithms based on dynamic programming approach are
presented. They differ on the optimization criteria for
measuring the quality of the covering. The first considers
the maximal deviation of the segment lengths relatively to
an ideal length. The second automatically finds a segment
length that minimizes the maximal deviation. .Copyright
© 2005 John Wiley & Sons, Ltd.
CPE952:Parallel RNA secondary structure prediction using
stochastic context-free grammars
- Author(s):Tong Liu, Bertil Schmidt
- School of Computer
Engineering, Nanyang Technological University,
Singapore 639798,
- http://dx.doi.org/10.1002/cpe.952
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:With the growing number of known RNA
genes efficient and accurate computational analysis of RNA
sequences is becoming increasingly important. Stochastic
context-free grammars (SCFGs) are used as a popular tool to
model RNA secondary structures. However, algorithms for
aligning a RNA sequence to a SCFG are highly
compute-intensive. This has so far limited applications of
SCFGs to relatively small problem sizes. In this paper we
present the design of a parallel RNA sequence-structure
alignment algorithm. Its implementation on parallel systems
leads to significant runtime savings. This makes it possible
to compute sequence-structure alignments of even the largest
RNAs such as small subunit ribosomal rRNAs and long subunit
ribosomal rRNAs in reasonable time. .Copyright ©
2005 John Wiley & Sons, Ltd.
CPE953:Peptide identification via constrained
multi-objective optimization: Pareto-based genetic algorithms
- Author(s):J. M. Malard,A. Heredia-Langner,W. R. Cannon,R. Mooney, D. J. Baxter
- Pacific Northwest
National Laboratory, Richland, WA, U.S.A.,
- http://dx.doi.org/10.1002/cpe.953
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:Automatic peptide identification from
collision-induced dissociation tandem mass spectrometry data
using optimization techniques is made difficult by large
plateaus in the fitness landscapes of scoring functions, by
the fuzzy nature of constraints from noisy data and by the
existence of diverse but equally justifiable probabilistic
models of peak matching. Here, two different scoring
functions are combined into a parallel multi-objective
optimization framework. It is shown how multi-objective
optimization can be used to empirically test for
independence between distinct scoring functions. The loss of
selection pressure during the evolution of a population of
putative peptide sequences by a Pareto-driven genetic
algorithm is addressed by alternating between two
definitions of fitness according to a numerical threshold.
Copyright © 2005 John Wiley & Sons, Ltd.
CPE954:RAxML-II: a program for sequential, parallel and distributed inference of large phylogenetic trees
- Author(s):Alexandros Stamatakis,Thomas Ludwig,Harald Meier
- Technische Universität München, Lehrstuhl für Rechnertechnik und Rechnerorganisation/I10, Boltzmannstrasse 3, D-85748 Garching b. München, Germany,Ruprecht-Karls-Universität Heidelberg, Institut für Informatik, Im Neuenheimer Feld 348, D-69120 Heidelberg, Germany,
- http://dx.doi.org/10.1002/cpe.954
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
Inference of phylogenetic trees comprising hundreds or even thousands of organisms based on the maximum likelihood method is computationally intensive. We present simple heuristics which yield accurate trees for synthetic as well as real data and significantly reduce execution time. Those heuristics have been implemented in a sequential, parallel, and distributed program called RAxML-II, which is freely available as open source code. We compare the performance of the sequential program with PHYML and MrBayes which-to the best of our knowledge-are currently the fastest and most accurate programs for phylogenetic tree inference based on statistical methods. Experiments are conducted using 50 synthetic 100 taxon alignments as well as nine real-world alignments comprising 101 up to 1000 sequences. RAxML-II outperforms MrBayes for real-world data both in terms of speed and final likelihood values. Furthermore, for real data RAxML-II requires less time (a factor of 2-8) than PHYML to reach PHYML's final likelihood values and yields better final trees due to its more exhaustive search strategy. For synthetic data MrBayes is slightly more accurate than RAxML-II and PHYML but significantly slower. The non-deterministic parallel program shows good speedup values and has been used to infer a 10 000-taxon tree comprising organisms from the domains Eukarya, Bacteria, and Archaea. Copyright © 2005 John Wiley & Sons, Ltd.
CPE887:Loosely coordinated coscheduling in the context of other approaches for dynamic job scheduling: a survey
- Author(s):A. C. Sodan
- University of Windsor, Computer Science, 401 Sunset Avenue, Windsor, Ontario, Canada N9B 3P4
http://dx.doi.org/10.1002/cpe.887
CCPE Title/Author Summary
CCPE Metadata
Abstract:
Loosely coordinated (implicit/dynamic) coscheduling is a time-sharing approach that originates from network of workstations environments of mixed parallel/serial workloads and limitedsoftware support. It is meant to be an easy-to-implement and scalable approach. Considering that the percentage of clusters in parallel computing is increasing and easily portable software is needed, loosely coordinated coscheduling becomes an attractive approach for dedicated machines. Loose coordination offers attractive features as a dynamic approach. Static approaches for local job scheduling assign resources exclusively and non-preemptively. Such approaches still remain beyond the desirable resource utilization and average response times. Conversely, approaches for dynamic scheduling of jobs can preempt resources and/or adapt their allocation. They typically provide better resource utilization and response times. Existing dynamic approaches are full preemption with checkpointing, dynamic adaptation of node/CPU allocation, and time sharing via gang or loosely coordinated coscheduling. This survey presents and compares the different approaches, while particularly focusing on the less well-explored loosely coordinated time sharing. The discussion particularly focuses on the implementation problems, in terms of modification of standard operating systems, the runtime system and the communication libraries. Copyright © 2005 John Wiley & Sons, Ltd.
CPE888:Neuroscience instrumentation and distributed analysis of brain activity data: a case for eScience on global Grids
- Author(s):Rajkumar Buyya,Susumu Date,Yuko Mizuno-Matsumoto,Srikumar Venugopal,David Abramson
- Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Australia,Graduate School of Information Science and Technology, Department of Bioinformatics Engineering, Osaka University, Japan,Department of Information Systems Engineering, Graduate School of Osaka University, Japan,School of Computer Science and Software Engineering, Monash University, Melbourne, Australia
http://dx.doi.org/10.1002/cpe.888
CCPE Title/Author Summary
CCPE Metadata
Abstract:
distribution
need
CPE918:http://nenya.ms.mff.cuni.cz/projects/corba/oopsla-workshop-03/
- Author(s):Paul Brebner,Emmanuel Cecchet,Julie Marguerite,Petr Tůma,Octavian Ciuhandu,Bruno Dufour,Lieven Eeckhout,Stéphane Frénot,Arvind S. Krishna,John Murphy,Clark Verbrugge
- CSIRO ICT Centre, G.P.O. Box 664, Canberra, ACT 2601, Australia,INRIA Rhône-Alpes/ObjectWeb, 655 avenue de l'Europe, 38330 Montbonnot, St. Martin, France,INRIA Rhône-Alpes/ObjectWeb, 655 avenue de l'Europe, 38330 Montbonnot, St. Martin, France,Department of Software Engineering, Charles University, Malostranské náměstí 25, Praha 1, Czech Republic,Performance Engineering Laboratory, Dublin City University, Dublin 9, Ireland,School of Computer Science, McGill University, Montréal, Québec, Canada H3A 2A7,Department of Electronics and Information Systems (ELIS), Ghent University, St. Pietersnieuwstraat 41, B9000 Gent, Belgium,INRIA Arès, INSA-CITI Bat. Léonard de Vinci, 69661 Villeurbanne Cedex, France,Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235, U.S.A.,
- http://dx.doi.org/10.1002/cpe.918
- CCPE Title/Author Summary
- CCPE Metadata
- Abstract:
The report summarizes the results of the Workshop on Middleware Benchmarking held during OOPSLA 2003. The goal of the workshop was to help advance the current practice of gathering performance characteristics of middleware implementations through benchmarking. The participants of the workshop have focused on identifying requirements of and obstacles to middleware benchmarking and forming a position on the related issues. Selected requirements and obstacles are presented, together with guidelines to adhere to when benchmarking, open issues of current practice, and perspectives on further research. Copyright © 2005 John Wiley & Sons, Ltd.