Concurrency and Computation:Practice and Experience
Here are instructions for
Grid Computing Environment Special Issue (CLOSED) due Mid July 2001 and here
are submitted papers.
Here are instructions for Java Grande/ISCOPE
Special Issue (CLOSED) due August 31 2001 and here are papers
This homepage supports referees.
Standard Referee Form
Journal Vision
WILEY
Journal
Home Page
Editorial Board
Useful CiteSeer
Citation Finder from NEC Research
Below you will find active abstracts.
Please send email to Geoffrey Fox,
gcf@indiana.edu, if you wish to referee any article.
We will send you link to full text online.
Articles under Consideration for Journal
- C467: VGDS: A Distributed Data Structure Framework for
Scientific Computation
- Abstract: This paper gives an overview of the VGDS
(Virtual Global Data Structure) project. The VGDS effort focuses on developing
an integrated, distributed environment that allows fast prototyping of a
diverse set of simulation problems in irregular scientific and engineering
domains, focusing on computations with irregular and adaptive structures. The
framework defines two base libraries: unstructured mesh and adaptive tree, that
capture major data structures involved inirregular scientic computation. The
framework defines multiple layers of class libraries which work together to
provide data-parallel representations to application developers while
encapsulate parallel implementation details into lower layers of the framework.
The layered approach enables easy extension of the base libraries to a variety
of application-specific data structures. Experimental results on a network of
workstations is reported.
- Pangfeng Liu, Jan-Jan Wu
- mailto:wuj@iis.sinica.edu.twmailto:pangfeng@cs.ccu.edu.tw
- Submitted April 21, 2000
- Comments To Authors October 1 2000
- C468: cJVM: A Cluster JVM Architecture for Single System
Image
- Abstract:cJVM is a Java Virtual Machine (JVM) which
provides a single system image of a traditional JVM while executing in a
distributed fashion on the nodes of a cluster. cJVM virtualizes the cluster,
supporting any pure Java application without requiring that applications be
tailored specifically for it. The aim of cJVM is to obtain improved scalability
for a class of Java Server Applications by distributing the application's work
among the cluster's computing resources. cJVM is based on a novel object model
which distinguishes between an application's view of an object (e.g., every
object is a unique data structure) and its implementation (e.g., objects may
have consistent replications on different nodes). This enables us to exploit
knowledge on the usage of individual objects to improve performance (e.g.,
using object replications to increase locality of access to objects).
Currently, we have already completed a prototype which runs pure Java
applications on a cluster of NT workstations connected via a Myrinet fast
switch. The prototype provides a single system image to applications,
distributing the application's threads and objects over the cluster. We have
used cJVM to run without change a real Java Server Application containing over
10K loc and achieve high scalability for it on a cluster. We also achieved
linear speedup for another application with a large number of independent
threads. This paper discusses cJVM's architecture and implementation. It
focuses on achieving a single system image for a traditional JVM on a cluster
while describing in short how we aim at obtaining scalability.
- Yariv Aridor, Michael Factor and Avi Teperman
- mailto:teperman@il.ibm.com
- Submitted May 1, 2000
- Comments To Authors October 1 2000; Accepted November 23
2000
- C474: Parallel solution of rotating flows in cavities
- Abstract:In this paper, we investigate the parallel
solution to the rotating internal flow problems, using the Navier-Stokes
equations as proposed in [16] and [15]. A Runge-Kutta time-stepping scheme was
applied to the equations and both sequential and message- passing
implementations were developed, the latter using MPI , and were tested on an
SGI Origin200 distributed, global shared memory parallel computer. The results
show that our approach to parallelize the sequential implementation requires
little effort whilst providing good results even for medium-sized problems, on
this particular computer.
- Rudnei Dias da Cunha and Alvaro Luiz de Bortoli
- mailto:rudnei@mat.ufrgs.br
- Submitted May 14, 2000
- Comments To Authors October 1 2000 and Accepted October 9
2000
- C475: Analysis and Measurement of the Effect of Kernel
Locks in SMP Systems
- Abstract:This paper reports the use of case studies to
evaluate the performance degradation caused by the kernel-level lock. We define
the lock ratio as a ratio of the execution time for critical sections to the
total execution time of a parallel program. The kernel-level lock ratio
determines how effective programs work on Symmetric MultiProcessor systems. We
have measured the lock ratios and the performance of three types of parallel
programs on SMP systems with Linux 2.0: matrix multiplication, parallel make,
and WWW server programs. Experimental results show that the higher the lock
ratio of parallel programs, the worse their performance becomes. keywords: SMP
Systems, Operating Systems, Parallel Programs, Per- formance Evaluation, Kernel
Lock
- Akihiro Kaieda and Yasuichi Nakayama; Atsuhiro Tanaka, Takashi
Horikawa, Toshiyasu Kurasugi and Issei Kino
- mailto:yasu@cs.uec.ac.jp
- Submitted May 24, 2000
- Accepted October 1 2000
- C476: Effective Multicast Programming in Large Scale
Distributed Systems:The DACE Approach
- Abstract:Many distributed applications have a strong
requirement for efficient dissemination of large amounts of information to
widely spread consumers in large networks.These include applications in
e-commerce and telecommunication.Publish/subscribe is considered one of the
most important interaction styles to model communication at large
scale.Producers publish information for a topic and consumers subscribe to the
topics they wish to be informed of.The decoupling of producers and consumers in
time and in space makes the publish/subscribe paradigm very attractive for
large scale distribution,especially in environments like the Internet. This
paper describes the architecture and implementation of DACE (Distributed
Asynchronous Computing Environment),a framework for publish/subscribe
communication based on an object- oriented programming abstraction in the form
of Distributed Asynchronous Collection (DAC).DACs capture the different
variations of publish/subscribe,without blurring their respective advantages.
The architecture we present is tolerant to network partitions and crash
failures.The underlying model is based on the notion of Topic Membership:a weak
membership for the parties involved in a topic.We present how Topic Membership
enables the realization of a robust and efficient reliable multicast for large
scale.The protocol ensures that,inside a topic,even a subscriber that is
temporarily partitioned away eventually receives a published message.
- Romain Boichat, Patrick Th. Eugster, Rachid Guerraoui, Joe
Sventek
- Swiss Federal Institute of Technology, Lausanne and Agilent
Laboratories Scotland, Edinburgh
- Email:Patrick.Eugster@lsesun6.epfl.ch
- Submitted July17, 2000
- Comments to Author November 23 2000
- Accepted February 3 2001
- C486: Parallel Versions of Stones Strongly Implicit
(SIP) Algorithm
- Abstract:In this paper, we describe various methods of
deriving a parallel version of Stones Strongly Implicit Procedure (SIP)
for solving sparse linear equations arising from finite difference
approximation to partial differential equations (PDEs). Sequential
versions of this algorithm have been very successful in solving semi-conductor,
heat conduction and flow simulation problems and an efficient parallel version
would enable much larger simulations to be run. An initial investigation of
various parallelising strategies was undertaken using a version of High
Performance Fortran (HPF) and the best methods were reprogrammed using the MPI
message passing libraries for increased efficiency. Early attempts concentrated
on developing a parallel version of the characteristic wavefront computation
pattern of the existing sequential SIP code. However, a red-black ordering of
grid points, similar to that used in parallel versions of the Gauss-Seidel
algorithm, is shown to be far more efficient. The results of both the wavefront
and red-black MPI based algorithms are reported for various size problems and
number of processors on a sixteen node IBM SP2.
- J.S. Reeve, A.D. Scurr and J.H. Merlin
- Department of Electronics and Computer Science, University of
Southampton
- Email:jsr@ecs.soton.ac.uk
- Submitted August 24, 2000
- Comments to Authors February 4 2001, Accepted March 12 2001
- C487: Real-time Multi-spectral Image Fusion
- Abstract:This paper describes a novel real-time
multi-spectral imaging capability for surveillance applications. The capability
combines a new high-performance multi-spectral camera system with a distributed
algorithm that computes a spectral-screening Principal Component Transform
(PCT). The camera system uses a novel filter wheel design together with a
high-bandwidth CCD camera to allow image cubes to be delivered at 110 frames
per second with spectral resolution between 400 and 1000 nm. The filters used
in a particular application are selected to highlight a particular object based
on its spectral signature. The distributed algorithm allows image streams from
a dispersed collection of cameras to be disseminated, viewed, and interpreted
by a distributed group of analysts in real-time. It operates on networks of
commercial-off-the-shelf multiprocessors connected with high-performance (e.g.
gigabit) networking, taking advantage of multi-threading where appropriate. The
algorithm uses a concurrent formulation of the PCT to de-correlate and compress
a multi-spectral image cube. Spectral screening is used to give features that
occur infrequently (e.g. mechanized vehicles in a forest) equal importance to
those that occur frequently (e.g. trees in the forest). A human-centered
color-mapping scheme is used to maximize the impact of spectral contrast on the
human visual system. To demonstrate the efficacy of the multi-spectral system,
plant-life scenes with both real and artificial foliage are used. These scenes
demonstrate the systems ability to distinguish elements of a scene, based on
spectral contrast, that cannot be distinguished with the naked eye. The
capability is evaluated in terms of visual performance, scalability, and
real-time throughput. Our previous work on predictive analytical modeling is
extended to answer practical design questions such as For a specified
cost, what system can should be constructed and what performance will it
attain?
- Tiranee Achalakul, Stephen Taylor
- Department Computer Science, Syracuse University
- Email:tachalak@syr.edu
- Submitted September 14, 2000
- Comments to Authors March 13 2001
- C488: Efficient Communication Using Message Prediction for
Cluster of Multiprocessors
- Abstract:With the increasing uniprocessor and SMP
computation power available today,interprocessor communication has become an
important factor that limits the performance of cluster of workstations.Many
factors including communication hardware overhead,communication software
overhead,and the user environment overhead (multithreading,multiuser) affect
the performance of the communication subsystems in such systems. A significant
portion of the software communication overhead belongs to a number of message
copying.Ideally,it is desirable to have a true zero-copy protocol where the
message is moved directly from the send buffer in its user space to the receive
buffer in the destination without any intermediate buffering.However due to the
fact that message-passing applications at the send side do not know the final
receive buffer addresses,early arrival messages have to be buffered at a
temporary area.In this paper,we show that there is a message reception
communication locality in message-passing applications.We have utilized this
communication locality and devised different message predictors at the receiver
sides of communications.In essence,these message predictors can be efficiently
used to drain the network and cache the incoming messages even if the
corresponding receive calls have not been posted yet.The performance of these
predictors,in terms of hit ratio,on some parallel applications are quite
promising and suggest that prediction has the potential to eliminate most of
the remaining message copies .We also show that the proposed predictors do not
have sensitivity to the starting message reception call,and that they perform
better than (or at least equal to) our previously proposed predictors in [3
].
- Ahmad Afsahi, Nikitas J.Dimopoulos
- Queens University and University of Victoria, Canada
- Email:ahmad@ee.queensu.ca
- Submitted September 14, 2000
- Comments to Authors March 13 2001, Accepted November 9 2001
- C489: Implementing a Dynamic Processor Allocation Policy
for Multiprogrammed Parallel Applications in the Solaris TM Operating
System
- Abstract:Parallel applications typically do not perform
well in a multiprogrammed envi- ronment that uses time-sharing to allocate
processor resources to the applications' parallel threads. Coscheduling related
parallel threads, or statically partitioning the system, often can reduce the
applications' execution times, but at the expense of reducing the overall
system utilization. To address this problem, there has been increasing interest
in dynamically allocating processors to applications based on their resource
demands and the dynamically varying system load. The Loop-Level Process Control
(LLPC) policy [16] dynamically adjusts the number of threads an application is
allowed to execute based on the application's available parallelism and the
overall system load. This study demonstrates the feasibility of incorporating
the LLPC strategy into an existing commercial operating system and
parallelizing compiler and provides further evidence of the performance
improvement that is possible using this dynamic allocation strategy. In this
implementation, applications are automatically parallelized and enhanced with
the appropriate LLPC hooks so that each application interacts with the modied
version of the Solaris operating system. The parallelism of the applications
are then dynamically adjusted automatically when they are executed in a
multiprogrammed environment so that all applications obtain a fair share of the
total processing resources.
- Kelvin K. Yue and David J. Lilja
- Sun Microsystems and University of Minnesota
- Email:lilja@ece.umn.edu
- Submitted October 10, 2000, Accepted February 10 2001
- C490: Lesser Bear a Light-weight Process Library for
SMP computers
- Abstract:We have designed and implemented a light-weight
process (thread)li- brary called Lesser Bear for SMP
computers.Lesser Bear has high portability and thread-level
parallelism.Creating UNIX processes as virtual processors and a memory-mapped
.le as a huge shared-memory space enables Lesser Bear to execute threads in
parallel. Lesser Bear requires exclusive operation between peer virtual
processors,and treats a shared-memory space as a critical section for
synchronization of threads. Therefore,thread functions of the previous Lesser
Bear are serialized. In this paper,we present a scheduling mechanism to execute
thread functions in parallel.In the design of the proposed mechanism,we divide
the entire shared- memory space into partial spaces for virtual processors,and
prepare two queues (Protect Queue and Waiver Queue)for each partial space.We
adopt an algorithm in which ock operations are not necessary for
enqueueing.This algorithm allows us to propose a scheduling mechanism that can
reduce the scheduling overhead.The mechanism is applied to Lesser Bear and
evaluated by experimental results.
- Hisashi Oguma and Yasuichi Nakayama
- The University of Electro-Communications, Tokyo
- Email:oguma-h@igo.cs.uec.ac.jp
- Submitted October 24, 2000
- Comments to Authors March 13 2001, Accepted April 23 2001
- C492: PoLAPACK: Parallel Factorization Routines with
Algorithmic Blocking
- Abstract:LU, QR, and Cholesky factorizations are the most
widely used methods for solving dense linear systems of equations, and have
been extensively studied and implemented onvector and parallel computers. Most
of these factorization routines are implemented with block- partitioned
algorithms in order to perform matrix-matrix operations, that is, to obtain the
highest performance by maximizing reuse of data in the upper levels of memory,
such as cache. Since parallel computers have dierent performance ratios of
computation and com- munication, the optimal computational block sizes are
dierent from one another to generate the maximumperformance of an algorithm.
Therefore, the data matrix should be distributed with the machine specific
optimal block size before the computation. Too small or large a block size
makes getting good performance on a machine nearly impossible. In such a case,
getting a better performance may require a complete redistribution of the data
matrix. In this paper, we present parallel LU, QR, and Cholesky factorization
routines with an "algorithmic blocking" on 2-dimensional block cyclic data
distribution. With the algorithmic blocking, it is possible to obtain the near
optimal performance irrespective of the physical block size. The routines are
implemented on the Intel Paragon and the SGI/Cray T3E and compared with the
corresponding ScaLAPACK factorization routines.
- Jaeyoung Choi
- School of Computing Soongsil University 1-1, Sangdo-Dong,
Dongjak-Ku Seoul 156-743, KOREA
- Email:choi@comp.soongsil.ac.kr
- Submitted November 14, 2000
- Comments to Authors February 4 2001, accepted March 12 2001
- C493: Space-Time Tradeoffs for Parallel 3D Reconstruction
Algorithms for Virus Structure Determination
- Abstract:The 3D electron density determination of viruses,
from experimental data provided by electron microscopy, is a data intensive
computation that requires the use of clusters of PCs or parallel computers. In
this paper we report on three parallel algorithms used for 3D recon- struction
of asymmetric objects from their 2d projections. We discuss their
computational, communication, I/O, and space requirements and present some
performance data. The algorithms are general and can be used for 3D
reconstruction of asymmetric objects for applications other than structural
biology.
- Dan C. Marinescu, Yongchang Ji, and Robert E. Lynch
- Department of Computer Sciences Purdue University West Lafayette,
IN 47907
- Email:[dcm, ji,
rel]@cs.purdue.edu
- Submitted November 17, 2000
- Comments to Authors March 13 2001 Accepted April 8 2001
- C494: A New Parallel Domain Decomposition Method for the
Adaptive Finite Element Solution of Elliptic Partial Differential Equations
- Abstract:We present a new domain decomposition algorithm
for the parallel finite element solution of elliptic partial differential
equations. As with most parallel domain decomposition methods each processor is
assigned one or more subdomains and an iteration is devised which allows the
processors to solve their own subproblem(s) concurrently. The novel feature of
this algorithm however is that each of these subproblems is defined over the
entire domain -- although the vast majority of the degrees of freedom for each
subproblem are associated with a single subdomain (owned by the corresponding
processor). This ensures that a global mechanism is contained within each of
the subproblems tackled and so no separate coarse grid solve is required in
order to achieve rapid convergence of the overall iteration. Furthermore, by
following the paradigm introduced in [5], it is demonstrated that this domain
decomposition solver may be coupled easily with a conventional mesh refinement
code, thus allowing the accuracy, reliability and efficiency of mesh adaptivity
to be utilized in a well load-balanced manner. Finally, numerical evidence is
presented which suggests that this technique has significant potential, both in
terms of the rapid convergence properties and the efficiency of the parallel
implementation.
- Randolph E. Bank and Peter K. Jimack
- Dept. of Mathematics, University of California at San Diego, La
Jolla, CA 92093, USA.School of Computer Studies, University of Leeds, Leeds LS2
9JT, UK.
- Email: pkj@comp.leeds.ac.uk
- Recieved 19 May 2000, Accepted December 7 2000,
- C495: Development and Performance Analysis of Real-World
Applications for Distributed and Parallel Architectures
- Abstract:Several large real-world applications have been
developed for distributed and parallel architectures. We examine two different
program development approaches: First, the usage of a high-level programming
paradigm which reduces the time to create a parallel program dramatically but
sometimes at the cost of a reduced performance. A source-to-source compiler,
has been employed to automatically compile programs written in a
high-level programming paradigm into message passing codes. Second,
manual program development by using a low-level programming paradigm
such as message passing enables the programmer to fully exploit a given
architecture at the cost of a time-consuming and error-prone effort.
Performance tools play a central role to support the performance-oriented
development of applications for distributed and parallel architectures. SCALA
a portable instrumentation, measurement, and post-execution performance
analysis system for distributed and parallel programs has been used to
analyse and to guide the application development by selectively instrumenting
and measuring the code versions, by comparing performance information of
several program executions, by computing a variety of important performance
metrics, by detecting performance bottlenecks, and by relating performance
information back to the input program. We show several experiments of SCALA
when applied to real-world applications. These experiments are conducted for a
NEC Cenju-4 distributed memory machine and a cluster of heterogeneous
workstations and networks.
- T. Fahringer, P.Blaha, A. Hossinger, J. Luitz, E. Mehofer, H.
Moritsch, B. Scholz
- Institute for Software Science, University of Vienna
Liechtensteinstrasse 22, A-1092, Vienna, Austria. Also Vienna University of
Technology and Dept. of Business administration at University of Vienna
- Email: tf@par.univie.ac.at
- Recieved August 1 1999, Revised November 20 2000, Accepted
December 18 2000,
- C496: Open Consensus
- Abstract:This paper presents the abstraction of open
consensus and argues for its use as an effective component for building
reliable agreement protocols in practical asynchronous systems where processes
and links can crash and recover.The specification of open consensus has a
decoupled on-demand and re-entrant behaviour that make its use very
efficient,especially in terms of forced logs, which are known to be major
sources of overhead in distributed systems.We illustrate the use of open
consensus as a basic building block to develop a modular, yet efficient, total
order broadcast protocol.Finally, we describe our Java implementation of our
open consensus abstraction and we convey our efficiency claims through some
practical performance measures.
- Romain Boichat, Svend Frølund, Rachid Guerraoui
- Swiss Federal Institute of Technology, CH 1015, Lausanne and
Hewlett-Packard Laboratories, 1501 Page Mill Rd, Palo Alto
- Email: Romain.Boichat@epfl.ch
- Received January 17 2001, Comments to Authors May 8 2001,
Accepted May 19 2001
- C497: Java for High-Performance Computing
- Abstract:There has been an increasing research interest in
extending the use of Java towards high-performance demanding applications such
as scalable web servers, multimedia applications, and large-scale scientific
applications. However, given the low performance of current Java
implementations, these application domains pose new challenges to both the
application designer and systems developer. In this paper we describe and
classify several important proposals and environments that tackle Java's
performance bottlenecks in order to make the language an effective option for
high-performance computing. We further survey most significant performance
issues while exposing the potential benefits and limitations of current
solutions in such a way that a framework for future research efforts can be
established. We show that most of the proposed solutions can be classified
according to some combination of the three basic parameters: the model adopted
for inter-process communication, language extensions, and the implementation
strategy. In addition, we examine other relevant issues, such as
interoperability, portability, and garbage collection.
- M. Lobosco, C. Amorim, and O. Loques
- Instituto de Computação Universidade Federal
Fluminense Niterói, Rio de Janeiro, Brazil
- Email: loques@ic.uff.br
- Received January 15 2001, Comments to authors May 28 2001,
Accepted July 24 2001
- C498: Simulating asynchronous hardware on multiprocessor
platforms: the case of AMULET1
- Abstract:Synchronous VLSI design is approaching a critical
point, with clock distribution becoming an increasingly costly and complicated
issue and power consumption rapidly emerging as a major concern. Hence,
recently, there has been a resurgence of interest in asynchronous digital
design techniques which promise to liberate digital design from the inherent
problems of synchronous systems. This activity has revealed a need for
modelling and simulation techniques suitable for the asynchronous design style.
The concurrent process algebra Communicating Sequential Processes (CSP) and its
executable counterpart, occam, are increasingly advocated as particularly
suitable for this purpose. This paper focuses on issues related to the
execution of CSP/occam models of asynchronous hardware on multiprocessor
machines, including I/O, monitoring and debugging, partition, mapping and load
balancing. These issues are addressed in the context of occarm, an occam
simulation model of the AMULET1 asynchronous microprocessor, however the
solutions devised are more general and may be applied to other systems
too.
- Georgios K. Theodoropoulos
- School of Computer Science, The University of Birmingham,
Birmingham B15 2TT, U.K.
- Email: gkt@cs.bham.ac.uk
- Received Oct 18 2000, Accepted February 22 2001
- C499: Data Movement and Control Substrate for Parallel
Adaptive Applications
- Abstract:In this paper, we present the Data Movement and
Control Substrate (DMCS), that implements one-sided low-latency communication
primitives for parallel adaptive and irregular computations. DMCS is built on
top of low-level, vendor-specific communication subsystems such as LAPI for IBM
SP machines and widely available libraries like MPI for clusters of
workstations and PCs. DMCS adds a small overhead to the communication
operations provided by the lower communication system. In return DMCS provides
a flexible and easy to understand application program interface for one-sided
communication operations including put-ops and remote procedure calls.
Furthermore, DMCS is designed so that it can be easily ported and maintained by
non-experts.
- Jeffrey Dobbelaere, Kevin Barker, Nikos Chrisochoides, and Demian
Naves, Keshav Pingali
- Computer Science Department College of William & Mary
Williamsburg, VA 23185.
- Computer Science Department Cornell University Ithaca, NY
14853-3801
- Email: nikos@plato.cs.wm.edu
- Received 11 March 2001, Comments to Authors July 24 2001,
Accepted October 28 2001
- C500: (ACES Special Issue) Parallel visualization of
gigabyte datasets in GeoFEM
- Abstract:Parallel visualization of large datasets in
GeoFEM is described. Our visualization subsystem supports concurrent
visualization with computation, and outputs a simplified small graphic
primitive set, which can be displayed by a basic viewer on clients. This
subsystem provides many kinds of parallel visualization algorithms for the
users to visualize their data from scalar, vector to tensor. Scalar field
topology analysis, flow field topology and semi-automatic parameter design are
employed to improve the quality of visualization results. We also present a
simplification algorithm to reduce the number of output graphic primitives,
accounting for both shape attribute and color attribute. The experimental
results show the effectiveness of our subsystem.
- Issei Fujishiro, Yuriko Takeshima, Li Chen, Hiroko Nakamura , and
Yasuko Suzuki
- Ochanomizu University, Tokyo, Japan
- Research Organization for Information Science & Technology,
Tokyo, Japan
- State Key Lab. of CAD&CG, Zhejiang University, Hangzhou,
310027, P.R. China.
- Email: chen@tokyo.rist.or.jp
- Received 26 February 2001, Comments to Authors August 5 2001,
Accepted November 9 2001
- C501: (ACES Special Issue)Parallelization of a Large-Scale
Computational Earthquake Simulation Program
- Abstract:Here we detail both the methods and preliminary
results of first efforts to parallelize two General Earthquake Model
(GEM)-related codes: 1) a relatively simple data mining procedure based on a
Genetic Algorithm; and 2) the Virtual California simulation of GEM . These
preliminary results, using a simple, heterogeneous processor system, existing
freeware, and with an extremely low cost of both manpower and hardware dollars,
motivate us to more ambitious work with considerably larger-scale computer
earthquake simulations of southern California. The GEM computational problem,
which is essentially a Monte Carlo simulation, is well suited to optimization
on parallel computers, and we outline how we are proceeding in implementing
this new software architecture.
- K.F. Tiampo , J.B. Rundle , S. Gross and S. McGinnis
- Colorado Center for Chaos & Complexity, CIRES, University of
Colorado, Boulder, CO, 80309, USA
- Email: kristy@caldera.colorado.edu
- Received 23 February 2001, Comments to Authors August 5 2001,
Accepted November 3 2001
- C502: (ACES Special Issue)Parallel Iterative Solvers for
Unstructured Grids using Directive/MPI Hybrid Programming Model for GeoFEM
Platform on SMP Cluster Architectures
- Abstract:In this study, efficient parallel iterative
method for unstructured grid has been developed for SMP (shared memory
symmetric multiprocessor) cluster architectures on GeoFEM platform. 3-level
hybrid parallel programming model has been applied : 1. message passing for
inter SMP node communication, 2. loop directive for intra SMP node
parallelization and 3. vectorization for each PE (processing element). Simple
3D elastic linear problems with more than 10 8 DOFs have been solved by 3x3
block ICCG(0) with additive Schwartz domain decomposition and PDJDS/CM-RCM
((Parallel Descending-order Jagged Diagonal Storage/ Cyclic Multicolor-Reverse
Cuthil Mckee)) re-ordering on 16 SMP nodes of Hitachi SR8000 and 20 GFLOPS
performance has been obtained. PDJDS/CM-RCM reordering method provides
excellent vector and parallel performance in SMP nodes. This re-ordering is
essential for parallelization of forward/backward substitution in IC/ILU
factorization with global data dependency. Developed method was also tested on
NEC SX-4 and attained 969 MFLOPS (48.5% of peak performance) using single
processor. Additive Schwartz domain decomposition method provided robustness to
the GeoFEM's parallel iterative solvers with localized preconditioning.
- Kengo Nakajima and Hiroshi Okuda
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Department of Quantum Engineering and Systems Science, The
University of Tokyo, Tokyo, Japan
- Email: nakajima@tokyo.rist.or.jp
- Received 27 February 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C503: (ACES Special Issue)Finite Element Modeling of
Multibody Contact and Its Application to Active Faults
- Abstract:Earthquakes have been recognized as resulting
from a stick-slip frictional instability along the faults between deformable
rocks. An arbitrarily shaped contact element strategy, named as node-to-point
contact element strategy is proposed to handle the friction contact between
deformable bodies with stick and finite frictional slip and extended to
simulate the active faults in the crust with a more general nonlinear friction
law. Also introduced is an efficient contact search algorithm for contact
problems among multiple small and finite deformation bodies. Moreover, the
efficiency of the parallel sparse solver for the nonlinear friction contact
problem is investigated. Finally, a model for the plate movement in the
northeast zone of Japan under gravitation is taken as an example to be analyzed
with different friction behaviors.
- H.L. Xing and A. Makinouchi
- Materials Fabrication Lab, The Institute of Physical and Chemical
Research (RIKEN) 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
- Email: xing@postman.riken.go.jp
- Received 28 February 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C504: (ACES Special Issue) Effective Adaptation Technique
for Hexahedral Mesh
- Abstract:This paper describes simple and effective
adaptive techniques for the finite element method.The proposed refinement
method is based on a modified octree divsion. This modified octree-based method
can be applied to an arbitary hexahedral unstructured mesh. Hex-R is the the
name of our implentation and it refines the mesh in cooperation with a finite
element viewer: GPPView. The refined mesh retains the topology of the original
coarse mesh. Therefore one can easily use the proposed method for multigrid
generation. Finally the construction of an adaptive method using Hex-R is
mentioned.
- Yoshitaka Wada
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: wada@tokyo.rist.or.jp
- Received March 2 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C505: (ACES Special Issue)Thermal Convection Analysis in a
Rotating Shell by a Parallel FEM - Development of a Thermal-Hydraulic Subsystem
of GeoFEM
- Abstract:We carry out a numerical simulation of thermally
driven convection in a rotating spherical shell modeled on the Earth s
outer core using Thermal-Hydraulic subsystem of GeoFEM,which gives a parallel
FEM platform.This simulation is for understanding of the origin of the
geomagnetic field and dynamics of a fluid in the Earths outer core.We
solve a three-dimensional and time dependent process of a Boussinesq fluid in a
rotating spherical shell under effects of self gravity and the Coriolis
force.In this simulation,tri-linear hexahedron element is applied for spatial
distribution,and 2nd-order Adams-Bashforth scheme are applied for time
integration of the temperature and velocity.To satisfy the mass conservation,a
parallel iterative solver of GeoFEM is applied for solving pressure and
correction of the velocity fields.To verify our simulation code,results of the
simulation are compared with the same analysis by the spectral method.In these
results,outline of convection is almost same in these cases;that is,three pairs
of convection columns are formed,and these columns propagate to the westward in
quasi-steady state.However, magnitude of kinetic energy averaged over the shell
is about 93%of that in the case by the spectral method,and drift frequency of
the columns in the GeoFEM case is larger than that in the case by the spectral
method.
- Hiroaki Matsui and Hiroshi Okuda
- Department of Research for Computational Earth Science, Research
Organization for Information Science & Technology (RIST), Tokyo
- Department of Quantum Engineering and System Science, The
University of Tokyo, Tokyo, Japan
- Email: matsui@tokyo.rist.or.jp
- Received March 3 2001, Comments to Authors August 5 2001,
Accepted October 26 2001
- C506: (ACES Special Issue) Performance Optimization of
GeoFEM on Various Computer Architecture
- Abstract:We have a prospect that Geofem get good parallel
performance on various computer architecture by the research which has been
made in GeoFEM team. In this research, we focus a target to performance with a
single processor, and common data structure / coding manner to make GeoFEM
running high performance on various computer architecture was researched.
Test coding for Solver Part of the structure and the fluid
analysis code Data structure and coding manner was evaluated on
scalar/vector/pseudo vector architecture. A new data structure and a new direct
access manner was introduced in the fluid analysis solver. As a result, 21%
performance for peak performance was obtained on pseudo vector architecture.
And We got as good performance as the pseudo-vector execution performance in
scalar execution. 25% performance for peak performance was obtained on vector
architecture. A new direct access manner was introduced in the structure code
solver. As a result, 27% performance for peak performance was obtained on
pseudo vector architecture. 34% performance for peak performance was obtained
on vector architecture.
Test Code for Matrix Assemble Part of the structure analysis
code Coding of removing dependency was finished and performance was
evaluated on vector/scalar machine. 736.8MFlops was obtained at matrix
assembling process on SX-4. 900.7MFlops was obtained at whole test Code on
SX-4, and 2.06GFlops was obtained on VPP5000 (peak : 9.6GFlops). 124MFlops was
obtained at matrix assembling process on Alpha system (Alpha 21164
533MHz).
- Kazuo Minami
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: minami@tokyo.rist.or.jp
- Received March 5 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C507: (ACES Special Issue)Parallel Multilevel Iterative
Linear Solvers with Unstructured Adaptive Grids for Simulations in Earth
Science
- Abstract:Preconditioned iterative solver is one of the
most powerful choice such as IC (Incomplete Cholesky) or ILU (Incomplete LU)
factorization method for large-scale scientific computation. But in these
methods, iteration number until convergence increases as the problem size
becomes larger. In multigrid solvers, the rate of convergence is independent of
problem size and the number of iterations remains fairly constant. Multigrid is
also a good preconditioner for Krylov iterative solvers. In this study,
multigrid preconditioned conjugate gradient iterative method (MGCG) on parallel
computers has been developed and applied to the Poisson equation in the region
between dual sphere surfaces on semi-unstructured prismatic grids generated
adaptively. Moreover this procedure has been applied to the grids with local
refinement. Computational results on Hitachi SR2201 using up to 128 processors
show good scalability of the developed method.
- Kengo Nakajima
- Department of Computational Earth Sciences, Research Organization
for Information Science and Technology (RIST), Tokyo, Japan
- Email: nakajima@tokyo.rist.or.jp
- Received 19 March 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C508: Comparing Windows NT, Linux, and QNX as the Basis for
Cluster Systems
- Abstract:Clusters use commodity hardware and software
components to provide an environment for parallel processing. A major issue in
the development of a cluster system is the choice of the operating system that
will run on each node. We compare three alternatives: Windows NT, Linux, and
QNX a real-time microkernel. The comparison is based on expressive
power, performance, and ease-of-use metrics. The result is that none of these
systems has a clear advantage over the others in all the metrics, but that each
has its strong and weak points. Thus any choice of a base system will involve
some technical compromises, but not major ones.
- Avi Kavas, Dror G. Feitelson
- School of Computer Science and Engineering The Hebrew University,
91904 Jerusalem, Israel
- Email: feit@cs.huji.ac.il
- Received 11 February 2001, Comments to authors May 28 2001,
Accepted July 24 2001
- C509: (ACES Special Issue)Parallel Simulation System for
Earthquake Generation: Fault Analysis Modules and Parallel Coupling
Analysis
- Abstract:Solid earth simulations have recently been
developed to address issues such as natural disasters, global environmental
destruction and the conservation of natural resources. The simulation of solid
earth phenomena involves the analysis of complex structures including strata,
faults, and heterogeneous material properties. Simulation of the generation and
cycle of earthquakes is particularly important solid earth phenomena, but such
a simulation requires analysis of a complex fault dynamics. GeoFEM (Iizuka et
al. , 1999) is a parallel finite element analysis system intended for problems
of solid earth field phenomena. This paper shows recent research of GeoFEM for
simulation of earthquake generation and cycles.
Key words: Solid earth simulations, Generation and
cycle of earthquakes, GeoFEM, Parallel finite element analysis.
- Mikio Iizuka, Daigo Sekita, Hisashi Suito, Mamoru Hyodo, Karzuro
Hirahara, David Place, Peter Mora, Osamu Hazama, and Hiroshi Okuda
- Organization for Information Science & Technology. Mitsubishi
Reserch Institute,Inc. Earth and Planetary Sciencs, Nagoya University QUAKES,
Department of Earth Sciences, The University of Queensland. Yokohama National
University. Department of Quantum Engineering and Systems Science, The
University ofTokyo
- Email: iizuka@tokyo.rist.or.jp
- Received 23 April 2001, Comments to Authors August 5 2001,
Accepted October 27 2001
- C514: Parallel Dynamic Scheduling in a Cluster of
Computers
- Abstract:The parallelization of complex planning and
control problems arising in diverse application areas in the industrial,
services, and commercial environments not only allows the determination of
control variables in the required times but also improves the performance of
the control procedure as more processors are involved in the execution of the
parallel program. In this paper we describe a scheduling application in a water
supply network to demonstrate the benefits of parallel processing. The
procedure we propose combines dynamic programming with genetic algorithms and
time series prediction in order to solve problems in which decisions are made
in stages, and the states an control belong to a continuous space. Taking into
account the computational complexity of these applications and the time
contraints that are usually imposed, the procedure has been implemented by a
parallel program in a cluster of computers, an inexpe sive and widely extended
platform that can make parallelism a practical means of tackling complex
problems in many different environments.
- M. Damas, M. Salmeron, J. Ortega, G. Olivares, H. Pomares
- Department of Computer Architecture and Computer Technology,
University of Granada. Facultad de Ciencias. Campus Fuentenueva s/n. E-18071,
Granada, Spain
- Email: mdamas@atc.ugr.es
- Received 13 November 2000, Comments to authors June 2 2001,
Accepted July 24 2001
- C515: A Parallel Algorithm for Static Slicing of Concurrent
Programs
- Abstract:Slicing of concurrent programs is a
compute-intensive task. To speed up the slicing process, we develop a parallel
algorithm. For this purpose we use a Concurrent Control Flow Graph (CCFG) as
the intermediate representation. We use a network of communicating processes to
develop our parallel algorithm. We have implemented our parallel algorithm and
the experimental results appear promising.
- D. Goswami, R. Mall
- Department of Computer Science and Engineering, Indian Institute
of Technology, Kharagpur INDIA
- Email: diganta@cse.iitkgp.ernet.in
- Received 10 May 2001; comments to Authors August 5 2001
- C516: An Analysis of VI Architecture Primitives in Support
of Parallel and Distributed Communication
- Abstract:We present the results of a detailed study of the
Virtual Interface (VI) paradigm as a communication foundation for a distributed
computing environment. Using Active Messages and the Split-C global memory
model, we analyze the inherent costs of using VI primitives to implement these
high-level communication abstractions. We demonstrate a minimum mapping cost
(i.e. the host processing required to map one abstraction to a lower
abstraction) of 5.4 microsec for both Active Messages and Split-C using 4-way
550 MHz Pentium III SMPs and the Myrinet network. We break down this cost to
use of individual VI primitives in supporting flow control, buffer management
and event processing and identify the completion queue as the source of the
highest overhead. Bulk transfer performance plateaus at 44 MB/sec for both
implementations due to the addition of fragmentation requirements. Based on
this analysis, we present the implications for the VI successor, Infiniband.
- Andrew Begel, Philip Buonadonna, David E. Culler, David Gay
- University of California, Berkeley
- Email: philipb@cs.berkeley.edu
- Received 9 May 2001; Comments to Authors August 19 2001, Accepted
September 15 2001
- C517: Automatic Determination of Matrix-Blocks
- Abstract:Many sparse matrices have a natural block
structure, for instance arising from the discretisation of a physical domain.
We give an algorithm for finding this block structure from the matrix sparsity
pattern. This algorithm can be used for instance in iterative solver libraries
in cases where the user does not or can not pass this block structure
information to the software. The block structure can then be used as a basis
for domain-decomposition based preconditioners.
- Victor Eijkhout
- Department of Computer Science; University of Tennessee,
Knoxville, TN 37996
- Email: eijkhout@cs.utk.edu
- Received 5 June 2001, Comments to Authors October 28 2001
- C518: A framework for high-performance matrix
multiplication based on hierarchical abstractions, algorithms and optimized
low-level kernels
- Abstract:Despite extensive research, optimal performance
has not easily been available previously for matrix multiplication (especially
for large matrices) on most architectures because of the lack of a structured
approach and the limitations imposed by matrix storage formats. A simple but
effective framework is presented here that lays the foundation for building
high-performance matrix-multiplication codes in a structured, portable and
efficient manner. The resulting codes are validated on three dfferent
representative RISC and CISC architectures on which they significantly
outperform highly optimized libraries such as ATLAS and other competing
methodologies reported in the literature. The main component of the proposed
approach is a hierarchical storage format that efficiently generalizes the
applicability of the memory hierarchy friendly Morton ordering to
arbitrary-sized matrices. The storage format supports polyalgorithms, which are
shown here to be essential for obtaining the best possible performance for a
range of problem sizes. Several algorithmic advances are made in this paper,
including an oscillating iterative algorithm for matrix multiplication and a
variable recursion cutoff criterion for Strassen's algorithm. The authors
expose the need to standardize linear algebra kernel interfaces, distinct from
the BLAS, for writing portable high-performance code. These kernel routines
operate on small blocks that fit in the L1 cache. The performance advantages
of the proposed framework can be effectively delivered to new and existing
applications through the use of object-oriented or compiler-based
approaches.
- Vinod Valsalam and Anthony Skjellum
- High Performance Computing Laboratory, Department of Computer
Science, Mississippi State University, MS 39762.
- Email: tony@hpcl.cs.msstate.edu
- Received June 1 2001, Comments to Authors October 28 2001,
Accepted November 21 2001
- C519: Parallel Implementation of the Fluid Particle Model
for Simulating Complex Fluids in the Mesoscale
- Abstract:Dissipative particle dynamics (DPD) and its
generalization - the fluid particle model (FPM) - represent the fluid
particle approach for simulating fluid-like behavior in the mesoscale.
Unlike particles from molecular dynamics (MD) method, the fluid
particle can be viewed as a droplet consisting of liquid
molecules. In FPM, fluid particles interact by both central and
non-central, short-range forces with conservative, dissipative and Brownian
character. In comparison to MD, FPM method in 3-D requires two to three times
more memory load and three times more communication overhead. Computational
load per step per particle is comparable to MD due to the shorter interaction
range allowed between fluid particles than between MD atoms. The
classical linked-cells technique and decomposing the computational box into
strips allow for rapid modifications of the code and for implementing non-cubic
computational boxes. We show that the efficiency of the FPM code depends
strongly on the number of particles simulated, geometry of the box, and the
computer architecture. We give a few examples from long FPM simulations
involving up to 8 million fluid particles and 32 processors. Results from FPM
simulations in 3-D of the phase separation in binary fluid and dispersion of
colloidal slab are presented. Scaling law for symmetric quenching in phase
separation has been properly reconstructed. We show also that the
microstructure of dispersed fluid depends strongly on the contrast between
kinematic viscosities of this fluid phase and the bulk phase. This FPM code can
be applied for simulating mesoscopic flow dynamics in capillary pipes or
critical flow phenomena in narrow blood vessels.
- Krzysztof Boryczko, Witold Dzwinel, David A.Yuen
- AGH Institute of Computer Science, al. Mickiewicza 30, 30-059,
Kraków, Poland, Minnesota Supercomputer Institute, University of
Minnesota, Minneapolis, Minnesota 55415-1227,USA
- Email: dwitek@msi.umn.edu
- Received June 25 2001, Comments to Authors October 28 2001,
Accepted November 10 2001
- C520: (ACES Special Issue)Interaction between Seismogenic
Body and Force-Supplying Body and Precursor in Far Field
- Abstract:Seismogenic body and wall rock of force-supply
are analogized respectively by rocks sample and block of transferring pressure
in test of fracture. Their interactions are used to study precursor in far
field and ultra-far field. Observation studies of stress and strain are made on
them. The results show that there appears mutation or disturbance precursor of
strain in the force-measuring sensor or on block of transferring pressure
before rock fracture beside anomaly variation appearing in all observing points
on sample. The observing points on lock of transferring pressure are 25-90 cm
away from sample, which are 3-10 times of fracture sizes. It is explained that
precursor anomaly is possible appearing in far places from seismogenic area.
The reason forming far field precursor is primarily studied.
- Xu Zhaoyong, Mei Shirong, Yang Runhai, Wang Jinglai, Zhao
Jinming, Wang Bin, Hua Peizhong
- Seismological Bureau of Yunnan Province, Kunming, 650041, P. R.
of China, Center for Analysis and Prediction, CSB, Beijing, 100036, P. R. of
China
- Email: xuzhaoyong@netease.com
- Received Jan 12 2001, Withdrawn August 7 2001
- C521: A Java Extension Framework for Web-based
Simulation
- Abstract:We have designed a framework for the web-based
simulation applications with a Java front-end. In this paper, we discuss the
architectural design and related considerations of this framework. We present
details of a multi-layer architecture, key components and their functions, and
the abstraction of the common features in the web-based simulation into this
framework. This framework is implemented in Java technology with
object-oriented design. This framework can incorporate an arbitrary legacy
simulation application in various domains by using configurable and extensible
interfaces. The usage of our framework can be in the area of developing Java
web-based simulation services and simulation problems.
- Tao Tang and Chu R. Wie
- State University of New York at Buffalo, Dept. of Electrical
Engineering, Bonner Hall, Buffalo, NY 14260
- Email: wie@eng.buffalo.edu>
- Received 22 July 2001, Comments to Authors October 28 2001
- C522: WebSimMicro: Web-based Simulation of Microelectronic
Devices
- Abstract:We have developed a web-based Java front-end for
microelectronic device simulation applications. We used a framework, that we
have previously proposed, for extending legacy simulation applications to the
web. This we shall call WebSimMicro, the Web-based Simulation on
Microelectronic devices. In this paper, we discuss the architecture of
WebSimMicro and the design approach using the proposed framework. This
WebSimMicro project contains a series of semiconductor device simulation
applets, including applets for metal-oxide-semiconductor transistors and
bipolar transistors which can be accessed on our website.
- Tao Tang and Chu R. Wie
- State University of New York at Buffalo, Dept. of Electrical
Engineering, Bonner Hall, Buffalo, NY 14260
- Email: wie@eng.buffalo.edu>
- Received 22 July 2001, Comments to Authors October 28 2001
- C523: Towards a Common Development Framework for
Distributed Applications
- Abstract:The last decade has witnessed a convergence in
several concerns related to the development ofdis- tributed applications
ranging from concurrency issues to architectural platforms. However, these
concerns are often addressed within the specicities of a particular
application area or a specic requirement such as performance. This paper is a
contribution towards dening a common devel- opment framework specically for
distributed applications. It denes the salient characteristics of such
applications and discusses emerging concepts and techniques from a number of
disciplines namely concurrency theory, real-time systems, distributed systems
and parallel processing. It concludes on their possible role in an integrated
development framework that would be suitable for distributed
applications.
Author's Notes:
- The paper is more structured as a survey/tutorial than a
traditional research paper.
- Its main value is to broaden parallel processing researchers'
thinking into considering a much wider perspective which encompasses
distributed applications in general.
- It is not exhaustive nor complete (it can't be, given the
broad scope) but it has some value in explaining a lot of the jargon being used
these days.
- It is quite long, but structured in a way that makes it easy
to skip sections for readers that are familiar with a particular research
area.
- Fethi A. Rabhi
- School of Information Systems, University of New South Wales,
Sydney 2052, Australia.
- Email: f.rabhi@unsw.edu.au
- Received 23 July 2001, Comments to Authors October 28 2001
- C524: Lesser Bear: a lightweight process library for SMP
computers - scheduling mechanism without a lock operation
- Abstract:We have designed and implemented a lightweight
process (thread)library called Lesser Bear for SMP computers.
Lesser Bear has thread-level parallelism and high portability. Lesser Bear
executes threads in parallel by creating UNIX processes as virtual processors
and a memory-mapped file as a huge shared-memory space.To schedule thread in
parallel,the shared-memory space has been divided into working spaces for each
virtual processor,and a ready queue has been distributed. However the previous
version of Lesser Bear sometimes requires a lock operation for dequeueing.We
therefore proposed a scheduling mechanism that does not require a lock
operation.To achieve this,each divided space forms a link topology through the
queue,and we use a lock-free algorithm for the queue operation.This mechanism
is applied to Lesser Bear and evaluated by experimental results.
- Hisashi Oguma and Yasuichi Nakayama
- c/o Prof. Y. Nakayama, Department of Computer Science, The
University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585
JAPAN
- oguma-h@igo.cs.uec.ac.jp
- Received 30 July 2001, Comments to Authors October 28 2001,
Accepted November 22 2001
- C525: Object-Oriented Distributed Computing Based on Remote
Class Reference
- Abstract:Java RMI and Jini provide effective mechanisms
for implementing a distributed computing system. Recently many numeral
libraries have been developed that take advantage of Java as an object-oriented
and portable language. The widely-used client-server method limits the extent
to which the benefits of the object-oriented approach can be exploited because
of the difficulties arising when a remote object is the argument or return
value of a remote or local method. In this paper this problem is solved by
introducing a data object that stores the data structure of the remote object
and related access methods. By using this data object, the client can easily
instantiate a remote object, and use it as the argument or return value of
either a local or remote method.
- Yan Huang, David W. Walker, and Omer F. Rana
- Department of Computer Science, Cardiff University, PO Box 916,
Cardiff, CF24 3XF, United Kingdom
- yan.huang@cs.cf.ac.uk
- Received August 2 2001, Comments to Authors November 11 2001
- C526: A stochastic load balancing algorithm for
i-Computing
- Abstract:This paper presents a stochastic dynamic load
balancing algorithm for Internet Computing, which is a new type of distributed
computing involving heterogeneous workstations from different organizations on
Internet. To realize the practical environment, we assume the system comprised
of heterogeneous, untrusted and non-dedicated workstations connected by
non-dedicated network. Our algorithm uses the product of the average processing
time and the queue length of system jobs as the load index. Dynamic
communication delay is included in the execution cost calculation. The transfer
policy and the location policy are combined in a stochastic algorithm. State
information exchange is done via information feedback and mutual updating.
Simulations demonstrate that our algorithm outperforms conventional approaches
over a wide range of system parameters. These results are reconfirmed by
empirical experiments after we have implemented the algorithms on the DJM
global virtual machine.
- Yuk-Yin WONG, Kwong-Sak LEUNG, Kin-Hong LEE
- Computing Services Centre City University of Hong Kong, Tat Chee
Avenue, KLN., Hong Kong; Department of Computer Science and Engineering Chinese
University of Hong Kong, Shatin, N.T., Hong Kong
- clevin.wong@cityu.edu.hk
- Received August 14 2001
- C527: Performance Comparison of CORBA and Message Queuing
under Diverse Network Constraints
- Abstract:Middleware provides standardised mechanisms that
can be used to simplify the construction of reliable and flexible distributed
applications over a network. Applying middleware to C3I problems can provide
simple integration of disparate information systems in the military
environment. Often, various middleware technologies are used as the
communication infrastructure and as a practical ease to the network-programming
problem. CORBA and message queuing are undoubtedly two major middleware
paradigms that are fundamentally distinct from one another. In order to exploit
middleware in the military environment, we need to understand the system
characteristics of these middleware technologies A robust middleware
infrastructure is a necessary underpinning for effective distributed
collaboration across the range of stringent military networks. This paper
presents a performance comparison between these two technologies, under typical
test environments. We attempt to identify problems that arise from different
network conditions in the interaction between client and server. Our
experimental results provide guidance in assessing the feasibility of using
different middleware paradigms in various military computing environments.
- T. Andrew Au
- Defence Science and Technology Organisation, Australia.
- Andrew.Au@dsto.defence.gov.au
- Received October 14 2001(electronic version)
- C528: Managing Application Complexity in the SAMRAI
Object-Oriented Framework
- Abstract:A major challenge facing software libraries for
scientific computing is the ability to provide adequate exibility to meet
sophisticated, diverse, and evolving application requirements. Object-oriented
design techniques are valuable tools for capturing characteristics of complex
applications in a software architecture. In this paper, we describe certain
prominent object-oriented features of the SAMRAI software library that have
proven to be useful in application development. SAMRAI is used in a variety of
applications and has demonstrated a substantial amount of code and design
re-use in those applications. This flexibility and extensibility is illustrated
with three different application codes. We emphasize two important features of
our design. First, we describe the composition of complex numerical algorithms
from smaller components which are usable in dfferent applications. Second, we
discuss the extension of existing framework components to satisfy new
application needs.
- Richard D. Hornung, Scott R. Kohn
- Center for Applied Scientific Computing, Lawrence Livermore
National Laboratory
- hornung@llnl.gov
- Received October 16 2001 (electronic version), Accepted for
Special Issue
- C529: Grid Services for Earthquake Science
- Abstract:We describe an information system architecture
for the ACES (Asia-Pacific Cooperation for Earthquake Simulation) community. It
addresses several key features of the field - simulations at multiple scales
that need to be coupled together; real-time and archival observational data,
which needs to be analyzed for patterns and linked to the simulations; a
variety of important algorithms including partial differential equation
solvers, particle dynamics, signal processing and data analysis; a natural
three dimensional space (plus time) setting for both visualization and
observations; the linkage of field to real-time events both as an aid to crisis
management and to scientific discovery. We also address the need to support
education and research for a field whose computational sophistication is
increasing rapidly and spans a broad range. The information system assumes that
all significant data is defined by an XML layer which could be virtual but
whose existence ensures that all data is object-based and can be accessed and
searched in this form. The various capabilities needed by ACES are defined as
Grid Services, which are conformant with emerging standards and implemented
with different levels of fidelity and performance appropriate for the
application. Grid Services can be composed in a hierarchical fashion to address
complex problems. The real-time needs of the field are addressed by high
performance implementation of data transfer and simulation services; further
the environment is linked to real-time collaboration to support interactions
between scientists in geographically distant locations.
- Geoffrey Fox,Sung-Hoon Ko, Marlon Pierce, Ozgur Balsoy, Jake Kim,
Sangmi Lee, Kangseok Kim, Sangyoon Oh, Xi Rao, Mustafa Varank, Hasan Bulut,
Gurhan Gunduz, Xiaohong Qiu, Shrideep Pallickara, Ahmet Uyar, Choonhan Youn
- Florida State University, Indiana University, Syracuse
University
- Marlon.Pierce@wpafb.af.mil
- Received October 3 2001, Comments to Authors October 20 2001,
Accepted October 27 2001
- C530: The Virtual Service Grid: An Architecture for
Delivering High-End Network Services
- Abstract:This paper presents the design of a novel system
architecture, Virtual Service Grid (VSG), for delivering high performance
network services. The VSG is based on the concept of the Virtual Service which
provides location, replication, and fault transparency to clients accessing
remotely deployed high-end services. One of the novel features of the Virtual
Service is the ability to self-scale in response to client demand. The VSG
exploits network- and service-information to make adaptive dynamic replica
selection, creation, and deletion decisions. We describe the VSG architecture,
middleware, and replica management policies. We have deployed the VSG on a
wide-area Internet testbed to evaluate its performance. The results indicate
that the VSG can deliver efficient performance for a wide range of client
workloads, both in terms of reduced response time, and in utilization of system
resources.
- Jon B. Weissman and Byoung-Dai Lee
- Department of Computer Science and Engineering, University of
Minnesota,
- jon@cs.umn.edu
- Received October 16 2001 (electronic version), , Comments to
authors Dec 2001
- C575: Advanced Concurrency Control in Java
- Abstract:Developing concurrent applications is not a
trivial task. As programs grow larger and become more complex, advanced
concurrency control mechanisms are needed to ensure that application
consistency is not compromised. Managing mutual exclusion on a per-object basis
is not sufficient to guarantee isolation of sets of semantically-related
actions. In this paper, we consider "atomic blocks", a simple and lightweight
concurrency control paradigm that enables arbitrary blocks of code to access
multiple shared objects in isolation. We evaluate various strategies for
implementing atomic blocks in Java, in such a way that concurrency control is
transparent to the programmer, isolation is preserved, and concurrency is
maximized. We discuss these concurrency control strategies and evaluate them in
terms of complexity and performance.
- Pascal A. Felber, Michael K. Reiter
- Bell Laboratories 600 Mountain Ave, Murray Hill, NJ 07974
- pascal@research.bell-labs.com
- Received October 1 2001, , Comments to authors Dec 2001
- C576: A Quality of Service-based Framework for Creating
Distributed Heterogeneous Software Components
- Abstract:Component-based software development offers a
promising solution for taming the complexity found in today's distributed
applications. Today's and future software systems will certainly require
combining heterogeneous software components that are geographically dispersed.
For the successful deployment of such a software system, it is necessary that
its realization, based on assembling heterogeneous components, not only meets
the functional requirements, but also satisfies the non-functional criteria
such as the desired QoS (quality of service). In this paper, a framework based
on the notions of a meta-component model, a generative domain model and QoS
parameters is described. A formal specification based on Two-level Grammar is
used to represent these notions in a tightly integrated way so that QoS becomes
a part of the generative domain model. A simple case study is described in the
context of this framework.
- Rajeev R. Raje, Mikhail Augustom, Barrett R. Bryant, Andrew M.
Olson, Carol Burt
- IUPUI, Naval Postgraduate School Monterey, University of Alabama
at Birmingham, 2AB Inc. Alabama
- rraje@cs.iupui.edu
- Received October 8 2001
- C577: A comparison of concurrent programming and
cooperative multithreading
- Abstract:This paper presents a comparison of the
cooperative multithreading model with the general concurrent programming model.
It focuses on the execution time performance of a range of standard concurrent
programming applications. The overall results are mixed. In some cases,
programs written in the cooperative multithreading model outperform those
written in the general concurrent programming model. The contributions of this
paper are twofold. First, it presents a thorough analysis of the performances
of applications in the different models, i.e., to explain the criteria that
determine when a program in one model will outperform an equivalent program in
the other. Second, it examines the tradeoffs in writing programs in the
different programming styles. In some cases, better performance comes at the
cost of more complicated code.
- Aaron W. Keen, Takashi Ishihara, Justin T. Maris, Tiejun Li,
Eugene F. Fodor, and Ronald A. Olsson
- Department of Computer Science, University of California, Davis,
CA 95616 USA
- olsson@cs.ucdavis.edu
- Received October 10 2001, , Comments to authors Dec 2001
- C578: GridSim: A Toolkit for the Modeling and Simulation of
Global Grids
- Abstract:Clusters, Grids, and Peer-to-Peer (P2P) networks
have emerged as popular paradigms for next generation parallel and distributed
computing. They enable aggregation of distributed resources for solving
large-scale problems in science, engineering, and commerce. In Grid and P2P
computing environments, the resources are usually geographically distributed in
multiple administrative domains, managed and owned by different organizations
with different policies, and interconnected by wide-area networks or the
Internet. This introduces a number of resource management and application
scheduling challenges in the domain of security, resource and policy
heterogeneity, fault tolerance, continuously changing resource conditions, and
politics. The resource management and scheduling systems for Grid computing
need to manage resources and application execution depending on either resource
consumers or owners requirements, and continuously adapting to
changes in resource availability. The management resources and scheduling of
applications in such a large-scale distributed systems is complex undertaking.
In order to prove the effectiveness of resource brokers and associated
scheduling algorithms, their performance need to evaluated under different
scenarios such as varying number of resources and users with different
requirements. In Grid environment, it is hard and even impossible to perform
scheduler performance evaluation in a repeatable and controllable manner as
resources and users are distributed across multiple organizations with their
own policies. To overcome this limitation, we have proposed and developed a
Java-based discrete-event Grid simulation toolkit called GridSim. The toolkit
supports modeling and simulation of heterogeneous Grid resources (both time and
space-shared), users and application models. It provides primitives for
creation of application tasks, mapping of tasks to resources and their
management. To demonstrate suitability of the GridSim toolkit, we have
simulated a Nimrod-G like grid resource broker and evaluated the performance of
deadline and budget constrained cost- and time minimization scheduling
algorithms.
- Manzur Murshed, Rajkumar Buyya, and David Abramson
- Gippsland School of Computing and IT , Monash University,
Gippsland Campus Churchill, Vic 3842, Australia; School of Computer Science and
Software Engineering Monash University, C5.41, Caulfield Campus Melbourne, VIC
3145, Australia
- rajkumar@csse.monash.edu.au
- Received November 6 2001, , Comments to authors Dec 2001
- C579: Experience with Memory Management in Open Linda
Systems
- Abstract:Coordination systems, in particular Linda, have
established themselves as important tools for the development of applications
for open systems such as the Internet. This paper shows how to tackle a
forgotten, but crucial problem in open coordination systems: memory management.
As with any system which intends to be of wide use, coordination systems have
to address the problems of memory exhaustion as memory is a finite resource.
This paper first explores the separation between coordination and computation
in order to make it clear that the problem of memory exhaustion in coordination
systems cannot be solved using garbage collection schemes implemented at the
computation language | a garbage collection scheme must exist in the
coordination environment as well. As Linda is arguably the most successful
coordination system, this paper will focus on the Linda family of systems. It
is expected that the solution in Linda can be adapted to other coordination
systems.
- Ronaldo Menezes
- Florida Institute of Technology Department of Computer Sciences
150 West University Blvd Melbourne, FL 32901
- rmenezes@cs.fit.edu
- Received November 21 2001
- C580: MPI-CHECK: a Tool for Checking Fortran 90 MPI
Programs
- Abstract: MPI is commonly used to write parallel programs
for distributed memory parallel computers. MPI-CHECK is a tool developed to aid
in the debugging of MPI programs that are written in free or fixed format
Fortran 90 and Fortran 77. MPI-CHECK provides automatic compile-time and
run-time checking of MPI programs. MPI-CHECK automatically detects the
following problems in the use of MPI routines: (i) mismatch in argument type,
kind, rank or number; (ii) messages which exceed the bounds of the
source/destination array; (iii) negative message lengths; (iv) illegal MPI
calls before MPI_INITIALIZE or after MPI_FINALIZE; (v) inconsistencies between
the declared type of a message and its associated DATATYPE argument; and (vi)
actual arguments which violate the INTENT attribute.
- Glenn Luecke, Hua Chen, James Coyle, Jim Hoekstra, Marina Kraeva,
Yan Zou
- Iowa State University
- grl@iastate.edu
- Received December 4 2001
- C581: The Design and Implementation of a Chinese Financial
Invoice Recognition System
- Abstract:This paper designs and implements a financial
invoice recognition system based on the features of the Chinese financial
invoice. By using the linear whole block moving method in each vertical
segment, a new fast algorithm is put forth to detect and rectify the slant
image. To distinguish the different form types (the foundation necessary for
locating the form fields, filtering the form lines, etc.), several
representative form features are discussed and an invoice-type features library
is built by using a semi-automatic machine study method. On the basis of the
recognized invoice type, real invoice form is re-oriented against the
corresponding blank form according to the invoice type feature, solving the
problem of adhesion of characters and form lines, as well as the problem of
characters segmentation and recognition. Based on the financial Chinese invoice
image feature, a mutual rectification mechanism founded on the recognition
results of financial Chinese characters and Arabic numerals is put forward to
raise the recognition rate. Finally, the experimental results and conclusions
are presented.
- MING Delie , LIU Jian , TIAN Jinwen
- State Key Laboratory for Image Processing and Intelligent Control
Institute for Pattern Recognition and Artificial Intelligence, Huazhong
University of Science and Technology, Wuhan, Hubei province, 430074, P.R.China
Tel: 86-27-87556301
- mingdelie@263.net
- Received 17 December 2001
- C582: The Virtual Laboratory: Enabling Molecular Modeling
for Drug Design on the World Wide Grid
- Abstract:Computational Grids are emerging as a new
paradigm for sharing and aggregation of geographically distributed resources
for solving large-scale computational and data intensive problems in science,
engineering, and commerce. However, application development, resource
management and scheduling in these environments is a complex undertaking. In
this paper, we illustrate the development of a virtual laboratory environment
by leveraging existing Grid technologies to enable molecular modeling for drug
design on geographically distributed resources. It involves screening millions
of compounds in the chemical database (CDB) against a protein target to
identify those with potential use for drug design. We have used the Nimrod-G
parameter specification language for transforming a molecular docking
application as a parameter sweep application for executing on distributed
systems. We have developed new tools for data management to organize and
provide access to chemical databases as a network service in a scalable and
distributed manner. They enable transparent access to molecule records in the
CDB of interest from remote clients. The Nimrod-G resource broker along with
molecule CDB service and access management infrastructure is used for
scheduling and on-demand processing of docking jobs on the World- Wide Grid
(WWG) resources. The results demonstrate the ease of use and power of the
Nimrod-G and virtual laboratory tools for grid computing.
- Rajkumar Buyya, Kim Branson, Jon Giddy, and David Abramson
- School of Computer Science and Software Engg. Monash University,
Caulfield Campus Melbourne, Australia; Structural Biology, Walter and Eliza
Hall Institute, Royal Parade, Parkville, Melbourne
- rajkumar@csse.monash.edu.au
- Received 19 December 2001
- C583: Automatic Refactoring by Simulation of Multiple
Inheritance in Java
- Abstract:This paper shows a technique that enables the
generation of proxy classes in an automatic manner. The model and
implementation extends production programming to support fast and automatic
prototyping of proxies and interfaces able to simulate multiple inheritance and
refactor legacy systems, even when faced with missing source code. Advantages
of the automatic synthesis of a proxy class include: compile-time type
checking, speed of execution, automatic disambiguation (name space collision
resolution) and ease of maintenance. The approach generates Java source that
does method forwarding and creates interfaces as a means to achieve
polymorphism. Disambiguation can be automatic, semi-automatic or manual. The
forwarding code (i.e., proxy) evolves into an adapter as the delegates change
their specification. This protects client classes from change. The interface
and proxy code are generated automatically via reflection. This type of
simulation of multiple inheritance was previously available only to Java
programmers who performed manual delegation or who made use of dynamic proxies.
The technique has been applied at a major aerospace corporation.
- Douglas Lyon
- Computer Engineering Dept. Fairfield University, Fairfield CT
06430
- lyon@docjava.com
- Received 24 December 2001
- C584: Workload Decomposition Strategies for Hierarchical
Distributed-Shared Memory Parallel Systems and their Implementation with
Integration of High Level Parallel Languages
- Abstract:In this paper we address the issue of workload
decomposition in programming hierarchical distributed-shared memory parallel
systems. The workload decomposition we have devised consists in a two-stage
procedure: a higher-level decomposition among the computational nodes, and a
lower-level one among the processors of each computational node. By focussing
on porting of a case study PIC application, we have implemented the described
work decomposition without large programming effort by using and integrating
the high-levellanguages High Performance Fortran and OpenMP.
- Sergio Briguglio, Beniamino Di Martino, and Gregorio Vlad
- Associazione Euratom-ENEA sulla Fusione, C.R. Frascati, C.P. 65
-1-00044 Frascati, Rome, Italy; Dip. Ingegneria dell'Informazione, Second
University of Naples, Italy
- briguglio@frascati.enea.it
- Received 18 November 2001
- C585: Optimizing Operating System Performance for CC-NUMA
Architecture
- Abstract:CC-NUMA (Cache-Coherent Non-Uniform Memory
Access) architecture is an attractive solution to scalable servers. The
performance of a CC-NUMA system heavily depends on the number of accesses to
remote memory through an interconnection network. To reduce the number of
remote accesses, an operating system needs to exploit the potential locality of
the architecture. This paper describes design and implementation of a
UNIX-based operating system supporting CC-NUMA architecture. The operating
system implements various enhancements by revising kernel algorithms and data
structure. This paper also analyzes the performance of the enhanced operating
system by running commercial benchmarks on a real CC-NUMA system. The
performance analysis shows that the operating system can achieve better
performance and scalability for CC-NUMA by kernel data striping, localization,
and load balancing.
- Moon-Seok Chang
- lBM, MS 905-7C020, 11400 Burnet Rd., Austin, TX 78758, USA
- cmoon@us.ibm.com
- Received December 15 2001
- C586: Implementation of the EARTH programming model on SMP
clusters: a multi-threaded language and runtime system
- Abstract:We designed and implemented an EARTH (Efficient
Architecture for Running THreads) runtime system for a
multi-processor/multi-node cluster. For portability, we built this runtime
system on top of Pthreads under Linux. This implementation enables the
overlapping of communication and computation on a cluster of Symmetric
Multi-Processors (SMP), and lets the interruptions generated by the arrival of
new data drive the system, rather than relying on network polling. We describe
how our implementation of a multi-threading model on a
multi-processor/multi-node system arranges the execution and the
synchronization activities to make the best use of the resources available, and
how the interaction between the local processing and the network activities are
organized. We also describe the EARTH programming model and its associated
programming language, Threaded-C (release 2.0), used to write programs for this
machine model. This programming model supports irregular fine-grain
parallelism through a two-level hierarchy of threads and fibers.We introduce
mutually exclusive fibers that enable the execution of explicitly threaded
Threaded-C code on a multi-processor shared-memory system. One of our new
synchronization mechanisms, atomic mailboxes, implements a non-deterministic
merge operator.
- G. Tremblay, C.J. Morrone, J.N. Amaral, G.R. Gao
- Dept. d'informatique, Univ. du Quebec a Montreal, Montreal, QC,
Canada; Computer Architecture and Parallel Systems Laboratory (CAPSL), Dept. of
Electrical and Computer Engineering, Univ. of Delaware, Newark, DE, USA; Dept.
of Computing Science, Univ. of Alberta, Edmonton, AB, Canada
- amaral@cs.ualberta.ca
- Received January 5 2002
- C587: The LINPACK Benchmark: Past, Present, and Future
- Abstract:In this paper we will clear up some of the
confusion and mystery surrounding the LINPACK Benchmark and some of its
variations. We will cover the original LINPACK 100 benchmark, the TOP500, the
HPL program that can be used in the TOP500 measurement. We will examine what is
measured and describe how to interpret the results of the programs'
execution.
- Jack J. Dongarra, Piotr Luszczek and Antoine Petitet
- Innovative Computing Lab, Computer Science Dept, University of
Tennessee, 1122 Volunteer Blvd, Knoxville TN, 37996-3450; Sun Microsystems,
Inc., Paris, France
- dongarra@cs.utk.edu
- Received December 24 2001
- C588: Coordinating Components in Middleware Systems
- Abstract:Configuration and coordination are central issues
in the design and implementation of middleware systems and are one of the
reasons why building such systems is more di.cult and complex than constructing
stand-alone sequential programs. Through configuration, the structure of the
system is established which elements it contains, where they are located
and how they are interconnected. Coordination is concerned with the interaction
of the various components when an interaction takes place, which parties
are involved, what protocols are followed. Its purpose is to coordinate the
behaviour of the various components in a way that meets the overall system
speci.cation. The open and adaptive nature of middleware systems makes the task
of configuration and coordination particularly challenging. We propose a model
that can operate in such an environment and enables the dynamic integration and
coordination of components through observation of their behaviour.
- Matthias Radestock, Susan Eisenbach
- Trans Enterprise Computer Communications, The Lee Valley
Tecnopark, Ashley Road London N17 9LN; Department of Computing, Imperial
College of Science, Technology and Medicine, London SW7 2BZ, UK
- sue@doc.ic.ac.uk
- Received January 8 2002