Concurrency and Computation:Practice and Experience
Here are instructions for Java Grande
2000 Special Issue. This issue is being reviewed now
This homepage
supports referees.
Standard Referee Form
Journal Vision
Below you will find active abstracts.
Please send email to Geoffrey Fox,
fox@csit.fsu.edu, if you wish to referee any article.
We will send you link to full text online.
Articles under Consideration for Journal
- C467: VGDS: A Distributed Data Structure Framework for Scientific
Computation
- Abstract: This paper gives an overview of the VGDS (Virtual Global
Data Structure) project. The VGDS effort focuses on developing an integrated,
distributed environment that allows fast prototyping of a diverse set of
simulation problems in irregular scientific and engineering domains, focusing
on computations with irregular and adaptive structures. The framework defines
two base libraries: unstructured mesh and adaptive tree, that capture major
data structures involved inirregular scientic computation. The framework
defines multiple layers of class libraries which work together to provide
data-parallel representations to application developers while encapsulate
parallel implementation details into lower layers of the framework. The layered
approach enables easy extension of the base libraries to a variety of
application-specific data structures. Experimental results on a network of
workstations is reported.
- Pangfeng Liu, Jan-Jan Wu
- mailto:wuj@iis.sinica.edu.twmailto:pangfeng@cs.ccu.edu.tw
- Submitted April 21, 2000
- Comments To Authors October 1 2000
- C468: cJVM: A Cluster JVM Architecture for Single System
Image
- Abstract:cJVM is a Java Virtual Machine (JVM) which provides a
single system image of a traditional JVM while executing in a distributed
fashion on the nodes of a cluster. cJVM virtualizes the cluster, supporting any
pure Java application without requiring that applications be tailored
specifically for it. The aim of cJVM is to obtain improved scalability for a
class of Java Server Applications by distributing the application's work among
the cluster's computing resources. cJVM is based on a novel object model which
distinguishes between an application's view of an object (e.g., every object is
a unique data structure) and its implementation (e.g., objects may have
consistent replications on different nodes). This enables us to exploit
knowledge on the usage of individual objects to improve performance (e.g.,
using object replications to increase locality of access to objects).
Currently, we have already completed a prototype which runs pure Java
applications on a cluster of NT workstations connected via a Myrinet fast
switch. The prototype provides a single system image to applications,
distributing the application's threads and objects over the cluster. We have
used cJVM to run without change a real Java Server Application containing over
10K loc and achieve high scalability for it on a cluster. We also achieved
linear speedup for another application with a large number of independent
threads. This paper discusses cJVM's architecture and implementation. It
focuses on achieving a single system image for a traditional JVM on a cluster
while describing in short how we aim at obtaining scalability.
- Yariv Aridor, Michael Factor and Avi Teperman
- mailto:teperman@il.ibm.com
- Submitted May 1, 2000
- Comments To Authors October 1 2000
- C474: Parallel solution of rotating flows in cavities
- Abstract:In this paper, we investigate the parallel solution to the
rotating internal flow problems, using the Navier-Stokes equations as proposed
in [16] and [15]. A Runge-Kutta time-stepping scheme was applied to the
equations and both sequential and message- passing implementations were
developed, the latter using MPI , and were tested on an SGI Origin200
distributed, global shared memory parallel computer. The results show that our
approach to parallelize the sequential implementation requires little effort
whilst providing good results even for medium-sized problems, on this
particular computer.
- Rudnei Dias da Cunha and Alvaro Luiz de Bortoli
- mailto:rudnei@mat.ufrgs.br
- Submitted May 14, 2000
- Comments To Authors October 1 2000
- C475: Analysis and Measurement of the Effect of Kernel Locks in
SMP Systems
- Abstract:This paper reports the use of case studies to evaluate the
performance degradation caused by the kernel-level lock. We define the lock
ratio as a ratio of the execution time for critical sections to the total
execution time of a parallel program. The kernel-level lock ratio determines
how effective programs work on Symmetric MultiProcessor systems. We have
measured the lock ratios and the performance of three types of parallel
programs on SMP systems with Linux 2.0: matrix multiplication, parallel make,
and WWW server programs. Experimental results show that the higher the lock
ratio of parallel programs, the worse their performance becomes. keywords: SMP
Systems, Operating Systems, Parallel Programs, Per- formance Evaluation, Kernel
Lock
- Akihiro Kaieda and Yasuichi Nakayama; Atsuhiro Tanaka, Takashi Horikawa,
Toshiyasu Kurasugi and Issei Kino
- mailto:yasu@cs.uec.ac.jp
- Submitted May 24, 2000
- Comments To Authors October 1 2000
- C476: Effective Multicast Programming in Large Scale Distributed
Systems:The DACE Approach
- Abstract:Many distributed applications have a strong requirement for
efficient dissemination of large amounts of information to widely spread
consumers in large networks.These include applications in e-commerce and
telecommunication.Publish/subscribe is considered one of the most important
interaction styles to model communication at large scale.Producers publish
information for a topic and consumers subscribe to the topics they wish to be
informed of.The decoupling of producers and consumers in time and in space
makes the publish/subscribe paradigm very attractive for large scale
distribution,especially in environments like the Internet. This paper describes
the architecture and implementation of DACE (Distributed Asynchronous Computing
Environment),a framework for publish/subscribe communication based on an
object- oriented programming abstraction in the form of Distributed
Asynchronous Collection (DAC).DACs capture the different variations of
publish/subscribe,without blurring their respective advantages. The
architecture we present is tolerant to network partitions and crash
failures.The underlying model is based on the notion of Topic Membership:a weak
membership for the parties involved in a topic.We present how Topic Membership
enables the realization of a robust and efficient reliable multicast for large
scale.The protocol ensures that,inside a topic,even a subscriber that is
temporarily partitioned away eventually receives a published message.
- Romain Boichat, Patrick Th. Eugster, Rachid Guerraoui, Joe Sventek
- Swiss Federal Institute of Technology, Lausanne and Agilent Laboratories
Scotland, Edinburgh
- Email:Patrick.Eugster@lsesun6.epfl.ch
- Submitted July17, 2000
- C486: Parallel Versions of Stones Strongly Implicit (SIP)
Algorithm
- Abstract:In this paper, we describe various methods of deriving a
parallel version of Stones Strongly Implicit Procedure (SIP) for solving
sparse linear equations arising from finite difference approximation to partial
differential equations (PDEs). Sequential versions of this algorithm have
been very successful in solving semi-conductor, heat conduction and flow
simulation problems and an efficient parallel version would enable much larger
simulations to be run. An initial investigation of various parallelising
strategies was undertaken using a version of High Performance Fortran (HPF) and
the best methods were reprogrammed using the MPI message passing libraries for
increased efficiency. Early attempts concentrated on developing a parallel
version of the characteristic wavefront computation pattern of the existing
sequential SIP code. However, a red-black ordering of grid points, similar to
that used in parallel versions of the Gauss-Seidel algorithm, is shown to be
far more efficient. The results of both the wavefront and red-black MPI based
algorithms are reported for various size problems and number of processors on a
sixteen node IBM SP2.
- J.S. Reeve, A.D. Scurr and J.H. Merlin
- Department of Electronics and Computer Science, University of Southampton
- Email:jsr@ecs.soton.ac.uk
- Submitted August 24, 2000
- C487: Real-time Multi-spectral Image Fusion
- Abstract:This paper describes a novel real-time multi-spectral
imaging capability for surveillance applications. The capability combines a new
high-performance multi-spectral camera system with a distributed algorithm that
computes a spectral-screening Principal Component Transform (PCT). The camera
system uses a novel filter wheel design together with a high-bandwidth CCD
camera to allow image cubes to be delivered at 110 frames per second with
spectral resolution between 400 and 1000 nm. The filters used in a particular
application are selected to highlight a particular object based on its spectral
signature. The distributed algorithm allows image streams from a dispersed
collection of cameras to be disseminated, viewed, and interpreted by a
distributed group of analysts in real-time. It operates on networks of
commercial-off-the-shelf multiprocessors connected with high-performance (e.g.
gigabit) networking, taking advantage of multi-threading where appropriate. The
algorithm uses a concurrent formulation of the PCT to de-correlate and compress
a multi-spectral image cube. Spectral screening is used to give features that
occur infrequently (e.g. mechanized vehicles in a forest) equal importance to
those that occur frequently (e.g. trees in the forest). A human-centered
color-mapping scheme is used to maximize the impact of spectral contrast on the
human visual system. To demonstrate the efficacy of the multi-spectral system,
plant-life scenes with both real and artificial foliage are used. These scenes
demonstrate the systems ability to distinguish elements of a scene, based on
spectral contrast, that cannot be distinguished with the naked eye. The
capability is evaluated in terms of visual performance, scalability, and
real-time throughput. Our previous work on predictive analytical modeling is
extended to answer practical design questions such as For a specified
cost, what system can should be constructed and what performance will it
attain?
- Tiranee Achalakul, Stephen Taylor
- Department Computer Science, Syracuse University
- Email:tachalak@syr.edu
- Submitted September 14, 2000
- C488: Efficient Communication Using Message Prediction for
Cluster of Multiprocessors
- Abstract:With the increasing uniprocessor and SMP computation power
available today,interprocessor communication has become an important factor
that limits the performance of cluster of workstations.Many factors including
communication hardware overhead,communication software overhead,and the user
environment overhead (multithreading,multiuser) affect the performance of the
communication subsystems in such systems. A significant portion of the software
communication overhead belongs to a number of message copying.Ideally,it is
desirable to have a true zero-copy protocol where the message is moved directly
from the send buffer in its user space to the receive buffer in the destination
without any intermediate buffering.However due to the fact that message-passing
applications at the send side do not know the final receive buffer
addresses,early arrival messages have to be buffered at a temporary area.In
this paper,we show that there is a message reception communication locality in
message-passing applications.We have utilized this communication locality and
devised different message predictors at the receiver sides of communications.In
essence,these message predictors can be efficiently used to drain the network
and cache the incoming messages even if the corresponding receive calls have
not been posted yet.The performance of these predictors,in terms of hit
ratio,on some parallel applications are quite promising and suggest that
prediction has the potential to eliminate most of the remaining message copies
.We also show that the proposed predictors do not have sensitivity to the
starting message reception call,and that they perform better than (or at least
equal to) our previously proposed predictors in [3 ].
- Ahmad Afsahi, Nikitas J.Dimopoulos
- Queens University and University of Victoria, Canada
- Email:ahmad@ee.queensu.ca
- Submitted September 14, 2000