Basic Concurrency:Practice and Experience

Basic Concurrency: Practice and Experience Papers

Standard Referee Form

C458: PFPC: An Infrastructure for Research on Parallelizing Compilers
- Abstract: Parallelizing compilers analyze sequential programs, in particular their loops, to detect hidden parallelism and automatically restructure sequential programs into parallel subtasks executed on mul- tiprocessor. This paper describes the design and implementation of an efficient and precise parallelizing compiler to parallelize loops and achieve high speedup rates on multiprocessor systems. As we know, the execution efficiency of a loop can be enhanced if the loop is executed in parallel or partially parallel, like a DOALL or DOACROSS loop. This paper also reports on a practical parallel loop detector (PPD) that is implemented in PFPC on fnding the parallelism in loops. The PPD can extract the potential DOALL and DOACROSS loops in a program by verifying array subscripts. In addition, a new model by using knowledge-based approach is proposed to exploit more loop parallelisms in this paper. The knowledge-based approach integrates existing loop transformations and loop scheduling algorithms to make good use of their ability to extract loop parallelisms. Two rule-based systems, called the KPLT and IPLS, are then developed using repertory grid analysis and attribute ordering tables respectively, to construct the knowledge bases. Finally, a runtime technique based on inspector/executor scheme is proposed in this paper for fnding available parallelism on loops. Our inspector can determine the wavefronts of a loop with any complex indirected array indexing pattern by building a DEF-USE table. The inspector is fully parallel without any synchronization. Experimental results show that the new method can handle any complex data dependence pattern that cannot be handled by the previous research. As an ultimate goal, a high-performance and portable FORTRAN parallelizing compiler on shared-memory multiprocessors will be constructed. We believe that our research may provide more insight into the development of a high-performance parallelizing compiler.
- Chao-Tung Yang, Shian-Shyong Tseng Yun-Woei Fann, Ting-Ku Tsai Ming-Huei Hsieh, Cheng-Tien Wu
- ROCSAT Ground Section National Space Program Office Hsinchu, Taiwan 300, ROC and Dept. Computer & Information Science, National Chiao Tung University, Hsinchu, Taiwan 300, ROC
- Submitted November 18,99
- mailto:ctyang@nspo.gov.tw or mailto:sstseng@cis.nctu.edu.tw
- C458ctyang/c458ctyang.pdf
- Submit Info:C458ctyang
C459: A Parallel Viterbi Decoding Algorithm
- Abstract: In this paper we express the Viterbi algorithm as a matrix-vector reduction in which multiplication is replaced by addition and addition by minimisation. The resulting algorithm is then readily parallelised in a form suitable for implementation on a systolic processor array. We describe the algorithm for BCH codes which have a task graph with valence restricted to four inputs and four outputs. The method is also applicable to convolution codes but the complexity of the task graph increases with the number of input bits for these codes. Results for BCH codes are given for two general purpose parallel machines, an IBM SP2 and a Meiko CS2.
- Dr. J.S. Reeve, Concurrent Computation Group, Department of Electronics and Computer Science, University of Southampton Southampton, SO17 1BJ UK
- mailto:jsr@ecs.soton.ac.uk
- Submitted November 24 99
- http://www.ecs.soton.ac.uk/~jsr
- C459reeve/c459reeve.pdf
- Submit Info: C459reeve
C460: Inverse Toeplitz Eigenproblem on Personal Computer Networks
- Abstract:
- J.M. Badia, A.M. Vidal, Dept. of Computer science, Univ. Jaume I, 12071, Castellón, Spain
- mailto:jbadia@inf.uji.es
- Submitted September 21, 99
- No URL
- No electronic version
- Submit Info: C460balia
C461: Emmerald: A fast matrix-matrix multiply using INTEL SIMD technology
- Abstract:
- Mr. Douglas Aberdeen, Computer Sciences Laboratory, Research School of Information Science and Engineering, Australian National University, Canberra, A.C.T. 2601, Australia
- mailto:daa@csl.anu.edu.au
- Submitted September 16. 99
- No URL
- No electronic version
- Submit Info: C461aberdeen
C462: Experiences from Integrating Algorithmic and Systemic Load Balancing Strategies
- Abstract:Load balancing increases the efficient use of existing resources for parallel and distributed applications. At a coarse level of granularity, advances in runtime systems for parallel programs have been proposed in order to control available resources as efficiently as possible by utilizing idle resources and using task migration. Simultaneously, at a finer granularity level, advances in algorithmic strategies for dynamically balancing computational loads by data redistribution have been proposed in order to respond to variations in processor performance during the execution of a given parallel application. Combining strategies from each level of granularity can result in a system which delivers advantages of both. The resulting integration is systemic in nature and transfers the responsibility of efficient resource utilization from the application programmer to the runtime system. This paper presents the design and implementation of a system that combines an algorithmic fine-grained data parallel load balancing strategy with a systemic coarse-grained task-parallel load balancing strategy, and reports on recent experimental results of running a computationally intensive scientific application under this integrated system. The experimental results indicate that a distributed runtime environment which combines both task and data migration can provide performance advantages with little overhead. It also presents proposals for performance enhancements of the implementation, as well as future exploration for effective resource management.
- Ioana Banicescu, Samuel H. Russ, Sheikh Ghafoor, Vijay Velusamy and Mark Bilderback
- mailto:ioana@cs.msstate.edu
- Submitted December 17, 99
- URL: http://www.cs.msstate.edu/ioana
- Electronic version:C462banicescu/pc_98.pdf
- Submit Info: C462banicescu
C464: A Survey of Concurrent Object-Oriented Languages
- Abstract: During the last decade object-oriented programming has grown from marginal in uence into widespread acceptance. During the same period, progress in hardware and networking has changed the computing environment from sequential to parallel. Multi-processor workstations and clusters are now quite common. Unnumbered proposals have been made to combine both developments. Always the prime objective has been to provide the advantages of object-oriented software design at the increased power of parallel machines. However, combining both concepts has proven to be notoriously difficult. Depending on the approach, often key characteristics of either the object-oriented paradigm or key performance factors of parallelism are sacriced, resulting in unsatisfactory languages. This survey rst recapitulates well-known characteristics of both the object-oriented paradigm and parallel programming, and then marks out the design space of possible combinations by identifying various interdependencies of key concepts. The design space is then lled with data points: For 111 proposed languages we provide brief characteristics and feature tables. Feature tables, the comprehensive bibliography, and web-addresses might help identifying open questions and preventing re-inventions.
- Michael Philippsen
- mailto:phlipp@icsi.berkeley.edu
- Submitted February 14,00 Accepted March 2,00
- URL: http://wwwipd.ira.uka.de/~phlipp/mypapers/cool-survey.ps.gz
- Electronic version:C464mphillippsensurvey/cool-survey.pdf
- Submit Info: C464mphillippsensurvey
C465: The Software Architecture of a Distributed Problem-Solving Environment
- Abstract: This paper describes the functionality and software architecture of a generic problem-solving environment (PSE) for collaborative computational science and engineering. A PSE is designed to provide transparent access to heterogeneous distributed computing resources, and its intended to enhance research productivity by making it easier to construct, run, and analyze the results of computer simulations. Although implementation details are not discussed in depth, the role of software technologies such as CORBA, Java, and XML is outlines. An XML-based component model is presented. The main features of a Visual Component Composition Environment for software development, and an Intelligent Resource Management System for scheduling components, are described. Some prototype implementations of PSE sub-systems are also presented.
- D.W. Walker, M. Li, O.F. Rana, M.S. Shields, and Y. Huang
- mailto:walkerdw@ornl.gov
- Submitted March 1,00
- Electronic Version: C465walkermarch00/cpandeinitial.pdf
- Submit Info: C465walkermarch00
C466: Redistribution strategies for portable parallel FFT: A case study
- Abstract: The best approach to parallelize multidimensional FFT algorithms has long been under debate. Distributed transposes are widely used, but they also vary in communication policies and hence performance. In this work we analyze the impact of different redistribution strategies on the performance of parallel FFT, on various machine architectures. We found that some redistribution strategies were consistently superior, while some others were unexpectedly inferior. An in-depth investigation into the reasons for this behavior is included in this work.
- Anshu Dubey and Daniele Tessera
- mailto:dubey@tagore.uchicago.edu
- Submitted March 6,00
- Submit Info: C466dubeymarch00
- Note: No Electronic Version
C467: VGDS: A Distributed Data Structure Framework for Scientific Computation
- Abstract: This paper gives an overview of the VGDS (Virtual Global Data Structure) project. The VGDS effort focuses on developing an integrated, distributed environment that allows fast prototyping of a diverse set of simulation problems in irregular scientific and engineering domains, focusing on computations with irregular and adaptive structures. The framework defines two base libraries: unstructured mesh and adaptive tree, that capture major data structures involved inirregular scientic computation. The framework defines multiple layers of class libraries which work together to provide data-parallel representations to application developers while encapsulate parallel implementation details into lower layers of the framework. The layered approach enables easy extension of the base libraries to a variety of application-specific data structures. Experimental results on a network of workstations is reported.
- Pangfeng Liu, Jan-Jan Wu
- mailto:wuj@iis.sinica.edu.tw mailto:pangfeng@cs.ccu.edu.tw
- Submitted April 21, 2000
- Electronic version:C467wuapril00/paper.pdf
- Submit InfoC467wuapril00
C468: cJVM: A Cluster JVM Architecture for Single System Image
- Abstract:cJVM is a Java Virtual Machine (JVM) which provides a single system image of a traditional JVM while executing in a distributed fashion on the nodes of a cluster. cJVM virtualizes the cluster, supporting any pure Java application without requiring that applications be tailored specifically for it. The aim of cJVM is to obtain improved scalability for a class of Java Server Applications by distributing the application's work among the cluster's computing resources. cJVM is based on a novel object model which distinguishes between an application's view of an object (e.g., every object is a unique data structure) and its implementation (e.g., objects may have consistent replications on different nodes). This enables us to exploit knowledge on the usage of individual objects to improve performance (e.g., using object replications to increase locality of access to objects). Currently, we have already completed a prototype which runs pure Java applications on a cluster of NT workstations connected via a Myrinet fast switch. The prototype provides a single system image to applications, distributing the application's threads and objects over the cluster. We have used cJVM to run without change a real Java Server Application containing over 10K loc and achieve high scalability for it on a cluster. We also achieved linear speedup for another application with a large number of independent threads. This paper discusses cJVM's architecture and implementation. It focuses on achieving a single system image for a traditional JVM on a cluster while describing in short how we aim at obtaining scalability.
- Yariv Aridor, Michael Factor and Avi Teperman
- mailto:teperman@il.ibm.com
- Submitted May 1, 2000
- Electronic version:C468teperman/cjvm.pdf
- Submit Info:C468teperman

Special Issue on Java and Message Passing

C469 to C473 Edited by Mark Baker

C469:An open Java System for SPMD Programming
- Nenad Stankovic
- Electronic Source:C469-473JavaMPSpecIssue2000/C469ConcurrencyFinal99.pdf
C470:Heterogeneous Parallel Computing using Java and WMPI
- Luis M. Silva, Paulo Martins João, Gabriel Silva
- Electronic Source:C469-473JavaMPSpecIssue2000/C470wmpi_cpe_final.pdf
C471:Java on Networks of Workstations (JavaNOW): A Parallel Computing Framework Inspired by Linda, Actors, and the Message Passing Interface (MPI)
- Vladimir S. Getov, Paul A. Gray, Vaidy S. Sunderam
- Electronic Source:C469-473JavaMPSpecIssue2000/C471jn-cpe.pdf
C472:Aspects of Portability and Distributed Execution for JNI-Wrapped Message Passing Libraries
- Electronic Source:C469-473JavaMPSpecIssue2000/C472paper4.pdf
C473: MPJ: MPI-like Message Passing for Java
- Bryan Carpenter, Vladimir Getov, Glenn Judd, Anthony Skjellum and Geoffrey Fox
- Electronic Source:C469-473JavaMPSpecIssue2000/C473cpempi.pdf

C474: Parallel solution of rotating flows in cavities
- Abstract:In this paper, we investigate the parallel solution to the rotating internal flow problems, using the Navier-Stokes equations as proposed in [16] and [15]. A Runge-Kutta time-stepping scheme was applied to the equations and both sequential and message- passing implementations were developed, the latter using MPI , and were tested on an SGI Origin200 distributed, global shared memory parallel computer. The results show that our approach to parallelize the sequential implementation requires little effort whilst providing good results even for medium-sized problems, on this particular computer.
- Rudnei Dias da Cunha and Alvaro Luiz de Bortoli
- mailto:rudnei@mat.ufrgs.br
- Submitted May 14, 2000
- Electronic version:C474dacunha/rotflow.pdf
- Submit Info:C474dacunha
C475: Analysis and Measurement of the Effect of Kernel Locks in SMP Systems
- Abstract:This paper reports the use of case studies to evaluate the performance degradation caused by the kernel-level lock. We define the lock ratio as a ratio of the execution time for critical sections to the total execution time of a parallel program. The kernel-level lock ratio determines how effective programs work on Symmetric MultiProcessor systems. We have measured the lock ratios and the performance of three types of parallel programs on SMP systems with Linux 2.0: matrix multiplication, parallel make, and WWW server programs. Experimental results show that the higher the lock ratio of parallel programs, the worse their performance becomes. keywords: SMP Systems, Operating Systems, Parallel Programs, Per- formance Evaluation, Kernel Lock
- Akihiro Kaieda and Yasuichi Nakayama; Atsuhiro Tanaka, Takashi Horikawa, Toshiyasu Kurasugi and Issei Kino
- mailto:yasu@cs.uec.ac.jp
- Submitted May 24, 2000
- Electronic version:C475nakayama/kaieda.pdf
- Submit Info:C475nakayama