Integrating Multiple Programming Paradigms on Connection Machine CM5 in a Dataflow-based Software Environment

Gang Cheng, Geoffrey C. Fox and Kim Mills

Northeast Parallel Architectures Center

Syracuse University, Syracuse, NY 13244

Abstract By viewing different parallel programming paradigms as essential heterogeneous approaches in mapping ``real-world'' problems to parallel systems, we discuss methodologies in integrating multiple programming models on a Connection Machine CM5. In a dataflow based integration model built in a visualization software AVS, we demonstrate a simple, effective and modular way to couple sequential, data-parallel and explicit message-passing modules into an integrated programming environment on the CM5.

Introduction

One of the major issues in parallel processing concerns about homogeneous versus heterogeneous at both software and hardware levels. In parallel software development, currently there are two mainstream approaches to deal with this issue. One is to attempt to hide the heterogeneity of architectures from the programmer. This approach usually employs software technology, especially advanced compiler techniques, to present users with a homogeneous, high level programming model. This methodology leads to highly portable programs across multi-platforms with different architectures and programming models and greatly simplifies programming on parallel machines, while among its drawbacks are hard to obtain most efficient programs for a general application on a particular machine, and the extremely complex compiler design. Among examples of this approach are the FortranD project under development at the Northeast Parallel Architectures Center at Syracuse University and Rice University\cite{fox2}, various automatic tools, and the efforts of HPF\cite{hpf} and the emerging MPI message passing standard\cite{mpi}. The other approach acknowledges the heterogeneity at both software and hardware levels, by extending existing programming languages and software environments into parallel environments best exploiting the high performance of underlying architectures. The advantage of the latter approach is the resultant highly efficient programs and an open software environment most flexible to be able to support an integrated heterogeneous processing environment which we will discuss in this paper. While the first approach is promising(attractive) and represents a major direction in parallel programming language development, we believe that the future of high performance computing lies in the integration of existing and emerging architectures and technologies into a heterogeneous networked computing environment, to be able to solve general classes of applications with various requirements in scientific computation, communication, programming models, hierarchy memory, I/O, data processing and visualization. Another advantage of this approach is in the adopting of existing programs where only those computationally intensive components need to be rewritten to run on supercomputers. This is especially important in porting a large existing application system when a significant proportion of the sequential code are those ``interface or management codes'', i.e. codes do not perform any real scientific computational tasks but simply manage the system and various computing resources, such as standard I/O, initiation, graphical user interface, system control interface, OS or file system interface and networking interface, to name a few. This heterogeneous approach, together with our view of parallel software as a mapping process between two complex systems - problems and parallel computers\cite{fox1}, would require integration methodologies and tools to facilitate multiple parallel programming models in an unified environment. System integration is expected to play a critical role in parallel software development and applications. In this paper, we discuss issues in integrating programs written in different programming paradigms and languages on a MIMD machine Connection Machine CM5. Using a dataflow-based integration model built in a commercially available software Application Visualization System(AVS) to a case study of an atomspheric advection modeling application, we demonstrate a simple, effective and modular way to seamlessly couple sequential, data parallel and explicit message-passing parallel modules into an unified programming environment on CM5. \section{Multi-paradigm Parallel Programming Environments} \subsection{Multi-paradigm Programming in Mapping Problem Domain to Parallel System} For this article, we shall consider that a {\em programming paradigm} is a model or an approach employed in solving a programming problem with a restricted set of concepts. Programming paradigms either induce or are induced by a class of languages, the programming environments that these languages are supported in, and the software engineering discipline one uses to produce systems within these environment. Typically, each class places different requirements on its associated programming environment\cite{hail}. On existing parallel systems, there are two dominant parallel programming paradigms: \begin{enumerate} \item Data parallel, in which more or less the same operations is performed on many data elements by many processors simultaneously. Data parallel program exploits parallelism in proportion to the quantity of data involved hence offers the highest potential for concurrency. Data parallel program is also easy to write, understand and debug. \item Explicit message passing which allows two or more operations to be performed simultaneously. Message passing paradigm allows programmer to have maximum flexibility and control in exploiting domain decomposition and control decomposition of the application problem so that maximum efficiency utilizing a MPP hardware can be achieved by carefully design of explicit message passing parallel algorithms. It allows the processing nodes to synchronize as frequently or infrequently as required for a given application. A message-passing program is free to arbitrarily distribute program tasks and data among the nodes as necessary, using synchronization and data communication between nodes when it is necessary. This arbitrary load-balancing of tasks and data makes message-passing particularly useful for applications that require dynamic allocation of tasks and data. \end{enumerate} Like the trade-off among various conventional programming languages, expressiveness and efficiency are two major factors featuring parallel programming models and it is even more remarkable because the driving force using a parallel computer is the efficiency, and programming on parallel machines tends to be more difficult and less portable. The programming languages for massively parallel processors must go beyond conventional languages in expressivity and cost/performance. Unfortunately, unlike on sequential machines and due to efficiency consideration on a particular SIMD/MIMD architecture with an unique interconnection topology, most parallel systems support only one single programming model, i.e. either data parallel or message passing. Given a targeted machine and an application problem, usually a decision must be made to choose which parallelism to solve the problem, early before looking into the problem, let alone at the implementation stage. This is unnecessary and should be avoided, as mostly for a large application problem some components may be well suited to data parallel, while others to message passing paradigm. We should exploit the parallelisms at each level of computations. There has been long debt in parallel community and industry on the issue of data-parallel versus message-passing programming, as well as SIMD versus MIMD machines. As studied in \cite{fox1}, by viewing "real-world" problem and parallel computers both as complex systems, the fundamental issue in solving problems on a parallel computer becomes whether we can easily and efficiently map a problem structure to a particular parallel architecture. A general mapping followed by a computer simulation typically consists of several mapping stages (shown in Fig. 1). In this paper, we are interested in the software environment and programming models used to effectively map a large application onto a massively parallel system. \begin{figure} %[tb] \centerline{\psfig{figure=map_small.eps}} %BoundingBox: 50 80 550 720 \end{figure} Parallel systems are built to solve large scale computationally intensive applications(e.g., "grand-challange" problems). The broad range of HPCC applications may embody a rich diversity of program structures. It is anticipated that no one parallel programming model will satisfy all needs. Data parallel, task parallel, and object parallel have all found applicability to end user problems\cite{report,kim}. A useful parallel programing environment for MPPs must incorporate semantic constructs for delineating program parallelism, identifying locality and specifying data set partioning, as well as the normal requirements in conventional software engineering. It will have to permit user accessibility to a mix of parallelism models, either by incorporating them into a single schema or by providing means for a single application to use different languages or different parallelism models for separate parts of a user problem. In either case, a parallel software system will of necessity support interoperability among independently derived program modules, perhaps even running on separate computers in a distributed heterogeneous computing environment. This is especially important for irregular and dynamic problems, and for multidiciplinary applications, as it is often inevitable in those applications to require coordination or composition of multiple paradigms programming, e.g., high-level data-parallel and low level explicit message passing, considering the complex of problem structures to be expressed and solved by the software environment. \cite{foster} gives such an example of multi-model earth system in which an ocean model and an atmosphere model are loosely coupled each other. While the homogenous approach emphasizes a virtual programming model due largely to portability consideration, and high-level and low-level software are used in a strict successive manner in the mapping as shown in Fig. 1, it is most likely that a large application problem would require programming models of different abstraction levels to be used in a hybrid fashion. For instance, Fig. 2 gives another picture of how subtasks of an application task can be mapped to a parallel system in a single mapping stage by multiple programming models. A problem task may consist of subtasks requiring a set of computing services such as parallel computation/communication, parallel database management, parallel or sequential I/O, real-time visualization, sequential event-driven graphical user interface, networking interface, etc. This mapping process will require not only a "universal architecture" of the parallel computer to be able to efficiently support multiple programming models, it also requires a software environment to be capable of supporting task-level computation and communication and integrating different programming paradigms in a single framework. Software integration techniques are clearly required in interfacing data and control flows among components(modules) of different programming models in this single mapping process. CM5 is one of a few today's parallel computers which support both data parallel and explicit message passing programming models. We choose it to evaluate the methodologies of multi-paradigm integration in this paper. However, we believe that many of the issues discussed here are also applicable to the programming integration in a heterogeneous computing environment in which multiple parallel systems are connected by high-speed LAN and/or WAN networks. \subsection{Multi-paradigm Programming} In this section, we first examine integration techniques of multi-paradigm programming on a sequential machine and then discuss briefly corresponding implementation on a Connection Machine CM5. On a conventional uniprocessor machine, multi-paradigm integration can be in general divided into following four classes, based on the interface of how one paradigm is combined in the others: \begin{enumerate} \item Full integration --- a single language has constructs supporting multiple programming paradigms. The transfer of program control and data among different language components at runtime are implicitly arranged by the compiler. The semantics of interfacing different paradigms is built in the language specification. Usually, a programming language evolves along this line, with a single paradigm at initial stage and new language components added in as applications grow and different paradigms are required. The evolvement of C++ from C(which in turn is evolved from Pascal), which now integrates imperative, modular and object-oriented programming paradigms, can be thought as an example of this category. Another example can be found in \cite{cheng3} in which functional and logic programming paradigms are integrated in a unified system. Among this approach in parallel programming language, compositional C++(CC++)\cite{cc++} and FortranM\cite{fortran-m} are examples attempted to integrate in a single language framework different paradigms of parallel programming such as data parallel, task-parallel and object-parallel paradigms, imperative and declarative programming, shared memory and message-based programs. This approach provides the most flexible and efficient hybrid integration of multiple models. However, new languages have to be developed and complex semantics of interfacing the multiple programming paradigms in a new language is usually inevitable. On a MIMD parallel system like CM5, because data-parallel and message-passing paradigms have dramatic implementation and performance differences in data mapping/processor allocation, task decomposition and execution synchronization, and require unique semantics inherently in both the programming language and underlying system architecture, this integration approach may incur interface overhead and unexpected complex semantics thus seems not feasible on a distributed-memory machine. The current CM programming environment provides some restricted ways to support this kind of integration, mostly through specialized library routines which encapsulate optimized implementation details of the functions provided. As an example, the CM Fortran Utility Library provides convenient access from CM Fortran to the capabilities of lower-level CM software. The purpose is typically to achieve functionality or performance beyond what is currently available from the compiler. CM Fortran programmers can use the library procedures in situations where one is normally tempted to make explicit calls to lower-level software. The utility procedures take CM Fortran array names and other CM Fortran data objects as arguments so that there is no need to convert CM Fortran objects into the data types by lower-level software. The CM Fortran Utility Library provides procedures for inquires, random number generation, dynamic array allocation, interprocessor and local data motion, and parallel I/O\cite{cmf}. \item Embedded integration --- master-slave style. The interface is a built-in construct (or convention) in a master language to allow seamless invocation of subroutines(functions) written in a slave language. No global data can be shared directly by both languages, even they have the same addressing memory. Data transfer among modules in master and slave languages is identical to that in a single language and is carried out by passing pointers to the same shared memory so that there is no data copy cost. Separate compilations of different languages are required and their object files are finally linearly linked together. The executable program runs as a single process. This approach provides a simple, coherent yet efficient way to support the integration at language level. Among examples of this approach are interlanguage calls on most conventional platforms, such as from C calling Fotran77 or assembly, or vice visa. This is the common approach on most SIMD and MIMD machines to support integration of parallel programming in host sequential languages, such as on CM5 calling CMfortran, C* or CMMD message passing primitives from Fortran77 or C. In order to exploit the special high-performance hardware arithmetic accelerator (called vector unit) in each processing node(PN) of the CM5 system, current CM5 programming software provides some limited way to mix data-parallel paradigm in message-passing program. Explicit message passing CMMD programs can be written in data parallel programming languages CMFortran and C*, in which case the programs use both the microprocessor and the vector units of a PN, with serial arrays stored in the microprocessor and parallel arrays in the VU memory. It can thus take advantage of the VUs for high-speed floating-point computations. Parallel arrays can be passed as arguments to CMMD message passing primitives in much the same way as serial arrays. Within this integration model, data parallel operations are confined in a ``node-level'' while the overall program is in explicit message-passing paradigm. If adopting this approach on a CM5, an ideal integrated program would be as follows: Executed as a single process on the control processor(CP) of the CM5, the main program in Fortran77 (or C) would first act as a host sequential program in the CMMD host/node programming model and starts a CMMD node program by enabling and broadcasting some data to the node program which is written in Fortran77, C or even CMfortran(node-level data-parallel), then this host program invokes another subroutine written in CMFortran (or C*). After the control is back to the main program from the CMFortran subroutine, it collects the results sent from the node CMMD program and finally ends. Before and after enabling the CMMD node program or invoking the CMFortran subroutine, there can be any other non-parallel computing subroutines in the sequential host program. Technically, however, this approach would pose significant difficulties on the complier(linker) as it is now the compiler's responsibility, as opposed to that of a third party daemon or the OS in the process-based approach described below, to manage the time-sharing of CM5's PNs and internal networks among the CMFortran subroutine and the CMMD node program. \item Loosely coupled integration --- process-based integration. Different individual program modules written in same or different languages are complied and linked separately, while their executables run concurrently as multiple processes time-sharing a CPU. Message passing among program processes is normally via socket-based interprocess communication(IPC) though if two modules share a host it is common to use shared memory to optimize the communication. Runtime process control is coordinated by a supervising manager(or called daemon) or the operating system. Currently most windowing systems and networking applications follow this approach, for instance, the client-server model popularly used in networking programming. On the host machine of a parallel system, IPC-based systems provide an open environment for the parallel programming software to be naturally migrated into the conventional software environement in which software such as concurrent operating system, networking, graphics, I/O, and sequential programming languages have been well-established for many years. This process-based approach also opens great opportunities to construct a ``metacomputer'' via a networked environment\cite{smarr}. Depending on the way(protocal) to facilitate the IPC, such as data exchange, management of buffers and sockets, process control and synchronization, an integration system of this kind can be quite different. For example, PVM\cite{sund}, a process-based general message-passing system, provides a set of user interface primitives that may be incorporated into existing procedural languages. PVM primitives exist for the invocation of processes, message transmission and reception, broadcasting, synchronization via barriers, mutual exclusion, and shared memory. Processes in PVM may be initiated by synchronously or asynchronously, and may be conditioned upon the initialization or termination of another process, or upon the availability of data values. The PVM constructs therefore permit the most appropriate programming paradigm and language to be used for each individual component of a parallel system. In our work, we use another process-based system, AVS\cite{avs2}, which was primarily used for visualization applications. We will demonstrate in later sections the advantages of the AVS's dataflow IPC model in facilitating data-parallel and message-passing programming paradigms on CM5. \item Not-at-all integration --- non-interactive style. Data passing and sharing among different program modules, usually stand-alone executables, are carried out at (UNIX) operating system level through shell scripts, pipes, intermediate data files, and other pre- or post- processing techniques. This is the usual way for an application to collectively use the (parallel) computing services when it does not require their programing tasks to run in an interactive way. \end{enumerate} \subsection{Parallel Programming on Connection Machine CM5} Detailed description about CM5 architecture and software system can be found in \cite{cm5} and related CM5 documents from TMC. In this section, we briefly outline existing parallel programming environment on CM5. The CM5 system combines the features of SIMD and MIMD designs, integrating them into a single parallel architecture. The unique feature of CM5 provides us the alternatives to exploit both paradigms in the parallel programming environment. The CM5 architecture is designed to support especially well the data parallel which implies a relatively synchronous parallel programming featuring a single conceptual thread of control and a global approach to data layout and I/O, in which arrays of data are laid out across all processors, and all elements are computed upon simultaneously, by their respective processors. Although the conceptual model itself is completely synchronous, in practice processors may execute asynchronously any operations that are independent of other processors, such as conditional operations on locally stored data. The CM5 data parallel compilers take full responsibility and advantage of this freedom to exploit the MIMD hardware capabilities of the CM5. Currently, CM5 supports data parallel Fortran(CMfortran), data parallel C(C*) and data parallel Lisp(*Lisp) programming languages. Explicit message passing model is extended from the data parallel model on CM5, by using a package of macros and runtime primitives called CMMD supporting MIMD style low level communications operations. Programs that use CMMD typically operate in a SPMD(single program multiple data) style in that each processing node of the CM5 runs an independent copy of a single program and manages its own computations and data layout, and communication between nodes is handled by calls to CMMD message passing primitives. The message passing programs can be written in Fortran 77, CMFortran, C, C++, C* and an assembly language DPEAC. As shown in Fig. 3, each processing node(PN) is a general-purpose computer that can fetch and interpret its own instruction stream, execute arithmetic and logical instructions, calculate memory address, and perform interprocessor communication. All PNs can perform independent tasks or collaborate on a single problem. A control processor(CP) is similar to a standard high-end workstation acting as a partition manager to communicate with the rest of CM5 system through the Control Network and Data Network. CMOST, a parallel timesharing operating system enhanced from the UNIX, runs on the CP to make all CM5 resource allocation and swapping decisions, as well as most system calls for process execution, memory management, I/O and access from/to local networks. The Control Network provides tightly coupled communication services. Optimized for fast response and low latency, its functions include synchronizing the PNs, broadcasting data to every node, combining a value from every node to produce a single result, and computing certain parallel prefix operation. The Data Network provides loosely coupled communication services. Optimized for high bandwidth, it provide point-to-point data delivery. A data parallel or CMMD message passing process running on the CP plus the nodes is a single process time-sharing the CM5 system with other processes. \section {AVS - A Dataflow Based Integration Tool for Multi-paradigm Programming on CM5} \subsection{The Application Visualization System} AVS\cite{avs1} is a widely available commercial visualization environment based on a dataflow model for scientific data visualization and process control. It incorporates visualization, graphics, visual programming, process management and networking into a single comprehensive visualization application software and development environment. We are most interested in its software integration capability in this article. The AVS dataflow is a model of parallel computation in which a flow network of autonomous processes compute by passing data along arcs that interconnect them by input/output ports. Each module/process fires autonomously as soon as all the inputs it requires arrive on its input ports. It then produces a value that flows on its output port(s) and thus triggers further computations. The AVS flow networks are built from a menu of modules by using a direct-manipulation visual programming interface. Both process control and data transfer among processes in AVS are in a modular fashion and is completely transparent to the programmer. AVS provides a data-channel abstraction that transparently handles module connectivity and ports type-checking. The module programmer needs only to define the input and output ports in AVS predefined data types and using a set of AVS routines and micros. Message passing occurs at a high level of data abstraction and only through the input and output ports. AVS kernel(manager) executes the flow network and supervises the real data transfer which is eventually carried out by sockets at a lower level. \subsection{A General Parallel Programming Environment on the CM5 with AVS} Although AVS does not provide support for parallel computing within the module, a specific module may use machine/language specific features to achieve such parallelism as data parallel or explicit message passing. Using the modular process management in AVS and time-sharing CMOST on the CP and CM5's parallel programming software, we can naturally combine programs in data-parallel, message passing and sequential programming languages available on CM5 into a single process-based software system. An AVS/CM process would be a module or a set of modules written in the same language/paradigm. A typical parallel process will have two distinct portions in the code: \begin{enumerate} \item {\em AVS/CM interface part} -- Sequential code in Fortran77, C or C++ to coordinate data flow and control flow between AVS kernel and the CM parallel system. It consists of codes declaring the AVS module's input/output ports and (graphical) parameters, packing serial data received from other AVS modules or by sequential subroutines in the same module into parallel data. e.g., transfer serial arrays into data parallel arrays or decompose/load block data onto each processing node, or vice visa, sequential I/O, or a host program for the CMMD host/node programming model. \item {\em CM computation part} -- Data parallel subroutines in CMFortran or C*, or CMMD node programs in Fortran77, C or C++ to perform main calculations requiring intensive computation and communication. \end{enumerate} Multiple modules can either be compiled and linked individually into separate executables(processes) or they can be linked into a single executable. Modules in different processes must communicate data through the CP under the supervise of the AVS kernel. If the modules are written in different paradigms, they are in different processes. If both modules are in the same process on the CM5, only a pointer needs to be passed, therefore, modules written in the same paradigm can be grouped in a single process and data transfer among them has no communication (data-movement) cost. AVS provides a sufficiently high level way to integrate both data parallel and message passing programming paradigms into a single environment on CM5, so that the programmers have the freedom to choose either parallel programming fashion to solve their problem, independently or in a mixed way. Other advantages using this integration system include: \begin{enumerate} \item Modularity and template-oriented --- Programs are constructed by using explicitly declared communication channels to plug together program processes. A process can encapsulate common data decomposition, manipulation and internal communication, and the programming paradigm inherent in the code. The modular approach of AVS also lends itself well to software sharing, module reuse, extensibility and flexibility. \item Module reusability --- CM5 data parallel or message passing codes and algorithms are modularly developed and compiled once, and can then be reused indefinitely in a plug-and-play mode in a variety of diverse scientific applications. Because the interfaces between modules are well defined, modules developed at different computer sites, by different developers and in different paradigms may be freely shared. Modules created for one application can be readily be used to another similar application with little or no change. \item Flexibility and extensibility for rapid prototyping --- At first stage of an application development, programers can concentrate on task decomposition, parallel algorithm design and performance improvement, totally ignore system integration issues. They may write their own modules in their favorite programming paradigm and language binding. The programs are then migrated to the AVS environment by simply attaching and linking the AVS interface code, while the main parallel algorithms can be untouched. \item Hierarchy parallelism --- At top of the task levels decomposition of problem domain, modules that are connected in the dataflow programming paradigm have the potential to run concurrently or in a pipelined way. At this coarse grained process level, functional parallelism can also be achieved such that graphical interface, sequential and parallel I/O, remote networking access and the decomposed task computation can be carried out concurrently by different components of the CM5 supercomputer. The second level of the hierarchy is the hybrid data parallelism in data parallel modules and control parallelism in explicit message passing modules. At the bottom level parallelism can be thought as that inherented in the piplelined processing hardware such as the vector units in the CM5. \item Safety --- Operations on channels are restricted so as to guarantee deterministic execution. Channels are strongly typed, so a compiler can check for correct usage. \end{enumerate} This is a general parallel programing environment which can allow sequential with parallel, data-parallel and explicit message passing programming, concurrent and pipeline, and control parallelism and functional parallelism. Running in an interactive environment, it gives the user maximum control over the optimization of heterogeneous resources and capabilities of a high performance computing system. \subsection {Performance Consideration in the Process-based Integration} The process-based integration system offers many advantages for high performance computing in terms of multi-paradigm programming, software re-use and modularity. For such a system to be effective, attention needs to be given to several key performance issues. The current coarse-grain dataflow system suffers from the problem of inefficiency in terms of data-movement overhead and memory usage. First, since scientific data sets are inevitably very large and the data flows between any two processes have to be through the sequential pipeline on the serial host computer, data transfer between the host machine(i.e. CP) where running AVS kernel and the parallel processing nodes(PNs) can be very costly. This may cause a long start-up time to fill up a pipeline and also cause delayed error detection if an error occurs in the data set being transferred. Secondly, because each module is buffering its entire data on the input and output ports and all the intermediate data have to reside in memory, the total memory requirement on the host can be very large. Continuous swapping of running processes by the operating system can make this situation even worse. In addition, there are other limitations in the current AVS system. It provides very limited support to dynamic modification of flow network, e.g., the processes and their communication requirements are changing with time -- processes can be created or destroyed, communication pattern will move at run-time\cite{tdfl}. It is also difficult in AVS to perform a round-way dataflow computation, i.e., output data of a down-stream module is used as input to a up-stream module. We have to use CMMD host/node programming model for the AVS/CM module, thus sacrifice certain advantages in the hostless programming. Performance of these systems on massively parallel machines will in large part depend upon the systems' capability to support high bandwidth and low latency for data transfer between a host machine and all the node processors, and the way how a problem's tasks are decomposed. The task granularity must be restricted in a coarse-grained domain and the task partitioning should be carefully evaluated to consider the balance of each module's computation time and process communication time. System software is needed to support high-bandwidth parallel I/O and shared memory segments between processes. For example, the CM/AVS system under development at Thinking Machines would greatly reduce the process-level data-movement overhead in the AVS integration environment\cite{gary,cmavs}. In CM/AVS, CM5 shared-memory regions are used to exchange data, if they are in the same CM5 partition, or they exchange data directly over the routine network via CM-domain socket, if they are in different partition. They may communicate with serial modules on a workstation over the fastest available network connecting the various components, using parallel sockets. \section {A Case Study --- Comparison of Numerical Advection Models} In the previous work, we have demonstrated the use of AVS in two real-world applications, a stock option pricing modeling\cite{cheng1} and an electromagnetic scattering simulation\cite{cheng2}, both in the context of scientific visualization and also, more broadly, as an attractive environment for software integration in high performance distributed computing. Those work motivated us to explore further the AVS's capability of system integration for HPCC applications such as GIS\cite{cheng4}, four-dimensional data assimilation and environmental modeling, in particular, the methodology to support multi-paradigm parallel programming as described in this paper. In this section, we use a simple case study to demonstrate the feasibility to integrate CMFortran and CMMD modules on a CM5 in the proposed AVS environment. This is not a HPCC application but it allows us to evaluate all the technical and system issues involved in the integration. \begin{figure}[tb] \centerline{\psfig{figure=system.eps}} %BoundingBox: 50 320 550 720 \end{figure} \setcounter{figure}{4} \begin{figure}[tb] \centerline{\psfig{figure=screen.ps,width=6.0in}} %\centerline{\psfig{figure=screen.ps,width=4.4in,height=5.60in}} %BoundingBox: 36 188 576 612 \caption{A graphical user interface of the integrated system} \end{figure} In fluid dynamics, the simple one-dimensional advection equation (1) with constant positive velocity serves as a basic model for the shape-conserving movement of an initial distribution of fluid volume toward positive $x$. Since the analytic solution is known in this simple case, numerical approximations to the advective process can be critically evaluated on fundamental properties such as accuracies, stabilities, coordinate systems and computer time and memory requirements. \begin{equation} \frac{\partial \mu}{\partial t} + u \frac{\partial \mu}{\partial x} = 0 \end{equation} where $\mu$ is the fluid velocity and $u$ is the mixing ratio $\mu$ $\equiv$ {\em C}/$\rho$ in which {\em C} is the constituent density and $\rho$ the density of the fluid. Our case study is to create a prototype interactive computing/visualization environment in which a couple of 1D advection models are purposely implemented in different programming paradigms on CM5 and the results can be graphically compared with through a graphical user interface. We choose from \cite{rood} five numerical advection algorithms with different diffusion, dispersion and monotonicity properties. They are {\em `Donor-cell'}, {\em `Partial donor-cell'}, {\em `van Leer'}, {\em `Hain'} and {\em `Lax-Wendroff'}, named after the authors of the algorithms. {\em 'Donor-cell'} is then implemented in CMFortran and {\em `van Leer'} in CMMD, while the rest models are in Fortran77. All the ``AVS/CM interface part'' codes are written in C. To demonstrate the system's integration capability on a heterogeneous architectures, we purposely run the AVS kernel on a IBM RS/6000, the {\em`Hain'} module on a DEC5000 and the {\em `Lax-Wendroff'} on a SUN4, while the data-parallel {\em `Donor-cell'}, the message-passing {\em `van Leer'} and the sequential {\em `Partial donor-cell'} module are running on a 32-node CM5. The system configuration is shown in Fig. 4. Fig. 5 is a screen dumped picture showing the system parameter control panel(left), model output windows(top) and the flow network(bottom). The flow-chart-like diagram is the process configuration built by the AVS visual programming interface Network Editor, in which {\em 'Advection Interface'} is a system control module for model input parameters steering and the {\em 'graph viewer'} is an AVS system module for model graphical output and both run on the same machine as the AVS kernel. The other modules correspond to the respective advection models. This prototype environment has the following features: \begin{enumerate} \item User interactive control of the modeling. Before or during a modeling run, the user can graphically choose advection models, set model parameters including execution modes (single step, continuous, pause, abort, or stop), displaying time steps, the total number of grid points and time steps. \item Graphical output of the model results. The user can decide the displaying pattern, such as line, area, bar, and other graphics parameters such as title, color, etc. \item Easy and flexible to include other advection models. It can be easily extended to add in other old/new models with varying velocity fields, multidimensions and non-rectangular coordinated systems or with sequential, data parallel or message passing implementations. All the models can be transparently configurated to run in a distributed computing environment. \end{enumerate} \section {Conclusion} The purpose of this experiment in integrating multi-paradigm programming is to provide programmers with a rich and attractive programming environment with which choice of high and low level programming models can be made upon problem structures, as well as particular machine architectures, to facilitate a gradual and eventually efficient mapping from problems to parallel systems. In the case multiple programming languages are required, this environment supports a seamless and modular way to interface the data and control transfer among processes in different languages. With its visual programming interface, modular program structure, dataflow based execution, interactive visualization functionality and its open system characteristics, a process-based integration software like AVS provides an excellent framework to facilitate integration of various system components required by a large, multidiciplanary applications, including integration of sophisticated interactive visualization, database management, heterogeneous networking, massively parallel processing and real-time decision making, as well as a useful tool for software development and project planning. We believe that with the adoption of HPCC technologies into industry applications, system integration will play an ever-growing important role in parallel software development and system design. \bigskip {\it Acknowledgment:} We are grateful to Ricky Rood of NASA/Goddard who made the original advection codes available to us. \begin{thebibliography}{99} \bibitem{avs1} Advanced Visual Systems Inc. {\em AVS 4.0 Developer's Guide and User's Guide}, May 1992. \bibitem {cc++} K. M. Chandy, C. Kesselman, {\em Compositional C++: Compositional Parallel Programing}, Technical Report, California Institute of Technology, 1992. \bibitem{cheng1} G. Cheng, K. Mills and G. Fox, {\em An Interactive Visualization Environment for Financial Modeling on Heterogeneous Computing Systems}, in Proc. of the 6th SIAM Conference on Parallel Processing for Scientific Computing, R. F. Sincovec, eds., SIAM, Norfolk, VA, March 1993. \bibitem{cheng2} G. Cheng, Y. Lu, G.C. Fox, K. Mills and T. Haupt, {\em An Interactive Remote Visualization Environment for an Electromagnetic Scattering Simulation on a High Performance Computing System}, Technical Report, SCCS-467, to appear in Proc. of Supercomputing `93, Portland, OR, Nov.,1993. \bibitem {cheng3} G. Cheng, and Y. Zhang, {\em A Functional + Logic Programming Language in Interpretation-Compilation Implementation}, Lisp And Symbolic Computation: An International Journal, Vol. 5, No. 3, 1992, pp. 133-156. \bibitem {cheng4} G. Cheng, C. Faigle, G. C. Fox, W. Fumanski, B. Li, and K. Mills, {\em Exploring AVS for HPDC Software Integration: Case Studies Towards Parallel Support for GIS}, Proc. of the 2nd AVS Conference AVS'93, Lake Buena Vista, FL, May 1993. \bibitem {mpi} {\em Document for a Standard Message-Passing Interface (draft)}, May 28, 1993. \bibitem {fortran-m} I. Foster and K. M. Chandy, {\em Fortran M: A Language for Modular Parallel Programming}, Preprint MCS-P237-0992, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Ill., 1992. \bibitem {foster} I. Foster, {\em Fortran M as a language for building earth system models}, Preprint MCS-P345-0193, Argonne National Laboratory, and Proc. 5th ECMWF Workshop on Parallel Processing in Meteorology, ECMWF, Reading, U.K., 1992. \bibitem{fox1} G. C. Fox, {\em Parallel Computers and Complex Systems}, Complex Systems '92: From Biology to Computation, Inaugural Australian National Conference on Complex Systems, December 1992. Editors: Bossomaier, David G. Green. CRPC-TR92266 \bibitem{fox2} G. C. Fox, S. Hiranadani, K. Kennedy, C. Koelbel, U. Kremer, C-W Tseng, and M-Y Wu, {\em Fortran D Language Specification}, Syracuse Center for Computational Science-42c, Rice COMP TR90-141, 37 pps, 1991. \bibitem{hail} B. Hailpern,{\em Multi-Paradigm Languages}, IEEE Software, Vol. 3, No. 1 (1986), 54-66. \bibitem {hpf} {\em High Performance Fortran Language Specification}, High Performance Fortran Forum, May, 1993, Version 1.0, 184 pp., Rice University, Houston, Texas. \bibitem {cmavs} M. F. Krogh and C. D. Hansen, {\em Visualization on Massively Parallel Computers using CM/AVS}, Proc. of the 2nd AVS Conference AVS'93, Lake Buena Vista, FL, May 1993. \bibitem {kim} K. Mills and G. C. Fox, {\em HPCC Application Development and Technology Transfer to Industry}, to appear in the Postproceedings of the New Frontier: A Workshop on Future Directions of Massively Parallel Processing, IEEE Computer Society Press, Los Alamitos, CA, July 1993. \bibitem{gary} G. Oberbrunner, {\em Parallel Networking and Visualization on the Connection Machine CM-5}, the Symposium on High Performance Distributed Computing HPDC-1, September, 1992, pp. 78-84, Syracuse, NY. \bibitem {rood} R. B. Rood, {\em Numerical Advection Algorithms and Their Role in Atmospheric Transport and Chemistry Models}, Review of Geophysics, Vol. 25, No. 1, pp. 71-100, February, 1987. \bibitem {smarr} L. L. Smarr and C. E. Catlett, {\em Metacomputing}, Communication of the ACM, Vol. 35, No.6, June, 1992, pp. 45-52. \bibitem {tdfl} P. A. Suhler, J. Biswas, K. M. Korner and J. Browne, {\em TDFL: A Task-Level Dataflow Language}, Journal of Parallel and Distributed Computing 9, 103-115(1990). \bibitem{sund} V. Sunderam, {\em PVM: A Framework for Parallel Distributed Computing}, Concurrency: practice and experience, 2(4), Dec. 1990. \bibitem {report} {\em System Software and Tools for High Performance Computing Environments}, Final report of the Workshop on System Software and Tools for High Performance Computing Environments, Pasadena, California, April 14-16, 1992. (225p.) \bibitem{cm5} Thinking Machines Corporation, {\em The Connection Machine CM-5 Technical Summary}, Technical Report, Cambridge, MA, October 1991. \bibitem{cmf} Thinking Machines Corporation, {\em CM Fortran Utility Library Reference Manual}, Technical Report, Cambridge, MA, January 1993. \bibitem {avs2} C. Upson, T. Faulhaber, Jr., D. Kamins, D. Laidlaw, D. Schlegel, J. Vroom, R. Gurwitz and A. van Dam, {\em The Application Visualization System: A Computational Environment for Scientific Visualization}, IEEE Computer Graphics and Applications, July, 1989. \end{thebibliography} \end{document}