Full HTML for

Basic foilset Methodology of Computational Science

Given by Geoffrey C. Fox at CPS615 Computational Science on Spring Semester 2000. Foils prepared 13 February 2000
Outside Index Summary of Material


We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory
We describe classes of parallel applications illustrating some key features such as load balancing and communication
We describe programming models and how their features match applications

Table of Contents for full HTML of Methodology of Computational Science

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Methodology of Computational Science
2 Abstract of Methodology of Computational Science Presentation
3 Parallel Computing Methodology in a Nutshell I
4 Parallel Computing Methodology in a Nutshell II
5 Potential in a Vacuum Filled Rectangular Box
6 Basic Sequential Algorithm
7 Update on the Grid
8 Parallelism is Straightforward
9 Communication is Needed
10 What is Parallel Architecture?
11 Parallel Computers -- Classic Overview
12 Distributed Memory Machines
13 Communication on Distributed Memory Architecture
14 Distributed Memory Machines -- Notes
15 Shared-Memory Machines
16 Communication on Shared Memory Architecture
17 Shared-Memory Machines -- Notes
18 Distributed Shared Memory Machines
19 Summary on Communication etc.
20 Communication Must be Reduced
21 Seismic Simulation of Los Angeles Basin
22 Irregular 2D Simulation -- Flow over an Airfoil
23 Heterogeneous Problems
24 Load Balancing Particle Dynamics
25 Reduce Communication
26 Minimize Load Imbalance
27 Parallel Irregular Finite Elements
28 Irregular Decomposition for Crack
29 Further Decomposition Strategies
30 Summary of Parallel Algorithms
31 Data Parallelism in Algorithms
32 Functional Parallelism in Algorithms
33 Pleasingly Parallel Algorithms
34 Parallel Languages
35 Data-Parallel Languages
36 Message-Passing Systems
37 Shared Memory Programming Model
38 Structure(Architecture) of Applications - I
39 Structure(Architecture) of Applications - II
40 Multi Server Model for metaproblems
41 Multi-Server Scenario
42 The 3 Roles of Java
43 Why is Java Worth Looking at?
44 What is Java Grande?
45 Java and Parallelism?
46 "Pure" Java Model For Parallelism
47 Pragmatic Computational Science January 2000 I
48 Pragmatic Computational Science January 2000 II

Outside Index Summary of Material



HTML version of Basic Foils prepared 13 February 2000

Foil 1 Methodology of Computational Science

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Spring Semester 2000
Geoffrey Fox
Northeast Parallel Architectures Center
Syracuse University
111 College Place
Syracuse NY
gcf@npac.syr.edu
gcf@cs.fsu.edu

HTML version of Basic Foils prepared 13 February 2000

Foil 2 Abstract of Methodology of Computational Science Presentation

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory
We describe classes of parallel applications illustrating some key features such as load balancing and communication
We describe programming models and how their features match applications

HTML version of Basic Foils prepared 13 February 2000

Foil 3 Parallel Computing Methodology in a Nutshell I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Find what machine or class of Machines you have available
Examine parallelism seemingly available in your application (algorithm) and decide on mechanism needed to exploit it.
  • What is decomposition implied by algorithm
  • Do we need to devise or adapt a nifty new algorithm
  • Is there an "automatic" way of implementing or must it be done more or less by hand
Decide on and use programming model (HPF, MPI, Threads, openMP) and explicit realization of it
  • expressivity; support for chosen machine; robustness

HTML version of Basic Foils prepared 13 February 2000

Foil 4 Parallel Computing Methodology in a Nutshell II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Worry about related issues
  • Node (sequential) performance (cache)
  • Input/output (parallel?) and storage (database?)
  • visualization
  • use of problem solving environment?
Evaluate possible tools
  • Debuggers (program)
  • Performance monitors/debuggers
  • load balancing / decomposition aids

HTML version of Basic Foils prepared 13 February 2000

Foil 5 Potential in a Vacuum Filled Rectangular Box

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
So imagine the world's simplest problem
Find the electrostatic potential inside a box whose sides are at a given potential
Set up a 16 by 16 Grid on which potential defined and which must satisfy Laplace's Equation

HTML version of Basic Foils prepared 13 February 2000

Foil 6 Basic Sequential Algorithm

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Initialize the internal 14 by 14 grid to anything you like and then apply for ever!
? New = (? Left + ? Right + ? Up + ? Down ) / 4

HTML version of Basic Foils prepared 13 February 2000

Foil 7 Update on the Grid

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
14 by 14 Internal Grid

HTML version of Basic Foils prepared 13 February 2000

Foil 8 Parallelism is Straightforward

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
If one has 16 processors, then decompose geometrical area into 16 equal parts
Each Processor updates 9 12 or 16 grid points independently

HTML version of Basic Foils prepared 13 February 2000

Foil 9 Communication is Needed

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Updating edge points in any processor requires communication of values from neighboring processor
For instance, the processor holding green points requires red points

HTML version of Basic Foils prepared 13 February 2000

Foil 10 What is Parallel Architecture?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
A parallel computer is any old collection of processing elements that cooperate to solve large problems fast
  • from a pile of PC's to a shared memory multiprocessor
Some broad issues:
  • Resource Allocation:
    • how large a collection?
    • how powerful are the elements?
    • how much memory?
  • Data access, Communication and Synchronization
    • how do the elements cooperate and communicate?
    • how are data transmitted between processors?
    • what are the abstractions and primitives for cooperation?
  • Performance and Scalability
    • how does it all translate into performance?
    • how does it scale?

HTML version of Basic Foils prepared 13 February 2000

Foil 11 Parallel Computers -- Classic Overview

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Parallel computers allow several CPUs to contribute to a computation simultaneously.
For our purposes, a parallel computer has three types of parts:
  • Processors
  • Memory modules
  • Communication / synchronization network
Key points:
  • All processors must be busy for peak speed.
  • Local memory is directly connected to each processor.
  • Accessing local memory is much faster than other memory.
  • Synchronization is expensive, but necessary for correctness.
Colors Used in Following pictures

HTML version of Basic Foils prepared 13 February 2000

Foil 12 Distributed Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Every processor has a memory others can't access.
Advantages:
  • Relatively easy to design and build
  • Predictable behavior
  • Can be scalable
  • Can hide latency of communication
Disadvantages:
  • Hard to program
  • Program and O/S (and sometimes data) must be replicated

HTML version of Basic Foils prepared 13 February 2000

Foil 13 Communication on Distributed Memory Architecture

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
On distributed memory machines, each chunk of decomposed data resides on separate memory space -- a processor is typically responsible for storing and processing data (owner-computes rule)
Information needed on edges for update must be communicated via explicitly generated messages
Messages

HTML version of Basic Foils prepared 13 February 2000

Foil 14 Distributed Memory Machines -- Notes

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Conceptually, the nCUBE CM-5 Paragon SP-2 Beowulf PC cluster are quite similar.
  • Bandwidth and latency of interconnects different
  • The network topology is a two-dimensional torus for Paragon, fat tree for CM-5, hypercube for nCUBE and Switch for SP-2
To program these machines:
  • Divide the problem to minimize number of messages while retaining parallelism
  • Convert all references to global structures into references to local pieces (explicit messages convert distant to local variables)
  • Optimization: Pack messages together to reduce fixed overhead (almost always needed)
  • Optimization: Carefully schedule messages (usually done by library)

HTML version of Basic Foils prepared 13 February 2000

Foil 15 Shared-Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
All processors access the same memory.
Advantages:
  • Retain sequential programming languages such as Java or Fortran
  • Easy to program (correctly)
  • Can share code and data among processors
Disadvantages:
  • Hard to program (optimally)
  • Not scalable due to bandwidth limitations in bus

HTML version of Basic Foils prepared 13 February 2000

Foil 16 Communication on Shared Memory Architecture

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
On a shared Memory Machine a CPU is responsible for processing a decomposed chunk of data but not for storing it
Nature of parallelism is identical to that for distributed memory machines but communication implicit as "just" access memory

HTML version of Basic Foils prepared 13 February 2000

Foil 17 Shared-Memory Machines -- Notes

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Interconnection network shown here is actually for the BBN Butterfly, but C-90 is in the same spirit.
These machines share data by direct access.
  • Potentially conflicting accesses must be protected by synchronization.
  • Simultaneous access to the same memory bank will cause contention, degrading performance.
  • Some access patterns will collide in the network (or bus), causing contention.
  • Many machines have caches at the processors.
  • All these features make it profitable to have each processor concentrate on one area of memory that others access infrequently.

HTML version of Basic Foils prepared 13 February 2000

Foil 18 Distributed Shared Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Combining the (dis)advantages of shared and distributed memory
Lots of hierarchical designs are appearing.
  • Typically, "shared memory nodes" with 4 to 32 processors
  • Each processor has a local cache
  • Processors within a node access shared memory
  • Nodes can get data from or put data to other nodes' memories

HTML version of Basic Foils prepared 13 February 2000

Foil 19 Summary on Communication etc.

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Distributed Shared Memory machines have communication features of both distributed (messages) and shared (memory access) architectures
Note for distributed memory, programming model must express data location (HPF Distribute command) and invocation of messages (MPI syntax)
For shared memory, need to express control (openMP) or processing parallelism and synchronization -- need to make certain that when variable updated, "correct" version is used by other processors accessing this variable and that values living in caches are updated

HTML version of Basic Foils prepared 13 February 2000

Foil 20 Communication Must be Reduced

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
4 by 4 regions in each processor
  • 16 Green (Compute) and 16 Red (Communicate) Points
8 by 8 regions in each processor
  • 64 Green and "just" 32 Red Points
Communication is an edge effect
Give each processor plenty of memory and increase region in each machine
Large Problems Parallelize Best

HTML version of Basic Foils prepared 13 February 2000

Foil 21 Seismic Simulation of Los Angeles Basin

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
This is (sophisticated) wave equation similar to Laplace example and you divide Los Angeles geometrically and assign roughly equal number of grid points to each processor

HTML version of Basic Foils prepared 13 February 2000

Foil 22 Irregular 2D Simulation -- Flow over an Airfoil

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
The Laplace grid points become finite element mesh nodal points arranged as triangles filling space
All the action (triangles) is near near wing boundary
Use domain decomposition but no longer equal area as equal triangle count

HTML version of Basic Foils prepared 13 February 2000

Foil 23 Heterogeneous Problems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Simulation of cosmological cluster (say 10 million stars )
Lots of work per star as very close together ( may need smaller time step)
Little work per star as force changes slowly and can be well approximated by low order multipole expansion

HTML version of Basic Foils prepared 13 February 2000

Foil 24 Load Balancing Particle Dynamics

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Particle dynamics of this type (irregular with sophisticated force calculations) always need complicated decompositions
Equal area decompositions as shown here to load imbalance
Equal Volume Decomposition Universe Simulation
Galaxy or Star or ...
16 Processors
If use simpler algorithms (full O(N2) forces) or FFT, then equal area best

HTML version of Basic Foils prepared 13 February 2000

Foil 25 Reduce Communication

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Consider a geometric problem with 4 processors
In top decomposition, we divide domain into 4 blocks with all points in a given block contiguous
In bottom decomposition we give each processor the same amount of work but divided into 4 separate domains
edge/area(bottom) = 2* edge/area(top)
So minimizing communication implies we keep points in a given processor together

HTML version of Basic Foils prepared 13 February 2000

Foil 26 Minimize Load Imbalance

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
But this has a flip side. Suppose we are decomposing Seismic wave problem and all the action is near a particular earthquake fault denoted by .
In Top decomposition only the white processor does any work while the other 3 sit idle.
  • Ffficiency 25% due to Load Imbalance
In Bottom decomposition all the processors do roughly the same work and so we get good load balance ......

HTML version of Basic Foils prepared 13 February 2000

Foil 27 Parallel Irregular Finite Elements

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Here is a cracked plate and calculating stresses with an equal area decomposition leads to terrible results
  • All the work is near crack

HTML version of Basic Foils prepared 13 February 2000

Foil 28 Irregular Decomposition for Crack

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Concentrating processors near crack leads to good workload balance
equal nodal point -- not equal area -- but to minimize communication nodal points assigned to a particular processor are contiguous
This is NP complete (exponenially hard) optimization problem but in practice many ways of getting good but not exact good decompositions
Region assigned to 1 processor
Work Load
Not Perfect !

HTML version of Basic Foils prepared 13 February 2000

Foil 29 Further Decomposition Strategies

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Not all decompositions are quite the same
In defending against missile attacks, you track each missile on a separate node -- geometric again
In playing chess, you decompose chess tree -- an abstract not geometric space
Computer Chess Tree
Current Position (node in Tree) First Set Moves Opponents Counter Moves
California gets its independence

HTML version of Basic Foils prepared 13 February 2000

Foil 30 Summary of Parallel Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
A parallel algorithm is a collection of tasks and a partial ordering between them.
Design goals:
  • Match tasks to the available processors (exploit parallelism).
  • Minimize ordering (avoid unnecessary synchronization points).
  • Recognize ways parallelism can be helped by changing ordering
Sources of parallelism:
  • Data parallelism: updating array elements simultaneously.
  • Functional parallelism: conceptually different tasks which combine to solve the problem. This happens at fine and coarse grain size
    • fine is "internal" such as I/O and computation; coarse is "external" such as separate modules linked together

HTML version of Basic Foils prepared 13 February 2000

Foil 31 Data Parallelism in Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Data-parallel algorithms exploit the parallelism inherent in many large data structures.
  • A problem is an (identical) algorithm applied to multiple points in data "array"
  • Usually iterate over such "updates"
Features of Data Parallelism
  • Scalable parallelism -- can often get million or more way parallelism
  • Hard to express when "geometry" irregular or dynamic
Note data-parallel algorithms can be expressed by ALL programming models (Message Passing, HPF like, openMP like)

HTML version of Basic Foils prepared 13 February 2000

Foil 32 Functional Parallelism in Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Functional parallelism exploits the parallelism between the parts of many systems.
  • Many pieces to work on ? many independent operations
  • Example: Coarse grain Aeroelasticity (aircraft design)
    • CFD(fluids) and CSM(structures) and others (acoustics, electromagnetics etc.) can be evaluated in parallel
Analysis:
  • Parallelism limited in size -- tens not millions
  • Synchronization probably good as parallelism natural from problem and usual way of writing software
  • Web exploits functional parallelism NOT data parallelism

HTML version of Basic Foils prepared 13 February 2000

Foil 33 Pleasingly Parallel Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Many applications are what is called (essentially) embarrassingly or more kindly pleasingly parallel
These are made up of independent concurrent components
  • Each client independently accesses a Web Server
  • Each roll of a Monte Carlo dice (random number) is an independent sample
  • Each stock can be priced separately in a financial portfolio
  • Each transaction in a database is almost independent (a given account is locked but usually different accounts are accessed at same time)
  • Different parts of Seismic data can be processed independently
In contrast points in a finite difference grid (from a differential equation) canNOT be updated independently
Such problems are often formally data-parallel but can be handled much more easily -- like functional parallelism

HTML version of Basic Foils prepared 13 February 2000

Foil 34 Parallel Languages

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
A parallel language provides an executable notation for implementing a parallel algorithm.
Design criteria:
  • How are parallel operations defined?
    • static tasks vs. dynamic tasks vs. implicit operations
  • How is data shared between tasks?
    • explicit communication/synchronization vs. shared memory
  • How is the language implemented?
    • low-overhead runtime systems vs. optimizing compilers
Usually a language reflects a particular style of expressing parallelism.
Data parallel expresses concept of identical algorithm on different parts of array
Message parallel expresses fact that at low level parallelism implies information is passed between different concurrently executing program parts

HTML version of Basic Foils prepared 13 February 2000

Foil 35 Data-Parallel Languages

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Data-parallel languages provide an abstract, machine-independent model of parallelism.
  • Fine-grain parallel operations, such as element-wise operations on arrays
  • Shared data in large, global arrays with mapping "hints"
  • Implicit synchronization between operations
  • Partially explicit communication from operation definitions
Advantages:
  • Global operations conceptually simple
  • Easy to program (particularly for certain scientific applications)
Disadvantages:
  • Unproven compilers
  • As express "problem" can be inflexible if new algorithm which language didn't express well
Examples: HPF, C*, HPC++
Originated on SIMD machines where parallel operations are in lock-step but generalized (not so successfully as compilers too hard) to MIMD

HTML version of Basic Foils prepared 13 February 2000

Foil 36 Message-Passing Systems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Program is based on typically coarse-grain tasks
Separate address space and a processor number for each task
Data shared by explicit messages
  • Point-to-point and collective communications patterns
Examples: MPI, PVM, Occam for parallel computing
Universal model for distributed computing to link naturally decomposed parts e.g. HTTP, RMI, IIOP etc. are all message passing
  • distributed object technology (COM, CORBA) built on functionally concurrent objects sending and receiving messages
Advantages:
  • Close to hardware ALWAYS
  • Can be close to problem as in distributed objects or functional parallelism
Disadvantages:
  • Many low-level details when NOT close to problem.

HTML version of Basic Foils prepared 13 February 2000

Foil 37 Shared Memory Programming Model

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Experts in Java are familiar with this as it is built in Java Language through thread primitives
We take "ordinary" languages such as Fortran, C++, Java and add constructs to help compilers divide processing (automatically) into separate threads
  • indicate which DO/for loop instances can be executed in parallel and where there are critical sections with global variables etc.
openMP is a recent set of compiler directives supporting this model
This model tends to be inefficient on distributed memory machines as optimizations (data layout, communication blocking etc.) not natural

HTML version of Basic Foils prepared 13 February 2000

Foil 38 Structure(Architecture) of Applications - I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Applications are metaproblems with a mix of module (aka coarse grain functional) and data parallelism
Modules are decomposed into parts (data parallelism) and composed hierarchically into full applications.They can be the
  • "10,000" separate programs (e.g. structures,CFD ..) used in design of aircraft
  • the various filters used in Khoros based image processing system
  • the ocean-atmosphere components in integrated climate simulation
  • The data-base or file system access of a data-intensive application
  • the objects in a distributed Forces Modeling Event Driven Simulation

HTML version of Basic Foils prepared 13 February 2000

Foil 39 Structure(Architecture) of Applications - II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Modules are "natural" message-parallel components of problem and tend to have less stringent latency and bandwidth requirements than those needed to link data-parallel components
  • modules are what HPF needs task parallelism for
  • Often modules are naturally distributed whereas parts of data parallel decomposition may need to be kept on tightly coupled MPP
Assume that primary goal of metacomputing system is to add to existing parallel computing environments, a higher level supporting module parallelism
  • Now if one takes a large CFD problem and divides into a few components, those "coarse grain data-parallel components" will be supported by computational grid technology
Use Java/Distributed Object Technology for modules -- note Java to growing extent used to write servers for CORBA and COM object systems

HTML version of Basic Foils prepared 13 February 2000

Foil 40 Multi Server Model for metaproblems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
We have multiple supercomputers in the backend -- one doing CFD simulation of airflow; another structural analysis while in more detail you have linear algebra servers (Netsolve); Optimization servers (NEOS); image processing filters(Khoros);databases (NCSA Biology workbench); visualization systems(AVS, CAVEs)
  • One runs 10,000 separate programs to design a modern aircraft which must be scheduled and linked .....
All linked to collaborative information systems in a sea of middle tier servers(as on previous page) to support design, crisis management, multi-disciplinary research

HTML version of Basic Foils prepared 13 February 2000

Foil 41 Multi-Server Scenario

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Database
Matrix Solver
Optimization Service
MPP
MPP
Parallel DB Proxy
NEOS Control Optimization
Origin 2000 Proxy
NetSolve Linear Alg. Server
IBM SP2 Proxy
Gateway Control
Agent-based Choice of Compute Engine
Multidisciplinary Control (WebFlow)
Data Analysis Server

HTML version of Basic Foils prepared 13 February 2000

Foil 42 The 3 Roles of Java

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index

HTML version of Basic Foils prepared 13 February 2000

Foil 43 Why is Java Worth Looking at?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
The Java Language has several good design features
  • secure, safe (wrt bugs), object-oriented, familiar (to C C++ and even Fortran programmers)
Java has a very good set of libraries covering everything from commerce, multimedia, images to math functions (under development at http://math.nist.gov/javanumerics)
Java has best available electronic and paper training and support resources
Java is rapidly getting best integrated program development environments
Java naturally integrated with network and universal machine supports powerful "write once-run anywhere" model
There is a large and growing trained labor force
Can we exploit this in computational science?

HTML version of Basic Foils prepared 13 February 2000

Foil 44 What is Java Grande?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Use of Java for:
High Performance Network Computing
Scientific and Engineering Computation
(Distributed) Modeling and Simulation
Parallel and Distributed Computing
Data Intensive Computing
Communication and Computing Intensive Commercial and Academic Applications
HPCC Computational Grids ........
Very difficult to find a "conventional name" that doesn't get misunderstood by some community!

HTML version of Basic Foils prepared 13 February 2000

Foil 45 Java and Parallelism?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
The Web integration of Java gives it excellent "network" classes and support for message passing.
Thus "Java plus message passing" form of parallel computing is actually somewhat easier than in Fortran or C.
Coarse grain parallelism very natural in Java
"Data Parallel" languages features are NOT in Java and have to be added (as a translator) of NPAC's HPJava to Java+Messaging just as HPF translates to Fortran plus message passing
Java has built in "threads" and a given Java Program can run multiple threads at a time
  • In Web use, allows one to process Image in one thread, HTML page in another etc.
Can be used to do more general parallel computing but only on shared memory computers
  • JavaVM (standard Java Runtime) does not support distributed memory systems

HTML version of Basic Foils prepared 13 February 2000

Foil 46 "Pure" Java Model For Parallelism

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
Combine threads on a shared memory machine with message passing between distinct distributed memories
"Distributed" or "Virtual" Shared memory does support the JavaVM as hardware gives illusion of shared memory to JavaVM
Message Passing
Message Passing

HTML version of Basic Foils prepared 13 February 2000

Foil 47 Pragmatic Computational Science January 2000 I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
So here is a recipe for developing HPCC (parallel) applications as of January 2000
Use MPI for data parallel distributed memory programs as alternatives are HPF, HPC++ or parallelizing compilers today
  • Neither HPF or HPC++ has clear long term future for implementations -- ideas are sound
  • MPI will run on PC clusters as well as customized parallel machines -- parallelizing compilers will not work on distributed memory machines
Use openMP or MPI on shared (distributed shared) memory machines
  • If successful (high enough performance), openMP obviously easiest
Pleasingly Parallel problems can use MPI or web/metacomputing technology

HTML version of Basic Foils prepared 13 February 2000

Foil 48 Pragmatic Computational Science January 2000 II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *
Full HTML Index
We don't emphasize openMP in class, as hard work (aka difficult programming model) of MPI is advantageous for class as teaches you parallel computing!
Today Java can be used for client side applets and in systems middleware but too slow for production scientific code
  • This should change over next year with better Java compilers -- including "native" compilers which do not translate to Java Virtual Machine but go directly to native machine language
Use metacomputers for pleasingly parallel and metaproblems -- not for closely knit problems with tight synchronization between parts
Use where possible web and distributed object technology for "coordination"

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Mar 16 2000