Full HTML for Basic Methodology of Computational Science

Full HTML for

Basic foilset Methodology of Computational Science

Given by Geoffrey C. Fox at CPS615 Computational Science on Spring Semester 2000. Foils prepared 13 February 2000
Outside Index Summary of Material

We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory

We describe classes of parallel applications illustrating some key features such as load balancing and communication

We describe programming models and how their features match applications

Table of Contents for full HTML of Methodology of Computational Science

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Methodology of Computational Science
2

Abstract of Methodology of Computational Science Presentation
3

Parallel Computing Methodology in a Nutshell I
4

Parallel Computing Methodology in a Nutshell II
5

Potential in a Vacuum Filled Rectangular Box
6

Basic Sequential Algorithm
7

Update on the Grid
8

Parallelism is Straightforward
9

Communication is Needed
10

What is Parallel Architecture?
11

Parallel Computers -- Classic Overview
12

Distributed Memory Machines
13

Communication on Distributed Memory Architecture
14

Distributed Memory Machines -- Notes
15

Shared-Memory Machines
16

Communication on Shared Memory Architecture
17

Shared-Memory Machines -- Notes
18

Distributed Shared Memory Machines
19

Summary on Communication etc.
20

Communication Must be Reduced
21

Seismic Simulation of Los Angeles Basin
22

Irregular 2D Simulation -- Flow over an Airfoil
23

Heterogeneous Problems
24

Load Balancing Particle Dynamics
25

Reduce Communication
26

Minimize Load Imbalance
27

Parallel Irregular Finite Elements
28

Irregular Decomposition for Crack
29

Further Decomposition Strategies
30

Summary of Parallel Algorithms
31

Data Parallelism in Algorithms
32

Functional Parallelism in Algorithms
33

Pleasingly Parallel Algorithms
34

Parallel Languages
35

Data-Parallel Languages
36

Message-Passing Systems
37

Shared Memory Programming Model
38

Structure(Architecture) of Applications - I
39

Structure(Architecture) of Applications - II
40

Multi Server Model for metaproblems
41

Multi-Server Scenario
42

The 3 Roles of Java
43

Why is Java Worth Looking at?
44

What is Java Grande?
45

Java and Parallelism?
46

"Pure" Java Model For Parallelism
47

Pragmatic Computational Science January 2000 I
48

Pragmatic Computational Science January 2000 II

Outside Index Summary of Material

HTML version of Basic Foils prepared 13 February 2000

Foil 1 Methodology of Computational Science

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Spring Semester 2000

Geoffrey Fox

Northeast Parallel Architectures Center

Syracuse University

111 College Place

Syracuse NY

gcf@npac.syr.edu

gcf@cs.fsu.edu

HTML version of Basic Foils prepared 13 February 2000

Foil 2 Abstract of Methodology of Computational Science Presentation

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory

We describe classes of parallel applications illustrating some key features such as load balancing and communication

We describe programming models and how their features match applications

HTML version of Basic Foils prepared 13 February 2000

Foil 3 Parallel Computing Methodology in a Nutshell I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Find what machine or class of Machines you have available

Examine parallelism seemingly available in your application (algorithm) and decide on mechanism needed to exploit it.

What is decomposition implied by algorithm
Do we need to devise or adapt a nifty new algorithm
Is there an "automatic" way of implementing or must it be done more or less by hand

Decide on and use programming model (HPF, MPI, Threads, openMP) and explicit realization of it

expressivity; support for chosen machine; robustness

HTML version of Basic Foils prepared 13 February 2000

Foil 4 Parallel Computing Methodology in a Nutshell II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Worry about related issues

Node (sequential) performance (cache)
Input/output (parallel?) and storage (database?)
visualization
use of problem solving environment?

Evaluate possible tools

Debuggers (program)
Performance monitors/debuggers
load balancing / decomposition aids

HTML version of Basic Foils prepared 13 February 2000

Foil 5 Potential in a Vacuum Filled Rectangular Box

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

So imagine the world's simplest problem

Find the electrostatic potential inside a box whose sides are at a given potential

Set up a 16 by 16 Grid on which potential defined and which must satisfy Laplace's Equation

HTML version of Basic Foils prepared 13 February 2000

Foil 6 Basic Sequential Algorithm

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Initialize the internal 14 by 14 grid to anything you like and then apply for ever!

? New = (? Left + ? Right + ? Up + ? Down ) / 4

HTML version of Basic Foils prepared 13 February 2000

Foil 7 Update on the Grid

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

14 by 14 Internal Grid

HTML version of Basic Foils prepared 13 February 2000

Foil 8 Parallelism is Straightforward

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

If one has 16 processors, then decompose geometrical area into 16 equal parts

Each Processor updates 9 12 or 16 grid points independently

HTML version of Basic Foils prepared 13 February 2000

Foil 9 Communication is Needed

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Updating edge points in any processor requires communication of values from neighboring processor

For instance, the processor holding green points requires red points

HTML version of Basic Foils prepared 13 February 2000

Foil 10 What is Parallel Architecture?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

A parallel computer is any old collection of processing elements that cooperate to solve large problems fast

from a pile of PC's to a shared memory multiprocessor

Some broad issues:

Resource Allocation:
- how large a collection?
- how powerful are the elements?
- how much memory?
Data access, Communication and Synchronization
- how do the elements cooperate and communicate?
- how are data transmitted between processors?
- what are the abstractions and primitives for cooperation?
Performance and Scalability
- how does it all translate into performance?
- how does it scale?

HTML version of Basic Foils prepared 13 February 2000

Foil 11 Parallel Computers -- Classic Overview

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Parallel computers allow several CPUs to contribute to a computation simultaneously.

For our purposes, a parallel computer has three types of parts:

Processors
Memory modules
Communication / synchronization network

Key points:

All processors must be busy for peak speed.
Local memory is directly connected to each processor.
Accessing local memory is much faster than other memory.
Synchronization is expensive, but necessary for correctness.

Colors Used in Following pictures

HTML version of Basic Foils prepared 13 February 2000

Foil 12 Distributed Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Every processor has a memory others can't access.

Advantages:

Relatively easy to design and build
Predictable behavior
Can be scalable
Can hide latency of communication

Disadvantages:

Hard to program
Program and O/S (and sometimes data) must be replicated

HTML version of Basic Foils prepared 13 February 2000

Foil 13 Communication on Distributed Memory Architecture

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

On distributed memory machines, each chunk of decomposed data resides on separate memory space -- a processor is typically responsible for storing and processing data (owner-computes rule)

Information needed on edges for update must be communicated via explicitly generated messages

Messages

HTML version of Basic Foils prepared 13 February 2000

Foil 14 Distributed Memory Machines -- Notes

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Conceptually, the nCUBE CM-5 Paragon SP-2 Beowulf PC cluster are quite similar.

Bandwidth and latency of interconnects different
The network topology is a two-dimensional torus for Paragon, fat tree for CM-5, hypercube for nCUBE and Switch for SP-2

To program these machines:

Divide the problem to minimize number of messages while retaining parallelism
Convert all references to global structures into references to local pieces (explicit messages convert distant to local variables)
Optimization: Pack messages together to reduce fixed overhead (almost always needed)
Optimization: Carefully schedule messages (usually done by library)

HTML version of Basic Foils prepared 13 February 2000

Foil 15 Shared-Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

All processors access the same memory.

Advantages:

Retain sequential programming languages such as Java or Fortran
Easy to program (correctly)
Can share code and data among processors

Disadvantages:

Hard to program (optimally)
Not scalable due to bandwidth limitations in bus

HTML version of Basic Foils prepared 13 February 2000

Foil 16 Communication on Shared Memory Architecture

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

On a shared Memory Machine a CPU is responsible for processing a decomposed chunk of data but not for storing it

Nature of parallelism is identical to that for distributed memory machines but communication implicit as "just" access memory

HTML version of Basic Foils prepared 13 February 2000

Foil 17 Shared-Memory Machines -- Notes

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Interconnection network shown here is actually for the BBN Butterfly, but C-90 is in the same spirit.

These machines share data by direct access.

Potentially conflicting accesses must be protected by synchronization.
Simultaneous access to the same memory bank will cause contention, degrading performance.
Some access patterns will collide in the network (or bus), causing contention.
Many machines have caches at the processors.
All these features make it profitable to have each processor concentrate on one area of memory that others access infrequently.

HTML version of Basic Foils prepared 13 February 2000

Foil 18 Distributed Shared Memory Machines

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Combining the (dis)advantages of shared and distributed memory

Lots of hierarchical designs are appearing.

Typically, "shared memory nodes" with 4 to 32 processors
Each processor has a local cache
Processors within a node access shared memory
Nodes can get data from or put data to other nodes' memories

HTML version of Basic Foils prepared 13 February 2000

Foil 19 Summary on Communication etc.

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Distributed Shared Memory machines have communication features of both distributed (messages) and shared (memory access) architectures

Note for distributed memory, programming model must express data location (HPF Distribute command) and invocation of messages (MPI syntax)

For shared memory, need to express control (openMP) or processing parallelism and synchronization -- need to make certain that when variable updated, "correct" version is used by other processors accessing this variable and that values living in caches are updated

HTML version of Basic Foils prepared 13 February 2000

Foil 20 Communication Must be Reduced

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

4 by 4 regions in each processor

16 Green (Compute) and 16 Red (Communicate) Points

8 by 8 regions in each processor

64 Green and "just" 32 Red Points

Communication is an edge effect

Give each processor plenty of memory and increase region in each machine

Large Problems Parallelize Best

HTML version of Basic Foils prepared 13 February 2000

Foil 21 Seismic Simulation of Los Angeles Basin

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

This is (sophisticated) wave equation similar to Laplace example and you divide Los Angeles geometrically and assign roughly equal number of grid points to each processor

HTML version of Basic Foils prepared 13 February 2000

Foil 22 Irregular 2D Simulation -- Flow over an Airfoil

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

The Laplace grid points become finite element mesh nodal points arranged as triangles filling space

All the action (triangles) is near near wing boundary

Use domain decomposition but no longer equal area as equal triangle count

HTML version of Basic Foils prepared 13 February 2000

Foil 23 Heterogeneous Problems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Simulation of cosmological cluster (say 10 million stars )

Lots of work per star as very close together ( may need smaller time step)

Little work per star as force changes slowly and can be well approximated by low order multipole expansion

HTML version of Basic Foils prepared 13 February 2000

Foil 24 Load Balancing Particle Dynamics

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Particle dynamics of this type (irregular with sophisticated force calculations) always need complicated decompositions

Equal area decompositions as shown here to load imbalance

Equal Volume Decomposition Universe Simulation

Galaxy or Star or ...

16 Processors

If use simpler algorithms (full O(N2) forces) or FFT, then equal area best

HTML version of Basic Foils prepared 13 February 2000

Foil 25 Reduce Communication

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Consider a geometric problem with 4 processors

In top decomposition, we divide domain into 4 blocks with all points in a given block contiguous

In bottom decomposition we give each processor the same amount of work but divided into 4 separate domains

edge/area(bottom) = 2* edge/area(top)

So minimizing communication implies we keep points in a given processor together

HTML version of Basic Foils prepared 13 February 2000

Foil 26 Minimize Load Imbalance

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

But this has a flip side. Suppose we are decomposing Seismic wave problem and all the action is near a particular earthquake fault denoted by .

In Top decomposition only the white processor does any work while the other 3 sit idle.

Ffficiency 25% due to Load Imbalance

In Bottom decomposition all the processors do roughly the same work and so we get good load balance ......

HTML version of Basic Foils prepared 13 February 2000

Foil 27 Parallel Irregular Finite Elements

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Here is a cracked plate and calculating stresses with an equal area decomposition leads to terrible results

All the work is near crack

HTML version of Basic Foils prepared 13 February 2000

Foil 28 Irregular Decomposition for Crack

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Concentrating processors near crack leads to good workload balance

equal nodal point -- not equal area -- but to minimize communication nodal points assigned to a particular processor are contiguous

This is NP complete (exponenially hard) optimization problem but in practice many ways of getting good but not exact good decompositions

Region assigned to 1 processor

Work Load

Not Perfect !

HTML version of Basic Foils prepared 13 February 2000

Foil 29 Further Decomposition Strategies

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Not all decompositions are quite the same

In defending against missile attacks, you track each missile on a separate node -- geometric again

In playing chess, you decompose chess tree -- an abstract not geometric space

Computer Chess Tree

Current Position (node in Tree) First Set Moves Opponents Counter Moves

California gets its independence

HTML version of Basic Foils prepared 13 February 2000

Foil 30 Summary of Parallel Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

A parallel algorithm is a collection of tasks and a partial ordering between them.

Design goals:

Match tasks to the available processors (exploit parallelism).
Minimize ordering (avoid unnecessary synchronization points).
Recognize ways parallelism can be helped by changing ordering

Sources of parallelism:

Data parallelism: updating array elements simultaneously.
Functional parallelism: conceptually different tasks which combine to solve the problem. This happens at fine and coarse grain size
- fine is "internal" such as I/O and computation; coarse is "external" such as separate modules linked together

HTML version of Basic Foils prepared 13 February 2000

Foil 31 Data Parallelism in Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Data-parallel algorithms exploit the parallelism inherent in many large data structures.

A problem is an (identical) algorithm applied to multiple points in data "array"
Usually iterate over such "updates"

Features of Data Parallelism

Scalable parallelism -- can often get million or more way parallelism
Hard to express when "geometry" irregular or dynamic

Note data-parallel algorithms can be expressed by ALL programming models (Message Passing, HPF like, openMP like)

HTML version of Basic Foils prepared 13 February 2000

Foil 32 Functional Parallelism in Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Functional parallelism exploits the parallelism between the parts of many systems.

Many pieces to work on ? many independent operations
Example: Coarse grain Aeroelasticity (aircraft design)
- CFD(fluids) and CSM(structures) and others (acoustics, electromagnetics etc.) can be evaluated in parallel

Analysis:

Parallelism limited in size -- tens not millions
Synchronization probably good as parallelism natural from problem and usual way of writing software
Web exploits functional parallelism NOT data parallelism

HTML version of Basic Foils prepared 13 February 2000

Foil 33 Pleasingly Parallel Algorithms

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Many applications are what is called (essentially) embarrassingly or more kindly pleasingly parallel

These are made up of independent concurrent components

Each client independently accesses a Web Server
Each roll of a Monte Carlo dice (random number) is an independent sample
Each stock can be priced separately in a financial portfolio
Each transaction in a database is almost independent (a given account is locked but usually different accounts are accessed at same time)
Different parts of Seismic data can be processed independently

In contrast points in a finite difference grid (from a differential equation) canNOT be updated independently

Such problems are often formally data-parallel but can be handled much more easily -- like functional parallelism

HTML version of Basic Foils prepared 13 February 2000

Foil 34 Parallel Languages

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

A parallel language provides an executable notation for implementing a parallel algorithm.

Design criteria:

How are parallel operations defined?
- static tasks vs. dynamic tasks vs. implicit operations
How is data shared between tasks?
- explicit communication/synchronization vs. shared memory
How is the language implemented?
- low-overhead runtime systems vs. optimizing compilers

Usually a language reflects a particular style of expressing parallelism.

Data parallel expresses concept of identical algorithm on different parts of array

Message parallel expresses fact that at low level parallelism implies information is passed between different concurrently executing program parts

HTML version of Basic Foils prepared 13 February 2000

Foil 35 Data-Parallel Languages

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Data-parallel languages provide an abstract, machine-independent model of parallelism.

Fine-grain parallel operations, such as element-wise operations on arrays
Shared data in large, global arrays with mapping "hints"
Implicit synchronization between operations
Partially explicit communication from operation definitions

Advantages:

Global operations conceptually simple
Easy to program (particularly for certain scientific applications)

Disadvantages:

Unproven compilers
As express "problem" can be inflexible if new algorithm which language didn't express well

Examples: HPF, C*, HPC++

Originated on SIMD machines where parallel operations are in lock-step but generalized (not so successfully as compilers too hard) to MIMD

HTML version of Basic Foils prepared 13 February 2000

Foil 36 Message-Passing Systems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Program is based on typically coarse-grain tasks

Separate address space and a processor number for each task

Data shared by explicit messages

Point-to-point and collective communications patterns

Examples: MPI, PVM, Occam for parallel computing

Universal model for distributed computing to link naturally decomposed parts e.g. HTTP, RMI, IIOP etc. are all message passing

distributed object technology (COM, CORBA) built on functionally concurrent objects sending and receiving messages

Advantages:

Close to hardware ALWAYS
Can be close to problem as in distributed objects or functional parallelism

Disadvantages:

Many low-level details when NOT close to problem.

HTML version of Basic Foils prepared 13 February 2000

Foil 37 Shared Memory Programming Model

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Experts in Java are familiar with this as it is built in Java Language through thread primitives

We take "ordinary" languages such as Fortran, C++, Java and add constructs to help compilers divide processing (automatically) into separate threads

indicate which DO/for loop instances can be executed in parallel and where there are critical sections with global variables etc.

openMP is a recent set of compiler directives supporting this model

This model tends to be inefficient on distributed memory machines as optimizations (data layout, communication blocking etc.) not natural

HTML version of Basic Foils prepared 13 February 2000

Foil 38 Structure(Architecture) of Applications - I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Applications are metaproblems with a mix of module (aka coarse grain functional) and data parallelism

Modules are decomposed into parts (data parallelism) and composed hierarchically into full applications.They can be the

"10,000" separate programs (e.g. structures,CFD ..) used in design of aircraft
the various filters used in Khoros based image processing system
the ocean-atmosphere components in integrated climate simulation
The data-base or file system access of a data-intensive application
the objects in a distributed Forces Modeling Event Driven Simulation

HTML version of Basic Foils prepared 13 February 2000

Foil 39 Structure(Architecture) of Applications - II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Modules are "natural" message-parallel components of problem and tend to have less stringent latency and bandwidth requirements than those needed to link data-parallel components

modules are what HPF needs task parallelism for
Often modules are naturally distributed whereas parts of data parallel decomposition may need to be kept on tightly coupled MPP

Assume that primary goal of metacomputing system is to add to existing parallel computing environments, a higher level supporting module parallelism

Now if one takes a large CFD problem and divides into a few components, those "coarse grain data-parallel components" will be supported by computational grid technology

Use Java/Distributed Object Technology for modules -- note Java to growing extent used to write servers for CORBA and COM object systems

HTML version of Basic Foils prepared 13 February 2000

Foil 40 Multi Server Model for metaproblems

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

We have multiple supercomputers in the backend -- one doing CFD simulation of airflow; another structural analysis while in more detail you have linear algebra servers (Netsolve); Optimization servers (NEOS); image processing filters(Khoros);databases (NCSA Biology workbench); visualization systems(AVS, CAVEs)

One runs 10,000 separate programs to design a modern aircraft which must be scheduled and linked .....

All linked to collaborative information systems in a sea of middle tier servers(as on previous page) to support design, crisis management, multi-disciplinary research

HTML version of Basic Foils prepared 13 February 2000

Foil 41 Multi-Server Scenario

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Database

Matrix Solver

Optimization Service

MPP

Parallel DB Proxy

NEOS Control Optimization

Origin 2000 Proxy

NetSolve Linear Alg. Server

IBM SP2 Proxy

Gateway Control

Agent-based Choice of Compute Engine

Multidisciplinary Control (WebFlow)

Data Analysis Server

HTML version of Basic Foils prepared 13 February 2000

Foil 42 The 3 Roles of Java

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

HTML version of Basic Foils prepared 13 February 2000

Foil 43 Why is Java Worth Looking at?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

The Java Language has several good design features

secure, safe (wrt bugs), object-oriented, familiar (to C C++ and even Fortran programmers)

Java has a very good set of libraries covering everything from commerce, multimedia, images to math functions (under development at http://math.nist.gov/javanumerics)

Java has best available electronic and paper training and support resources

Java is rapidly getting best integrated program development environments

Java naturally integrated with network and universal machine supports powerful "write once-run anywhere" model

There is a large and growing trained labor force

Can we exploit this in computational science?

HTML version of Basic Foils prepared 13 February 2000

Foil 44 What is Java Grande?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Use of Java for:

High Performance Network Computing

Scientific and Engineering Computation

(Distributed) Modeling and Simulation

Parallel and Distributed Computing

Data Intensive Computing

Communication and Computing Intensive Commercial and Academic Applications

HPCC Computational Grids ........

Very difficult to find a "conventional name" that doesn't get misunderstood by some community!

HTML version of Basic Foils prepared 13 February 2000

Foil 45 Java and Parallelism?

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

The Web integration of Java gives it excellent "network" classes and support for message passing.

Thus "Java plus message passing" form of parallel computing is actually somewhat easier than in Fortran or C.

Coarse grain parallelism very natural in Java

"Data Parallel" languages features are NOT in Java and have to be added (as a translator) of NPAC's HPJava to Java+Messaging just as HPF translates to Fortran plus message passing

Java has built in "threads" and a given Java Program can run multiple threads at a time

In Web use, allows one to process Image in one thread, HTML page in another etc.

Can be used to do more general parallel computing but only on shared memory computers

JavaVM (standard Java Runtime) does not support distributed memory systems

HTML version of Basic Foils prepared 13 February 2000

Foil 46 "Pure" Java Model For Parallelism

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

Combine threads on a shared memory machine with message passing between distinct distributed memories

"Distributed" or "Virtual" Shared memory does support the JavaVM as hardware gives illusion of shared memory to JavaVM

Message Passing

HTML version of Basic Foils prepared 13 February 2000

Foil 47 Pragmatic Computational Science January 2000 I

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

So here is a recipe for developing HPCC (parallel) applications as of January 2000

Use MPI for data parallel distributed memory programs as alternatives are HPF, HPC++ or parallelizing compilers today

Neither HPF or HPC++ has clear long term future for implementations -- ideas are sound
MPI will run on PC clusters as well as customized parallel machines -- parallelizing compilers will not work on distributed memory machines

Use openMP or MPI on shared (distributed shared) memory machines

If successful (high enough performance), openMP obviously easiest

Pleasingly Parallel problems can use MPI or web/metacomputing technology

HTML version of Basic Foils prepared 13 February 2000

Foil 48 Pragmatic Computational Science January 2000 II

From Methodology of Computational Science CPS615 Computational Science -- Spring Semester 2000. *

Full HTML Index

We don't emphasize openMP in class, as hard work (aka difficult programming model) of MPI is advantageous for class as teaches you parallel computing!

Today Java can be used for client side applets and in systems middleware but too slow for production scientific code

This should change over next year with better Java compilers -- including "native" compilers which do not translate to Java Virtual Machine but go directly to native machine language

Use metacomputers for pleasingly parallel and metaproblems -- not for closely knit problems with tight synchronization between parts

Use where possible web and distributed object technology for "coordination"

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Mar 16 2000