Full HTML for Basic Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel

Full HTML for

Basic foilset Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel

Given by Geoffrey C. Fox, Nancy McCracken at Computational Science for Simulations on Fall Semester 1998. Foils prepared 24 August 98
Outside Index Summary of Material

We Introduce Computational Science and Driving Forces

Technology Advances and Commodity Trends
Inevitability of Parallelism
Integration of Distributed and Parallel Computing
Comparison with Internetics

We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory

We describe the growing importance of Java

We explain pragmatic choices

MPI with Fortran and C today
Java Grande is future?

Table of Contents for full HTML of Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Framework for Computational Science
2

Abstract of Computational Science Presentation
3

What is Computational Science ?
4

Synergy of Parallel Computing and Web Internetics as Unifying Principle
5

Basic CPS615 Contact Points
6

Course Organization
7

Material Covered in this Course
8

Structure of CPS615 - II
9

What are Parallel and Distributed Computing?
10

Why Parallel Computing?
11

Parallel Computing Technology Rationale
12

Motivating Applications
13

Some Comments on Simulation and HPCC
14

The Multicomputer: an Idealized Parallel Computer
15

Multicomputer Architecture
16

Multicomputer Cost Model
17

Sequential Memory Structure
18

Parallel Computer Memory Structure
19

Real Parallel Computers Architectures
20

Parallel Computers -- Classic Overview
21

Distributed Memory MIMD Multiprocessor
22

Distributed Memory Machines
23

Distributed Memory Machines -- Notes
24

Shared Memory MIMD Multiprocessor
25

Shared-Memory Machines
26

Shared-Memory Machines -- Notes
27

Distributed Shared Memory (DSM)
28

Distributed Shared Memory Machines
29

Workstation Clusters
30

Parallel Algorithms
31

Data Parallelism in Algorithms
32

Some Illustrative Examples of Parallel Applications!
33

Functional Parallelism in Algorithms
34

Structure(Architecture) of Applications - I
35

Structure(Architecture) of Applications - II
36

Multi Server Model for metaproblems
37

Multi-Server Gateway Tier
38

Pleasingly Parallel Algorithms
39

Parallel Languages
40

Data-Parallel Languages
41

Message-Passing Systems
42

A Simple Parallel Programming Model
43

Properties of Programming Model
44

Some Steps in Parallel Programming
45

Partitioning
46

Communication
47

Agglomeration
48

Mapping
49

Example: Atmosphere Model
50

Atmosphere Model: Numerical Methods
51

Atmosphere Model: Partition
52

Atmosphere Model: Communication
53

Atmosphere Model: Agglomeration
54

Atmosphere Model: Mapping
55

What is Parallel Architecture?
56

Why Study Parallel Architecture as a computer scientist?
57

Why Study Architecture Today?
58

Inevitability of Parallel Computing
59

Application Trends
60

TPC-C (database transaction processing)
61

Summary of Application Trends
62

Technology Trends -- CPU's
63

General Technology Trends
64

Technology: A Closer Look
65

Clock Frequency Growth Rate
66

Transistor Count Growth Rate
67

Similar Story for Storage
68

The HPCC Dilemma and its Solution
69

What is Commodity Software
70

The Computing Pyramid
71

Implications of the Computing Pyramid
72

The 3 Roles of Java
73

Why is Java Worth Looking at?
74

What is Java Grande?
75

Java and Parallelism?
76

"Pure" Java Model For Parallelism
77

Pragmatic Computational Science August 1998

Outside Index Summary of Material

HTML version of Basic Foils prepared 24 August 98

Foil 1 Framework for Computational Science

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Fall Semester 98 1998

Geoffrey Fox

Northeast Parallel Architectures Center

Syracuse University

111 College Place

Syracuse NY

gcf@npac.syr.edu

HTML version of Basic Foils prepared 24 August 98

Foil 2 Abstract of Computational Science Presentation

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

We Introduce Computational Science and Driving Forces

Technology Advances and Commodity Trends
Inevitability of Parallelism
Integration of Distributed and Parallel Computing
Comparison with Internetics

We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory

We describe the growing importance of Java

We explain pragmatic choices

MPI with Fortran and C today
Java Grande is future?

HTML version of Basic Foils prepared 24 August 98

Foil 3 What is Computational Science ?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Practical use of leading edge computer science technologies to address "real" applications

Two tracks at Syracuse

CPS615/713: Simulation Track

CPS606/616/640/714: Information Track

Large Scale Parallel Computing Low latency Closely Coupled Components

World Wide Distributed Computing Loosely Coupled Components

Parallel Algorithms Fortran90

HPF MPI Interconnection Networks

Transactions

Security

Compression

PERL JavaScript Multimedia

Wide Area Networks

Java

VRML Collaboration Integration (middleware) Metacomputing CORBA Databases

HTML version of Basic Foils prepared 24 August 98

Foil 4 Synergy of Parallel Computing and Web Internetics as Unifying Principle

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

The two forms of Large Scale Computing Scale Computer for Scale Users in Proportion Power User to number of computers

Parallel Commodity Distributed Computers Information Systems Technology <--------------- Internetics Technologies --------------->

Parallel Computer Distributed Computer

HPF MPI HPJava HTML VRML

HTML version of Basic Foils prepared 24 August 98

Foil 5 Basic CPS615 Contact Points

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST

Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST

Grader: Saleh Elmohamed -- saleh@npac.syr.edu, 3154431073

NPAC administrative support: Nicole Caza -- ncaza@npac.syr.edu, 3154431722 and room 3-206 CST

The above can be reached at cps615ad@npac.syr.edu

Students will be on alias cps615@npac.syr.edu

Homepage is: http://www.npac.syr.edu/projects/jsufall98

HTML version of Basic Foils prepared 24 August 98

Foil 6 Course Organization

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Graded on the basis of approximately 8 Homework sets which will be due Thursday of the week following day (Tuesday or Thursday given out)

There will be one project -- which will start after message passing (MPI) discussed

Total grade is 70% homework, 30% project

Languages will Fortran or C and Java -- we will do a survey early on to clarify this

All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web

HTML version of Basic Foils prepared 24 August 98

Foil 7 Material Covered in this Course

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Status of High Performance Computing and Computation HPCC nationally

What is Computational Science Nationally (and at Syracuse)

Technology driving forces

Moore's law and exponentially increasing transistors

Elementary discussion of Parallel Computing in Society and why it must obviously work in simulation!

Sequential and Parallel Computer Architectures

Comparison of Architecture of World Wide Web and Parallel Systems

HTML version of Basic Foils prepared 24 August 98

Foil 8 Structure of CPS615 - II

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Simple base Example -- Laplace's Equation

Illustrate parallel computing
Illustrate use of Web

Then we discuss software and algorithms (mini-applications) intermixed

Programming Models -- SPMD and Message Passing (MPI) with Fortran, C and Java

Data Parallel (HPF, Fortran90) languages will be discussed but NOT used

Visualization -- Java Applets

Other tools such as Collaboration , Performance Measurement will be mentioned

HTML version of Basic Foils prepared 24 August 98

Foil 9 What are Parallel and Distributed Computing?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

It is multiple processes running on multiple computers which are coordinated to solve a single problem!

Distributed Computing can be defined in same way

Parallel computing has tight coordination (implying low latency and high bandwidth communication)

Distributed computing has looser constraints between component processes

HTML version of Basic Foils prepared 24 August 98

Foil 10 Why Parallel Computing?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Continuing demands for higher performance

Physical limits on single processor performance

Increasing number of transistors in a single chip or on a single board implies parallel architectures

High costs of internal concurrency

Result is rise of multiprocessor architectures

And number of processors will continue to increase

Increasing importance of DISTRIBUTED computing through the Web and Internet

or Intranets which is the Internet internal to organizations
Distributed computing is a much larger field

HTML version of Basic Foils prepared 24 August 98

Foil 11 Parallel Computing Technology Rationale

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU

Essentially impossible to increase clock speed and so must exploit increasing transistor density in figure of merit (1/f)2-4

Already we build chips with some factor of ten more transistors than this and this is used for "automatic" instruction level parallelism.

This corresponds to parallelism in "innermost loops"

However getting much more speedup than this requires use of "outer loop" or data parallelism.

Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!

Harder for sequential than parallel computers
Data locality is unifying concept!

HTML version of Basic Foils prepared 24 August 98

Foil 12 Motivating Applications

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Large Scale Simulations in Engineering

Model airflow around an aircraft
Study environmental issues -- flow of contaminants
Forecast weather
Oil Industry: Reservoir Simulation and analysis of Seismic data

Large Scale Academic Simulations (Physics, Chemistry, Biology)

Study of Evolution of Universe
Study of fundamental particles: Quarks and Gluons
Study of protein folding
Study of catalysts

"Other Critical Real World Applications"

Run optimization and classification algorithms in datamining of Enterprise Information Systems
Model Financial Instruments

HTML version of Basic Foils prepared 24 August 98

Foil 13 Some Comments on Simulation and HPCC

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

HPCC is a maturing field with many organizations installing large scale systems

These include NSF (academic computations) with PACI activity, DoE (Dept of Energy) with ASCI and DoD (Defense) with Modernization

There are new applications with new algorithmic challenges

Our work on Black Holes has novel adaptive meshes
On earthquake simulation, new "fast multipole" approaches to a problem not tackled this way before
On financial modeling, new Monte Carlo methods for complex options

However these are not "qualitatively" new concepts

Software ideas are "sound" but note simulation is only a few percent of total computer market and software immature

Use where possible software from other sources (such as web)which can spend more money on each feature!

HTML version of Basic Foils prepared 24 August 98

Foil 14 The Multicomputer: an Idealized Parallel Computer

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Memory

Node

Typically a Bus on a board

HTML version of Basic Foils prepared 24 August 98

Foil 15 Multicomputer Architecture

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Multicomputer = nodes + network

Node = processor(s) + local memory

Access to local memory is cheap

10s of cycles; does not involve network
Conventional memory reference

Access to remote memory is expensive

100s or 1000s of cycles; uses network
Use I/O-like mechanisms (e.g., send/receive)

HTML version of Basic Foils prepared 24 August 98

Foil 16 Multicomputer Cost Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Cost of remote memory access/communication (including synchronization)

T = ts + N tw
ts = per-message cost ("latency")
tw = per-word cost
N = message size in words

Hence data locality is an important property of good parallel algorithms

HTML version of Basic Foils prepared 24 August 98

Foil 17 Sequential Memory Structure

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Data locality implies CPU finds information it needs in cache which stores most recently accessed information

This means one reuses a given memory reference in many nearby computations e.g.

A1 = B*C

A2 = B*D + B*B

.... Reuses B

Processor

Cache

L2 Cache

L3 Cache

Main

Memory

Disk

Increasing Memory

Capacity Decreasing

Memory Speed (factor of 100 difference between processor

and main memory

speed)

HTML version of Basic Foils prepared 24 August 98

Foil 18 Parallel Computer Memory Structure

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

For both parallel and sequential computers, cost is accessing remote memories with some form of "communication"

Data locality addresses in both cases

Differences are quantitative size of effect and what is done by user and what automatically

Processor

Cache

L2 Cache

L3 Cache

Main

Memory

Processor

Cache

L2 Cache

Processor

Cache

L2 Cache

Board Level Interconnection Networks

....

System Level Interconnection Network

L3 Cache

Main

Memory

L3 Cache

Main

Memory

Slow

Very Slow

HTML version of Basic Foils prepared 24 August 98

Foil 19 Real Parallel Computers Architectures

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Major architecture types

Distributed memory MIMD
Shared memory MIMD
Distributed shared memory (DSM)
Workstation clusters and "metacomputers"
An older design SIMD where processors run in lockstep has fallen out of favor

Idealized Multicomputer Model fits current architectures pretty well

HTML version of Basic Foils prepared 24 August 98

Foil 20 Parallel Computers -- Classic Overview

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Parallel computers allow several CPUs to contribute to a computation simultaneously and different architectures take different steps to enable this coordinated action.

For our purposes, a parallel computer has three types of parts:

Processors
Memory modules
Communication / synchronization network

Some Key points: Easier in Some Architectures/Applications than others!

All processors must be busy for peak speed.
Local memory is directly connected to each processor.
Accessing local memory is much faster than other memory.
Synchronization is expensive, but necessary for correctness.

Colors Used in Following pictures

HTML version of Basic Foils prepared 24 August 98

Foil 21 Distributed Memory MIMD Multiprocessor

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Multiple Instruction/Multiple Data

Processors with local memory connected by high-speed interconnection network

Typically high bandwidth, medium latency

Hardware support for remote memory access

Model breaks down when topology matters

Examples: Cray T3E, IBM SP

HTML version of Basic Foils prepared 24 August 98

Foil 22 Distributed Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Every processor has a memory others can't access.

Advantages:

Relatively easy to design and build
Predictable (if modest) behavior
Can be scalable
Can hide latency of communication

Disadvantages:

Hard to program
Program and O/S (and sometimes data) must be replicated

HTML version of Basic Foils prepared 24 August 98

Foil 23 Distributed Memory Machines -- Notes

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Conceptually, the nCUBE CM-5 Paragon SP-2 Beowulf PC cluster are quite similar.

Bandwidth and latency of interconnects different
The network topology is a two-dimensional torus for Paragon, fat tree for CM-5, hypercube for nCUBE and Switch for SP-2

To program these machines:

Divide the problem to minimize number of messages while retaining parallelism
Convert all references to global structures into references to local pieces (explicit messages convert distant to local variables)
Optimization: Pack messages together to reduce fixed overhead (almost always needed)
Optimization: Carefully schedule messages (usually done by library)

HTML version of Basic Foils prepared 24 August 98

Foil 24 Shared Memory MIMD Multiprocessor

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Processors access shared memory via bus

Low latency, high bandwidth

Bus contention limits scalability

Search for scalability introduces locality

Cache (a form of local memory)
Multistage architectures (some memory closer)

Examples: Cray T90, SGI Power Challenge, Sun

HTML version of Basic Foils prepared 24 August 98

Foil 25 Shared-Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

All processors access the same memory.

Advantages:

Retain sequential programming languages such as Java or Fortran
Easy to program (correctly)
Can share code and data among processors

Disadvantages:

Hard to program (optimally)
Not scalable due to bandwidth limitations in bus

HTML version of Basic Foils prepared 24 August 98

Foil 26 Shared-Memory Machines -- Notes

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Interconnection network shown here is actually for the BBN Butterfly, but C-90 is in the same spirit.

These machines share data by direct access.

Potentially conflicting accesses must be protected by synchronization.
Simultaneous access to the same memory bank will cause contention, degrading performance.
Some access patterns will collide in the network (or bus), causing contention.
Many machines have caches at the processors.
All these features make it profitable to have each processor concentrate on one area of memory that others access infrequently.

HTML version of Basic Foils prepared 24 August 98

Foil 27 Distributed Shared Memory (DSM)

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

A hybrid of distributed and shared memory

Small groups of processors share memory; others access across a scalable network

Low to moderate latency, high bandwidth

Model simplifies the multilevel hierarchy

Examples: SGI Origin, HP Exemplar

HTML version of Basic Foils prepared 24 August 98

Foil 28 Distributed Shared Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Combining the (dis)advantages of shared and distributed memory

Lots of hierarchical designs are appearing.

Typically, "shared memory nodes" with 4 to 32 processors
Each processor has a local cache
Processors within a node access shared memory
Nodes can get data from or put data to other nodes' memories

HTML version of Basic Foils prepared 24 August 98

Foil 29 Workstation Clusters

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Workstations connected by network

Cost effective

High latency, low to moderate bandwidth

Often lack integrated software environment

Multicomputer Model breaks down if connectivity limited

Example Connectivities: Ethernet, ATM crossbar, Myrinet

"Ordinary" Network

Ordinary PC's or

Workstations

HTML version of Basic Foils prepared 24 August 98

Foil 30 Parallel Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

A parallel algorithm is a collection of tasks and a partial ordering between them.

Design goals: We will return to this later on

Match tasks to the available processors (exploit parallelism).
Minimize ordering (avoid unnecessary synchronization points).
Recognize ways parallelism can be helped by changing ordering

Sources of parallelism: We will discuss this now

Data parallelism: updating array elements simultaneously.
Functional parallelism: conceptually different tasks which combine to solve the problem. This happens at fine and coarse grain size
- fine is "internal" such as I/O and computation; coarse is "external" such as separate modules linked together

HTML version of Basic Foils prepared 24 August 98

Foil 31 Data Parallelism in Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Data-parallel algorithms exploit the parallelism inherent in many large data structures.

A problem is an (identical) algorithm applied to multiple points in data "array"
Usually iterate over such "updates"
Example: Red-Black Relaxation
- All "red" points can be updated in parallel; then all the "black"

Analysis:

Scalable parallelism -- can usually get million or more way parallelism
Hard to express when "geometry" irregular or dynamic

HTML version of Basic Foils prepared 24 August 98

Foil 32 Some Illustrative Examples of Parallel Applications!

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Now we go through a set of semi-realistic examples of parallel computing and show they use various forms of data parallelism

Seismic Wave Simulation: Regular mesh of grid points

Cosmology (Universe) Simulation: Irregular collection of stars or galaxies

Computer Chess: "Data" is now parts of a game tree with major complication of pruning algorithms

"Deep Blue" uses this global parallelism combined with optimal locally parallel evaluation of "score" of a position using special hardware

Defending the Nation: Tracking multiple missiles achieves parallelism from set of missiles

Finite Element (NASTRAN) structures problem: Irregular collection of nodal (grid) points clustered near a crack

HTML version of Basic Foils prepared 24 August 98

Foil 33 Functional Parallelism in Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Functional parallelism exploits the parallelism between the parts of many systems.

Many pieces to work on ? many independent operations
Example: Coarse grain Aeroelasticity (aircraft design)
- CFD(fluids) and CSM(structures) and others (acoustics, electromagnetics etc.) can be evaluated in parallel

Analysis:

Parallelism limited in size -- tens not millions
Synchronization probably good as parallelism natural from problem and usual way of writing software
Web exploits functional parallelism NOT data parallelism

HTML version of Basic Foils prepared 24 August 98

Foil 34 Structure(Architecture) of Applications - I

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Applications are metaproblems with a mix of module (aka coarse grain functional) and data parallelism

Modules are decomposed into parts (data parallelism) and composed hierarchically into full applications.They can be the

"10,000" separate programs (e.g. structures,CFD ..) used in design of aircraft
the various filters used in Khoros based image processing system
the ocean-atmosphere components in integrated climate simulation
The data-base or file system access of a data-intensive application
the objects in a distributed Forces Modeling Event Driven Simulation

HTML version of Basic Foils prepared 24 August 98

Foil 35 Structure(Architecture) of Applications - II

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Modules are "natural" message-parallel components of problem and tend to have less stringent latency and bandwidth requirements than those needed to link data-parallel components

modules are what HPF needs task parallelism for
Often modules are naturally distributed whereas parts of data parallel decomposition may need to be kept on tightly coupled MPP

Assume that primary goal of metacomputing system is to add to existing parallel computing environments, a higher level supporting module parallelism

Now if one takes a large CFD problem and divides into a few components, those "coarse grain data-parallel components" will be supported by computational grid technology

Use Java/Distributed Object Technology for modules -- note Java to growing extent used to write servers for CORBA and COM object systems

HTML version of Basic Foils prepared 24 August 98

Foil 36 Multi Server Model for metaproblems

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

We have multiple supercomputers in the backend -- one doing CFD simulation of airflow; another structural analysis while in more detail you have linear algebra servers (Netsolve); Optimization servers (NEOS); image processing filters(Khoros);databases (NCSA Biology workbench); visualization systems(AVS, CAVEs)

One runs 10,000 separate programs to design a modern aircraft which must be scheduled and linked .....

All linked to collaborative information systems in a sea of middle tier servers(as on previous page) to support design, crisis management, multi-disciplinary research

HTML version of Basic Foils prepared 24 August 98

Foil 37 Multi-Server Gateway Tier

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Database

Matrix Solver

Optimization Service

MPP

Parallel DB Proxy

NEOS Control Optimization

Origin 2000 Proxy

NetSolve Linear Alg. Server

IBM SP2 Proxy

Gateway Control

Agent-based Choice of Compute Engine

Multidisciplinary Control (WebFlow)

Data Analysis Server

HTML version of Basic Foils prepared 24 August 98

Foil 38 Pleasingly Parallel Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Many applications are what is called (essentially) embarrassingly or more kindly pleasingly parallel

These are made up of independent concurrent components

Each client independently accesses a Web Server
Each roll of a Monte Carlo dice (random number) is an independent sample
Each stock can be priced separately in a financial portfolio
Each transaction in a database is almost independent (a given account is locked but usually different accounts are accessed at same time)
Different parts of Seismic data can be processed independently

In contrast points in a finite difference grid (from a differential equation) canNOT be updated independently

Such problems are often formally data-parallel but can be handled much more easily -- like functional parallelism

HTML version of Basic Foils prepared 24 August 98

Foil 39 Parallel Languages

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

A parallel language provides an executable notation for implementing a parallel algorithm.

Design criteria:

How are parallel operations defined?
- static tasks vs. dynamic tasks vs. implicit operations
How is data shared between tasks?
- explicit communication/synchronization vs. shared memory
How is the language implemented?
- low-overhead runtime systems vs. optimizing compilers

Usually a language reflects a particular style of expressing parallelism.

Data parallel expresses concept of identical algorithm on different parts of array

Message parallel expresses fact that at low level parallelism implis information is passed between different concurrently executing program parts

HTML version of Basic Foils prepared 24 August 98

Foil 40 Data-Parallel Languages

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Data-parallel languages provide an abstract, machine-independent model of parallelism.

Fine-grain parallel operations, such as element-wise operations on arrays
Shared data in large, global arrays with mapping "hints"
Implicit synchronization between operations
Partially explicit communication from operation definitions

Advantages:

Global operations conceptually simple
Easy to program (particularly for certain scientific applications)

Disadvantages:

Unproven compilers
As express "problem" can be inflexible if new algorithm which language didn't express well

Examples: HPF, C*, HPC++

Originated on SIMD machines where parallel operations are in lock-step but generalized (not so successfully as compilers too hard) to MIMD

HTML version of Basic Foils prepared 24 August 98

Foil 41 Message-Passing Systems

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Program is based on typically coarse-grain tasks

Separate address space and a processor number for each task

Data shared by explicit messages

Point-to-point and collective communications patterns

Examples: MPI, PVM, Occam for parallel computing

Universal model for distributed computing to link naturally decomposed parts e.g. HTTP, RMI, IIOP etc. are all message passing

distributed object technology (COM, CORBA) built on functionally concurrent objects sending and receiving messages

Advantages:

Close to hardware ALWAYS
Can be close to problem as in distributed objects or functional parallelism

Disadvantages:

Many low-level details when NOT close to problem.

HTML version of Basic Foils prepared 24 August 98

Foil 42 A Simple Parallel Programming Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

A parallel computation is a set of tasks

Each task has local data, can be connected to other tasks by channels

A task can

Compute using local data
Send to/receive from other tasks
Create new tasks, or terminate itself

A receiving task blocks until data available

Note ALL coordination is message passing

HTML version of Basic Foils prepared 24 August 98

Foil 43 Properties of Programming Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Concurrency is enhanced by creating multiple tasks

Scalability: More tasks than processor nodes

Locality: Access local data when possible -- minimize message traffic

A task (with local data and subtasks) is a unit for modular design -- it is an object!

Mapping to nodes affects performance only

HTML version of Basic Foils prepared 24 August 98

Foil 44 Some Steps in Parallel Programming

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Partition

Define tasks

Communication

Identify requirements

Agglomeration

Enhance locality

Mapping

Place tasks

HTML version of Basic Foils prepared 24 August 98

Foil 45 Partitioning

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Goal: identify opportunities for concurrent execution (define tasks: computation+data)

Focus on data operated on by algorithm ...

Then distribute computation appropriately
Domain decomposition or Data Parallelism

... or on the operations performed

Then distribute data appropriately
Functional decomposition

HTML version of Basic Foils prepared 24 August 98

Foil 46 Communication

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Identify communication requirements

If computation in one task requires data located in another, communication is needed

Example: finite difference computation

Must communicate with each neighbor

Xi = (Xi-1 + 2*Xi + Xi+1)/4

Partition creates

one task per point

I=2 above

HTML version of Basic Foils prepared 24 August 98

Foil 47 Agglomeration

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Once tasks + communication determined, "agglomerate" small tasks into larger tasks

Motivations

To reduce communication costs
If tasks cannot execute concurrently
To reduce software engineering costs

Caveats

May involve replicating computation or data

HTML version of Basic Foils prepared 24 August 98

Foil 48 Mapping

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Place tasks on processors, to

Maximize concurrency } note potential
Minimize communication } conflict

Techniques:

Regular problems: agglomerate to P tasks
Irregular problems: use static load balancing
If irregular in time: dynamic load balancing

HTML version of Basic Foils prepared 24 August 98

Foil 49 Example: Atmosphere Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Simulate atmospheric processes

Conservation of momentum, mass, energy
Ideal gas law, hydrostatic approximation

Represent atmosphere state by 3-D grid

Periodic in two horizontal dimensions
Nx.Ny.Nz: e.g., Ny=50-500, Nx=2Ny, Nz=15-30

Computation includes

Atmospheric dynamics: finite difference
"Physics" (radiation etc.) in vertical only

HTML version of Basic Foils prepared 24 August 98

Foil 50 Atmosphere Model: Numerical Methods

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Discretize the (continuous) domain by a regular Nx??Ny ??Nz grid

Store p, u, v, T, ? at every grid point

Approximate derivatives by finite differences

Leads to stencils in vertical and horizontal

HTML version of Basic Foils prepared 24 August 98

Foil 51 Atmosphere Model: Partition

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Use domain decomposition

Because model operates on large, regular grid
Can decompose in 1, 2, or 3 dimensions
3-D decomposition offers greatest flexibility

HTML version of Basic Foils prepared 24 August 98

Foil 52 Atmosphere Model: Communication

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Finite difference stencil horizontally

Local, regular, structured

Radiation calculations vertically

Global, regular, structured

Diagnostic sums

Global, regular, structured

HTML version of Basic Foils prepared 24 August 98

Foil 53 Atmosphere Model: Agglomeration

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

In horizontal

Clump so that 4 points per task
Efficiency: communicate with 4 neighbors only

In vertical, clump all points in column

Performance: avoid communication
Modularity: Reuse physics modules

Resulting algorithm "reasonably scalable"

(Nx.Ny)/4 : at least 1250 tasks

HTML version of Basic Foils prepared 24 August 98

Foil 54 Atmosphere Model: Mapping

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Technique depends on load distribution

1) Agglomerate to one task per processor

Appropriate if little load imbalance

2) Extend (1) to incorporate cyclic mapping

Works well for diurnal cycle imbalance

3) Use dynamic, local load balancing

Works well for unpredictable, local imbalances

HTML version of Basic Foils prepared 24 August 98

Foil 55 What is Parallel Architecture?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

A parallel computer is any old collection of processing elements that cooperate to solve large problems fast

from a pile of PC's to a shared memory multiprocessor

Some broad issues:

Resource Allocation:
- how large a collection?
- how powerful are the elements?
- how much memory?
Data access, Communication and Synchronization
- how do the elements cooperate and communicate?
- how are data transmitted between processors?
- what are the abstractions and primitives for cooperation?
Performance and Scalability
- how does it all translate into performance?
- how does it scale?

HTML version of Basic Foils prepared 24 August 98

Foil 56 Why Study Parallel Architecture as a computer scientist?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Role of a computer architect:

To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost.

Parallelism:

Provides alternative to faster clock for performance
Applies at all levels of system design
Is a fascinating perspective from which to view architecture
Is increasingly central in information processing

HTML version of Basic Foils prepared 24 August 98

Foil 57 Why Study Architecture Today?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

History: diverse and innovative organizational structures, often tied to novel programming models

Rapidly maturing under strong technological constraints

The "killer microprocessor" is ubiquitous
Laptops and supercomputers are fundamentally similar!
Technological trends cause diverse approaches to converge

Technological trends make parallel computing inevitable

In the mainstream

Need to understand fundamental principles and design tradeoffs, not just taxonomies

According to Culler fundamental are: Naming, Ordering, Replication, Communication performance
According to Fox: match computer hardware software and problem

HTML version of Basic Foils prepared 24 August 98

Foil 58 Inevitability of Parallel Computing

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Application demands: Our insatiable need for computing cycles

Scientific computing: CFD, Biology, Chemistry, Physics, ...
General-purpose computing: Video, Graphics, CAD, Databases, TP...

Technology Trends

Number of transistors on chip growing rapidly
Clock rates expected to go up only slowly

Architecture Trends

Instruction-level parallelism(ILP) valuable but limited
Coarser-level parallelism, as in Multiprocessors, the most viable approach

Economics

Current trends:

Today's microprocessors have multiprocessor(MP) support
Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!...
Tomorrow's microprocessors are multiprocessors

HTML version of Basic Foils prepared 24 August 98

Foil 59 Application Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Demand for cycles fuels advances in hardware, and vice-versa

Cycle drives exponential increase in microprocessor performance (this is largely from low-end)
Drives parallel architecture harder: most demanding applications (this is high end driver)

Range of performance demands

Need range of system performance with progressively increasing cost
Platform pyramid

Goal of applications in using parallel machines: Speedup

Speedup (p processors) =

For a fixed problem size (input data set), performance = 1/time

Speedup fixed problem (p processors) =

HTML version of Basic Foils prepared 24 August 98

Foil 60 TPC-C (database transaction processing)

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Results from March 96

Parallelism is pervasive

Small to moderate scale parallelism very important

HTML version of Basic Foils prepared 24 August 98

Foil 61 Summary of Application Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Transition to parallel computing has occurred for scientific and engineering computing but this is 1-2% of computer market

In rapid progress in commercial computing

Database and transactions as well as financial modeling/oil reservoir simulation
Web servers including multi-media and search growing importance
Typically functional or pleasingly parallel
Usually smaller-scale, but large-scale systems also used

Desktop also uses multithreaded programs, which are a lot like parallel programs (again functional not data parallel)

Demand for improving throughput on sequential workloads

Greatest use of small-scale multiprocessors

Solid application demand exists and will increase

HTML version of Basic Foils prepared 24 August 98

Foil 62 Technology Trends -- CPU's

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

The natural building block for multiprocessors is now also about the fastest!

HTML version of Basic Foils prepared 24 August 98

Foil 63 General Technology Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Microprocessor performance increases 50% - 100% per year

Transistor count doubles every 3 years

DRAM size quadruples every 3 years

Huge investment per generation is carried by huge commodity market

Note that single-processor performance is plateauing, but that parallelism is a natural way to improve it.

Sequential or small multiprocessors -- today 2 processor PC's

HTML version of Basic Foils prepared 24 August 98

Foil 64 Technology: A Closer Look

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Basic advance is decreasing feature size ( ??)

Circuits become either faster or lower in power

Die size is growing too

Clock rate improves roughly proportional to improvement in ?
Number of transistors improves like ????(or faster)

Performance > 100x per decade; clock rate 10x, rest of increase is due to transistor count

How to use more transistors?

Parallelism in processing
- multiple operations per cycle reduces CPI
Locality in data access
- avoids latency and reduces CPI
- also improves processor utilization
Both need resources, so tradeoff

Fundamental issue is resource distribution, as in uniprocessors

CPI = Clock Cycles per Instruction

HTML version of Basic Foils prepared 24 August 98

Foil 65 Clock Frequency Growth Rate

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

30% per year

HTML version of Basic Foils prepared 24 August 98

Foil 66 Transistor Count Growth Rate

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

100 million transistors on chip by early 2000's A.D.

Transistor count grows much faster than clock rate

- 40% per year, order of magnitude more contribution in 2 decades

HTML version of Basic Foils prepared 24 August 98

Foil 67 Similar Story for Storage

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Divergence between memory capacity and speed more pronounced

Capacity increased by 1000x from 1980-95, speed only 2x
Gigabit DRAM by c. 2000, but gap with processor speed much greater

Larger memories are slower, while processors get faster

Need to transfer more data in parallel
Need deeper cache hierarchies
How to organize caches?

Parallelism increases effective size of each level of hierarchy, without increasing access time

Parallelism and locality within memory systems too

New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface
Buffer caches most recently accessed data

Disks too: Parallel disks plus caching

HTML version of Basic Foils prepared 24 August 98

Foil 68 The HPCC Dilemma and its Solution

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

HPCC has developed good research ideas but cannot implement them as solving computing's hardest problem with 1 percent of the funding

HPCC applications are very complex and use essentially all computer capabilities and also have synchronization and performance constraints from HPCC

We have learnt to use commodity hardware either

partially as in Origin 2000/SP2 with consumer CPU's but custom network or
fully as in PC cluster with fast ethernet/ATM

Let us do the same with software and design systems with maximum possible commodity software basis

HTML version of Basic Foils prepared 24 August 98

Foil 69 What is Commodity Software

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

The world is building a wonderful distributed computing (information processing) environment using Web (dissemination) and distributed object (CORBA COM) technologies

This includes Java, Web-linked databases and the essential standards such as HTML(documents), VRML(3D objects), JDBC (Java database connectivity).

The standard interfaces are essential in that they allow modular (component based) software

We will "just" add high performance to this commodity distributed infrastructure

Respecting architecture of the object web, should allow us to naturally use improved software as it produced

The alternative strategy starts with HPCC technologies (such as MPI,HPF) and adds links to commodity world. This approach does not easily track evolution of commodity systems and so has large maintenance costs

HTML version of Basic Foils prepared 24 August 98

Foil 70 The Computing Pyramid

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Bottom of Pyramid has 100 times dollar value and 1000 times compute power of best supercomputer

HTML version of Basic Foils prepared 24 August 98

Foil 71 Implications of the Computing Pyramid

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Web Software MUST be cheaper and better than MPP software as factor of 100 more money invested!

Therefore natural strategy is to get parallel computing environment by adding synchronization of parallel algorithms to loosely coupled Web distributed computing model

HTML version of Basic Foils prepared 24 August 98

Foil 72 The 3 Roles of Java

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

HTML version of Basic Foils prepared 24 August 98

Foil 73 Why is Java Worth Looking at?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

The Java Language has several good design features

secure, safe (wrt bugs), object-oriented, familiar (to C C++ and even Fortran programmers)

Java has a very good set of libraries covering everything from commerce, multimedia, images to math functions (under development at http://math.nist.gov/javanumerics)

Java has best available electronic and paper training and support resources

Java is rapidly getting best integrated program development environments

Java naturally integrated with network and universal machine supports powerful "write once-run anywhere" model

There is a large and growing trained labor force

Can we exploit this in computational science?

HTML version of Basic Foils prepared 24 August 98

Foil 74 What is Java Grande?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Use of Java for:

High Performance Network Computing

Scientific and Engineering Computation

(Distributed) Modeling and Simulation

Parallel and Distributed Computing

Data Intensive Computing

Communication and Computing Intensive Commercial and Academic Applications

HPCC Computational Grids ........

Very difficult to find a "conventional name" that doesn't get misunderstood by some community!

HTML version of Basic Foils prepared 24 August 98

Foil 75 Java and Parallelism?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

The Web integration of Java gives it excellent "network" classes and support for message passing.

Thus "Java plus message passing" form of parallel computing is actually somewhat easier than in Fortran or C.

Coarse grain parallelism very natural in Java

"Data Parallel" languages features are NOT in Java and have to be added (as a translator) of NPAC's HPJava to Java+Messaging just as HPF translates to Fortran plus message passing

Java has built in "threads" and a given Java Program can run multiple threads at a time

In Web use, allows one to process Image in one thread, HTML page in another etc.

Can be used to do more general parallel computing but only on shared memory computers

JavaVM (standard Java Runtime) does not support distributed memory systems

HTML version of Basic Foils prepared 24 August 98

Foil 76 "Pure" Java Model For Parallelism

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

Combine threads on a shared memory machine with message passing between distinct distributed memories

"Distributed" or "Virtual" Shared memory does support the JavaVM as hardware gives illusion of shared memory to JavaVM

Message Passing

HTML version of Basic Foils prepared 24 August 98

Foil 77 Pragmatic Computational Science August 1998

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *

Full HTML Index

So here is a recipe for developing HPCC (parallel) applications as of August 1998

Use MPI for data parallel programs as alternatives are HPF, HPC++ or parallelizing compilers today

Neither HPF or HPC++ has clear long term future for implementations -- ideas are sound
MPI will run on PC clusters as well as customized parallel machines -- parallelizing compilers will not work on distributed memory machines

Today Fortran and C are good production languages

Today Java can be used for client side applets and in systems middleware but too slow for production scientific code

This should change over next year with better Java compilers -- including "native" compilers which do not translate to Java Virtual Machine but go directly to native machine language

Use metacomputers for pleasingly parallel and metaproblems -- not for closely knit problems with tight synchronization between parts

Use where possible web and distributed object technology for "coordination"

pleasingly parallel problems can use MPI or web/metacomputing technology

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Fri Aug 28 1998