Full HTML for

Basic foilset Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel

Given by Geoffrey C. Fox, Nancy McCracken at Computational Science for Simulations on Fall Semester 1998. Foils prepared 24 August 98
Outside Index Summary of Material


We Introduce Computational Science and Driving Forces
  • Technology Advances and Commodity Trends
  • Inevitability of Parallelism
  • Integration of Distributed and Parallel Computing
  • Comparison with Internetics
We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory
We describe the growing importance of Java
We explain pragmatic choices
  • MPI with Fortran and C today
  • Java Grande is future?

Table of Contents for full HTML of Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Framework for Computational Science
2 Abstract of Computational Science Presentation
3 What is Computational Science ?
4 Synergy of Parallel Computing and Web Internetics as Unifying Principle
5 Basic CPS615 Contact Points
6 Course Organization
7 Material Covered in this Course
8 Structure of CPS615 - II
9 What are Parallel and Distributed Computing?
10 Why Parallel Computing?
11 Parallel Computing Technology Rationale
12 Motivating Applications
13 Some Comments on Simulation and HPCC
14 The Multicomputer: an Idealized Parallel Computer
15 Multicomputer Architecture
16 Multicomputer Cost Model
17 Sequential Memory Structure
18 Parallel Computer Memory Structure
19 Real Parallel Computers Architectures
20 Parallel Computers -- Classic Overview
21 Distributed Memory MIMD Multiprocessor
22 Distributed Memory Machines
23 Distributed Memory Machines -- Notes
24 Shared Memory MIMD Multiprocessor
25 Shared-Memory Machines
26 Shared-Memory Machines -- Notes
27 Distributed Shared Memory (DSM)
28 Distributed Shared Memory Machines
29 Workstation Clusters
30 Parallel Algorithms
31 Data Parallelism in Algorithms
32 Some Illustrative Examples of Parallel Applications!
33 Functional Parallelism in Algorithms
34 Structure(Architecture) of Applications - I
35 Structure(Architecture) of Applications - II
36 Multi Server Model for metaproblems
37 Multi-Server Gateway Tier
38 Pleasingly Parallel Algorithms
39 Parallel Languages
40 Data-Parallel Languages
41 Message-Passing Systems
42 A Simple Parallel Programming Model
43 Properties of Programming Model
44 Some Steps in Parallel Programming
45 Partitioning
46 Communication
47 Agglomeration
48 Mapping
49 Example: Atmosphere Model
50 Atmosphere Model: Numerical Methods
51 Atmosphere Model: Partition
52 Atmosphere Model: Communication
53 Atmosphere Model: Agglomeration
54 Atmosphere Model: Mapping
55 What is Parallel Architecture?
56 Why Study Parallel Architecture as a computer scientist?
57 Why Study Architecture Today?
58 Inevitability of Parallel Computing
59 Application Trends
60 TPC-C (database transaction processing)
61 Summary of Application Trends
62 Technology Trends -- CPU's
63 General Technology Trends
64 Technology: A Closer Look
65 Clock Frequency Growth Rate
66 Transistor Count Growth Rate
67 Similar Story for Storage
68 The HPCC Dilemma and its Solution
69 What is Commodity Software
70 The Computing Pyramid
71 Implications of the Computing Pyramid
72 The 3 Roles of Java
73 Why is Java Worth Looking at?
74 What is Java Grande?
75 Java and Parallelism?
76 "Pure" Java Model For Parallelism
77 Pragmatic Computational Science August 1998

Outside Index Summary of Material



HTML version of Basic Foils prepared 24 August 98

Foil 1 Framework for Computational Science

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Fall Semester 98 1998
Geoffrey Fox
Northeast Parallel Architectures Center
Syracuse University
111 College Place
Syracuse NY
gcf@npac.syr.edu

HTML version of Basic Foils prepared 24 August 98

Foil 2 Abstract of Computational Science Presentation

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
We Introduce Computational Science and Driving Forces
  • Technology Advances and Commodity Trends
  • Inevitability of Parallelism
  • Integration of Distributed and Parallel Computing
  • Comparison with Internetics
We give a simple overview of parallel architectures today with distributed, shared or distributed shared memory
We describe the growing importance of Java
We explain pragmatic choices
  • MPI with Fortran and C today
  • Java Grande is future?

HTML version of Basic Foils prepared 24 August 98

Foil 3 What is Computational Science ?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Practical use of leading edge computer science technologies to address "real" applications
Two tracks at Syracuse
CPS615/713: Simulation Track
CPS606/616/640/714: Information Track
Large Scale Parallel Computing Low latency Closely Coupled Components
World Wide Distributed Computing Loosely Coupled Components
Parallel Algorithms Fortran90
HPF MPI Interconnection Networks
Transactions
Security
Compression
PERL JavaScript Multimedia
Wide Area Networks
Java
VRML Collaboration Integration (middleware) Metacomputing CORBA Databases

HTML version of Basic Foils prepared 24 August 98

Foil 4 Synergy of Parallel Computing and Web Internetics as Unifying Principle

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
The two forms of Large Scale Computing Scale Computer for Scale Users in Proportion Power User to number of computers
Parallel Commodity Distributed Computers Information Systems Technology <--------------- Internetics Technologies --------------->
Parallel Computer Distributed Computer
HPF MPI HPJava HTML VRML

HTML version of Basic Foils prepared 24 August 98

Foil 5 Basic CPS615 Contact Points

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST
Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST
Grader: Saleh Elmohamed -- saleh@npac.syr.edu, 3154431073
NPAC administrative support: Nicole Caza -- ncaza@npac.syr.edu, 3154431722 and room 3-206 CST
The above can be reached at cps615ad@npac.syr.edu
Students will be on alias cps615@npac.syr.edu
Homepage is: http://www.npac.syr.edu/projects/jsufall98

HTML version of Basic Foils prepared 24 August 98

Foil 6 Course Organization

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Graded on the basis of approximately 8 Homework sets which will be due Thursday of the week following day (Tuesday or Thursday given out)
There will be one project -- which will start after message passing (MPI) discussed
Total grade is 70% homework, 30% project
Languages will Fortran or C and Java -- we will do a survey early on to clarify this
All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web

HTML version of Basic Foils prepared 24 August 98

Foil 7 Material Covered in this Course

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Status of High Performance Computing and Computation HPCC nationally
What is Computational Science Nationally (and at Syracuse)
Technology driving forces
Moore's law and exponentially increasing transistors
Elementary discussion of Parallel Computing in Society and why it must obviously work in simulation!
Sequential and Parallel Computer Architectures
Comparison of Architecture of World Wide Web and Parallel Systems

HTML version of Basic Foils prepared 24 August 98

Foil 8 Structure of CPS615 - II

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Simple base Example -- Laplace's Equation
  • Illustrate parallel computing
  • Illustrate use of Web
Then we discuss software and algorithms (mini-applications) intermixed
Programming Models -- SPMD and Message Passing (MPI) with Fortran, C and Java
Data Parallel (HPF, Fortran90) languages will be discussed but NOT used
Visualization -- Java Applets
Other tools such as Collaboration , Performance Measurement will be mentioned

HTML version of Basic Foils prepared 24 August 98

Foil 9 What are Parallel and Distributed Computing?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
It is multiple processes running on multiple computers which are coordinated to solve a single problem!
Distributed Computing can be defined in same way
Parallel computing has tight coordination (implying low latency and high bandwidth communication)
Distributed computing has looser constraints between component processes

HTML version of Basic Foils prepared 24 August 98

Foil 10 Why Parallel Computing?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Continuing demands for higher performance
Physical limits on single processor performance
Increasing number of transistors in a single chip or on a single board implies parallel architectures
High costs of internal concurrency
Result is rise of multiprocessor architectures
  • And number of processors will continue to increase
Increasing importance of DISTRIBUTED computing through the Web and Internet
  • or Intranets which is the Internet internal to organizations
  • Distributed computing is a much larger field

HTML version of Basic Foils prepared 24 August 98

Foil 11 Parallel Computing Technology Rationale

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
  • Essentially impossible to increase clock speed and so must exploit increasing transistor density in figure of merit (1/f)2-4
Already we build chips with some factor of ten more transistors than this and this is used for "automatic" instruction level parallelism.
  • This corresponds to parallelism in "innermost loops"
However getting much more speedup than this requires use of "outer loop" or data parallelism.
Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!
  • Harder for sequential than parallel computers
  • Data locality is unifying concept!

HTML version of Basic Foils prepared 24 August 98

Foil 12 Motivating Applications

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Large Scale Simulations in Engineering
  • Model airflow around an aircraft
  • Study environmental issues -- flow of contaminants
  • Forecast weather
  • Oil Industry: Reservoir Simulation and analysis of Seismic data
Large Scale Academic Simulations (Physics, Chemistry, Biology)
  • Study of Evolution of Universe
  • Study of fundamental particles: Quarks and Gluons
  • Study of protein folding
  • Study of catalysts
"Other Critical Real World Applications"
  • Run optimization and classification algorithms in datamining of Enterprise Information Systems
  • Model Financial Instruments

HTML version of Basic Foils prepared 24 August 98

Foil 13 Some Comments on Simulation and HPCC

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
HPCC is a maturing field with many organizations installing large scale systems
These include NSF (academic computations) with PACI activity, DoE (Dept of Energy) with ASCI and DoD (Defense) with Modernization
There are new applications with new algorithmic challenges
  • Our work on Black Holes has novel adaptive meshes
  • On earthquake simulation, new "fast multipole" approaches to a problem not tackled this way before
  • On financial modeling, new Monte Carlo methods for complex options
However these are not "qualitatively" new concepts
Software ideas are "sound" but note simulation is only a few percent of total computer market and software immature
  • Use where possible software from other sources (such as web)which can spend more money on each feature!

HTML version of Basic Foils prepared 24 August 98

Foil 14 The Multicomputer: an Idealized Parallel Computer

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
P
Memory
P
P
Node
Node
Node
Node
Typically a Bus on a board

HTML version of Basic Foils prepared 24 August 98

Foil 15 Multicomputer Architecture

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Multicomputer = nodes + network
Node = processor(s) + local memory
Access to local memory is cheap
  • 10s of cycles; does not involve network
  • Conventional memory reference
Access to remote memory is expensive
  • 100s or 1000s of cycles; uses network
  • Use I/O-like mechanisms (e.g., send/receive)

HTML version of Basic Foils prepared 24 August 98

Foil 16 Multicomputer Cost Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Cost of remote memory access/communication (including synchronization)
  • T = ts + N tw
  • ts = per-message cost ("latency")
  • tw = per-word cost
  • N = message size in words
Hence data locality is an important property of good parallel algorithms
N
T
ts
tw

HTML version of Basic Foils prepared 24 August 98

Foil 17 Sequential Memory Structure

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Data locality implies CPU finds information it needs in cache which stores most recently accessed information
This means one reuses a given memory reference in many nearby computations e.g.
A1 = B*C
A2 = B*D + B*B
.... Reuses B
Processor
Cache
L2 Cache
L3 Cache
Main
Memory
Disk
Increasing Memory
Capacity Decreasing
Memory Speed (factor of 100 difference between processor
and main memory
speed)

HTML version of Basic Foils prepared 24 August 98

Foil 18 Parallel Computer Memory Structure

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
For both parallel and sequential computers, cost is accessing remote memories with some form of "communication"
Data locality addresses in both cases
Differences are quantitative size of effect and what is done by user and what automatically
Processor
Cache
L2 Cache
L3 Cache
Main
Memory
Processor
Cache
L2 Cache
Processor
Cache
L2 Cache
Board Level Interconnection Networks
....
....
System Level Interconnection Network
L3 Cache
Main
Memory
L3 Cache
Main
Memory
Slow
Very Slow

HTML version of Basic Foils prepared 24 August 98

Foil 19 Real Parallel Computers Architectures

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Major architecture types
  • Distributed memory MIMD
  • Shared memory MIMD
  • Distributed shared memory (DSM)
  • Workstation clusters and "metacomputers"
  • An older design SIMD where processors run in lockstep has fallen out of favor
Idealized Multicomputer Model fits current architectures pretty well

HTML version of Basic Foils prepared 24 August 98

Foil 20 Parallel Computers -- Classic Overview

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Parallel computers allow several CPUs to contribute to a computation simultaneously and different architectures take different steps to enable this coordinated action.
For our purposes, a parallel computer has three types of parts:
  • Processors
  • Memory modules
  • Communication / synchronization network
Some Key points: Easier in Some Architectures/Applications than others!
  • All processors must be busy for peak speed.
  • Local memory is directly connected to each processor.
  • Accessing local memory is much faster than other memory.
  • Synchronization is expensive, but necessary for correctness.
Colors Used in Following pictures

HTML version of Basic Foils prepared 24 August 98

Foil 21 Distributed Memory MIMD Multiprocessor

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Multiple Instruction/Multiple Data
Processors with local memory connected by high-speed interconnection network
Typically high bandwidth, medium latency
Hardware support for remote memory access
Model breaks down when topology matters
Examples: Cray T3E, IBM SP

HTML version of Basic Foils prepared 24 August 98

Foil 22 Distributed Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Every processor has a memory others can't access.
Advantages:
  • Relatively easy to design and build
  • Predictable (if modest) behavior
  • Can be scalable
  • Can hide latency of communication
Disadvantages:
  • Hard to program
  • Program and O/S (and sometimes data) must be replicated

HTML version of Basic Foils prepared 24 August 98

Foil 23 Distributed Memory Machines -- Notes

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Conceptually, the nCUBE CM-5 Paragon SP-2 Beowulf PC cluster are quite similar.
  • Bandwidth and latency of interconnects different
  • The network topology is a two-dimensional torus for Paragon, fat tree for CM-5, hypercube for nCUBE and Switch for SP-2
To program these machines:
  • Divide the problem to minimize number of messages while retaining parallelism
  • Convert all references to global structures into references to local pieces (explicit messages convert distant to local variables)
  • Optimization: Pack messages together to reduce fixed overhead (almost always needed)
  • Optimization: Carefully schedule messages (usually done by library)

HTML version of Basic Foils prepared 24 August 98

Foil 24 Shared Memory MIMD Multiprocessor

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Processors access shared memory via bus
Low latency, high bandwidth
Bus contention limits scalability
Search for scalability introduces locality
  • Cache (a form of local memory)
  • Multistage architectures (some memory closer)
Examples: Cray T90, SGI Power Challenge, Sun

HTML version of Basic Foils prepared 24 August 98

Foil 25 Shared-Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
All processors access the same memory.
Advantages:
  • Retain sequential programming languages such as Java or Fortran
  • Easy to program (correctly)
  • Can share code and data among processors
Disadvantages:
  • Hard to program (optimally)
  • Not scalable due to bandwidth limitations in bus

HTML version of Basic Foils prepared 24 August 98

Foil 26 Shared-Memory Machines -- Notes

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Interconnection network shown here is actually for the BBN Butterfly, but C-90 is in the same spirit.
These machines share data by direct access.
  • Potentially conflicting accesses must be protected by synchronization.
  • Simultaneous access to the same memory bank will cause contention, degrading performance.
  • Some access patterns will collide in the network (or bus), causing contention.
  • Many machines have caches at the processors.
  • All these features make it profitable to have each processor concentrate on one area of memory that others access infrequently.

HTML version of Basic Foils prepared 24 August 98

Foil 27 Distributed Shared Memory (DSM)

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
A hybrid of distributed and shared memory
Small groups of processors share memory; others access across a scalable network
Low to moderate latency, high bandwidth
Model simplifies the multilevel hierarchy
Examples: SGI Origin, HP Exemplar

HTML version of Basic Foils prepared 24 August 98

Foil 28 Distributed Shared Memory Machines

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Combining the (dis)advantages of shared and distributed memory
Lots of hierarchical designs are appearing.
  • Typically, "shared memory nodes" with 4 to 32 processors
  • Each processor has a local cache
  • Processors within a node access shared memory
  • Nodes can get data from or put data to other nodes' memories

HTML version of Basic Foils prepared 24 August 98

Foil 29 Workstation Clusters

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Workstations connected by network
Cost effective
High latency, low to moderate bandwidth
Often lack integrated software environment
Multicomputer Model breaks down if connectivity limited
Example Connectivities: Ethernet, ATM crossbar, Myrinet
"Ordinary" Network
Ordinary PC's or
Workstations

HTML version of Basic Foils prepared 24 August 98

Foil 30 Parallel Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
A parallel algorithm is a collection of tasks and a partial ordering between them.
Design goals: We will return to this later on
  • Match tasks to the available processors (exploit parallelism).
  • Minimize ordering (avoid unnecessary synchronization points).
  • Recognize ways parallelism can be helped by changing ordering
Sources of parallelism: We will discuss this now
  • Data parallelism: updating array elements simultaneously.
  • Functional parallelism: conceptually different tasks which combine to solve the problem. This happens at fine and coarse grain size
    • fine is "internal" such as I/O and computation; coarse is "external" such as separate modules linked together

HTML version of Basic Foils prepared 24 August 98

Foil 31 Data Parallelism in Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Data-parallel algorithms exploit the parallelism inherent in many large data structures.
  • A problem is an (identical) algorithm applied to multiple points in data "array"
  • Usually iterate over such "updates"
  • Example: Red-Black Relaxation
    • All "red" points can be updated in parallel; then all the "black"
Analysis:
  • Scalable parallelism -- can usually get million or more way parallelism
  • Hard to express when "geometry" irregular or dynamic

HTML version of Basic Foils prepared 24 August 98

Foil 32 Some Illustrative Examples of Parallel Applications!

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Now we go through a set of semi-realistic examples of parallel computing and show they use various forms of data parallelism
Seismic Wave Simulation: Regular mesh of grid points
Cosmology (Universe) Simulation: Irregular collection of stars or galaxies
Computer Chess: "Data" is now parts of a game tree with major complication of pruning algorithms
  • "Deep Blue" uses this global parallelism combined with optimal locally parallel evaluation of "score" of a position using special hardware
Defending the Nation: Tracking multiple missiles achieves parallelism from set of missiles
Finite Element (NASTRAN) structures problem: Irregular collection of nodal (grid) points clustered near a crack

HTML version of Basic Foils prepared 24 August 98

Foil 33 Functional Parallelism in Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Functional parallelism exploits the parallelism between the parts of many systems.
  • Many pieces to work on ? many independent operations
  • Example: Coarse grain Aeroelasticity (aircraft design)
    • CFD(fluids) and CSM(structures) and others (acoustics, electromagnetics etc.) can be evaluated in parallel
Analysis:
  • Parallelism limited in size -- tens not millions
  • Synchronization probably good as parallelism natural from problem and usual way of writing software
  • Web exploits functional parallelism NOT data parallelism

HTML version of Basic Foils prepared 24 August 98

Foil 34 Structure(Architecture) of Applications - I

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Applications are metaproblems with a mix of module (aka coarse grain functional) and data parallelism
Modules are decomposed into parts (data parallelism) and composed hierarchically into full applications.They can be the
  • "10,000" separate programs (e.g. structures,CFD ..) used in design of aircraft
  • the various filters used in Khoros based image processing system
  • the ocean-atmosphere components in integrated climate simulation
  • The data-base or file system access of a data-intensive application
  • the objects in a distributed Forces Modeling Event Driven Simulation

HTML version of Basic Foils prepared 24 August 98

Foil 35 Structure(Architecture) of Applications - II

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Modules are "natural" message-parallel components of problem and tend to have less stringent latency and bandwidth requirements than those needed to link data-parallel components
  • modules are what HPF needs task parallelism for
  • Often modules are naturally distributed whereas parts of data parallel decomposition may need to be kept on tightly coupled MPP
Assume that primary goal of metacomputing system is to add to existing parallel computing environments, a higher level supporting module parallelism
  • Now if one takes a large CFD problem and divides into a few components, those "coarse grain data-parallel components" will be supported by computational grid technology
Use Java/Distributed Object Technology for modules -- note Java to growing extent used to write servers for CORBA and COM object systems

HTML version of Basic Foils prepared 24 August 98

Foil 36 Multi Server Model for metaproblems

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
We have multiple supercomputers in the backend -- one doing CFD simulation of airflow; another structural analysis while in more detail you have linear algebra servers (Netsolve); Optimization servers (NEOS); image processing filters(Khoros);databases (NCSA Biology workbench); visualization systems(AVS, CAVEs)
  • One runs 10,000 separate programs to design a modern aircraft which must be scheduled and linked .....
All linked to collaborative information systems in a sea of middle tier servers(as on previous page) to support design, crisis management, multi-disciplinary research

HTML version of Basic Foils prepared 24 August 98

Foil 37 Multi-Server Gateway Tier

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Database
Matrix Solver
Optimization Service
MPP
MPP
Parallel DB Proxy
NEOS Control Optimization
Origin 2000 Proxy
NetSolve Linear Alg. Server
IBM SP2 Proxy
Gateway Control
Agent-based Choice of Compute Engine
Multidisciplinary Control (WebFlow)
Data Analysis Server

HTML version of Basic Foils prepared 24 August 98

Foil 38 Pleasingly Parallel Algorithms

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Many applications are what is called (essentially) embarrassingly or more kindly pleasingly parallel
These are made up of independent concurrent components
  • Each client independently accesses a Web Server
  • Each roll of a Monte Carlo dice (random number) is an independent sample
  • Each stock can be priced separately in a financial portfolio
  • Each transaction in a database is almost independent (a given account is locked but usually different accounts are accessed at same time)
  • Different parts of Seismic data can be processed independently
In contrast points in a finite difference grid (from a differential equation) canNOT be updated independently
Such problems are often formally data-parallel but can be handled much more easily -- like functional parallelism

HTML version of Basic Foils prepared 24 August 98

Foil 39 Parallel Languages

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
A parallel language provides an executable notation for implementing a parallel algorithm.
Design criteria:
  • How are parallel operations defined?
    • static tasks vs. dynamic tasks vs. implicit operations
  • How is data shared between tasks?
    • explicit communication/synchronization vs. shared memory
  • How is the language implemented?
    • low-overhead runtime systems vs. optimizing compilers
Usually a language reflects a particular style of expressing parallelism.
Data parallel expresses concept of identical algorithm on different parts of array
Message parallel expresses fact that at low level parallelism implis information is passed between different concurrently executing program parts

HTML version of Basic Foils prepared 24 August 98

Foil 40 Data-Parallel Languages

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Data-parallel languages provide an abstract, machine-independent model of parallelism.
  • Fine-grain parallel operations, such as element-wise operations on arrays
  • Shared data in large, global arrays with mapping "hints"
  • Implicit synchronization between operations
  • Partially explicit communication from operation definitions
Advantages:
  • Global operations conceptually simple
  • Easy to program (particularly for certain scientific applications)
Disadvantages:
  • Unproven compilers
  • As express "problem" can be inflexible if new algorithm which language didn't express well
Examples: HPF, C*, HPC++
Originated on SIMD machines where parallel operations are in lock-step but generalized (not so successfully as compilers too hard) to MIMD

HTML version of Basic Foils prepared 24 August 98

Foil 41 Message-Passing Systems

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Program is based on typically coarse-grain tasks
Separate address space and a processor number for each task
Data shared by explicit messages
  • Point-to-point and collective communications patterns
Examples: MPI, PVM, Occam for parallel computing
Universal model for distributed computing to link naturally decomposed parts e.g. HTTP, RMI, IIOP etc. are all message passing
  • distributed object technology (COM, CORBA) built on functionally concurrent objects sending and receiving messages
Advantages:
  • Close to hardware ALWAYS
  • Can be close to problem as in distributed objects or functional parallelism
Disadvantages:
  • Many low-level details when NOT close to problem.

HTML version of Basic Foils prepared 24 August 98

Foil 42 A Simple Parallel Programming Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
A parallel computation is a set of tasks
Each task has local data, can be connected to other tasks by channels
A task can
  • Compute using local data
  • Send to/receive from other tasks
  • Create new tasks, or terminate itself
A receiving task blocks until data available
Note ALL coordination is message passing

HTML version of Basic Foils prepared 24 August 98

Foil 43 Properties of Programming Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Concurrency is enhanced by creating multiple tasks
Scalability: More tasks than processor nodes
Locality: Access local data when possible -- minimize message traffic
A task (with local data and subtasks) is a unit for modular design -- it is an object!
Mapping to nodes affects performance only

HTML version of Basic Foils prepared 24 August 98

Foil 44 Some Steps in Parallel Programming

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Partition
  • Define tasks
Communication
  • Identify requirements
Agglomeration
  • Enhance locality
Mapping
  • Place tasks

HTML version of Basic Foils prepared 24 August 98

Foil 45 Partitioning

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Goal: identify opportunities for concurrent execution (define tasks: computation+data)
Focus on data operated on by algorithm ...
  • Then distribute computation appropriately
  • Domain decomposition or Data Parallelism
... or on the operations performed
  • Then distribute data appropriately
  • Functional decomposition

HTML version of Basic Foils prepared 24 August 98

Foil 46 Communication

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Identify communication requirements
  • If computation in one task requires data located in another, communication is needed
Example: finite difference computation
Must communicate with each neighbor
Xi = (Xi-1 + 2*Xi + Xi+1)/4
Partition creates
one task per point
I=2 above

HTML version of Basic Foils prepared 24 August 98

Foil 47 Agglomeration

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Once tasks + communication determined, "agglomerate" small tasks into larger tasks
Motivations
  • To reduce communication costs
  • If tasks cannot execute concurrently
  • To reduce software engineering costs
Caveats
  • May involve replicating computation or data

HTML version of Basic Foils prepared 24 August 98

Foil 48 Mapping

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Place tasks on processors, to
  • Maximize concurrency } note potential
  • Minimize communication } conflict
Techniques:
  • Regular problems: agglomerate to P tasks
  • Irregular problems: use static load balancing
  • If irregular in time: dynamic load balancing

HTML version of Basic Foils prepared 24 August 98

Foil 49 Example: Atmosphere Model

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Simulate atmospheric processes
  • Conservation of momentum, mass, energy
  • Ideal gas law, hydrostatic approximation
Represent atmosphere state by 3-D grid
  • Periodic in two horizontal dimensions
  • Nx.Ny.Nz: e.g., Ny=50-500, Nx=2Ny, Nz=15-30
Computation includes
  • Atmospheric dynamics: finite difference
  • "Physics" (radiation etc.) in vertical only

HTML version of Basic Foils prepared 24 August 98

Foil 50 Atmosphere Model: Numerical Methods

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Discretize the (continuous) domain by a regular Nx??Ny ??Nz grid
  • Store p, u, v, T, ? at every grid point
Approximate derivatives by finite differences
  • Leads to stencils in vertical and horizontal

HTML version of Basic Foils prepared 24 August 98

Foil 51 Atmosphere Model: Partition

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Use domain decomposition
  • Because model operates on large, regular grid
  • Can decompose in 1, 2, or 3 dimensions
  • 3-D decomposition offers greatest flexibility

HTML version of Basic Foils prepared 24 August 98

Foil 52 Atmosphere Model: Communication

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Finite difference stencil horizontally
  • Local, regular, structured
Radiation calculations vertically
  • Global, regular, structured
Diagnostic sums
  • Global, regular, structured

HTML version of Basic Foils prepared 24 August 98

Foil 53 Atmosphere Model: Agglomeration

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
In horizontal
  • Clump so that 4 points per task
  • Efficiency: communicate with 4 neighbors only
In vertical, clump all points in column
  • Performance: avoid communication
  • Modularity: Reuse physics modules
Resulting algorithm "reasonably scalable"
  • (Nx.Ny)/4 : at least 1250 tasks

HTML version of Basic Foils prepared 24 August 98

Foil 54 Atmosphere Model: Mapping

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Technique depends on load distribution
1) Agglomerate to one task per processor
  • Appropriate if little load imbalance
2) Extend (1) to incorporate cyclic mapping
  • Works well for diurnal cycle imbalance
3) Use dynamic, local load balancing
  • Works well for unpredictable, local imbalances

HTML version of Basic Foils prepared 24 August 98

Foil 55 What is Parallel Architecture?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
A parallel computer is any old collection of processing elements that cooperate to solve large problems fast
  • from a pile of PC's to a shared memory multiprocessor
Some broad issues:
  • Resource Allocation:
    • how large a collection?
    • how powerful are the elements?
    • how much memory?
  • Data access, Communication and Synchronization
    • how do the elements cooperate and communicate?
    • how are data transmitted between processors?
    • what are the abstractions and primitives for cooperation?
  • Performance and Scalability
    • how does it all translate into performance?
    • how does it scale?

HTML version of Basic Foils prepared 24 August 98

Foil 56 Why Study Parallel Architecture as a computer scientist?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Role of a computer architect:
  • To design and engineer the various levels of a computer system to maximize performance and programmability within limits of technology and cost.
Parallelism:
  • Provides alternative to faster clock for performance
  • Applies at all levels of system design
  • Is a fascinating perspective from which to view architecture
  • Is increasingly central in information processing

HTML version of Basic Foils prepared 24 August 98

Foil 57 Why Study Architecture Today?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
History: diverse and innovative organizational structures, often tied to novel programming models
Rapidly maturing under strong technological constraints
  • The "killer microprocessor" is ubiquitous
  • Laptops and supercomputers are fundamentally similar!
  • Technological trends cause diverse approaches to converge
Technological trends make parallel computing inevitable
  • In the mainstream
Need to understand fundamental principles and design tradeoffs, not just taxonomies
  • According to Culler fundamental are: Naming, Ordering, Replication, Communication performance
  • According to Fox: match computer hardware software and problem

HTML version of Basic Foils prepared 24 August 98

Foil 58 Inevitability of Parallel Computing

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Application demands: Our insatiable need for computing cycles
  • Scientific computing: CFD, Biology, Chemistry, Physics, ...
  • General-purpose computing: Video, Graphics, CAD, Databases, TP...
Technology Trends
  • Number of transistors on chip growing rapidly
  • Clock rates expected to go up only slowly
Architecture Trends
  • Instruction-level parallelism(ILP) valuable but limited
  • Coarser-level parallelism, as in Multiprocessors, the most viable approach
Economics
Current trends:
  • Today's microprocessors have multiprocessor(MP) support
  • Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!...
  • Tomorrow's microprocessors are multiprocessors

HTML version of Basic Foils prepared 24 August 98

Foil 59 Application Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Demand for cycles fuels advances in hardware, and vice-versa
  • Cycle drives exponential increase in microprocessor performance (this is largely from low-end)
  • Drives parallel architecture harder: most demanding applications (this is high end driver)
Range of performance demands
  • Need range of system performance with progressively increasing cost
  • Platform pyramid
Goal of applications in using parallel machines: Speedup
  • Speedup (p processors) =
For a fixed problem size (input data set), performance = 1/time
  • Speedup fixed problem (p processors) =

HTML version of Basic Foils prepared 24 August 98

Foil 60 TPC-C (database transaction processing)

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Results from March 96
Parallelism is pervasive
Small to moderate scale parallelism very important

HTML version of Basic Foils prepared 24 August 98

Foil 61 Summary of Application Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Transition to parallel computing has occurred for scientific and engineering computing but this is 1-2% of computer market
In rapid progress in commercial computing
  • Database and transactions as well as financial modeling/oil reservoir simulation
  • Web servers including multi-media and search growing importance
  • Typically functional or pleasingly parallel
  • Usually smaller-scale, but large-scale systems also used
Desktop also uses multithreaded programs, which are a lot like parallel programs (again functional not data parallel)
Demand for improving throughput on sequential workloads
  • Greatest use of small-scale multiprocessors
Solid application demand exists and will increase

HTML version of Basic Foils prepared 24 August 98

Foil 62 Technology Trends -- CPU's

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
The natural building block for multiprocessors is now also about the fastest!

HTML version of Basic Foils prepared 24 August 98

Foil 63 General Technology Trends

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Microprocessor performance increases 50% - 100% per year
Transistor count doubles every 3 years
DRAM size quadruples every 3 years
Huge investment per generation is carried by huge commodity market
Note that single-processor performance is plateauing, but that parallelism is a natural way to improve it.
Sequential or small multiprocessors -- today 2 processor PC's

HTML version of Basic Foils prepared 24 August 98

Foil 64 Technology: A Closer Look

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Basic advance is decreasing feature size ( ??)
  • Circuits become either faster or lower in power
Die size is growing too
  • Clock rate improves roughly proportional to improvement in ?
  • Number of transistors improves like ????(or faster)
Performance > 100x per decade; clock rate 10x, rest of increase is due to transistor count
How to use more transistors?
  • Parallelism in processing
    • multiple operations per cycle reduces CPI
  • Locality in data access
    • avoids latency and reduces CPI
    • also improves processor utilization
  • Both need resources, so tradeoff
Fundamental issue is resource distribution, as in uniprocessors
CPI = Clock Cycles per Instruction

HTML version of Basic Foils prepared 24 August 98

Foil 65 Clock Frequency Growth Rate

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
30% per year

HTML version of Basic Foils prepared 24 August 98

Foil 66 Transistor Count Growth Rate

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
100 million transistors on chip by early 2000's A.D.
Transistor count grows much faster than clock rate
- 40% per year, order of magnitude more contribution in 2 decades

HTML version of Basic Foils prepared 24 August 98

Foil 67 Similar Story for Storage

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Divergence between memory capacity and speed more pronounced
  • Capacity increased by 1000x from 1980-95, speed only 2x
  • Gigabit DRAM by c. 2000, but gap with processor speed much greater
Larger memories are slower, while processors get faster
  • Need to transfer more data in parallel
  • Need deeper cache hierarchies
  • How to organize caches?
Parallelism increases effective size of each level of hierarchy, without increasing access time
Parallelism and locality within memory systems too
  • New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface
  • Buffer caches most recently accessed data
Disks too: Parallel disks plus caching

HTML version of Basic Foils prepared 24 August 98

Foil 68 The HPCC Dilemma and its Solution

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
HPCC has developed good research ideas but cannot implement them as solving computing's hardest problem with 1 percent of the funding
  • HPCC applications are very complex and use essentially all computer capabilities and also have synchronization and performance constraints from HPCC
We have learnt to use commodity hardware either
  • partially as in Origin 2000/SP2 with consumer CPU's but custom network or
  • fully as in PC cluster with fast ethernet/ATM
Let us do the same with software and design systems with maximum possible commodity software basis

HTML version of Basic Foils prepared 24 August 98

Foil 69 What is Commodity Software

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
The world is building a wonderful distributed computing (information processing) environment using Web (dissemination) and distributed object (CORBA COM) technologies
This includes Java, Web-linked databases and the essential standards such as HTML(documents), VRML(3D objects), JDBC (Java database connectivity).
  • The standard interfaces are essential in that they allow modular (component based) software
We will "just" add high performance to this commodity distributed infrastructure
  • Respecting architecture of the object web, should allow us to naturally use improved software as it produced
The alternative strategy starts with HPCC technologies (such as MPI,HPF) and adds links to commodity world. This approach does not easily track evolution of commodity systems and so has large maintenance costs

HTML version of Basic Foils prepared 24 August 98

Foil 70 The Computing Pyramid

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Bottom of Pyramid has 100 times dollar value and 1000 times compute power of best supercomputer

HTML version of Basic Foils prepared 24 August 98

Foil 71 Implications of the Computing Pyramid

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Web Software MUST be cheaper and better than MPP software as factor of 100 more money invested!
Therefore natural strategy is to get parallel computing environment by adding synchronization of parallel algorithms to loosely coupled Web distributed computing model

HTML version of Basic Foils prepared 24 August 98

Foil 72 The 3 Roles of Java

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index

HTML version of Basic Foils prepared 24 August 98

Foil 73 Why is Java Worth Looking at?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
The Java Language has several good design features
  • secure, safe (wrt bugs), object-oriented, familiar (to C C++ and even Fortran programmers)
Java has a very good set of libraries covering everything from commerce, multimedia, images to math functions (under development at http://math.nist.gov/javanumerics)
Java has best available electronic and paper training and support resources
Java is rapidly getting best integrated program development environments
Java naturally integrated with network and universal machine supports powerful "write once-run anywhere" model
There is a large and growing trained labor force
Can we exploit this in computational science?

HTML version of Basic Foils prepared 24 August 98

Foil 74 What is Java Grande?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Use of Java for:
High Performance Network Computing
Scientific and Engineering Computation
(Distributed) Modeling and Simulation
Parallel and Distributed Computing
Data Intensive Computing
Communication and Computing Intensive Commercial and Academic Applications
HPCC Computational Grids ........
Very difficult to find a "conventional name" that doesn't get misunderstood by some community!

HTML version of Basic Foils prepared 24 August 98

Foil 75 Java and Parallelism?

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
The Web integration of Java gives it excellent "network" classes and support for message passing.
Thus "Java plus message passing" form of parallel computing is actually somewhat easier than in Fortran or C.
Coarse grain parallelism very natural in Java
"Data Parallel" languages features are NOT in Java and have to be added (as a translator) of NPAC's HPJava to Java+Messaging just as HPF translates to Fortran plus message passing
Java has built in "threads" and a given Java Program can run multiple threads at a time
  • In Web use, allows one to process Image in one thread, HTML page in another etc.
Can be used to do more general parallel computing but only on shared memory computers
  • JavaVM (standard Java Runtime) does not support distributed memory systems

HTML version of Basic Foils prepared 24 August 98

Foil 76 "Pure" Java Model For Parallelism

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
Combine threads on a shared memory machine with message passing between distinct distributed memories
"Distributed" or "Virtual" Shared memory does support the JavaVM as hardware gives illusion of shared memory to JavaVM
Message Passing
Message Passing

HTML version of Basic Foils prepared 24 August 98

Foil 77 Pragmatic Computational Science August 1998

From Master Foilset for CPS615 Introduction -- Material from Culler and Koelbel Computational Science for Simulations -- Fall Semester 1998. *
Full HTML Index
So here is a recipe for developing HPCC (parallel) applications as of August 1998
Use MPI for data parallel programs as alternatives are HPF, HPC++ or parallelizing compilers today
  • Neither HPF or HPC++ has clear long term future for implementations -- ideas are sound
  • MPI will run on PC clusters as well as customized parallel machines -- parallelizing compilers will not work on distributed memory machines
Today Fortran and C are good production languages
Today Java can be used for client side applets and in systems middleware but too slow for production scientific code
  • This should change over next year with better Java compilers -- including "native" compilers which do not translate to Java Virtual Machine but go directly to native machine language
Use metacomputers for pleasingly parallel and metaproblems -- not for closely knit problems with tight synchronization between parts
Use where possible web and distributed object technology for "coordination"
pleasingly parallel problems can use MPI or web/metacomputing technology

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999