Full HTML for Basic Master Set A of Overview Material on Parallel Computing for CPS615

Full HTML for

Basic foilset Master Set A of Overview Material on Parallel Computing for CPS615

Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 29 August 1995
Outside Index Summary of Material Secs 30

Technology Driving Forces for HPCC

Overview of What and Why is Computational Science

Needs to be expanded with further remarks on Information track and degree/certificate requirements

Elementary Discussion of Parallel Computing in the "real-world"

Hadrian Wall example

Sequential Computer Architecture

Table of Contents for full HTML of Master Set A of Overview Material on Parallel Computing for CPS615

Denote Foils where Image Critical

Denote Foils where Image has important information

Denote Foils where HTML is sufficient

Indicates Available audio which is greyed out if missing

1

CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1995
Foilsets A
2

Contents of Foilsets A of CPS615 Computational Science
3

The Technology
Driving Forces for HPCC
4

Effect of Feature Size on Performance
5

Growing Logic Chip Density
6

Trends in Feature and Die Size as a Function of Time
7

Supercomputer Memory Sizes and trends in RAM Density
8

Comparison of Trends in RAM Density and CPU Performance Increases
9

National Roadmap for Semiconductor Technology --1992
10

CMOS Technology and Parallel Processor Chip Projections
11

What and Why is Computational Science ?
12

Parallelism Implies Major Changes which have significant educational Implications
13

Program in Computational Science
Implemented within current academic framework
14

Program in Information Age Computational Science Implemented Within Current Academic Program
15

Elementary Discussion of
Parallel Computing
16

Single nCUBE2 CPU Chip
17

64 Node nCUBE Board
18

CM-5 in NPAC Machine Room
19

Basic METHODOLOGY of Parallel Computing
20

Concurrent Computation as a Mapping Problem -I
21

Concurrent Computation as a Mapping Problem - II
22

Concurrent Computation as a Mapping Problem - III
23

Finite Element Mesh From Nastran
(mesh only shown in upper half)
24

A Simple Equal Area Decomposition
25

Decomposition After Annealing
(one particularly good but nonoptimal decomposition)
26

Parallel Processing and Society
27

Concurrent Construction of a Wall
Using N = 8 Bricklayers
Decomposition by Vertical Sections
28

Quantitative Speed-Up Analysis for Construction of Hadrian's Wall
29

Amdahl's law for Real World Parallel Processing
30

Pipelining --Another Parallel Processing Strategy for Hadrian's Wall
31

Hadrian's Wall Illustrates that the Topology of Processor Must Include Topology of Problem
32

General Speed Up Analysis
33

Comparison of The Complete Problem to the subproblems formed in domain decomposition
34

Hadrian's Wall Illustrating an
Irregular but Homogeneous Problem
35

Some Problems are Inhomogeneous Illustrated by:
An Inhomogeneous Hadrian Wall with Decoration
36

Global and Local Parallelism Illustrated by Hadrian's Wall
37

Parallel I/O Illustrated by
Concurrent Brick Delivery for Hadrian's Wall
Bandwidth of Trucks and Roads
Matches that of Masons
38

Nature's Concurrent Computers
39

Comparison of Concurrent Processing in Society and Computing
40

Sequential Computer Architecture
41

Sequential Computer Architecture
42

Instruction Flow in A Simple Machine Pipeline
43

Examples of Superpipelined (a) and superscaler (b) machine pipelines

Outside Index Summary of Material

HTML version of Basic Foils prepared 29 August 1995

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1995
Foilsets A

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 15

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 29 August 1995

Foil 2 Contents of Foilsets A of CPS615 Computational Science

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 30

Technology Driving Forces for HPCC

Overview of What and Why is Computational Science

Needs to be expanded with further remarks on Information track and degree/certificate requirements

Elementary Discussion of Parallel Computing in the "real-world"

Hadrian Wall example

Sequential Computer Architecture

HTML version of Basic Foils prepared 29 August 1995

Foil 3 The Technology
Driving Forces for HPCC

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 14

HTML version of Basic Foils prepared 29 August 1995

Foil 4 Effect of Feature Size on Performance

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 63

HTML version of Basic Foils prepared 29 August 1995

Foil 5 Growing Logic Chip Density

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 30

HTML version of Basic Foils prepared 29 August 1995

Foil 6 Trends in Feature and Die Size as a Function of Time

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 25

HTML version of Basic Foils prepared 29 August 1995

Foil 7 Supercomputer Memory Sizes and trends in RAM Density

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 57

RAM density increases by about a factor of 50 in 8 years

Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109)

Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012)

HTML version of Basic Foils prepared 29 August 1995

Foil 8 Comparison of Trends in RAM Density and CPU Performance Increases

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 27

Computer Performance is increasing faster than RAM density

HTML version of Basic Foils prepared 29 August 1995

Foil 9 National Roadmap for Semiconductor Technology --1992

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 56

See Chapter 5 of Petaflops Report -- July 95

HTML version of Basic Foils prepared 29 August 1995

Foil 10 CMOS Technology and Parallel Processor Chip Projections

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 47

See Chapter 5 of Petaflops Report -- July 95

HTML version of Basic Foils prepared 29 August 1995

Foil 11 What and Why is Computational Science ?

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 7

HTML version of Basic Foils prepared 29 August 1995

Foil 12 Parallelism Implies Major Changes which have significant educational Implications

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 47

Different machines

New types of computers

New libraries

Rewritten Applications

Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science

Will be a nucleus for the phase transition

and accelerate use of parallel computers in the real world

HTML version of Basic Foils prepared 29 August 1995

Foil 13 Program in Computational Science
Implemented within current academic framework

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 21

HTML version of Basic Foils prepared 29 August 1995

Foil 14 Program in Information Age Computational Science Implemented Within Current Academic Program

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 31

HTML version of Basic Foils prepared 29 August 1995

Foil 15 Elementary Discussion of
Parallel Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 30

HTML version of Basic Foils prepared 29 August 1995

Foil 16 Single nCUBE2 CPU Chip

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 18

HTML version of Basic Foils prepared 29 August 1995

Foil 17 64 Node nCUBE Board

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 34

Each node is CPU and 6 memory chips -- CPU Chip integrates communication channels with floating, integer and logical CPU functions

HTML version of Basic Foils prepared 29 August 1995

Foil 18 CM-5 in NPAC Machine Room

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Secs 47

32 node CM-5 and in foreground old CM-2 diskvault

HTML version of Basic Foils prepared 29 August 1995

Foil 19 Basic METHODOLOGY of Parallel Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Simple, but general and extensible to many more nodes is domain decomposition

All successful concurrent machines with

Many nodes
High performance (this excludes Dataflow)

Have obtained parallelism from "Data Parallelism" or "Domain Decomposition"

Problem is an algorithm applied to data set

and obtains concurrency by acting on data concurrently.

The three architectures considered here differ as follows:

MIMD Distributed Memory

Processing and Data Distributed

MIMD Shared Memory

Processing Distributed but data shared

SIMD Distributed Memory

Synchronous Processing on Distributed Data

HTML version of Basic Foils prepared 29 August 1995

Foil 20 Concurrent Computation as a Mapping Problem -I

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

2 Different types of Mappings in Physical Spaces

Both are static

a) Seismic Migration with domain decomposition on 4 nodes
b)Universe simulation with irregular data but static 16 node decomposition
but this problem would be best with dynamic irregular decomposition

HTML version of Basic Foils prepared 29 August 1995

Foil 21 Concurrent Computation as a Mapping Problem - II

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Different types of Mappings -- A very dynamic case without any underlying Physical Space

c)Computer Chess with dynamic game tree decomposed onto 4 nodes

HTML version of Basic Foils prepared 29 August 1995

Foil 22 Concurrent Computation as a Mapping Problem - III

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 23 Finite Element Mesh From Nastran
(mesh only shown in upper half)

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 24 A Simple Equal Area Decomposition

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

And the corresponding poor workload balance

HTML version of Basic Foils prepared 29 August 1995

Foil 25 Decomposition After Annealing
(one particularly good but nonoptimal decomposition)

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

And excellent workload balance

HTML version of Basic Foils prepared 29 August 1995

Foil 26 Parallel Processing and Society

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

The fundamental principles behind the use of concurrent computers are identical to those used in society - in fact they are partly why society exists.

If a problem is too large for one person, one does not hire a SUPERman, but rather puts together a team of ordinary people...

cf. Construction of Hadrians Wall

HTML version of Basic Foils prepared 29 August 1995

Foil 27 Concurrent Construction of a Wall
Using N = 8 Bricklayers
Decomposition by Vertical Sections

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Domain Decomposition is Key to Parallelism

Need "Large" Subdomains l >> l overlap

HTML version of Basic Foils prepared 29 August 1995

Foil 28 Quantitative Speed-Up Analysis for Construction of Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 29 Amdahl's law for Real World Parallel Processing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

AMDAHL"s LAW or

Too many cooks spoil the broth

Says that

Speedup S is small if efficiency e small

or for Hadrian's wall

equivalently S is small if length l small

But this is irrelevant as we do not need parallel processing unless problem big!

HTML version of Basic Foils prepared 29 August 1995

Foil 30 Pipelining --Another Parallel Processing Strategy for Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

"Pipelining" or decomposition by horizontal section is:

In general less effective
and leads to less parallelism
(N = Number of bricklayers must be < number of layers of bricks)

HTML version of Basic Foils prepared 29 August 1995

Foil 31 Hadrian's Wall Illustrates that the Topology of Processor Must Include Topology of Problem

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Hadrian's Wall is one dimensional

Humans represent a flexible processor node that can be arranged in different ways for different problems

The lesson for computing is:

Original MIMD machines used a hypercube topology. The hypercube includes several topologies including all meshes. It is a flexible concurrent computer that can tackle a broad range of problems. Current machines use different interconnect structure from hypercube but preserve this capability.

HTML version of Basic Foils prepared 29 August 1995

Foil 32 General Speed Up Analysis

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Comparing Computer and Hadrian's Wall Cases

HTML version of Basic Foils prepared 29 August 1995

Foil 33 Comparison of The Complete Problem to the subproblems formed in domain decomposition

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

The case of Programming a Hypercube

Each node runs software that is similar to sequential code

e.g., FORTRAN with geometry and boundary value sections changed

HTML version of Basic Foils prepared 29 August 1995

Foil 34 Hadrian's Wall Illustrating an
Irregular but Homogeneous Problem

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Geometry irregular but each brick takes about the same amount of time to lay.

Decomposition of wall for an irregular geometry involves equalizing number of bricks per mason, not length of wall per mason.

HTML version of Basic Foils prepared 29 August 1995

Foil 35 Some Problems are Inhomogeneous Illustrated by:
An Inhomogeneous Hadrian Wall with Decoration

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Fundamental entities (bricks, gargoyles) are of different complexity

Best decomposition dynamic

Inhomogeneous problems run on concurrent computers but require dynamic assignment of work to nodes and strategies to optimize this

(we use neural networks, simulated annealing, spectral bisection etc.)

HTML version of Basic Foils prepared 29 August 1995

Foil 36 Global and Local Parallelism Illustrated by Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Global Parallelism

Break up domain
Amount of Parallelism proportional to size of problem (and is usually large)
Unit is Bricklayer or Computer node

Local Parallelism

Do in parallel local operations in the processing of basic entities
- e.g. for Hadrian's problem, use two hands, one for brick and one for mortar while ...
- for computer case, do addition at same time as multiplication
Local Parallelism is limited but useful

Local and Global Parallelism

Should both be Exploited

HTML version of Basic Foils prepared 29 August 1995

Foil 37 Parallel I/O Illustrated by
Concurrent Brick Delivery for Hadrian's Wall
Bandwidth of Trucks and Roads
Matches that of Masons

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Disk (input/output) Technology is better matched to several modest power processors than to a single sequential supercomputer

Concurrent Computers natural in databases, transaction analysis

HTML version of Basic Foils prepared 29 August 1995

Foil 38 Nature's Concurrent Computers

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

At the finest resolution, collection of neurons sending and receiving messages by axons and dendrites

At a coarser resolution

Society is a collection of brains sending and receiving messages by sight and sound

Ant Hill is a collection of ants (smaller brains) sending and receiving messages by chemical signals

Lesson: All Nature's Computers Use Message Passing

With several different Architectures

HTML version of Basic Foils prepared 29 August 1995

Foil 39 Comparison of Concurrent Processing in Society and Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Problems are large - use domain decomposition Overheads are edge effects

Topology of processor matches that of domain - processor with rich flexible node/topology matches most domains

Regular homogeneous problems easiest but

irregular or

Inhomogeneous

Can use local and global parallelism

Can handle concurrent calculation and I/O

Nature always uses message passing as in parallel computers (at lowest level)

HTML version of Basic Foils prepared 29 August 1995

Foil 40 Sequential Computer Architecture

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 41 Sequential Computer Architecture

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 42 Instruction Flow in A Simple Machine Pipeline

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Three Instructions are shown overlapped -- each starting one clock cycle after last

HTML version of Basic Foils prepared 29 August 1995

Foil 43 Examples of Superpipelined (a) and superscaler (b) machine pipelines

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Basic foilset Master Set A of Overview Material on Parallel Computing for CPS615

Table of Contents for full HTML of Master Set A of Overview Material on Parallel Computing for CPS615

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1995Foilsets A

Foil 2 Contents of Foilsets A of CPS615 Computational Science

Foil 3 The Technology Driving Forces for HPCC

Foil 4 Effect of Feature Size on Performance

Foil 5 Growing Logic Chip Density

Foil 6 Trends in Feature and Die Size as a Function of Time

Foil 7 Supercomputer Memory Sizes and trends in RAM Density

Foil 8 Comparison of Trends in RAM Density and CPU Performance Increases

Foil 9 National Roadmap for Semiconductor Technology --1992

Foil 10 CMOS Technology and Parallel Processor Chip Projections

Foil 11 What and Why is Computational Science ?

Foil 12 Parallelism Implies Major Changes which have significant educational Implications

Foil 13 Program in Computational ScienceImplemented within current academic framework

Foil 14 Program in Information Age Computational Science Implemented Within Current Academic Program

Foil 15 Elementary Discussion ofParallel Computing

Foil 16 Single nCUBE2 CPU Chip

Foil 17 64 Node nCUBE Board

Foil 18 CM-5 in NPAC Machine Room

Foil 19 Basic METHODOLOGY of Parallel Computing

Foil 20 Concurrent Computation as a Mapping Problem -I

Foil 21 Concurrent Computation as a Mapping Problem - II

Foil 22 Concurrent Computation as a Mapping Problem - III

Foil 23 Finite Element Mesh From Nastran(mesh only shown in upper half)

Foil 24 A Simple Equal Area Decomposition

Foil 25 Decomposition After Annealing(one particularly good but nonoptimal decomposition)

Foil 26 Parallel Processing and Society

Foil 27 Concurrent Construction of a WallUsing N = 8 BricklayersDecomposition by Vertical Sections

Foil 28 Quantitative Speed-Up Analysis for Construction of Hadrian's Wall

Foil 29 Amdahl's law for Real World Parallel Processing

Foil 30 Pipelining --Another Parallel Processing Strategy for Hadrian's Wall

Foil 31 Hadrian's Wall Illustrates that the Topology of Processor Must Include Topology of Problem

Foil 32 General Speed Up Analysis

Foil 33 Comparison of The Complete Problem to the subproblems formed in domain decomposition

Foil 34 Hadrian's Wall Illustrating anIrregular but Homogeneous Problem

Foil 35 Some Problems are Inhomogeneous Illustrated by:An Inhomogeneous Hadrian Wall with Decoration

Foil 36 Global and Local Parallelism Illustrated by Hadrian's Wall

Foil 37 Parallel I/O Illustrated byConcurrent Brick Delivery for Hadrian's WallBandwidth of Trucks and RoadsMatches that of Masons

Foil 38 Nature's Concurrent Computers

Foil 39 Comparison of Concurrent Processing in Society and Computing

Foil 40 Sequential Computer Architecture

Foil 41 Sequential Computer Architecture

Foil 42 Instruction Flow in A Simple Machine Pipeline

Foil 43 Examples of Superpipelined (a) and superscaler (b) machine pipelines

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1995
Foilsets A

Foil 3 The Technology
Driving Forces for HPCC

Foil 13 Program in Computational Science
Implemented within current academic framework

Foil 15 Elementary Discussion of
Parallel Computing

Foil 23 Finite Element Mesh From Nastran
(mesh only shown in upper half)

Foil 25 Decomposition After Annealing
(one particularly good but nonoptimal decomposition)

Foil 27 Concurrent Construction of a Wall
Using N = 8 Bricklayers
Decomposition by Vertical Sections

Foil 34 Hadrian's Wall Illustrating an
Irregular but Homogeneous Problem

Foil 35 Some Problems are Inhomogeneous Illustrated by:
An Inhomogeneous Hadrian Wall with Decoration

Foil 37 Parallel I/O Illustrated by
Concurrent Brick Delivery for Hadrian's Wall
Bandwidth of Trucks and Roads
Matches that of Masons