Full HTML for

Basic foilset Master Set A of Overview Material on Parallel Computing for CPS615

Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 29 August 1995
Outside Index Summary of Material Secs 30


Technology Driving Forces for HPCC
Overview of What and Why is Computational Science
  • Needs to be expanded with further remarks on Information track and degree/certificate requirements
Elementary Discussion of Parallel Computing in the "real-world"
  • Hadrian Wall example
Sequential Computer Architecture

Table of Contents for full HTML of Master Set A of Overview Material on Parallel Computing for CPS615

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient
Indicates Available audio which is greyed out if missing
1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1995
Foilsets A

2 Contents of Foilsets A of CPS615 Computational Science
3 The Technology
Driving Forces for HPCC

4 Effect of Feature Size on Performance
5 Growing Logic Chip Density
6 Trends in Feature and Die Size as a Function of Time
7 Supercomputer Memory Sizes and trends in RAM Density
8 Comparison of Trends in RAM Density and CPU Performance Increases
9 National Roadmap for Semiconductor Technology --1992
10 CMOS Technology and Parallel Processor Chip Projections
11 What and Why is Computational Science ?
12 Parallelism Implies Major Changes which have significant educational Implications
13 Program in Computational Science
Implemented within current academic framework

14 Program in Information Age Computational Science Implemented Within Current Academic Program
15 Elementary Discussion of
Parallel Computing

16 Single nCUBE2 CPU Chip
17 64 Node nCUBE Board
18 CM-5 in NPAC Machine Room
19 Basic METHODOLOGY of Parallel Computing
20 Concurrent Computation as a Mapping Problem -I
21 Concurrent Computation as a Mapping Problem - II
22 Concurrent Computation as a Mapping Problem - III
23 Finite Element Mesh From Nastran
(mesh only shown in upper half)

24 A Simple Equal Area Decomposition
25 Decomposition After Annealing
(one particularly good but nonoptimal decomposition)

26 Parallel Processing and Society
27 Concurrent Construction of a Wall
Using N = 8 Bricklayers
Decomposition by Vertical Sections

28 Quantitative Speed-Up Analysis for Construction of Hadrian's Wall
29 Amdahl's law for Real World Parallel Processing
30 Pipelining --Another Parallel Processing Strategy for Hadrian's Wall
31 Hadrian's Wall Illustrates that the Topology of Processor Must Include Topology of Problem
32 General Speed Up Analysis
33 Comparison of The Complete Problem to the subproblems formed in domain decomposition
34 Hadrian's Wall Illustrating an
Irregular but Homogeneous Problem

35 Some Problems are Inhomogeneous Illustrated by:
An Inhomogeneous Hadrian Wall with Decoration

36 Global and Local Parallelism Illustrated by Hadrian's Wall
37 Parallel I/O Illustrated by
Concurrent Brick Delivery for Hadrian's Wall
Bandwidth of Trucks and Roads
Matches that of Masons

38 Nature's Concurrent Computers
39 Comparison of Concurrent Processing in Society and Computing
40 Sequential Computer Architecture
41 Sequential Computer Architecture
42 Instruction Flow in A Simple Machine Pipeline
43 Examples of Superpipelined (a) and superscaler (b) machine pipelines

Outside Index Summary of Material



HTML version of Basic Foils prepared 29 August 1995

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1995
Foilsets A

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 15
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 29 August 1995

Foil 2 Contents of Foilsets A of CPS615 Computational Science

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 30
Technology Driving Forces for HPCC
Overview of What and Why is Computational Science
  • Needs to be expanded with further remarks on Information track and degree/certificate requirements
Elementary Discussion of Parallel Computing in the "real-world"
  • Hadrian Wall example
Sequential Computer Architecture

HTML version of Basic Foils prepared 29 August 1995

Foil 3 The Technology
Driving Forces for HPCC

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 14

HTML version of Basic Foils prepared 29 August 1995

Foil 4 Effect of Feature Size on Performance

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 63

HTML version of Basic Foils prepared 29 August 1995

Foil 5 Growing Logic Chip Density

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 30

HTML version of Basic Foils prepared 29 August 1995

Foil 6 Trends in Feature and Die Size as a Function of Time

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 25

HTML version of Basic Foils prepared 29 August 1995

Foil 7 Supercomputer Memory Sizes and trends in RAM Density

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 57
RAM density increases by about a factor of 50 in 8 years
Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109)
Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012)

HTML version of Basic Foils prepared 29 August 1995

Foil 8 Comparison of Trends in RAM Density and CPU Performance Increases

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 27
Computer Performance is increasing faster than RAM density

HTML version of Basic Foils prepared 29 August 1995

Foil 9 National Roadmap for Semiconductor Technology --1992

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 56
See Chapter 5 of Petaflops Report -- July 95

HTML version of Basic Foils prepared 29 August 1995

Foil 10 CMOS Technology and Parallel Processor Chip Projections

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 47
See Chapter 5 of Petaflops Report -- July 95

HTML version of Basic Foils prepared 29 August 1995

Foil 11 What and Why is Computational Science ?

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 7

HTML version of Basic Foils prepared 29 August 1995

Foil 12 Parallelism Implies Major Changes which have significant educational Implications

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 47
Different machines
New types of computers
New libraries
Rewritten Applications
Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science
Will be a nucleus for the phase transition
and accelerate use of parallel computers in the real world

HTML version of Basic Foils prepared 29 August 1995

Foil 13 Program in Computational Science
Implemented within current academic framework

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 21

HTML version of Basic Foils prepared 29 August 1995

Foil 14 Program in Information Age Computational Science Implemented Within Current Academic Program

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 31

HTML version of Basic Foils prepared 29 August 1995

Foil 15 Elementary Discussion of
Parallel Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 30

HTML version of Basic Foils prepared 29 August 1995

Foil 16 Single nCUBE2 CPU Chip

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 18

HTML version of Basic Foils prepared 29 August 1995

Foil 17 64 Node nCUBE Board

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 34
Each node is CPU and 6 memory chips -- CPU Chip integrates communication channels with floating, integer and logical CPU functions

HTML version of Basic Foils prepared 29 August 1995

Foil 18 CM-5 in NPAC Machine Room

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index Secs 47
32 node CM-5 and in foreground old CM-2 diskvault

HTML version of Basic Foils prepared 29 August 1995

Foil 19 Basic METHODOLOGY of Parallel Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Simple, but general and extensible to many more nodes is domain decomposition
All successful concurrent machines with
  • Many nodes
  • High performance (this excludes Dataflow)
Have obtained parallelism from "Data Parallelism" or "Domain Decomposition"
Problem is an algorithm applied to data set
  • and obtains concurrency by acting on data concurrently.
The three architectures considered here differ as follows:
MIMD Distributed Memory
  • Processing and Data Distributed
MIMD Shared Memory
  • Processing Distributed but data shared
SIMD Distributed Memory
  • Synchronous Processing on Distributed Data

HTML version of Basic Foils prepared 29 August 1995

Foil 20 Concurrent Computation as a Mapping Problem -I

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
2 Different types of Mappings in Physical Spaces
Both are static
  • a) Seismic Migration with domain decomposition on 4 nodes
  • b)Universe simulation with irregular data but static 16 node decomposition
  • but this problem would be best with dynamic irregular decomposition

HTML version of Basic Foils prepared 29 August 1995

Foil 21 Concurrent Computation as a Mapping Problem - II

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Different types of Mappings -- A very dynamic case without any underlying Physical Space
c)Computer Chess with dynamic game tree decomposed onto 4 nodes

HTML version of Basic Foils prepared 29 August 1995

Foil 22 Concurrent Computation as a Mapping Problem - III

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 23 Finite Element Mesh From Nastran
(mesh only shown in upper half)

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 24 A Simple Equal Area Decomposition

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
And the corresponding poor workload balance

HTML version of Basic Foils prepared 29 August 1995

Foil 25 Decomposition After Annealing
(one particularly good but nonoptimal decomposition)

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
And excellent workload balance

HTML version of Basic Foils prepared 29 August 1995

Foil 26 Parallel Processing and Society

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
The fundamental principles behind the use of concurrent computers are identical to those used in society - in fact they are partly why society exists.
If a problem is too large for one person, one does not hire a SUPERman, but rather puts together a team of ordinary people...
cf. Construction of Hadrians Wall

HTML version of Basic Foils prepared 29 August 1995

Foil 27 Concurrent Construction of a Wall
Using N = 8 Bricklayers
Decomposition by Vertical Sections

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Domain Decomposition is Key to Parallelism
Need "Large" Subdomains l >> l overlap

HTML version of Basic Foils prepared 29 August 1995

Foil 28 Quantitative Speed-Up Analysis for Construction of Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 29 Amdahl's law for Real World Parallel Processing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
AMDAHL"s LAW or
Too many cooks spoil the broth
Says that
Speedup S is small if efficiency e small
or for Hadrian's wall
equivalently S is small if length l small
But this is irrelevant as we do not need parallel processing unless problem big!

HTML version of Basic Foils prepared 29 August 1995

Foil 30 Pipelining --Another Parallel Processing Strategy for Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
"Pipelining" or decomposition by horizontal section is:
  • In general less effective
  • and leads to less parallelism
  • (N = Number of bricklayers must be < number of layers of bricks)

HTML version of Basic Foils prepared 29 August 1995

Foil 31 Hadrian's Wall Illustrates that the Topology of Processor Must Include Topology of Problem

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Hadrian's Wall is one dimensional
Humans represent a flexible processor node that can be arranged in different ways for different problems
The lesson for computing is:
Original MIMD machines used a hypercube topology. The hypercube includes several topologies including all meshes. It is a flexible concurrent computer that can tackle a broad range of problems. Current machines use different interconnect structure from hypercube but preserve this capability.

HTML version of Basic Foils prepared 29 August 1995

Foil 32 General Speed Up Analysis

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Comparing Computer and Hadrian's Wall Cases

HTML version of Basic Foils prepared 29 August 1995

Foil 33 Comparison of The Complete Problem to the subproblems formed in domain decomposition

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
The case of Programming a Hypercube
Each node runs software that is similar to sequential code
e.g., FORTRAN with geometry and boundary value sections changed

HTML version of Basic Foils prepared 29 August 1995

Foil 34 Hadrian's Wall Illustrating an
Irregular but Homogeneous Problem

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Geometry irregular but each brick takes about the same amount of time to lay.
Decomposition of wall for an irregular geometry involves equalizing number of bricks per mason, not length of wall per mason.

HTML version of Basic Foils prepared 29 August 1995

Foil 35 Some Problems are Inhomogeneous Illustrated by:
An Inhomogeneous Hadrian Wall with Decoration

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Fundamental entities (bricks, gargoyles) are of different complexity
Best decomposition dynamic
Inhomogeneous problems run on concurrent computers but require dynamic assignment of work to nodes and strategies to optimize this
(we use neural networks, simulated annealing, spectral bisection etc.)

HTML version of Basic Foils prepared 29 August 1995

Foil 36 Global and Local Parallelism Illustrated by Hadrian's Wall

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Global Parallelism
  • Break up domain
  • Amount of Parallelism proportional to size of problem (and is usually large)
  • Unit is Bricklayer or Computer node
Local Parallelism
  • Do in parallel local operations in the processing of basic entities
    • e.g. for Hadrian's problem, use two hands, one for brick and one for mortar while ...
    • for computer case, do addition at same time as multiplication
  • Local Parallelism is limited but useful
Local and Global Parallelism
Should both be Exploited

HTML version of Basic Foils prepared 29 August 1995

Foil 37 Parallel I/O Illustrated by
Concurrent Brick Delivery for Hadrian's Wall
Bandwidth of Trucks and Roads
Matches that of Masons

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Disk (input/output) Technology is better matched to several modest power processors than to a single sequential supercomputer
Concurrent Computers natural in databases, transaction analysis

HTML version of Basic Foils prepared 29 August 1995

Foil 38 Nature's Concurrent Computers

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
At the finest resolution, collection of neurons sending and receiving messages by axons and dendrites
At a coarser resolution
Society is a collection of brains sending and receiving messages by sight and sound
Ant Hill is a collection of ants (smaller brains) sending and receiving messages by chemical signals
Lesson: All Nature's Computers Use Message Passing
With several different Architectures

HTML version of Basic Foils prepared 29 August 1995

Foil 39 Comparison of Concurrent Processing in Society and Computing

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Problems are large - use domain decomposition Overheads are edge effects
Topology of processor matches that of domain - processor with rich flexible node/topology matches most domains
Regular homogeneous problems easiest but
irregular or
Inhomogeneous
Can use local and global parallelism
Can handle concurrent calculation and I/O
Nature always uses message passing as in parallel computers (at lowest level)

HTML version of Basic Foils prepared 29 August 1995

Foil 40 Sequential Computer Architecture

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 41 Sequential Computer Architecture

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

HTML version of Basic Foils prepared 29 August 1995

Foil 42 Instruction Flow in A Simple Machine Pipeline

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Three Instructions are shown overlapped -- each starting one clock cycle after last

HTML version of Basic Foils prepared 29 August 1995

Foil 43 Examples of Superpipelined (a) and superscaler (b) machine pipelines

From New CPS615 Foils 25 March95 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index

© on Tue Oct 7 1997