Full HTML for Basic Master Set of Foils for 1996 Session of CPS615

Full HTML for

Basic foilset Master Set of Foils for 1996 Session of CPS615

Given by Geoffrey C. Fox at Basic Simulation Track for Computational Science CPS615 on Fall Semester 96. Foils prepared 10 Sept 1996
Outside Index Summary of Material Secs 80

Overview of Course Itself! -- and then introductory material on basic curricula

Overview of National Program -- The Grand Challenges

Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully)

Overview of Syracuse and National programs in computational science

Parallel Computing in Society

Why Parallel Computing works

Simple Overview of Computer Architectures

SIMD MIMD Distributed (shared memory) Systems ... PIM ... Quantum Computing

General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages

Table of Contents for full HTML of Master Set of Foils for 1996 Session of CPS615

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

Indicates Available audio which is greyed out if missing

1

CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Introduction to Driving Technology and HPCC
Current Status and Futures
2

Abstract of The Current Status and Futures of HPCC
3

Basic Course CPS615 Contact Points
4

Course Organization
5

Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- I
6

Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- II
7

Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- III
8

Three Major Markets -- Logic,ASIC,DRAM
9

Chip and Package Characteristics
10

Fabrication Characteristics
11

Electrical Design and Test Metrics
12

Technologies for High Performance Computers
13

Architectures for High Performance Computers - I
14

Architectures for High Performance Computers - II
15

There is no Best Machine!
16

Quantum Computing - I
17

Quantum Computing - II
18

Quantum Computing - III
19

Superconducting Technology -- Past
20

Superconducting Technology -- Present
21

Superconducting Technology -- Problems
22

Architecture Classes of High Performance Computers
23

von Neuman Architecture in a Nutshell
24

Illustration of Importance of Cache
25

Vector Supercomputers in a Nutshell - I
26

Vector Supercomputing in a picture
27

Vector Supercomputers in a Nutshell - II
28

Flynn's Classification of HPC Systems
29

Parallel Computer Architecture Memory Structure
30

Comparison of Memory Access Strategies
31

Types of Parallel Memory Architectures -- Physical Characteristics
32

Diagrams of Shared and Distributed Memories
33

Parallel Computer Architecture Control Structure
34

Some Major Hardware Architectures - MIMD
35

MIMD Distributed Memory Architecture
36

Some Major Hardware Architectures - SIMD
37

SIMD (Single Instruction Multiple Data) Architecture
38

Some Major Hardware Architectures - Mixed
39

Some MetaComputer Systems
40

Comments on Special Purpose Devices
41

The GRAPE N-Body Machine
42

Why isn't GRAPE a Perfect Solution?
43

Granularity of Parallel Components - I
44

Granularity of Parallel Components - II
45

Classes of Communication Networks
46

Switch and Bus based Architectures
47

Examples of Interconnection Topologies
48

Useful Concepts in Communication Systems
49

Communication Performance of Some MPP's
50

Implication of Hardware Performance
51

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 5 - 1996
52

Abstract of Sept 5 1996 CPS615 Lecture
53

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 10 - 1996
54

Abstract of Sept 10 1996 CPS615 Lecture
55

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 12 - 1996
56

Abstract of Sept 12 1996 CPS615 Lecture
57

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 24 - 1996
58

Abstract of Sept 24 1996 CPS615 Lecture
59

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 26 - 1996
60

Abstract of Sept 26 1996 CPS615 Lecture
61

Embarassingly Parallel Problem Class
62

CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI
63

Abstract of CPS615 HPCC Software Technologies
64

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 1 - 1996
65

Abstract of Oct 1 1996 CPS615 Lecture
66

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 3 - 1996
67

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 10 - 1996
68

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996
69

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 22 - 1996
70

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 24 - 1996
71

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 31 - 1996
72

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996
73

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 8 - 1996
74

Abstract of Oct 24 1996 CPS615 Lecture
75

Abstract of Oct 22 1996 CPS615 Lecture
76

Abstract of Oct 15 1996 CPS615 Lecture
77

Abstract of Nov 7 1996 CPS615 Lecture
78

Abstract of Nov 8 1996 CPS615 Lecture
79

Abstract of Oct 3 1996 CPS615 Lecture
80

Abstract of Oct 10 1996 CPS615 Lecture
81

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 14 - 1996
82

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 26 - 1996
83

Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996
84

Abstract of Nov 14 1996 CPS615 Lecture
85

Abstract of Nov 26 1996 CPS615 Lecture
86

Abstract of Dec 5 1996 CPS615 Lecture

Outside Index Summary of Material

HTML version of Basic Foils prepared 10 Sept 1996

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Introduction to Driving Technology and HPCC
Current Status and Futures

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 31

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 2 Abstract of The Current Status and Futures of HPCC

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 80

Overview of Course Itself! -- and then introductory material on basic curricula

Overview of National Program -- The Grand Challenges

Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully)

Overview of Syracuse and National programs in computational science

Parallel Computing in Society

Why Parallel Computing works

Simple Overview of Computer Architectures

SIMD MIMD Distributed (shared memory) Systems ... PIM ... Quantum Computing

General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages

HTML version of Basic Foils prepared 10 Sept 1996

Foil 3 Basic Course CPS615 Contact Points

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 50

Instructor: Geoffrey Fox gcf@npac.syr.edu 3154432163 Room 3-131 CST

Backup: Nancy McCracken njm@npac.syr.edu 3154434687 Room 3-234 CST

NPAC Administrative support: Nora Downey-Easter nora@npac.syr.edu 3154431722 Room 3-206 CST

CPS615 Powers that be above can be reached at cps615ad@npac.syr.edu

CPS615 Students can be reached by mailing cps615@npac.syr.edu

Homepage will be:

http://www.npac.syr.edu/projects/cps615fall96

See my paper SCCS 736 as an overview of HPCC status

HTML version of Basic Foils prepared 10 Sept 1996

Foil 4 Course Organization

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 31

Graded on the basis of Approximately 8 Homeworks which will be due Thursday of week following day given out (Tuesday or Thursday)

Plus one modest sized project at the end of class -- must involve "real" running parallel code!

No finals or written exams

All material will be placed on World Wide Web(WWW)

Preference given to work returned on the Web

HTML version of Basic Foils prepared 10 Sept 1996

Foil 5 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 48

Overview of National Scene -- Why is High Performance Computing Important

Grand Challenges

What is Computational Science -- The Program at Syracuse

Basic Technology Situation -- Increasing density of transistors on a chip

Trends to year 2007 using Moore's Law (see UVC Video)

Elementary Discussion of Parallel Computing including use in society

why does parallel computing always "work" in principle

Computer Architecture -- Parallel and Sequential

Network Interconnections, SIMD v. MIMD, Distributed Shared Memory
vectorization contrasted with parallism

HTML version of Basic Foils prepared 10 Sept 1996

Foil 6 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 72

Simple base example -- Laplace's Equation

How does parallel computing work

This is followed by two sections -- software technologies and applications which are interspersed with each other and "algorithm" modules

Programming Models -- Message Passing and Data Parallel Computing -- MPI and HPF (Fortran 90)

Some remarks on parallel compilers
Remarks on use of parallel Java

Some real applications analysed in detail

Chemistry, CFD, Earthquake prediction, Statistical Physics

HTML version of Basic Foils prepared 10 Sept 1996

Foil 7 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- III

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 96

This introduction is followed by a set of "vignettes" discussing problem classes which illustrate parallel programming and parallel algorithms

Ordinary Differential Equations

N body Problem by both O(N^2) and "fast multipole" O(N) method

Numerical Integration including adaptive methods

Floating Point Arithmetic

Monte Carlo Methods including Random Numbers

Full Matrix Algebra as in

Computational Electromagnetism
Computational Chemistry

Partial Differential Equations implemented as sparse matrix problems (as in Computational Fluid Dynamics)

Iterative Algorithms from Gauss Seidel to Conjugate Gradient
Finite Element Methods

HTML version of Basic Foils prepared 10 Sept 1996

Foil 8 Three Major Markets -- Logic,ASIC,DRAM

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 47

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

L=Logic, D=DRAM, A=ASIC, mP = microprocessor

HTML version of Basic Foils prepared 10 Sept 1996

Foil 9 Chip and Package Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 37

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 10 Fabrication Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 25

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 11 Electrical Design and Test Metrics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Secs 34

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 12 Technologies for High Performance Computers

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

We can choose technology and architecture separately in designing our high performance system

Technology is like choosing ants people or tanks as basic units in our society analogy

or less frivolously neurons or brains

In HPCC arena, we can distinguish current technologies

COTS (Consumer off the shelf) Microprocessors
Custom node computer architectures
More generally these are all CMOS technologies

Near term technology choices include

Gallium Arsenide or Superconducting materials as opposed to Silicon
These are faster by a factor of 2 (GaAs) to 300 (Superconducting)

Further term technology choices include

DNA (Chemical) or Quantum technologies

It will cost $40 Billion for next industry investment in CMOS plants and this huge investment makes it hard for new technologies to "break in"

HTML version of Basic Foils prepared 10 Sept 1996

Foil 13 Architectures for High Performance Computers - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Architecture is equivalent to organization or design in society analogy

Different models for society (Capitalism etc.) or different types of groupings in a given society
Businesses or Armies are more precisely controlled/organized than a crowd at the State Fair
We will generalize this to formal (army) and informal (crowds) organizations

We can distinguish formal and informal parallel computers

Informal parallel computers are typically "metacomputers"

i.e. a bunch of computers sitting on a department network

HTML version of Basic Foils prepared 10 Sept 1996

Foil 14 Architectures for High Performance Computers - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Metacomputers are a very important trend which uses similar software and algorithms to conventional "MPP's" but have typically less optimized parameters

In particular network latency is higher and bandwidth is lower for an informal HPC
Latency is time for zero length communication -- start up time

Formal high performance computers are the classic (basic) object of study and are

"closely coupled" specially designed collections of compute nodes which have (in principle) been carefully optimized and balanced in the areas of

Processor (computer) nodes
Communication (internal) Network
Linkage of Memory and Processors
I/O (external network) capabilities
Overall Control or Synchronization Structure

HTML version of Basic Foils prepared 10 Sept 1996

Foil 15 There is no Best Machine!

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

In society, we see a rich set of technologies and architectures

Ant Hills
Brains as bunch of neurons
Cities as informal bunch of people
Armies as formal collections of people

With several different communication mechanisms with different trade-offs

One can walk -- low latency, low bandwidth
Go by car -- high latency (especially if can't park), reasonable bandwidth
Go by air -- higher latency and bandwidth than car
Phone -- High speed at long distance but can only communicate modest material (low capacity)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 16 Quantum Computing - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Quantum-Mechanical Computers by Seth Lloyd, Scientific American, Oct 95

Chapter 6 of The Feynman Lectures on Computation edited by Tony Hey and Robin Allen, Addison-Wesley, 1996

Quantum Computing: Dream or Nightmare? Haroche and Raimond, Physics Today, August 96 page 51

Basically any physical system can "compute" as one "just" needs a system that gives answers that depend on inputs and all physical systems have this property

Thus one can build "superconducting" "DNA" or "Quantum" computers exploiting respectively superconducting molecular or quantum mechanical rules

HTML version of Basic Foils prepared 10 Sept 1996

Foil 17 Quantum Computing - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

For a "new technology" computer to be useful, one needs to be able to

conveniently prepare inputs,
conveniently program,
reliably produce answer (quicker than other techniques), and
conveniently read out answer

Conventional computers are built around bit ( taking values 0 or 1) manipulation

One can build arbitarily complex arithmetic if have some way of implementing NOT and AND

Quantum Systems naturally represent bits

A spin (of say an electron or proton) is either up or down
A hydrogen atom is either in lowest or (first) excited state etc.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 18 Quantum Computing - III

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Interactions between quantum systems can cause "spin-flips" or state transitions and so implement arithmetic

Incident photons can "read" state of system and so give I/O capabilities

Quantum "bits" called qubits have another property as one has not only

State |0> and state |1> but also
Coherent states such as .7071*(|0> + |1>) which are equally in either state

Lloyd describes how such coherent states provide new types of computing capabilities

Natural random number as measuring state of qubit gives answer 0 or 1 randomly with equal probability
As Feynman suggests, qubit based computers are natural for large scale simulation of quantum physical systems -- this is "just" analog computing

HTML version of Basic Foils prepared 10 Sept 1996

Foil 19 Superconducting Technology -- Past

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Superconductors produce wonderful "wires" which transmit picosecond (10^-12 seconds) pulses at near speed of light

Superconducting is lower power and faster than diffusive electron transmission in CMOS
At about 0.35micron chip feature size, CMOS transmission time changes from domination by transmission (Distance) issues to resistive (diffusive effects)

Niobium used in constructing such superconducting circuits can be processed by similar fabrication techniques to CMOS

Josephson Junctions allow picosecond performance switches

BUT IBM (!969-1983) and Japan (MITI 1981-90) terminated major efforts in this area

HTML version of Basic Foils prepared 10 Sept 1996

Foil 20 Superconducting Technology -- Present

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

New ideas have resurrected this concept using RSFQ -- Rapid Single Flux Quantum -- approach

This naturally gives a bit which is 0 or 1 (or in fact n units!)

This gives interesting circuits of similar structure to CMOS systems but with a clock speed of order 100-300GHz -- factor of 100 better than CMOS which will asymptote at around 1 GHz (= one nanosecond cycle time)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 21 Superconducting Technology -- Problems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

At least two major problems:

Semiconductor industry will invest some some $40B in CMOS "plants" and infrastructure

Currently perhaps $100M a year going into superconducting circuit area!
How do we "bootstrap" superconducting industry?

Cannot build memory to match CPU speed and current designs have superconducting CPU's (with perhaps 256 Kbytes superconducting memory per processor) but conventional CMOS memory

So compared with current computers have a thousand times faster CPU, factor of four smaller cache of CPU speed and same speed basic memory as now
Can such machines perform well -- need new algorithms?
Can one design new superconducting memories?

Superconducting technology also has a bad "name" due to IBM termination!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 22 Architecture Classes of High Performance Computers

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Sequential or von Neuman Architecture

Vector (Super)computers

Parallel Computers

with various architectures classified by Flynn's methodology (this is incomplete as only discusses control or synchronization structure )
SISD
MISD
MIMD
SIMD
Metacomputers

HTML version of Basic Foils prepared 10 Sept 1996

Foil 23 von Neuman Architecture in a Nutshell

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Instructions and data are stored in the same memory for which there is a single link (the von Neumann Bottleneck) to the CPU which decodes and executues instructions

The CPU can have multiple functional units

The memory access can be enhanced by use of caches made from faster memory to allow greater bandwidth and lower latency

HTML version of Basic Foils prepared 10 Sept 1996

Foil 24 Illustration of Importance of Cache

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Fig 1.14 of Aspects of Computational Science

Editor Aad van der Steen

published by NCF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 25 Vector Supercomputers in a Nutshell - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This design enhances performance by noting that many applications calculate "vector-like" operations

Such as c(i)=a(i)+b(i) for i=1...N and N quite large

This allows one to address two performance problems

Latency in accessing memory (e.g. could take 10-20 clock cycles between requesting a particular memory location and delivery of result to CPU)
A complex operation , e.g. a floating point operation, can take a few machine cycles to complete

They are typified by Cray 1, XMP, YMP, C-90, CDC-205, ETA-10 and Japaneses Supercomputers from NEC Fujitsu and Hitachi

HTML version of Basic Foils prepared 10 Sept 1996

Foil 26 Vector Supercomputing in a picture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

A pipeline for vector addition looks like:

From Aspects of Computational Science -- Editor Aad van der Steen published by NCF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 27 Vector Supercomputers in a Nutshell - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Vector machines pipeline data through the CPU

They are not so popular/relevant as in the past as

Improved C.P.U. architecture needs fewer cycles than before for each (complex) operation (e.g 4 now not ~100 as in past)
8 Mhz 8087 of Cosmic Cube took 160 to 400 clock cycles to do a full floating point operation in 1983
Applications need more flexible pipelines which allow different operations to be executed on consequitive operands as they stream through CPU
Modern RISC processors (super scalar) can support such complex pipelines as they have far more logic than CPU's of the past

In fact excellence of say, Cray C-90 is due to its very good memory architecture allowing one to get enough operands to sustain pipeline.

Most workstation class machines have "good" CPU's but can never get enough data from memory to sustain good performance except for a few cache intensive applications

HTML version of Basic Foils prepared 10 Sept 1996

Foil 28 Flynn's Classification of HPC Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Very high speed computing systems,Proc of IEEE 54,12,p1901-1909(1966) and

Some Computer Organizations and their Effectiveness, IEEE Trans. on Comp. C-21,948-960(1972) -- both papers by M.J. Flynn

SISD -- Single Instruction stream, Single Data Stream -- i.e. von Neumann Architecture

MISD -- Multiple Instruction stream, Single Data Stream -- Not interesting

SIMD -- Single Instruction stream, Multiple Data Stream

MIMD -- Multiple Instruction stream and Multiple Data Stream -- dominant parallel system with ~one to ~one match of instruction and data streams.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 29 Parallel Computer Architecture Memory Structure

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Memory Structure of Parallel Machines

Distributed
Shared
Cached

and Heterogeneous mixtures

Shared (Global): There is a global memory space, accessible by all processors.

Processors may also have some local memory.
Algorithms may use global data structures efficiently.
However "distributed memory" algorithms may still be important as memory is NUMA (Nonuniform access times)

Distributed (Local, Message-Passing): All memory is associated with processors.

To retrieve information from another processors' memory a message must be sent there.
Algorithms should use distributed data structures.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 30 Comparison of Memory Access Strategies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Memory can be accessed directly (analogous to a phone call) as in red lines below or indirectly by message passing (green line below)

We show two processors in a MIMD machine for distributed (left) or shared(right) memory architectures

HTML version of Basic Foils prepared 10 Sept 1996

Foil 31 Types of Parallel Memory Architectures -- Physical Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Uniform: All processors take the same time to reach all memory locations.

Nonuniform (NUMA): Memory access is not uniform so that it takes a different time to get data by a given processor from each memory bank. This is natural for distributed memory machines but also true in most modern shared memory machines

DASH (Hennessey at Stanford) is best known example of such a virtual shared memory machine which is logically shared but physically distributed.
ALEWIFE from MIT is a similar project
TERA (from Burton Smith) is Uniform memory access and logically shared memory machine

Most NUMA machines these days have two memory access times

Local memory (divided in registers caches etc) and
Nonlocal memory with little or no difference in access time for different nonlocal memories

This simple two level memory access model gets more complicated in proposed 10 year out "petaflop" designs

HTML version of Basic Foils prepared 10 Sept 1996

Foil 32 Diagrams of Shared and Distributed Memories

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

HTML version of Basic Foils prepared 10 Sept 1996

Foil 33 Parallel Computer Architecture Control Structure

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

SIMD -lockstep synchronization

Each processor executes same instruction stream

MIMD - Each Processor executes independent instruction streams

MIMD Synchronization can take several forms

Simplest: program controlled message passing
"Flags" (barriers,semaphores) in memory - typical shared memory construct as in locks seen in Java Threads
Special hardware - as in cache and its coherency (coordination between nodes)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 34 Some Major Hardware Architectures - MIMD

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

MIMD Distributed Memory

This is now best illustrated by a collection of computers on a network (i.e. a metacomputer)

MIMD with logically shared memory but usually physically distributed. The latter is sometimes called distributed shared memory.

In near future, ALL formal (closely coupled) MPP's will be distributed shared memory
Note all computers (e.g. current MIMD distributed memory IBM SP2) allow any node to get at any memory but this is done indirectly -- you send a message
In future "closely-coupled" machines, there will be built in hardware supporting the function that any node can directly address all memory of the system
This distributed shared memory architecture is currently of great interest to (a major challenge for) parallel compilers

HTML version of Basic Foils prepared 10 Sept 1996

Foil 35 MIMD Distributed Memory Architecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

A special case of this is a network of workstations (NOW's) or personal computers (metacomputer)

Issues include:

Node - CPU, Memory
Network - Bandwidth, Memory
Hardware Enhanced Access to distributed Memory

HTML version of Basic Foils prepared 10 Sept 1996

Foil 36 Some Major Hardware Architectures - SIMD

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

SIMD -- Single Instruction Multiple Data -- can have logically distributed or shared memory

Examples are CM-1,2 from Thinking Machines
and AMT DAP and Maspar which are currently focussed entirely on accelerating parts of database indexing
This architecture is of decreasing interest as has reduced functionality without significant cost advantage compared to MIMD machines
Cost of synchronization in MIMD machines is not high!
Main interest of SIMD is flexible bit arithmetic as processors "small" but as transistor densities get higher this also becomes less interesting as full function 64 bit CPU's only use a small fraction of silicon of modern computer

HTML version of Basic Foils prepared 10 Sept 1996

Foil 37 SIMD (Single Instruction Multiple Data) Architecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

CM2 - 64 K processors with 1 bit arithmetic - hypercube network, broadcast network can also combine , "global or" network

Maspar, DECmpp - 16 K processors with 4 bit (MP-1), 32 bit (MP-2) arithmetic, fast two-dimensional mesh and slower general switch for communication

HTML version of Basic Foils prepared 10 Sept 1996

Foil 38 Some Major Hardware Architectures - Mixed

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Also have heterogeneous compound architecture (metacomputer) gotten by arbitrary combination of MIMD or SIMD, Sequential or Parallel machines.

Metacomputers can vary from full collections of several hundred PC's/Settop boxes on the (future) World Wide Web to a CRAY C-90 connected to a CRAY T3D

This is a critical future architecture which is intrinsically distributed memory as multi-vendor heterogenity implies that one cannot have special hardware enhanced shared memory

note that this can be a MIMD collection of SIMD machines if have a set of Maspar's on a network
One can think of human brain as a SIMD machine and then a group of people is such a MIMD collection of SIMD processors

HTML version of Basic Foils prepared 10 Sept 1996

Foil 39 Some MetaComputer Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Cluster of workstations or PC's

Heterogeneous MetaComputer System

HTML version of Basic Foils prepared 10 Sept 1996

Foil 40 Comments on Special Purpose Devices

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

One example is an Associative memory - SIMD or MIMD or content addressable memories

This is an an example of a special purpose "signal" processing machine which can in fact be built from "conventional" SIMD or "MIMD" architectures

This type of machine is not so popular as most applications are not dominated by computations for which good special purpose devices can be designed

If only 10% of a problem is say "track-finding" or some special purpose processing, then who cares if you reduce that 10% by a factor of 100

You have only sped up the system by a factor 1.1 not by 100!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 41 The GRAPE N-Body Machine

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

N body problems (e.g. Newton's laws for one million stars in a globular cluster) can have succesful special purpose devices

See GRAPE (GRAvity PipE) machine (Sugimoto et al. Nature 345 page 90,1990)

Essential reason is that such problems need much less memory per floating point unit than most problems
Globular Cluster: 10^6 computations per datum stored
Finite Element Iteration: A few computations per datum stored
Rule of thumb is that one needs one gigabyte of memory per gigaflop of computation in general problems and this general design puts most cost into memory not into CPU.

Note GRAPE uses EXACTLY same parallel algorithm that one finds in the books (e.g. Solving Problems on Concurrent Processors) for N-body problems on classic distributed memory MIMD machines

HTML version of Basic Foils prepared 10 Sept 1996

Foil 42 Why isn't GRAPE a Perfect Solution?

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

GRAPE will execute the classic O(N^2) (parallel) N body algorithm BUT this is not the algorithm used in most such computations

Rather there is the O(N) or O(N)logN so called "fast-multipole" algorithm which uses hierarchical approach

On one million stars, fast multipole is a factor of 100-1000 faster than GRAPE algorithm
fast multipole works in most but not all N-body problems (in globular clusters, extreme heterogenity makes direct O(N^2) method most attractive)

So special purpose devices cannot usually take advantage of new nifty algorithms!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 43 Granularity of Parallel Components - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Coarse-grain: Task is broken into a handful of pieces, each executed by powerful processors.

Pieces, processors may be heterogeneous. Computation/
Communication ratio very high -- Typical of Networked Metacomputing

Medium-grain: Tens to few thousands of pieces, typically executed by microprocessors.

Processors typically run the same code.(SPMD Style)
Computation/communication ratio often hundreds or more.
Typical of MIMD Parallel Systems such as SP2, CM5, Paragon, T3D

HTML version of Basic Foils prepared 10 Sept 1996

Foil 44 Granularity of Parallel Components - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Fine-grain: Thousands to perhaps millions of small pieces, executed by very small, simple processors (several per chip) or through pipelines.

Processors often have instructions broadcasted to them.
Computation/ Communication ratio often near unity.
Typical of SIMD but seen in a few MIMD systems such as Kogge's Execube, Dally's J Machine or commercial Myrianet (Seitz)
This is going to be very important in future petaflop architectures as the dense chips of year 2003 onwards favor this Processor in Memory Architecture
So many "transistors" in future chips that "small processors" of the "future" will be similar to todays high end microprocessors
As chips get denser, not realistic to put processors and memories on separate chips as granularities become too big

Note that a machine of given granularity can be used on algorithms of the same or finer granularity

HTML version of Basic Foils prepared 10 Sept 1996

Foil 45 Classes of Communication Networks

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

The last major architectural feature of a parallel machine is the network or design of hardware/software connecting processors and memories together.

Bus: All processors (and memory) connected to a common bus or busses.

Memory access fairly uniform, but not very scalable due to contention
Bus machines can be NUMA if memory consists of directly accessed local memory as well as memory banks accessed by Bus. The Bus accessed memories can be local memories on other processors

Switching Network: Processors (and memory) connected to routing switches like in telephone system.

Switches might have queues and "combining logic", which improve functionality but increase latency.
Switch settings may be determined by message headers or preset by controller.
Connections can be packet-switched (messages no longer than some fixed size) or circuit-switched (connection remains as long as needed)
Usually NUMA, blocking, often scalable and upgradable

HTML version of Basic Foils prepared 10 Sept 1996

Foil 46 Switch and Bus based Architectures

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Switch

Bus

HTML version of Basic Foils prepared 10 Sept 1996

Foil 47 Examples of Interconnection Topologies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Two dimensional grid, Binary tree, complete interconnect and 4D Hypercube.

Communication (operating system) software ensures that systems appears fully connected even if physical connections partial

HTML version of Basic Foils prepared 10 Sept 1996

Foil 48 Useful Concepts in Communication Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Useful terms include:

Scalability: Can network be extended to very large systems? Related to wire length (synchronization and driving problems), degree (pinout)

Fault Tolerance: How easily can system bypass faulty processor, memory, switch, or link? How much of system is lost by fault?

Blocking: Some communication requests may not get through, due to conflicts caused by other requests.

Nonblocking: All communication requests succeed. Sometimes just applies as long as no two requests are for same memory cell or processor.

Latency (delay): Maximal time for nonblocked request to be transmitted.

Bandwidth: Maximal total rate (MB/sec) of system communication, or subsystem-to-subsystem communication. Sometimes determined by cutsets, which cut all communication between subsystems. Often useful in providing lower bounds on time needed for task.

Worm Hole Routing -- Intermediate switch nodes do not wait for full message but allow it to pass throuch in small packets

HTML version of Basic Foils prepared 10 Sept 1996

Foil 49 Communication Performance of Some MPP's

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

From Aspects of Computational Science, Editor Aad van der Steen, published by NCF

System Communication Speed Computation Speed

Mbytes/sec(per link) Mflops/sec(per node)

IBM SP2 40 267

Intel iPSC860 2.8 60

Intel Paragon 200 75

Kendall Square

KSR-1 17.1 40

Meiko CS-2 100 200

Parsytec GC 20 25

TMC CM-5 20 128

Cray T3D 150 300

HTML version of Basic Foils prepared 10 Sept 1996

Foil 50 Implication of Hardware Performance

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

tcomm = 4 or 8 /Speed in Mbytes sec

as 4 or 8 bytes in a floating point word

tfloat = 1/Speed in Mflops per sec

Thus tcomm / tfloat is just 4 X Computation Speed divided by Communication speed

tcomm / tfloat is 26.7, 85, 1.5, 9.35, 8, 5, 25.6, 8 for the machines SP2, iPSC860, Paragon, KSR-1, Meiko CS2, Parsytec GC, TMC CM5, and Cray T3D respectively

Latency makes situation worse for small messages and double for 64bit arithmetic natural on large problems!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 51 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 5 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 52 Abstract of Sept 5 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This starts by considering the analytic form for communication overhead and illustrates its stencil dependence in simple local cases -- stressing relevance of grain size

The implication for scaling and generalizing from Laplace example is covered

We covered scaled speedup (fixed grain size) as well as fixed problem size

We noted some useful material was missing and this was continued in next lecture (Sept 10,96)

The lecture starts coverage of computer architecture covering base technologies with both CMOS covered in an earlier lecture contrasted to Quantum and Superconducting technology

HTML version of Basic Foils prepared 10 Sept 1996

Foil 53 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 10 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 54 Abstract of Sept 10 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This starts by filling in details of communication overhead in parallel processing for case where "range" of interaction is large

We show two old examples from Caltech illustrates correctness of analytic form

We return to discussion of computer architectures describing

Vector Supercomputers
General Relevance of data locality and pipelining
Flynn's classification (MIMD,SIMD etc.)
Memory Structures
Initial issues in MIMD and SIMD discussion

HTML version of Basic Foils prepared 10 Sept 1996

Foil 55 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 12 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 56 Abstract of Sept 12 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This continues the computer architecture discussion with

MIMD and SIMD with distributed shared memory
MetaComputers
Special Purpose Architectures
Granularity with technological changes forcing larger process sizes
Overview of Communication Networks with
- Switches versus topologies versus buses
- Typical values in today's machines

HTML version of Basic Foils prepared 10 Sept 1996

Foil 57 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 24 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 58 Abstract of Sept 24 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This continues the discussion of Fortran 90 with a set of overview remarks on each of the key new capabilities of this language

We also comment on value of Fortran90/HPF in a world that will switch to Java

We disgress to discuss a general theory of problem architectures as this explains such things as tradeoffs

HPCC v Software Engineering
HPF versus MPI

And the types of applications each software model is designed to address

(Note Server errors at start which confuses audio)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 59 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 26 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 60 Abstract of Sept 26 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This quickly completes the discussion of problem architecture but rather than continuing qualitative discussion of HPF applications in notes

Jumped to a discussion of HPF language describing

Basic Approach to Parallelism with "owner-computes" rule

Types of new constructs with

TEMPLATE ALIGN and PROCESSORS described

The lecture started with a description of the Web based Programming Laboratory developed by Kivanc Dincer

HTML version of Basic Foils prepared 10 Sept 1996

Foil 61 Embarassingly Parallel Problem Class

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Loosely Synchronous, Synchronous or Asynchronous classify problems by their "control" or "synchronization" structure

However there is an important class of problems where this does not matter as the synchronization overhead -- even if in the difficult asynchronous case -- is irrelevant

This is when overhead small or zero. These are "embarassingly parallel problems" where each component of decomposed problem is essentially independent

Examples are:

Financial Modelling where each component is calculating some expected value for a particular (possibly Monte Carlo) set of assumptions about the future
OLTP (Online Transaction Processing) where each component is a separate checking of a credit card transaction against the account data
Graphics rendering where each component is the calculation of the color of a particular pixel.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 62 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 63 Abstract of CPS615 HPCC Software Technologies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

We go through the 2D Laplace's Equation with both HPF and MPI for Simple Jacobi Iteration

HPF and Fortran90 are reviewed followed by MPI

We also discuss the structure of problems as these determine why and when certain software approaches are appropriate

HTML version of Basic Foils prepared 10 Sept 1996

Foil 64 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 1 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 65 Abstract of Oct 1 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This continues the discussion of HPF in the area of distribution and ALIGN statements.

The discussion of ALIGN should be improved as audio makes dubious statements about "broadcasting" information.

The distribution discussion includes a reasonable descriuption of block and cyclic and when you should use them.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 66 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 3 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 67 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 10 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 68 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 69 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 22 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 70 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 24 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 71 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 31 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 72 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 73 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 8 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 74 Abstract of Oct 24 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This covers two topics:

Monte Carlo Integration for large scale Problems using Experimental and Theoretical high energy physics as an example

This includes accept-reject methods, uniform weighting and parallel algorithms

Then we complete HPF discussion with embarassingly parallel DO INDEPENDENT discussed in Monte Carlo case

And HPF2 Changes

HTML version of Basic Foils prepared 10 Sept 1996

Foil 75 Abstract of Oct 22 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This starts by finishing the simple overview of statistics

Covering Gaussian Random Numbers, Numerical Generation of Random Numbers both sequentially and in parallel

Then we describe the central limit theorem which underlies Monte Carlo method

Then it returns to Numerical Integration with the first part of discussion of Monte Carlo Integration

HTML version of Basic Foils prepared 10 Sept 1996

Foil 76 Abstract of Oct 15 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This finishes the last part of N body and ODE discussion fociussing on pipeline data parallel algorithm

Note several foils were changed after presentation and so discussion is a little disconnected from foils at times

We start Numerical Integration with a basic discussion of Newton-Cotes formulae (including Trapezoidal and Simpson's rule)

We illustrate them pictorially

HTML version of Basic Foils prepared 10 Sept 1996

Foil 77 Abstract of Nov 7 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This completes the MPI general discussion with the basic message passing, collective communication and some advanced features

It then returns to Laplace Example foilset to show how MPI can be used here

We have previously used this for HPF and performance analysis

HTML version of Basic Foils prepared 10 Sept 1996

Foil 78 Abstract of Nov 8 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This starts basic module on Partial Differential Solvers with

Introduction to Classification of Equations

Basic Discretization

Derivation of Sparse Matrix Formulation

Analogies of Iteration with Artificial Time

Start of Explicit Matrix Formulation for Simple Cases

HTML version of Basic Foils prepared 10 Sept 1996

Foil 79 Abstract of Oct 3 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Tom Haupt on Fortran 90 / HPF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 80 Abstract of Oct 10 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This discusses solution of systems of ordinary differential equations (ODE) in the context of N squared particle dynamics problems

We start with motivation with brief discussion of relevant problems

We go through basic theory of ODE's including Euler, Runge-Kutta, Predictor-Corrector and Multistep Methods

We begin the discussion of solving N body problems using classic Connection Machine elegant but inefficient algorithm

Note -- Some foils expanded to two after talk and second one is given without audio in script

HTML version of Basic Foils prepared 10 Sept 1996

Foil 81 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 14 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 82 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 26 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 83 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

Geoffrey Fox

NPAC

Room 3-131 CST

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 84 Abstract of Nov 14 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This started with a description of current Web set-up of CPS615 and other foilsets

Then we started the foilset describing Physical Simulations and the various approaches -- Continuum Physics, Monte Carlo, Quantum Dynamics, and Computational Fluid Dynamics

For CFD we do enough to discuss why viscosity and High Reynolds numbers are critical in air and similar media

We discuss computation and communication needs of CFD compared to Laplace equation

HTML version of Basic Foils prepared 10 Sept 1996

Foil 85 Abstract of Nov 26 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This covers essentially all the finite element method and its solution using the conjugate gradient method

Only the simple 2D Laplace equation using triangular nodes is discussed

We stress variational method as an optimization method and you use this analogy to motivate conjugate gradient as an improved steepest descent approach

We discuss parallel computing issues for both finite element and conjugate gradient

HTML version of Basic Foils prepared 10 Sept 1996

Foil 86 Abstract of Dec 5 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *

Full HTML Index

This lecture covers two distinct areas.

Firstly a short discussion of LInear Programming -- what type of problems its used for, what the equations look like and basic issues in the difficult use of parallel processing

Then we give an abbreviated discussion of Full Matrix algorithms covering

The types of applications that use them
Matrix Multiplication including Cannon's algorithm in detail
Use of MPI primitives including communicator groups
Performance Analysis

Basic foilset Master Set of Foils for 1996 Session of CPS615

Table of Contents for full HTML of Master Set of Foils for 1996 Session of CPS615

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Introduction to Driving Technology and HPCCCurrent Status and Futures

Foil 2 Abstract of The Current Status and Futures of HPCC

Foil 3 Basic Course CPS615 Contact Points

Foil 4 Course Organization

Foil 5 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- I

Foil 6 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- II

Foil 7 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- III

Foil 8 Three Major Markets -- Logic,ASIC,DRAM

Foil 9 Chip and Package Characteristics

Foil 10 Fabrication Characteristics

Foil 11 Electrical Design and Test Metrics

Foil 12 Technologies for High Performance Computers

Foil 13 Architectures for High Performance Computers - I

Foil 14 Architectures for High Performance Computers - II

Foil 15 There is no Best Machine!

Foil 16 Quantum Computing - I

Foil 17 Quantum Computing - II

Foil 18 Quantum Computing - III

Foil 19 Superconducting Technology -- Past

Foil 20 Superconducting Technology -- Present

Foil 21 Superconducting Technology -- Problems

Foil 22 Architecture Classes of High Performance Computers

Foil 23 von Neuman Architecture in a Nutshell

Foil 24 Illustration of Importance of Cache

Foil 25 Vector Supercomputers in a Nutshell - I

Foil 26 Vector Supercomputing in a picture

Foil 27 Vector Supercomputers in a Nutshell - II

Foil 28 Flynn's Classification of HPC Systems

Foil 29 Parallel Computer Architecture Memory Structure

Foil 30 Comparison of Memory Access Strategies

Foil 31 Types of Parallel Memory Architectures -- Physical Characteristics

Foil 32 Diagrams of Shared and Distributed Memories

Foil 33 Parallel Computer Architecture Control Structure

Foil 34 Some Major Hardware Architectures - MIMD

Foil 35 MIMD Distributed Memory Architecture

Foil 36 Some Major Hardware Architectures - SIMD

Foil 37 SIMD (Single Instruction Multiple Data) Architecture

Foil 38 Some Major Hardware Architectures - Mixed

Foil 39 Some MetaComputer Systems

Foil 40 Comments on Special Purpose Devices

Foil 41 The GRAPE N-Body Machine

Foil 42 Why isn't GRAPE a Perfect Solution?

Foil 43 Granularity of Parallel Components - I

Foil 44 Granularity of Parallel Components - II

Foil 45 Classes of Communication Networks

Foil 46 Switch and Bus based Architectures

Foil 47 Examples of Interconnection Topologies

Foil 48 Useful Concepts in Communication Systems

Foil 49 Communication Performance of Some MPP's

Foil 50 Implication of Hardware Performance

Foil 51 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of September 5 - 1996

Foil 52 Abstract of Sept 5 1996 CPS615 Lecture

Foil 53 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of September 10 - 1996

Foil 54 Abstract of Sept 10 1996 CPS615 Lecture

Foil 55 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of September 12 - 1996

Foil 56 Abstract of Sept 12 1996 CPS615 Lecture

Foil 57 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of September 24 - 1996

Foil 58 Abstract of Sept 24 1996 CPS615 Lecture

Foil 59 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of September 26 - 1996

Foil 60 Abstract of Sept 26 1996 CPS615 Lecture

Foil 61 Embarassingly Parallel Problem Class

Foil 62 CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --HPCC Software TechnologiesHPF and MPI

Foil 63 Abstract of CPS615 HPCC Software Technologies

Foil 64 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 1 - 1996

Foil 65 Abstract of Oct 1 1996 CPS615 Lecture

Foil 66 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 3 - 1996

Foil 67 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 10 - 1996

Foil 68 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 15 - 1996

Foil 69 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 22 - 1996

Foil 70 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 24 - 1996

Foil 71 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of October 31 - 1996

Foil 72 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of November 7 - 1996

Foil 73 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational ScienceFall Semester 1996 --Lecture of November 8 - 1996

Foil 74 Abstract of Oct 24 1996 CPS615 Lecture

Foil 75 Abstract of Oct 22 1996 CPS615 Lecture

Foil 76 Abstract of Oct 15 1996 CPS615 Lecture

Foil 77 Abstract of Nov 7 1996 CPS615 Lecture

Foil 78 Abstract of Nov 8 1996 CPS615 Lecture

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Introduction to Driving Technology and HPCC
Current Status and Futures

Foil 51 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 5 - 1996

Foil 53 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 10 - 1996

Foil 55 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 12 - 1996

Foil 57 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 24 - 1996

Foil 59 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 26 - 1996

Foil 62 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

Foil 64 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 1 - 1996

Foil 66 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 3 - 1996

Foil 67 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 10 - 1996

Foil 68 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

Foil 69 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 22 - 1996

Foil 70 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 24 - 1996

Foil 71 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 31 - 1996

Foil 72 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

Foil 73 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 8 - 1996

Foil 81 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 14 - 1996

Foil 82 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 26 - 1996

Foil 83 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996