Full HTML for

Basic foilset Master Set of Foils for 1996 Session of CPS615

Given by Geoffrey C. Fox at Basic Simulation Track for Computational Science CPS615 on Fall Semester 96. Foils prepared 10 Sept 1996
Outside Index Summary of Material Secs 80


Overview of Course Itself! -- and then introductory material on basic curricula
Overview of National Program -- The Grand Challenges
Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully)
Overview of Syracuse and National programs in computational science
Parallel Computing in Society
Why Parallel Computing works
Simple Overview of Computer Architectures
  • SIMD MIMD Distributed (shared memory) Systems ... PIM ... Quantum Computing
General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages

Table of Contents for full HTML of Master Set of Foils for 1996 Session of CPS615

Denote Foils where Image Critical
Denote Foils where HTML is sufficient
Indicates Available audio which is greyed out if missing
1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Introduction to Driving Technology and HPCC
Current Status and Futures

2 Abstract of The Current Status and Futures of HPCC
3 Basic Course CPS615 Contact Points
4 Course Organization
5 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- I
6 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- II
7 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- III
8 Three Major Markets -- Logic,ASIC,DRAM
9 Chip and Package Characteristics
10 Fabrication Characteristics
11 Electrical Design and Test Metrics
12 Technologies for High Performance Computers
13 Architectures for High Performance Computers - I
14 Architectures for High Performance Computers - II
15 There is no Best Machine!
16 Quantum Computing - I
17 Quantum Computing - II
18 Quantum Computing - III
19 Superconducting Technology -- Past
20 Superconducting Technology -- Present
21 Superconducting Technology -- Problems
22 Architecture Classes of High Performance Computers
23 von Neuman Architecture in a Nutshell
24 Illustration of Importance of Cache
25 Vector Supercomputers in a Nutshell - I
26 Vector Supercomputing in a picture
27 Vector Supercomputers in a Nutshell - II
28 Flynn's Classification of HPC Systems
29 Parallel Computer Architecture Memory Structure
30 Comparison of Memory Access Strategies
31 Types of Parallel Memory Architectures -- Physical Characteristics
32 Diagrams of Shared and Distributed Memories
33 Parallel Computer Architecture Control Structure
34 Some Major Hardware Architectures - MIMD
35 MIMD Distributed Memory Architecture
36 Some Major Hardware Architectures - SIMD
37 SIMD (Single Instruction Multiple Data) Architecture
38 Some Major Hardware Architectures - Mixed
39 Some MetaComputer Systems
40 Comments on Special Purpose Devices
41 The GRAPE N-Body Machine
42 Why isn't GRAPE a Perfect Solution?
43 Granularity of Parallel Components - I
44 Granularity of Parallel Components - II
45 Classes of Communication Networks
46 Switch and Bus based Architectures
47 Examples of Interconnection Topologies
48 Useful Concepts in Communication Systems
49 Communication Performance of Some MPP's
50 Implication of Hardware Performance
51 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 5 - 1996

52 Abstract of Sept 5 1996 CPS615 Lecture
53 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 10 - 1996

54 Abstract of Sept 10 1996 CPS615 Lecture
55 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 12 - 1996

56 Abstract of Sept 12 1996 CPS615 Lecture
57 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 24 - 1996

58 Abstract of Sept 24 1996 CPS615 Lecture
59 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 26 - 1996

60 Abstract of Sept 26 1996 CPS615 Lecture
61 Embarassingly Parallel Problem Class
62 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

63 Abstract of CPS615 HPCC Software Technologies
64 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 1 - 1996

65 Abstract of Oct 1 1996 CPS615 Lecture
66 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 3 - 1996

67 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 10 - 1996

68 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

69 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 22 - 1996

70 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 24 - 1996

71 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 31 - 1996

72 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

73 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 8 - 1996

74 Abstract of Oct 24 1996 CPS615 Lecture
75 Abstract of Oct 22 1996 CPS615 Lecture
76 Abstract of Oct 15 1996 CPS615 Lecture
77 Abstract of Nov 7 1996 CPS615 Lecture
78 Abstract of Nov 8 1996 CPS615 Lecture
79 Abstract of Oct 3 1996 CPS615 Lecture
80 Abstract of Oct 10 1996 CPS615 Lecture
81 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 14 - 1996

82 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 26 - 1996

83 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996

84 Abstract of Nov 14 1996 CPS615 Lecture
85 Abstract of Nov 26 1996 CPS615 Lecture
86 Abstract of Dec 5 1996 CPS615 Lecture

Outside Index Summary of Material



HTML version of Basic Foils prepared 10 Sept 1996

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Introduction to Driving Technology and HPCC
Current Status and Futures

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 31
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 2 Abstract of The Current Status and Futures of HPCC

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 80
Overview of Course Itself! -- and then introductory material on basic curricula
Overview of National Program -- The Grand Challenges
Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully)
Overview of Syracuse and National programs in computational science
Parallel Computing in Society
Why Parallel Computing works
Simple Overview of Computer Architectures
  • SIMD MIMD Distributed (shared memory) Systems ... PIM ... Quantum Computing
General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages

HTML version of Basic Foils prepared 10 Sept 1996

Foil 3 Basic Course CPS615 Contact Points

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 50
Instructor: Geoffrey Fox gcf@npac.syr.edu 3154432163 Room 3-131 CST
Backup: Nancy McCracken njm@npac.syr.edu 3154434687 Room 3-234 CST
NPAC Administrative support: Nora Downey-Easter nora@npac.syr.edu 3154431722 Room 3-206 CST
CPS615 Powers that be above can be reached at cps615ad@npac.syr.edu
CPS615 Students can be reached by mailing cps615@npac.syr.edu
Homepage will be:
http://www.npac.syr.edu/projects/cps615fall96
See my paper SCCS 736 as an overview of HPCC status

HTML version of Basic Foils prepared 10 Sept 1996

Foil 4 Course Organization

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 31
Graded on the basis of Approximately 8 Homeworks which will be due Thursday of week following day given out (Tuesday or Thursday)
Plus one modest sized project at the end of class -- must involve "real" running parallel code!
No finals or written exams
All material will be placed on World Wide Web(WWW)
Preference given to work returned on the Web

HTML version of Basic Foils prepared 10 Sept 1996

Foil 5 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 48
Overview of National Scene -- Why is High Performance Computing Important
  • Grand Challenges
What is Computational Science -- The Program at Syracuse
Basic Technology Situation -- Increasing density of transistors on a chip
  • Trends to year 2007 using Moore's Law (see UVC Video)
Elementary Discussion of Parallel Computing including use in society
  • why does parallel computing always "work" in principle
Computer Architecture -- Parallel and Sequential
  • Network Interconnections, SIMD v. MIMD, Distributed Shared Memory
  • vectorization contrasted with parallism

HTML version of Basic Foils prepared 10 Sept 1996

Foil 6 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 72
Simple base example -- Laplace's Equation
  • How does parallel computing work
This is followed by two sections -- software technologies and applications which are interspersed with each other and "algorithm" modules
Programming Models -- Message Passing and Data Parallel Computing -- MPI and HPF (Fortran 90)
  • Some remarks on parallel compilers
  • Remarks on use of parallel Java
Some real applications analysed in detail
  • Chemistry, CFD, Earthquake prediction, Statistical Physics

HTML version of Basic Foils prepared 10 Sept 1996

Foil 7 Basic Structure of Complete CPS615 Base Course on Computational Science Simulation Track -- III

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 96
This introduction is followed by a set of "vignettes" discussing problem classes which illustrate parallel programming and parallel algorithms
Ordinary Differential Equations
  • N body Problem by both O(N^2) and "fast multipole" O(N) method
Numerical Integration including adaptive methods
Floating Point Arithmetic
Monte Carlo Methods including Random Numbers
Full Matrix Algebra as in
  • Computational Electromagnetism
  • Computational Chemistry
Partial Differential Equations implemented as sparse matrix problems (as in Computational Fluid Dynamics)
  • Iterative Algorithms from Gauss Seidel to Conjugate Gradient
  • Finite Element Methods

HTML version of Basic Foils prepared 10 Sept 1996

Foil 8 Three Major Markets -- Logic,ASIC,DRAM

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 47
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994
L=Logic, D=DRAM, A=ASIC, mP = microprocessor

HTML version of Basic Foils prepared 10 Sept 1996

Foil 9 Chip and Package Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 37
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 10 Fabrication Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 25
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 11 Electrical Design and Test Metrics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index Secs 34
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Basic Foils prepared 10 Sept 1996

Foil 12 Technologies for High Performance Computers

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
We can choose technology and architecture separately in designing our high performance system
Technology is like choosing ants people or tanks as basic units in our society analogy
  • or less frivolously neurons or brains
In HPCC arena, we can distinguish current technologies
  • COTS (Consumer off the shelf) Microprocessors
  • Custom node computer architectures
  • More generally these are all CMOS technologies
Near term technology choices include
  • Gallium Arsenide or Superconducting materials as opposed to Silicon
  • These are faster by a factor of 2 (GaAs) to 300 (Superconducting)
Further term technology choices include
  • DNA (Chemical) or Quantum technologies
It will cost $40 Billion for next industry investment in CMOS plants and this huge investment makes it hard for new technologies to "break in"

HTML version of Basic Foils prepared 10 Sept 1996

Foil 13 Architectures for High Performance Computers - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Architecture is equivalent to organization or design in society analogy
  • Different models for society (Capitalism etc.) or different types of groupings in a given society
  • Businesses or Armies are more precisely controlled/organized than a crowd at the State Fair
  • We will generalize this to formal (army) and informal (crowds) organizations
We can distinguish formal and informal parallel computers
Informal parallel computers are typically "metacomputers"
  • i.e. a bunch of computers sitting on a department network

HTML version of Basic Foils prepared 10 Sept 1996

Foil 14 Architectures for High Performance Computers - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Metacomputers are a very important trend which uses similar software and algorithms to conventional "MPP's" but have typically less optimized parameters
  • In particular network latency is higher and bandwidth is lower for an informal HPC
  • Latency is time for zero length communication -- start up time
Formal high performance computers are the classic (basic) object of study and are
"closely coupled" specially designed collections of compute nodes which have (in principle) been carefully optimized and balanced in the areas of
  • Processor (computer) nodes
  • Communication (internal) Network
  • Linkage of Memory and Processors
  • I/O (external network) capabilities
  • Overall Control or Synchronization Structure

HTML version of Basic Foils prepared 10 Sept 1996

Foil 15 There is no Best Machine!

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
In society, we see a rich set of technologies and architectures
  • Ant Hills
  • Brains as bunch of neurons
  • Cities as informal bunch of people
  • Armies as formal collections of people
With several different communication mechanisms with different trade-offs
  • One can walk -- low latency, low bandwidth
  • Go by car -- high latency (especially if can't park), reasonable bandwidth
  • Go by air -- higher latency and bandwidth than car
  • Phone -- High speed at long distance but can only communicate modest material (low capacity)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 16 Quantum Computing - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Quantum-Mechanical Computers by Seth Lloyd, Scientific American, Oct 95
Chapter 6 of The Feynman Lectures on Computation edited by Tony Hey and Robin Allen, Addison-Wesley, 1996
Quantum Computing: Dream or Nightmare? Haroche and Raimond, Physics Today, August 96 page 51
Basically any physical system can "compute" as one "just" needs a system that gives answers that depend on inputs and all physical systems have this property
Thus one can build "superconducting" "DNA" or "Quantum" computers exploiting respectively superconducting molecular or quantum mechanical rules

HTML version of Basic Foils prepared 10 Sept 1996

Foil 17 Quantum Computing - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
For a "new technology" computer to be useful, one needs to be able to
  • conveniently prepare inputs,
  • conveniently program,
  • reliably produce answer (quicker than other techniques), and
  • conveniently read out answer
Conventional computers are built around bit ( taking values 0 or 1) manipulation
One can build arbitarily complex arithmetic if have some way of implementing NOT and AND
Quantum Systems naturally represent bits
  • A spin (of say an electron or proton) is either up or down
  • A hydrogen atom is either in lowest or (first) excited state etc.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 18 Quantum Computing - III

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Interactions between quantum systems can cause "spin-flips" or state transitions and so implement arithmetic
Incident photons can "read" state of system and so give I/O capabilities
Quantum "bits" called qubits have another property as one has not only
  • State |0> and state |1> but also
  • Coherent states such as .7071*(|0> + |1>) which are equally in either state
Lloyd describes how such coherent states provide new types of computing capabilities
  • Natural random number as measuring state of qubit gives answer 0 or 1 randomly with equal probability
  • As Feynman suggests, qubit based computers are natural for large scale simulation of quantum physical systems -- this is "just" analog computing

HTML version of Basic Foils prepared 10 Sept 1996

Foil 19 Superconducting Technology -- Past

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Superconductors produce wonderful "wires" which transmit picosecond (10^-12 seconds) pulses at near speed of light
  • Superconducting is lower power and faster than diffusive electron transmission in CMOS
  • At about 0.35micron chip feature size, CMOS transmission time changes from domination by transmission (Distance) issues to resistive (diffusive effects)
Niobium used in constructing such superconducting circuits can be processed by similar fabrication techniques to CMOS
Josephson Junctions allow picosecond performance switches
BUT IBM (!969-1983) and Japan (MITI 1981-90) terminated major efforts in this area

HTML version of Basic Foils prepared 10 Sept 1996

Foil 20 Superconducting Technology -- Present

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
New ideas have resurrected this concept using RSFQ -- Rapid Single Flux Quantum -- approach
This naturally gives a bit which is 0 or 1 (or in fact n units!)
This gives interesting circuits of similar structure to CMOS systems but with a clock speed of order 100-300GHz -- factor of 100 better than CMOS which will asymptote at around 1 GHz (= one nanosecond cycle time)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 21 Superconducting Technology -- Problems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
At least two major problems:
Semiconductor industry will invest some some $40B in CMOS "plants" and infrastructure
  • Currently perhaps $100M a year going into superconducting circuit area!
  • How do we "bootstrap" superconducting industry?
Cannot build memory to match CPU speed and current designs have superconducting CPU's (with perhaps 256 Kbytes superconducting memory per processor) but conventional CMOS memory
  • So compared with current computers have a thousand times faster CPU, factor of four smaller cache of CPU speed and same speed basic memory as now
  • Can such machines perform well -- need new algorithms?
  • Can one design new superconducting memories?
Superconducting technology also has a bad "name" due to IBM termination!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 22 Architecture Classes of High Performance Computers

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Sequential or von Neuman Architecture
Vector (Super)computers
Parallel Computers
  • with various architectures classified by Flynn's methodology (this is incomplete as only discusses control or synchronization structure )
  • SISD
  • MISD
  • MIMD
  • SIMD
  • Metacomputers

HTML version of Basic Foils prepared 10 Sept 1996

Foil 23 von Neuman Architecture in a Nutshell

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Instructions and data are stored in the same memory for which there is a single link (the von Neumann Bottleneck) to the CPU which decodes and executues instructions
The CPU can have multiple functional units
The memory access can be enhanced by use of caches made from faster memory to allow greater bandwidth and lower latency

HTML version of Basic Foils prepared 10 Sept 1996

Foil 24 Illustration of Importance of Cache

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Fig 1.14 of Aspects of Computational Science
Editor Aad van der Steen
published by NCF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 25 Vector Supercomputers in a Nutshell - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This design enhances performance by noting that many applications calculate "vector-like" operations
  • Such as c(i)=a(i)+b(i) for i=1...N and N quite large
This allows one to address two performance problems
  • Latency in accessing memory (e.g. could take 10-20 clock cycles between requesting a particular memory location and delivery of result to CPU)
  • A complex operation , e.g. a floating point operation, can take a few machine cycles to complete
They are typified by Cray 1, XMP, YMP, C-90, CDC-205, ETA-10 and Japaneses Supercomputers from NEC Fujitsu and Hitachi

HTML version of Basic Foils prepared 10 Sept 1996

Foil 26 Vector Supercomputing in a picture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
A pipeline for vector addition looks like:
  • From Aspects of Computational Science -- Editor Aad van der Steen published by NCF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 27 Vector Supercomputers in a Nutshell - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Vector machines pipeline data through the CPU
They are not so popular/relevant as in the past as
  • Improved C.P.U. architecture needs fewer cycles than before for each (complex) operation (e.g 4 now not ~100 as in past)
  • 8 Mhz 8087 of Cosmic Cube took 160 to 400 clock cycles to do a full floating point operation in 1983
  • Applications need more flexible pipelines which allow different operations to be executed on consequitive operands as they stream through CPU
  • Modern RISC processors (super scalar) can support such complex pipelines as they have far more logic than CPU's of the past
In fact excellence of say, Cray C-90 is due to its very good memory architecture allowing one to get enough operands to sustain pipeline.
Most workstation class machines have "good" CPU's but can never get enough data from memory to sustain good performance except for a few cache intensive applications

HTML version of Basic Foils prepared 10 Sept 1996

Foil 28 Flynn's Classification of HPC Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Very high speed computing systems,Proc of IEEE 54,12,p1901-1909(1966) and
Some Computer Organizations and their Effectiveness, IEEE Trans. on Comp. C-21,948-960(1972) -- both papers by M.J. Flynn
SISD -- Single Instruction stream, Single Data Stream -- i.e. von Neumann Architecture
MISD -- Multiple Instruction stream, Single Data Stream -- Not interesting
SIMD -- Single Instruction stream, Multiple Data Stream
MIMD -- Multiple Instruction stream and Multiple Data Stream -- dominant parallel system with ~one to ~one match of instruction and data streams.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 29 Parallel Computer Architecture Memory Structure

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Memory Structure of Parallel Machines
  • Distributed
  • Shared
  • Cached
and Heterogeneous mixtures
Shared (Global): There is a global memory space, accessible by all processors.
  • Processors may also have some local memory.
  • Algorithms may use global data structures efficiently.
  • However "distributed memory" algorithms may still be important as memory is NUMA (Nonuniform access times)
Distributed (Local, Message-Passing): All memory is associated with processors.
  • To retrieve information from another processors' memory a message must be sent there.
  • Algorithms should use distributed data structures.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 30 Comparison of Memory Access Strategies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Memory can be accessed directly (analogous to a phone call) as in red lines below or indirectly by message passing (green line below)
We show two processors in a MIMD machine for distributed (left) or shared(right) memory architectures

HTML version of Basic Foils prepared 10 Sept 1996

Foil 31 Types of Parallel Memory Architectures -- Physical Characteristics

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Uniform: All processors take the same time to reach all memory locations.
Nonuniform (NUMA): Memory access is not uniform so that it takes a different time to get data by a given processor from each memory bank. This is natural for distributed memory machines but also true in most modern shared memory machines
  • DASH (Hennessey at Stanford) is best known example of such a virtual shared memory machine which is logically shared but physically distributed.
  • ALEWIFE from MIT is a similar project
  • TERA (from Burton Smith) is Uniform memory access and logically shared memory machine
Most NUMA machines these days have two memory access times
  • Local memory (divided in registers caches etc) and
  • Nonlocal memory with little or no difference in access time for different nonlocal memories
This simple two level memory access model gets more complicated in proposed 10 year out "petaflop" designs

HTML version of Basic Foils prepared 10 Sept 1996

Foil 32 Diagrams of Shared and Distributed Memories

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index

HTML version of Basic Foils prepared 10 Sept 1996

Foil 33 Parallel Computer Architecture Control Structure

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
SIMD -lockstep synchronization
  • Each processor executes same instruction stream
MIMD - Each Processor executes independent instruction streams
MIMD Synchronization can take several forms
  • Simplest: program controlled message passing
  • "Flags" (barriers,semaphores) in memory - typical shared memory construct as in locks seen in Java Threads
  • Special hardware - as in cache and its coherency (coordination between nodes)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 34 Some Major Hardware Architectures - MIMD

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
MIMD Distributed Memory
  • This is now best illustrated by a collection of computers on a network (i.e. a metacomputer)
MIMD with logically shared memory but usually physically distributed. The latter is sometimes called distributed shared memory.
  • In near future, ALL formal (closely coupled) MPP's will be distributed shared memory
  • Note all computers (e.g. current MIMD distributed memory IBM SP2) allow any node to get at any memory but this is done indirectly -- you send a message
  • In future "closely-coupled" machines, there will be built in hardware supporting the function that any node can directly address all memory of the system
  • This distributed shared memory architecture is currently of great interest to (a major challenge for) parallel compilers

HTML version of Basic Foils prepared 10 Sept 1996

Foil 35 MIMD Distributed Memory Architecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
A special case of this is a network of workstations (NOW's) or personal computers (metacomputer)
Issues include:
  • Node - CPU, Memory
  • Network - Bandwidth, Memory
  • Hardware Enhanced Access to distributed Memory

HTML version of Basic Foils prepared 10 Sept 1996

Foil 36 Some Major Hardware Architectures - SIMD

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
SIMD -- Single Instruction Multiple Data -- can have logically distributed or shared memory
  • Examples are CM-1,2 from Thinking Machines
  • and AMT DAP and Maspar which are currently focussed entirely on accelerating parts of database indexing
  • This architecture is of decreasing interest as has reduced functionality without significant cost advantage compared to MIMD machines
  • Cost of synchronization in MIMD machines is not high!
  • Main interest of SIMD is flexible bit arithmetic as processors "small" but as transistor densities get higher this also becomes less interesting as full function 64 bit CPU's only use a small fraction of silicon of modern computer

HTML version of Basic Foils prepared 10 Sept 1996

Foil 37 SIMD (Single Instruction Multiple Data) Architecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
CM2 - 64 K processors with 1 bit arithmetic - hypercube network, broadcast network can also combine , "global or" network
Maspar, DECmpp - 16 K processors with 4 bit (MP-1), 32 bit (MP-2) arithmetic, fast two-dimensional mesh and slower general switch for communication

HTML version of Basic Foils prepared 10 Sept 1996

Foil 38 Some Major Hardware Architectures - Mixed

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Also have heterogeneous compound architecture (metacomputer) gotten by arbitrary combination of MIMD or SIMD, Sequential or Parallel machines.
Metacomputers can vary from full collections of several hundred PC's/Settop boxes on the (future) World Wide Web to a CRAY C-90 connected to a CRAY T3D
This is a critical future architecture which is intrinsically distributed memory as multi-vendor heterogenity implies that one cannot have special hardware enhanced shared memory
  • note that this can be a MIMD collection of SIMD machines if have a set of Maspar's on a network
  • One can think of human brain as a SIMD machine and then a group of people is such a MIMD collection of SIMD processors

HTML version of Basic Foils prepared 10 Sept 1996

Foil 39 Some MetaComputer Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Cluster of workstations or PC's
Heterogeneous MetaComputer System

HTML version of Basic Foils prepared 10 Sept 1996

Foil 40 Comments on Special Purpose Devices

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
One example is an Associative memory - SIMD or MIMD or content addressable memories
This is an an example of a special purpose "signal" processing machine which can in fact be built from "conventional" SIMD or "MIMD" architectures
This type of machine is not so popular as most applications are not dominated by computations for which good special purpose devices can be designed
If only 10% of a problem is say "track-finding" or some special purpose processing, then who cares if you reduce that 10% by a factor of 100
  • You have only sped up the system by a factor 1.1 not by 100!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 41 The GRAPE N-Body Machine

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
N body problems (e.g. Newton's laws for one million stars in a globular cluster) can have succesful special purpose devices
See GRAPE (GRAvity PipE) machine (Sugimoto et al. Nature 345 page 90,1990)
  • Essential reason is that such problems need much less memory per floating point unit than most problems
  • Globular Cluster: 10^6 computations per datum stored
  • Finite Element Iteration: A few computations per datum stored
  • Rule of thumb is that one needs one gigabyte of memory per gigaflop of computation in general problems and this general design puts most cost into memory not into CPU.
Note GRAPE uses EXACTLY same parallel algorithm that one finds in the books (e.g. Solving Problems on Concurrent Processors) for N-body problems on classic distributed memory MIMD machines

HTML version of Basic Foils prepared 10 Sept 1996

Foil 42 Why isn't GRAPE a Perfect Solution?

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
GRAPE will execute the classic O(N^2) (parallel) N body algorithm BUT this is not the algorithm used in most such computations
Rather there is the O(N) or O(N)logN so called "fast-multipole" algorithm which uses hierarchical approach
  • On one million stars, fast multipole is a factor of 100-1000 faster than GRAPE algorithm
  • fast multipole works in most but not all N-body problems (in globular clusters, extreme heterogenity makes direct O(N^2) method most attractive)
So special purpose devices cannot usually take advantage of new nifty algorithms!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 43 Granularity of Parallel Components - I

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Coarse-grain: Task is broken into a handful of pieces, each executed by powerful processors.
  • Pieces, processors may be heterogeneous. Computation/
  • Communication ratio very high -- Typical of Networked Metacomputing
Medium-grain: Tens to few thousands of pieces, typically executed by microprocessors.
  • Processors typically run the same code.(SPMD Style)
  • Computation/communication ratio often hundreds or more.
  • Typical of MIMD Parallel Systems such as SP2, CM5, Paragon, T3D

HTML version of Basic Foils prepared 10 Sept 1996

Foil 44 Granularity of Parallel Components - II

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Fine-grain: Thousands to perhaps millions of small pieces, executed by very small, simple processors (several per chip) or through pipelines.
  • Processors often have instructions broadcasted to them.
  • Computation/ Communication ratio often near unity.
  • Typical of SIMD but seen in a few MIMD systems such as Kogge's Execube, Dally's J Machine or commercial Myrianet (Seitz)
  • This is going to be very important in future petaflop architectures as the dense chips of year 2003 onwards favor this Processor in Memory Architecture
  • So many "transistors" in future chips that "small processors" of the "future" will be similar to todays high end microprocessors
  • As chips get denser, not realistic to put processors and memories on separate chips as granularities become too big
Note that a machine of given granularity can be used on algorithms of the same or finer granularity

HTML version of Basic Foils prepared 10 Sept 1996

Foil 45 Classes of Communication Networks

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
The last major architectural feature of a parallel machine is the network or design of hardware/software connecting processors and memories together.
Bus: All processors (and memory) connected to a common bus or busses.
  • Memory access fairly uniform, but not very scalable due to contention
  • Bus machines can be NUMA if memory consists of directly accessed local memory as well as memory banks accessed by Bus. The Bus accessed memories can be local memories on other processors
Switching Network: Processors (and memory) connected to routing switches like in telephone system.
  • Switches might have queues and "combining logic", which improve functionality but increase latency.
  • Switch settings may be determined by message headers or preset by controller.
  • Connections can be packet-switched (messages no longer than some fixed size) or circuit-switched (connection remains as long as needed)
  • Usually NUMA, blocking, often scalable and upgradable

HTML version of Basic Foils prepared 10 Sept 1996

Foil 46 Switch and Bus based Architectures

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Switch
Bus

HTML version of Basic Foils prepared 10 Sept 1996

Foil 47 Examples of Interconnection Topologies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Two dimensional grid, Binary tree, complete interconnect and 4D Hypercube.
Communication (operating system) software ensures that systems appears fully connected even if physical connections partial

HTML version of Basic Foils prepared 10 Sept 1996

Foil 48 Useful Concepts in Communication Systems

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Useful terms include:
Scalability: Can network be extended to very large systems? Related to wire length (synchronization and driving problems), degree (pinout)
Fault Tolerance: How easily can system bypass faulty processor, memory, switch, or link? How much of system is lost by fault?
Blocking: Some communication requests may not get through, due to conflicts caused by other requests.
Nonblocking: All communication requests succeed. Sometimes just applies as long as no two requests are for same memory cell or processor.
Latency (delay): Maximal time for nonblocked request to be transmitted.
Bandwidth: Maximal total rate (MB/sec) of system communication, or subsystem-to-subsystem communication. Sometimes determined by cutsets, which cut all communication between subsystems. Often useful in providing lower bounds on time needed for task.
Worm Hole Routing -- Intermediate switch nodes do not wait for full message but allow it to pass throuch in small packets

HTML version of Basic Foils prepared 10 Sept 1996

Foil 49 Communication Performance of Some MPP's

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
From Aspects of Computational Science, Editor Aad van der Steen, published by NCF
System Communication Speed Computation Speed
    • Mbytes/sec(per link) Mflops/sec(per node)
IBM SP2 40 267
Intel iPSC860 2.8 60
Intel Paragon 200 75
Kendall Square
KSR-1 17.1 40
Meiko CS-2 100 200
Parsytec GC 20 25
TMC CM-5 20 128
Cray T3D 150 300

HTML version of Basic Foils prepared 10 Sept 1996

Foil 50 Implication of Hardware Performance

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
tcomm = 4 or 8 /Speed in Mbytes sec
  • as 4 or 8 bytes in a floating point word
tfloat = 1/Speed in Mflops per sec
Thus tcomm / tfloat is just 4 X Computation Speed divided by Communication speed
tcomm / tfloat is 26.7, 85, 1.5, 9.35, 8, 5, 25.6, 8 for the machines SP2, iPSC860, Paragon, KSR-1, Meiko CS2, Parsytec GC, TMC CM5, and Cray T3D respectively
Latency makes situation worse for small messages and double for 64bit arithmetic natural on large problems!

HTML version of Basic Foils prepared 10 Sept 1996

Foil 51 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 5 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 52 Abstract of Sept 5 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This starts by considering the analytic form for communication overhead and illustrates its stencil dependence in simple local cases -- stressing relevance of grain size
The implication for scaling and generalizing from Laplace example is covered
  • We covered scaled speedup (fixed grain size) as well as fixed problem size
We noted some useful material was missing and this was continued in next lecture (Sept 10,96)
The lecture starts coverage of computer architecture covering base technologies with both CMOS covered in an earlier lecture contrasted to Quantum and Superconducting technology

HTML version of Basic Foils prepared 10 Sept 1996

Foil 53 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 10 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 54 Abstract of Sept 10 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This starts by filling in details of communication overhead in parallel processing for case where "range" of interaction is large
We show two old examples from Caltech illustrates correctness of analytic form
We return to discussion of computer architectures describing
  • Vector Supercomputers
  • General Relevance of data locality and pipelining
  • Flynn's classification (MIMD,SIMD etc.)
  • Memory Structures
  • Initial issues in MIMD and SIMD discussion

HTML version of Basic Foils prepared 10 Sept 1996

Foil 55 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 12 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 56 Abstract of Sept 12 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This continues the computer architecture discussion with
  • MIMD and SIMD with distributed shared memory
  • MetaComputers
  • Special Purpose Architectures
  • Granularity with technological changes forcing larger process sizes
  • Overview of Communication Networks with
    • Switches versus topologies versus buses
    • Typical values in today's machines

HTML version of Basic Foils prepared 10 Sept 1996

Foil 57 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 24 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 58 Abstract of Sept 24 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This continues the discussion of Fortran 90 with a set of overview remarks on each of the key new capabilities of this language
We also comment on value of Fortran90/HPF in a world that will switch to Java
We disgress to discuss a general theory of problem architectures as this explains such things as tradeoffs
  • HPCC v Software Engineering
  • HPF versus MPI
And the types of applications each software model is designed to address
(Note Server errors at start which confuses audio)

HTML version of Basic Foils prepared 10 Sept 1996

Foil 59 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of September 26 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 60 Abstract of Sept 26 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This quickly completes the discussion of problem architecture but rather than continuing qualitative discussion of HPF applications in notes
Jumped to a discussion of HPF language describing
Basic Approach to Parallelism with "owner-computes" rule
Types of new constructs with
TEMPLATE ALIGN and PROCESSORS described
The lecture started with a description of the Web based Programming Laboratory developed by Kivanc Dincer

HTML version of Basic Foils prepared 10 Sept 1996

Foil 61 Embarassingly Parallel Problem Class

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Loosely Synchronous, Synchronous or Asynchronous classify problems by their "control" or "synchronization" structure
However there is an important class of problems where this does not matter as the synchronization overhead -- even if in the difficult asynchronous case -- is irrelevant
This is when overhead small or zero. These are "embarassingly parallel problems" where each component of decomposed problem is essentially independent
Examples are:
  • Financial Modelling where each component is calculating some expected value for a particular (possibly Monte Carlo) set of assumptions about the future
  • OLTP (Online Transaction Processing) where each component is a separate checking of a credit card transaction against the account data
  • Graphics rendering where each component is the calculation of the color of a particular pixel.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 62 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 63 Abstract of CPS615 HPCC Software Technologies

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
We go through the 2D Laplace's Equation with both HPF and MPI for Simple Jacobi Iteration
HPF and Fortran90 are reviewed followed by MPI
We also discuss the structure of problems as these determine why and when certain software approaches are appropriate

HTML version of Basic Foils prepared 10 Sept 1996

Foil 64 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 1 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 65 Abstract of Oct 1 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This continues the discussion of HPF in the area of distribution and ALIGN statements.
The discussion of ALIGN should be improved as audio makes dubious statements about "broadcasting" information.
The distribution discussion includes a reasonable descriuption of block and cyclic and when you should use them.

HTML version of Basic Foils prepared 10 Sept 1996

Foil 66 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 3 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 67 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 10 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 68 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 69 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 22 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 70 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 24 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 71 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 31 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 72 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 73 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 8 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 74 Abstract of Oct 24 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This covers two topics:
Monte Carlo Integration for large scale Problems using Experimental and Theoretical high energy physics as an example
This includes accept-reject methods, uniform weighting and parallel algorithms
Then we complete HPF discussion with embarassingly parallel DO INDEPENDENT discussed in Monte Carlo case
And HPF2 Changes

HTML version of Basic Foils prepared 10 Sept 1996

Foil 75 Abstract of Oct 22 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This starts by finishing the simple overview of statistics
Covering Gaussian Random Numbers, Numerical Generation of Random Numbers both sequentially and in parallel
Then we describe the central limit theorem which underlies Monte Carlo method
Then it returns to Numerical Integration with the first part of discussion of Monte Carlo Integration

HTML version of Basic Foils prepared 10 Sept 1996

Foil 76 Abstract of Oct 15 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This finishes the last part of N body and ODE discussion fociussing on pipeline data parallel algorithm
Note several foils were changed after presentation and so discussion is a little disconnected from foils at times
We start Numerical Integration with a basic discussion of Newton-Cotes formulae (including Trapezoidal and Simpson's rule)
We illustrate them pictorially

HTML version of Basic Foils prepared 10 Sept 1996

Foil 77 Abstract of Nov 7 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This completes the MPI general discussion with the basic message passing, collective communication and some advanced features
It then returns to Laplace Example foilset to show how MPI can be used here
  • We have previously used this for HPF and performance analysis

HTML version of Basic Foils prepared 10 Sept 1996

Foil 78 Abstract of Nov 8 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This starts basic module on Partial Differential Solvers with
Introduction to Classification of Equations
Basic Discretization
Derivation of Sparse Matrix Formulation
Analogies of Iteration with Artificial Time
Start of Explicit Matrix Formulation for Simple Cases

HTML version of Basic Foils prepared 10 Sept 1996

Foil 79 Abstract of Oct 3 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Tom Haupt on Fortran 90 / HPF

HTML version of Basic Foils prepared 10 Sept 1996

Foil 80 Abstract of Oct 10 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This discusses solution of systems of ordinary differential equations (ODE) in the context of N squared particle dynamics problems
We start with motivation with brief discussion of relevant problems
We go through basic theory of ODE's including Euler, Runge-Kutta, Predictor-Corrector and Multistep Methods
We begin the discussion of solving N body problems using classic Connection Machine elegant but inefficient algorithm
Note -- Some foils expanded to two after talk and second one is given without audio in script

HTML version of Basic Foils prepared 10 Sept 1996

Foil 81 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 14 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 82 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 26 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 83 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Sept 1996

Foil 84 Abstract of Nov 14 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This started with a description of current Web set-up of CPS615 and other foilsets
Then we started the foilset describing Physical Simulations and the various approaches -- Continuum Physics, Monte Carlo, Quantum Dynamics, and Computational Fluid Dynamics
For CFD we do enough to discuss why viscosity and High Reynolds numbers are critical in air and similar media
We discuss computation and communication needs of CFD compared to Laplace equation

HTML version of Basic Foils prepared 10 Sept 1996

Foil 85 Abstract of Nov 26 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This covers essentially all the finite element method and its solution using the conjugate gradient method
Only the simple 2D Laplace equation using triangular nodes is discussed
We stress variational method as an optimization method and you use this analogy to motivate conjugate gradient as an improved steepest descent approach
We discuss parallel computing issues for both finite element and conjugate gradient

HTML version of Basic Foils prepared 10 Sept 1996

Foil 86 Abstract of Dec 5 1996 CPS615 Lecture

From New CPS615Master Foils-- 26 August 96 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 96. *
Full HTML Index
This lecture covers two distinct areas.
Firstly a short discussion of LInear Programming -- what type of problems its used for, what the equations look like and basic issues in the difficult use of parallel processing
Then we give an abbreviated discussion of Full Matrix algorithms covering
  • The types of applications that use them
  • Matrix Multiplication including Cannon's algorithm in detail
  • Use of MPI primitives including communicator groups
  • Performance Analysis

© on Tue Oct 7 1997