Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 29 August 1995
Outside Index
Summary of Material
Secs 30
Technology Driving Forces for HPCC |
Overview of What and Why is Computational Science
|
Elementary Discussion of Parallel Computing in the "real-world"
|
Sequential Computer Architecture |
Outside Index Summary of Material
Geoffrey Fox |
NPAC |
Room 3-131 CST |
111 College Place |
Syracuse NY 13244-4100 |
Technology Driving Forces for HPCC |
Overview of What and Why is Computational Science
|
Elementary Discussion of Parallel Computing in the "real-world"
|
Sequential Computer Architecture |
RAM density increases by about a factor of 50 in 8 years |
Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109) |
Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012) |
Computer Performance is increasing faster than RAM density |
See Chapter 5 of Petaflops Report -- July 95 |
See Chapter 5 of Petaflops Report -- July 95 |
Different machines |
New types of computers |
New libraries |
Rewritten Applications |
Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science |
Will be a nucleus for the phase transition |
and accelerate use of parallel computers in the real world |
Each node is CPU and 6 memory chips -- CPU Chip integrates communication channels with floating, integer and logical CPU functions |
32 node CM-5 and in foreground old CM-2 diskvault |
Simple, but general and extensible to many more nodes is domain decomposition |
All successful concurrent machines with
|
Have obtained parallelism from "Data Parallelism" or "Domain Decomposition" |
Problem is an algorithm applied to data set
|
The three architectures considered here differ as follows: |
MIMD Distributed Memory
|
MIMD Shared Memory
|
SIMD Distributed Memory
|
2 Different types of Mappings in Physical Spaces |
Both are static
|
Different types of Mappings -- A very dynamic case without any underlying Physical Space |
c)Computer Chess with dynamic game tree decomposed onto 4 nodes |
And the corresponding poor workload balance |
And excellent workload balance |
The fundamental principles behind the use of concurrent computers are identical to those used in society - in fact they are partly why society exists. |
If a problem is too large for one person, one does not hire a SUPERman, but rather puts together a team of ordinary people... |
cf. Construction of Hadrians Wall |
Domain Decomposition is Key to Parallelism |
Need "Large" Subdomains l >> l overlap |
AMDAHL"s LAW or |
Too many cooks spoil the broth |
Says that |
Speedup S is small if efficiency e small |
or for Hadrian's wall |
equivalently S is small if length l small |
But this is irrelevant as we do not need parallel processing unless problem big! |
"Pipelining" or decomposition by horizontal section is:
|
Hadrian's Wall is one dimensional |
Humans represent a flexible processor node that can be arranged in different ways for different problems |
The lesson for computing is: |
Original MIMD machines used a hypercube topology. The hypercube includes several topologies including all meshes. It is a flexible concurrent computer that can tackle a broad range of problems. Current machines use different interconnect structure from hypercube but preserve this capability. |
Comparing Computer and Hadrian's Wall Cases |
The case of Programming a Hypercube |
Each node runs software that is similar to sequential code |
e.g., FORTRAN with geometry and boundary value sections changed |
Geometry irregular but each brick takes about the same amount of time to lay. |
Decomposition of wall for an irregular geometry involves equalizing number of bricks per mason, not length of wall per mason. |
Fundamental entities (bricks, gargoyles) are of different complexity |
Best decomposition dynamic |
Inhomogeneous problems run on concurrent computers but require dynamic assignment of work to nodes and strategies to optimize this |
(we use neural networks, simulated annealing, spectral bisection etc.) |
Global Parallelism
|
Local Parallelism
|
Local and Global Parallelism |
Should both be Exploited |
Disk (input/output) Technology is better matched to several modest power processors than to a single sequential supercomputer |
Concurrent Computers natural in databases, transaction analysis |
At the finest resolution, collection of neurons sending and receiving messages by axons and dendrites |
At a coarser resolution |
Society is a collection of brains sending and receiving messages by sight and sound |
Ant Hill is a collection of ants (smaller brains) sending and receiving messages by chemical signals |
Lesson: All Nature's Computers Use Message Passing |
With several different Architectures |
Problems are large - use domain decomposition Overheads are edge effects |
Topology of processor matches that of domain - processor with rich flexible node/topology matches most domains |
Regular homogeneous problems easiest but |
irregular or |
Inhomogeneous |
Can use local and global parallelism |
Can handle concurrent calculation and I/O |
Nature always uses message passing as in parallel computers (at lowest level) |
Three Instructions are shown overlapped -- each starting one clock cycle after last |