Given by Geoffrey C. Fox at CPSP713 Case studies in Computational Science on Spring Semester 1996. Foils prepared 15 March 1996
Outside Index
Summary of Material
Parallel Computers for the Simulation of Complex Systems
|
Complex Systems for the theory of Computer and Network Architecture
|
Complex Systems for new Computational Methodologies
|
Outside Index Summary of Material
gcf@nova.npac.syr.edu |
http://www.npac.syr.edu |
315.443.2163 |
Parallel Computers for the Simulation of Complex Systems
|
Complex Systems for the theory of Computer and Network Architecture
|
Complex Systems for new Computational Methodologies
|
Can one use the 1 ----->10 TERAFLOP machines which will be available in the middle to late 1990's ? |
What software do you need ? |
What about virtual reality and distributed metacomputers ? |
When it has enough headroom (e.g. a factor of ten better "performance"/"cost-performance" than existing technology)
|
nCUBE-1, CM-2 were comparable or perhaps a factor of 2 ... 4 better in cost-performance than conventional supercomputers
|
DECmpp, IBM SP1, INTEL Paragon, CM-5, nCUBE-2 are (perhaps) a factor of ten better than competitive sequential (vector) machines
|
$2.9 billion over 5 years |
The Grand Challenges
|
Nearly all have industrial payoff but technology transfer NOT funded by HPCCI |
Hardware trends imply that all computers (PC's--->Supercomputers) will be parallel by the year 2000
|
Software is challenge and could prevent/delay hardware trend that suggests parallelism will be a mainline computer architecture
|
Data Parallelism
|
High Performance
|
Code Tuning
|
Validation
|
The underlying intuition behind HPF computational model is that
|
The main mechanism for achieving parallelism in HPF is the data partition annotation of HPF. Distribution of control is derived from the data distribution, using the owner compute rule: each assignment is executed by the processor(s) that own the assigned variable. (However, a compiler may relax the owner rule, in order to improve performance.) |
An HPF directive has the form of a comment line: |
!HPF$ hpf-directive |
A goal of the design is that HPF directives should be consistent with Fortran 90 syntax in the following strong sense: if HPF were to be adopted as a part of a future Fortran standard, the only change necessary to convert an HPF program would be to remove the comment character and directive prefix from each directive. |
There is a clear separation between directives that serves as specification statements and directives that serve as executable statements ( in the sense of the Fortran standard). Specification statements are carried out on entry to a program unit; after these, executable statements are carried out. |
Model of the allocation of data objects to processor memories is that there is a two-level mapping of data objects to abstract processors. Data objects (typically array elements) are first aligned with an abstract indexing space called a template; a template is then distributed onto a recilinear arrangement of abstract processors. The implementation then uses the same number, or perhaps some smaller number, of physical processors to implement these abstract processors. This mapping of abstract processors to physical processors is implementation dependent. |
!HPF$ TEMPLATE B (N, N) |
!HPF$ ALIGN X3 (I, J) with B (I, J-1) |
!HPF$ ALIGN X4 (I, J) with B (I-1, J+2) |
!HPF$ TEMPLATE A (N, N), B (N, N), C (N, N) |
!HPF$ DISTRIBUTE A (BLOCK, *) |
!HPF$ DISTRIBUTE B (*, BLOCK ) |
!HPF$ DISTRIBUTE C (BLOCK, BLOCK ) |
The final draft of the HPF proposal is available via anonymous ftp from
|
E-mail servers
|
HPFF mailing list
|
Alpha version: demonstrated at SUPERCOMPUTING '92 in Minneapolis, MN
|
Now running on nCUBE2, iPSC/860 and network of SUN workstations |
beta release: June 1993 |
alliance with industry to develop commercial compiler for HPF, optimized for specific architectures |
next target: heterogeneous networks |
High Performance Fortran is only suitable for data parallel problems whose data structure is an array.
|
But complex systems can be complicated! |
For instance, Image Processing, Data Fusion, Multidisciplinary Design have several modules.
|
Si: Satellite with: Image processing Kalman filter Stereo match |
FCi: Fire Control platform with loosely synchronous target-weapon pairing algorithm |
E: Overall Asynchronous expert system |
The analogy between metaproblems and metacomputers |
Can use teraflop computers to make such "games" more realistic. |
Heirarchy of Complexity |
Virtual Class e.g. weather or cortex model |
Virtual Class room |
Virtual Laboratory realistic SIMCITY; combines several classes |
Virtual School .... |
Experience virtual classes ---> school with "Virtual Reality" |
Complex System Simulation requires
|
Possible Implementation |
Basic Features
|
Highlights
|
Flaws
|
Forms of parallelism in the AVS model:
|
NPAC Approach
|
modern parallel systems (CM-5, nCUBE2, DECmpp, Intel Touchstone) |
High Performance Fortran and Concurrent C++ software environment (compilers, interpreters, runtime support) |
parallel Oracle + SGI front-ends |
expert system (RT/AI from Intellisys, Inc.) |
Clusters of DEC/IBM/SUN workstations on the Ultra-Net / FCS / HIPPI / ATM (?) based gigabit/sec LAN |
visualization front-ends (AVS/Explorer, Khoros/apE) |
multimedia front-ends |
VR front-ends (BOOM, Convolvotron, DataGloves, PROvision and Sense8 systems, new consumer VR products) |
MOVIE as integration/infrastructure software |
Use Complex Systems to describe |
Problems |
Computers |
Networks |
And their Interaction |
A hierarchy of mapping problems |
We would like to optimize the overall mapping |
Both problems and computer are |
Collections of homogeneous or heterogeneous basic entities with some sort of static or dynamic connection between them |
Modeling involves mapping one "complex problem" into another |
Simulation or parallel computing involves map Complex problem ---> Complex Computer |
What parameters of these two underlying complex systems control effectiveness of map? |
Execution time - main focus of HPCC community? |
User happiness - main focus of software engineering community |
2 Different types of Mappings in Physical Spaces |
Both are static
|
Different types of Mappings -- A very dynamic case without any underlying Physical Space |
c)Computer Chess with dynamic game tree decomposed onto 4 nodes |
All successful parallel computers with:
|
This is just the mapping of data ("spatial domain") of complex problem onto spatial structure of complex computer Whether architecture SIMD MIMD Distributed or Shared Memory |
Space is either
|
or at a different resolution
|
Time is either
|
or at a different resolution (on CISC computers)
|
(The correspondence implied by theory of complex systems) |
Problems have a data domain- "SPACE" |
and the computation involves work done on the elements of this domain; label this work by - "TIME" |
Examples
|
e.g. On a sequential computer, all aspects of problem are mapped onto "TIME" in computer |
Seismograph: time dependence of earthquake mapped on spatial extent of recording paper. |
High Performance Fortran
|
Space (elements of arrays A,B) in problem is mapped into space (i.e. arrays preserved) in Virtual Problem |
---> Compiler can recognize data parallelism |
---> Good parallel performance on all machines |
Fortran77 Do 1 I = 1, 10000
|
Space in problem is mapped into time (successive iterations of DO loop) in Virtual Problem |
---> Compiler has to "undo" this to recognize parallelism and generally fails |
---> Good languages preserve problem architecture |
d = geometric dimension for local problems |
d~2 computer chips (Rent's Rule) |
d=1 ALL long range force problems - whatever dimension |
Homogeneous Multicomputer |
Purely Spatial Decomposition |
Both Distributed and Hierarchial memory need DATA LOCALITY in memory references
|
Hierarchal memory ----> "Time" |
Distributed memory ----> "Space" |
All overheads proportional to 1/[space-time volume] 1/d |
with space and time decomposition (cf. BLAS-3 for Matrices) |
High Performance Fortran expresses data locality ==> will improve performance of sequential machines |
Onto 8 Nodes |
Problem is to Find MAP in |
DISTRIBUTION (MAP) in FortranD/HPF (High Performance Fortran |
A user (compiler) generated data decomposition |
User (compiler) supplies |
What is to be distributed |
"Graph" - algorithmic connection between entities to be distributed |
These entities are:
|
Label data points (members of problem spatial domain by i |
Each i ---> Particle moving in space with discrete points (number of points in space = number of nodes in machine) and with topology defined by interconnection of nodes |
Let problem be defined by parameters:
|
Need ground state of H |
from Chrisochoides |
The "Particles" and "springs"/forces between them including effect of the cost of redistribution |
In a dynamic load balancing problem (time dependent system), simulated annealing will be performed concurrently on the same computer ensemble used to simulate system. |
Suppose "optimal" annealing strategy used. There will be a minimum temperature TS to which it will be possible to "cool" system before time spent annealing is longer than gain due to improved load balancing. |
Note TS is independent of speed of computer - it is a property of system being simulated |
Block Scattered and Domain Decomposition Distributions Illustrated |
Note scattered position is independent of time |
Note Domain decomposition can end up worse than block scattered after averaging |
The particle - process analogy does not apply to |
Asynchronous problems
|
Loosely synchronous problems where fundamental entities move between nodes on a microscopic time scale (such problems can be called microscopically dynamic) |
Example - Global sums as used in
|
Where I and J distributed over nodes of hypercube |
Originally we used the analogy
|
Now generalize this to
|
In this new formalism our loosely synchronous microscopically static problems are represented by |
Particles p1 and p2 interact with an attractive "communication force" where this force is carried by message (packets) which are the quantum of force |
So loosely synchronous microscopically static problems can be described by strings that vary slowly with time and it is sufficient to slice, every now and then, at a fixed time and consider as particles |
The true problem is to minimize the length in the time direction of the string of maximum length. This can be treated directly in "full string dynamics" but here we will follow particle analogy and replace this minimax problem by a least squares formulation |
Again replace exact constraints (e.g. a node can only do one or a few things at a time) by at least squares constraints
|
New terms in energy (in field theory language) corresponding to the creation or annihilation or strings |
One uses as dynamical variables the complete strings {xp(t)} or probably better specifies them with a neural formalism inspired by Hopfield and Tank's traveling salesperson work |
Sp (x,t) = 0 if p is not at x time t |
Sp (x,t) = 1 if p is at x time t |
with constraint S Sp (x,t) = 1 (one string in field) |
The associated dynamics is that of interacting strings with usual repulsive and attractive forces plus creation and annihilation. |
Cellular Automata |
Physical Computation
|
Examples of Physical Computation
|
Physical (Pertaining to Nature) optimization and associated field
|
These can be compared with
|
Work of Nashat Mansour |
A system is specified by a string of (binary) digits and evolves with a set of transitions generated by Genetic Operators |
Parallel algorithms for
|
Large problems (the "real world") require multiscale algorithms just as multigrid is faster than Gauss Seidel for large P.D.E.'s (take --> log Nnode, d=2,3) |
This compares parallel Genetic(PGA), Simulated Annealing(PSA), Neural Network(PNN) and a wonderful heuristic(PRSB -recursive spectral bisection) |
This comes from work of Nashat Mansour |
What is the shape of the objective function?
|
Do you need the
|
Local minima in "physics" problems are closely correlated with true minimum |
"Computer Science" grand challenges in optimization might look different |
Minimize E = E(parameters y) |
y can be continuous or discrete, or a mix |
Introduce a fake temperature T and set b = 1/T
|
Probability of state y = exp(-bE)/Z where |
As b ® ¥, minimum E (ground state) dominates |
Find y(T) by minimizing Free Energy |
F = E - TS = -1/b log Z |
Annealing tracks global minima by initializing search at temperature T by minima found at temperature T + dT |
Movement at given temperature |
Global Minimum moving with decreasing temperature |
Simulated annealing: find y(T) by Monte Carlo as mean over configurations at that temperature |
Neural networks: y is discrete. Find y by mean field approximation |
Elastic net: y is discrete, but use improved mean field including some or all constraints |
Deterministic annealing: leave important y out of sum S. Find by simple iterative optimization |
General, Travelling Salesman, Quadratic Assignment
|
Scheduling
|
Track Finding
|
Robot Path Planning, Navigation
|
Character Recognition
|
Image Analysis
|
Clustering, Vector Quantization (coding), Electronic Packaging
|
A new method Simulated Tempering by Marinari ( Rome, Syracuse) and Parisi (Rome) |
Conventional Monte Carlo methods do not work well for Random Field Ising Model = RFIM |
Spins si live on 3D lattice |
si = +/-1 to be determined |
Fixed Magnetic Field |
hi = |h| ri |
|h| = 1 |
ri = +/- 1 with random probability 1/2 |
Add b to list of dynamical variables |
b chosen out of {bm} with probability given by
|
b1 Increasing |
b2 |
b3 |
b4 |
b5 |
Now system can move up in temperature and jump barrier |
Don't have to decide on a priori annealing schedule i.e. list of bm and computer time to be spent at each bm. |