Full HTML for

Basic foilset Overview of HPC for Physical Sciences

Given by Geoffrey C. Fox at Mardi Gras Conference on Concurrent Computing in Physical Sciences on February 18, 1993. Foils prepared October 22 1997
Outside Index Summary of Material


This overviews status of HPCC architectures
Software approaches focussing on HPF and an Application Analysis
Problem Architectures Load Balancing and the Complex System Approach

Table of Contents for full HTML of Overview of HPC for Physical Sciences

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Overview of High Performance Computing for Physical Sciences
2 Performance of Computers as a Function of Time
3 Performance of QCD Machines
4 When will Parallel Computing Take Over ?
5 Parallelism Implies
6 Program in Computational Science Implemented within current academic framework
7 We have learnt that Parallel Computing Works !
8 METHODOLOGY
9 Sideways View of Concurrent Computing Mappings
10 Sample uses of Concurrent Computers
- 4 Nodes

11 Parallel Computing Trends to Year 2000
12 39 National Grand Challenges
13 39 National Grand Challenges (II)
14 39 National Grand Challenges (III)
15 39 National Grand Challenges (IV)
16 Most Grand Challenges are
17 ACTION Motivation
18 Applications of Interest to Industry IA
19 Applications of Interest to Industry IB
20 Applications of Interest to Industry IC
21 Pluses and Minuses of Parallel Computing
22 P C E T E C H Ñ A Result of
23 The Life History of an HPC Idea
24 Core Enabling Software Technologies
25 Core Enabling Algorithms and Components
26 Portable Scalable Languages
27 Problem Architectures
28 Software Problem Architecture Interactions
29 Why build on existing languages - especially Fortran / C
30 Fortran( C ) Plus Message Passing
31 Fortran( C ) Plus Message Passing
32 Data Parallel Fortran
33 Data Parallel Fortran ( C, C++, LISP, ADA )
34 The Origins of High Performance Fortran
35 Strategy for HPF
36 Model for Scalable Languages
37 Dynamically Triangulated Random Surfaces
38 Update Strategies for DTRS
39 N=36 Wireframe DTRS
40 N=144 Lambda =1.5 Colored DTRS
41 N=144 Lambda=1 Colored DTRS
42 Computational Issues and Code Performance Optimization
43 Sequential Computer Architecture
44 Performance of Random Surface Code
45 Parallel Algorithms
46 Parallel Grid Generation
47 What is Fortran 90?
48 HPF Language Features - 1
49 HPF Language Features - 2
50 FORTRAN-90D
The First Implementation of HPF
(NPAC, Syracuse University)
Current Status

51 Fortran90D Performance on Gaussian Elimination
52 Gaussian Elimination
Example of Fortran90D Source Code

53 Fortran90D Interface
54 HPF Directives
55 Data Alignment
56 Data Distribution
57 FORALL
58 DO INDEPENDENT
59 HPF Library
60 Intrinsic Library
61 HPF Library
62 Fortran 90 Local Routine Intrinsics
63 Does HPF need a different type of compiler ?
64 HPF Interpreter
65 Benchmarking Set ( Fortran90D and Fortran 77 )
66 HPF/FORTRAN-90D Benchmark Suite
67 The final draft of the HPF Language Specification is version 1.0 - DRAFT, dated January 25, 1993
68 Request for Public Comment on High Performance Fortran
69 Application Structure Versus Fortran90D Features
70 What applications does HPF support?
71 Current HPF can do I .....
72 Current HPF can do II.....
73 HPF can express using FORALL
74 Classic Irregular Mesh Application
75 HPF Can Express
76 Simulation of Spin Models of Magnets
77 New Monte Carlo Algorithms
78 Magnetization for Random Field Ising Model -- Metropolis
79 Magnetization for Random Field Ising Model -- Simulated Annealing
80 Wolff cluster (bands shown in yellow) for 3 state Potts model at Tc
81 Swendsen-Wang clusters (boundaries shown in black) for 3 state Potts model at Tc
82 Cluster Algorithm versus Metropolis
83 Parallel Algorithms
84 Parallel Cluster Algorithms
85 MIMD algorithm on nCUBE-1
86 SIMD algorithms on CM-2
87 Autocorrelation Plots for Cluster Algorithms
88 HPF probably cannot express well
89 I do not know how to express - let alone optimize -
90 Late breaking results
91 Large N-Body Calculations (Quinn, Salmon, Warren)
92 Hierarchical Decomposition
93 Ncube Speed Up of Barnes Hut Algorithm
94 17x106 Body Simulation
Diameter 250Mpc
Quinn, Salmon, Warren, Zurek

95 The largest "galaxy" halo (137,000 bodies) from the 8.8M body simulation
(Warren, Fullagar, Quinn, Zurek)

96 8M bodies - 10 Mpc diameter
Final state with ~700 resolved "galaxies"
(Warren, Quinn, Zurek)

97 Smooth Particle Hydro simulation of the collision of two "main sequence stars." 137,000 bodies (Warren,Davies)
98 Expressing Problem Structure in Language
99 The map of Problem ---> Computer is performed in two or more stages
100 From Problem to Machine Space as a Mapping
101 Different Granularities of Decomposition I
102 Different Granularities of Decomposition II
103 Compiler/Interpreter Tradeoffs at Different Levels of Granularity
104 The Mapping of Heterogeneous MetaProblems onto Heterogeneous MetaComputer Systems
105 Mapping of Irregular Grid
106 Finding General Maps for FortranD/HPF
107 Three Physical Optimization Methods for Allocating Data to Multicomputer Nodes
108 Finite Element Mesh
109 Graph Contraction
110 Some Results of Load Balancing Studies I
111 Some Results of Load Balancing Studies II
112 Physics Analogy for Load Balancing
113 Complex System SHLSoft governed by Hamiltonian = Execution Time
114 PHYSICS ANALOGY FOR STATIC AND DYNAMIC LOAD BALANCING
115 Definition of Temperature for a Complex System
116 Particle dynamics problem on a four node system
117 Energy Structure in Physics Analogy with Multiple Minima
118 Phase Transitions in Physical model -- Scattered versus Domain Decompositions

Outside Index Summary of Material



HTML version of Basic Foils prepared October 22 1997

Foil 1 Overview of High Performance Computing for Physical Sciences

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Geoffrey C. Fox
gcf@nova.npac.syr.edu
315.443.2163
Mardi Gras Conference
Concurrent Computing in the Physical Sciences
Louisiana State University
Baton Rouge, LA
February 18, 1993

HTML version of Basic Foils prepared October 22 1997

Foil 2 Performance of Computers as a Function of Time

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 3 Performance of QCD Machines

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 4 When will Parallel Computing Take Over ?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
This is a phase transition
Needs headroom (Carver Mead) which is large (factor of 10 ?) due to large new software investment
nCUBE-1 and CM-2 were comparable in cost performance to conventional supercomputers
  • Enough to show that "Parallel Computing Works"
  • Not enough to take over!
Intel Paragon, CM-5, DECmpp (MP-2), IBM SP-1 have enough headroom?

HTML version of Basic Foils prepared October 22 1997

Foil 5 Parallelism Implies

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Different machines
New types of computers
New libraries
Rewritten Applications
Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science
Will be a nucleus for the phase transition
and accelerate use of parallel computers in the real world

HTML version of Basic Foils prepared October 22 1997

Foil 6 Program in Computational Science Implemented within current academic framework

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 7 We have learnt that Parallel Computing Works !

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Data Parallelism - universal form of scaling parallelism
Functional Parallelism - Important but typically modest speedup. - Critical in multidisciplinary applications.
On any machine architecture
  • Distributed Memory MIMD
  • Distributed Memory SIMD
  • Shared memory - this affects programming model This affects generality SIMD ~ 50% academic problems but < 50% commercial

HTML version of Basic Foils prepared October 22 1997

Foil 8 METHODOLOGY

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Simple, but general and extensible to many more nodes is domain decomposition
All successful concurrent machines with
  • Many nodes
  • High performance (this excludes Dataflow) Have obtained parallelism from "Data Parallelism" or "Domain Decomposition"
Problem is an algorithm applied to data set Obtain concurrency by acting on data concurrently. The three architectures considered here differ as follows:
  • MIMD Distributed Memory Processing and Data Distributed
  • MIMD Shared Memory Processing Distributed
  • SIMD Distributed Memory Synchronous Processing on Distributed Data

HTML version of Basic Foils prepared October 22 1997

Foil 9 Sideways View of Concurrent Computing Mappings

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 10 Sample uses of Concurrent Computers
- 4 Nodes

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 11 Parallel Computing Trends to Year 2000

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Hardware trends imply that all computers (PC's ---> Supercomputers) will be parallel by the year 2000
  • Upto now parallel computers are from small start-up companies (except Intel Supercomputer Division)
  • Recently Cray, Convex (HP), Digital, IBM have announced massively parallel computing initiatives
  • Several architectures but only one : Distributed memory MIMD multicomputer is known to scale from one to very many processors
Software is challenge and could prevent/delay hardware trend that suggests parallelism will be a mainline computer architecture
  • first we must get systems software correct
  • then applications software

HTML version of Basic Foils prepared October 22 1997

Foil 12 39 National Grand Challenges

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 13 39 National Grand Challenges (II)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 14 39 National Grand Challenges (III)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 15 39 National Grand Challenges (IV)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 16 Most Grand Challenges are

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Partial Differential Equations
Particle Dynamics and Multidisciplinary Integration
Image Processing Some:
Visualization
Artificial Intelligence Not Much:
Network Simulation
Economic (and other complex system) modeling
Scheduling
Manufacturing
Education
Entertainment
Information Processing
  • BMC3IS (Command & Control in military war)
  • Decision Support in global economic war

HTML version of Basic Foils prepared October 22 1997

Foil 17 ACTION Motivation

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Industrial and Government Problems
  • Are a factor of 10 to 100 larger (up to 106 lines)
    • Have a longer lifetime
    • Less accessible expertise
  • The parallel hardware must be much better (factor of >ten "headroom" in Carver Mead's model) than sequential competition. This will happen in the next two years.
  • Must have better software to port and maintain large parallel codes -- can only convert once!
    • Are software issues the same for academic and industrial production codes?
    • Probably not? So we need to integrate computer science (software) research and development with experience from industrial codes.

HTML version of Basic Foils prepared October 22 1997

Foil 18 Applications of Interest to Industry IA

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 19 Applications of Interest to Industry IB

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 20 Applications of Interest to Industry IC

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 21 Pluses and Minuses of Parallel Computing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
We know parallel versions for basic algorithms currently being used in most kernels
  • There are some counter examples: 1) Parallelize incomplete LU factorization for irregular sparse matrices 2) Develop new neural network to predict and hence avoid hurricanes or avoid computer failures.... 3) Optimal preconditioning for 3D partial differential equation is not known in many applications --- to be expected as little experience.
  • Can't use a computer of performance 10P without substantial experience on a computer with performance P

HTML version of Basic Foils prepared October 22 1997

Foil 22 P C E T E C H Ñ A Result of

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
The study of implementation of ACTION
  • What can a small enterprise like ACTION offer much bigger industries?
Study of role of applications in CRPC
  • What can a relatively small group of computer scientists (CRPC) offer the larger national HPCCI ?
Study of relation of CRPC and NSF Supercomputer Centers which are again much larger
  • Ñ> Need to concentrate in areas where CRPC (ACTION)'s unique capabilities have high value
P C E T E C H includes and expands technology development components of CRPC and ACTION

HTML version of Basic Foils prepared October 22 1997

Foil 23 The Life History of an HPC Idea

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 24 Core Enabling Software Technologies

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
PVM, Express, Linda
Einstein (CSW)
ISIS (Cornell)
High Performance Fortran Compiler
High Performance C, C++ Compiler
HPF Extensions - PARTI
Parallel / Distributed Computing Runtime Tools
ADIFOR
AVS and Extensions
MOVIE (Syracuse)
High Performance Fortran Interpreter
Televirtuality / Virtual Reality OS
Parallel Database - Oracle 7.0
OO Database
  • Integration Methodology for above software
  • Heterogeneous Network Target

HTML version of Basic Foils prepared October 22 1997

Foil 25 Core Enabling Algorithms and Components

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Mesh Generation
SCALAPACK
Sparse Matrix Solvers - Templates and libraries (Direct and Iterative)
Particle Dynamics Kernels - Templates and Libraries ( O(N2) to fast multipole)
Optimization Methodology and Templates
  • Linear programming
  • Non-linear programming
Scheduling (neural-net, parallel) Templates

HTML version of Basic Foils prepared October 22 1997

Foil 26 Portable Scalable Languages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
We may have to rewrite our code, but we want the resulting program to run on "All" current and future (anticipated) machines
No compelling new languages e.g. OCCAM does not solve enough problems to warrant adoption Adapt existing languages Fortran C++ ADA C LISP PROLOG BASIC ..........COBOL ?
Can map onto individual sequential or parallel computers or networks of these (distributed computing)

HTML version of Basic Foils prepared October 22 1997

Foil 27 Problem Architectures

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 28 Software Problem Architecture Interactions

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 29 Why build on existing languages - especially Fortran / C

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Experience of users
Good sequential and node compilers exist
Migration of existing codes
Hard to build a new language as need all the features/libraries of Fortran / C implemented well. This is why one uses Fortran 90 and not APL
Fortran and C are good (OK) for a class of applications

HTML version of Basic Foils prepared October 22 1997

Foil 30 Fortran( C ) Plus Message Passing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Natural MIMD model of programming
Used in vast majority of current successful MIMD programming
Each node has program which controls and calculates its data (owner-computes "rule" ) Do i run over local data CALL READ (for non-local data) CALCULATE ALL i's DATA
In simplest form one program per node of computer

HTML version of Basic Foils prepared October 22 1997

Foil 31 Fortran( C ) Plus Message Passing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Advantages
  • Portable to MIMD distributed and shared memory machines - should scale to future machines with some difficulties (overlapping communication with itself and calculation)
  • Available now
  • Will be industry standards Express, PICL, Linda now
  • "All Problems" can be written in it
Disadvantages
  • Can be "hard-work" for user inserting communication calls
  • Optimizations not portable
  • Only MIMD

HTML version of Basic Foils prepared October 22 1997

Foil 32 Data Parallel Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
e.g. Program represented as sequence of functions operating on arrays
Dimension: A(100,100), B(100,100), C(100,100) A = B + C
  • A = B + Shift in x, y (C)
  • forall i, j A(i,j) = B (i,j) + C (i-1, j) + ...... In APL, Fortran90 - like syntax or with explicit loops
All data owned by a particular processor
Owner computes rule: for an expression A(i,j) = .........., Processor holding A(i,j) calculates expression after communicating any needed data if r.h.s. contains data not owned by processor.
Similar concepts in C, C++, LISP, ADA .......

HTML version of Basic Foils prepared October 22 1997

Foil 33 Data Parallel Fortran ( C, C++, LISP, ADA )

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Advantages
  • Very portable and scalable to SIMD and MIMD
  • Should be able to eventually handle ~all synchronous and loosely synchronous problems including ones that only run well on MIMD
  • Industry Standard will be adopted
    • "High Performance Fortran"
    • hope for consensus ~December 1992
    • include CMFortran, FortranD .....
Disadvantages
  • Need to wait for good compiler
  • Not all problems can be expressed in it

HTML version of Basic Foils prepared October 22 1997

Foil 34 The Origins of High Performance Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 35 Strategy for HPF

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Program in an SPMD data parallel style
  • User writes code with (conceptually) a single thread of control and globally accessible data
Program is annotated with assertions giving information about desired data locality and/or distribution
Compiler generates code implementing data and work distribution from the user-written SPMD application
  • For MIMD - implementation is multi-threaded message passing with local data; generated synchronization and send/receive code
  • for SIMD - implementation is parallel code with communication optimized by complier placement of data
  • for vectors - implementation is vectorized code optimized for the vector units
  • for RISC - implementation is pipelined superscalar code with compiler generated cache management

HTML version of Basic Foils prepared October 22 1997

Foil 36 Model for Scalable Languages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 37 Dynamically Triangulated Random Surfaces

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
String theories are candidates for a TOE (Theory of Everything). They provide a plausible quantum theory of gravity.
Basic objects are tiny one dimensional "strings" (rather than points). Dynamics described by a two dimensional space-time world-sheet (rather than world line).
Calculations involve integrating over all possible world-sheets, i.e. all 2-d surfaces embedded in some higher dimensional space (3-d, 4-d, 26-d, ....)
Done numerically by a Monte Carlo integration over different triangulations of the surface, produced dynamically using the Metropolis algorithm.
Current research centers on finding good discretizations of the continuum theories, and trying to find critical points with divergent correlation lengths, so the details of the discrete lattice structure become irrelevant and we can extract continuum results (c.f. lattice QCD)
Interesting critical points are phase transitions between a smooth phase and a crumpled phase of the surface. The order and critical exponents of this crumpling transition are still uncertain.
We have done the largest simulations so far on 2-d systems, but these are only 576 node meshes. Critical slowing down is a major problem - need better algorithms.
The same techniques are used to model real fluctuating surfaces, e.g. cell membranes, lipid bilayers, phase boundaries.

HTML version of Basic Foils prepared October 22 1997

Foil 38 Update Strategies for DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 39 N=36 Wireframe DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 40 N=144 Lambda =1.5 Colored DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 41 N=144 Lambda=1 Colored DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 42 Computational Issues and Code Performance Optimization

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Program for random surfaces is very irregular, with many linked list and pointer trees to traversed. Original code performed very poorly. Completely rewrote and simplified code and gained a substantial speed-up. Still performed poorly on some RISC processors such as the Intel i860.
Optimal use of cache memory is vital for processors such as the i860, which have a small cache memory, and the time to access external memory is much greater than the time to perform a floating point operation
Due to the dynamic nature of the code, the neighbors of a site which are used for the update of that site are constantly changing. This means we cannot make good use of the cache unless we keep a new array which constantly readjusts pointers to neighbors, so that data for neighboring sites is kept in neighboring regions of memory.
Other optimizations on i860 and use of non-IEEE arithmetic gave us a substantial improvement, but still nowhere near 80 Mflops peak performance!
Program performs much better on more recent processors (IBM RS/6000, Hewlett-Packard).

HTML version of Basic Foils prepared October 22 1997

Foil 43 Sequential Computer Architecture

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 44 Performance of Random Surface Code

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 45 Parallel Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Independent Parallelism
  • Have so far used independent parallelism on MIMD machines and workstation networks
  • Due to strong critical slowing down can only be used for small meshes, since producing an independent mesh for each processor takes so long. Need to parallelize over a single simulation.
Parallel Random Surfaces
  • Load Balancing - Poor load balancing on SIMD machines, since the number of links per node can range from 3 to 20.
  • Dynamic Domain Decomposition - Dynamic mesh means that neighboring points will change during the simulation and eventually be on distant processors, requiring substantial non-local communication. Need dynamic load balancing to preserve local communications (c.f. dynamic adjustment of arrays to keep neighboring points in fast cache memory on sequential machines.
  • Detailed Balance - The most difficult problem is to preserve detailed balance. A site cannot be updated at the same time as the other sites on which its value depends. For a regular 2-d grid, use black/red checkerboard update, For a dynamic, irregular grid, identifying independent sites is a time-consuming problem which must be redone before every iteration.
  • Irregular Adaptive Grids for PDEs - This problem has much in common with parallel algorithms for solving PDEs on irregular, adaptive grids (e.g. CFD)
  • Implementation - Currently developing parallel code for CM-5 and Intel Delta

HTML version of Basic Foils prepared October 22 1997

Foil 46 Parallel Grid Generation

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 47 What is Fortran 90?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
FORTRAN 77 entirely contained within Fortran 90
Language goal is to "modernize Fortran, so that it may continue its long history as a scientific and engineering programming language"
Major language additions include:
  • array operations
  • numeric computation facilities
  • parameterized intrinsic data types
  • modularzation of data and procedures
  • pointer facilities
  • many more intrinsic functions
A significantly larger language than FORTRAN 77

HTML version of Basic Foils prepared October 22 1997

Foil 48 HPF Language Features - 1

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Data Distribution Directives
  • locality of reference
  • TEMPLATE directive
  • ALIGN and REALIGN directives
  • DISTRIBUTE and REDISTRIBUTE directives
  • DYNAMIC directive
  • PROCESSORS and processor VIEW directives
Parallel Statements
  • FORALL statement and construct, PURE procedures
  • INDEPENDENT directive
Extended Intrinsic Functions and Standard Library
  • system inquiry functions
  • computational functions
  • distribution inquiry functions

HTML version of Basic Foils prepared October 22 1997

Foil 49 HPF Language Features - 2

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
EXTRINSIC Procedures
  • escape from global name space model
  • EXTRINSIC directive
  • Fortran 90 binding for EXTRINSIC interface
Parallel I/O Statements
  • none
Changes in Sequence and Storage Association
  • restrictions
  • SEQUENCE directive
In addition, a subset of HPF suitable for earlier implementation is defined

HTML version of Basic Foils prepared October 22 1997

Foil 50 FORTRAN-90D
The First Implementation of HPF
(NPAC, Syracuse University)
Current Status

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Alpha version: demonstrated at SUPERCOMPUTING '92 in Minneapolis, MN
  • one- and two-dimensional BLOCK, CYCLIC and BLOCK-CYCLIC distributions
  • FORALL statement
  • full run-time support for regular communications
  • PARTI (ICASE) primitives for irregular communications
  • intrinsic library
Now running on nCUBE2, iPSC/860 and network of SUN workstations
beta release: June 1993
alliance with industry to develop commercial compiler for HPF, optimized for specific architectures
next target: heterogeneous networks

HTML version of Basic Foils prepared October 22 1997

Foil 51 Fortran90D Performance on Gaussian Elimination

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 52 Gaussian Elimination
Example of Fortran90D Source Code

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 53 Fortran90D Interface

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 54 HPF Directives

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Syntactic form of a structured Fortran environment
A program that contains directives is expected to generate the same results whether directives are processed or not, assuming that the directives are used correctly
HPF directives are consistent with Fortran 90
  • if HPF were to be adopted as part of a future Fortran standard, only the comment character and the directive prefix would have to be removed from HPF directives
  • directives follow either free source form or fixed source form

HTML version of Basic Foils prepared October 22 1997

Foil 55 Data Alignment

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Specifies lowest-level abstract alignment of array elements
TEMPLATE is an abstract, very fine-grain virtual processor space Template labels physically independent objects e.g. gluons, fluid particles. Align degrees of freedom e.g. gluon components to template members
Use for machine-independent alignment between structures

HTML version of Basic Foils prepared October 22 1997

Foil 56 Data Distribution

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Specifies coarse-grain partioning of data
PROCESSORS is a coarse-grain processor grid, presumably in 1-1 correspondence with physical processors
Use for machine-dependent partitioning of data structures
Compiler will normally use "owner-computes rule"

HTML version of Basic Foils prepared October 22 1997

Foil 57 FORALL

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Elemental array assignment statement (Similar to array expressions)
FORALL is not a general parallel construct!
Functions without side effects may be called from FORALL
Other functions may be allowed as well

HTML version of Basic Foils prepared October 22 1997

Foil 58 DO INDEPENDENT

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
An assertion that the iterations of the DO may be executed in any order or concurrently
If the assertion is false, the compiler is free to take any action it deems necessary
No restrictions on the loop body
No way to perform explicit synchronization

HTML version of Basic Foils prepared October 22 1997

Foil 59 HPF Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Much of HPF's Parallel "Power" is buried in its library
Personal note: Fortran 90D motivated by similarities between Caltech message passing "collective" communication and F90 Intrinsics
18 Non-elemental (true array) Fortran 90 Intrinsics e.g. CSHIFT, SUM
7 new HPF Intrinsics e.g. Alignment, Template, Distribution inquiries
4 new HPF Reduction (combine) Functions
11 new HPF Combine-Scatter Functions
22 new HPF Parallel Prefix Functions
2 new HPF Parallel Sorting Functions
3 other new HPF Parallel Functions
7 HPF Local Intrinsics
74 library routines (==> ~1000 routines when written in clumsy F77)

HTML version of Basic Foils prepared October 22 1997

Foil 60 Intrinsic Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 61 HPF Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 62 Fortran 90 Local Routine Intrinsics

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 63 Does HPF need a different type of compiler ?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
HPF makes much greater use of runtime optimizations than the traditional sequential or parallel compiler
Fortran 77: Do 1 I=1,N 1 A(I) = FUNC(B(I), B(I +/- 1), C(I), C(I +/- 1), .....) Traditionally arrange access to these elements in COMPILER
HPF: A=FUNC(B,C) Invoke ALREADY OPTIMIZED parallel implementation of FUNC In many cases, HPF compiler need not know anything about parallelization except interfaces, data layout etc.

HTML version of Basic Foils prepared October 22 1997

Foil 64 HPF Interpreter

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Runtime optimizations acceptable (and perhaps best) as "grain size" (i.e. total number elements in array divided by Nproc) is large ==> "Startup" of Runtime Procedure calls is small ==> Can use an interpreter rather efficiently cf. APL and *LISP on CM-2
There is only ONE data parallelism ==> Same Runtime library can support C, C++, ADA APL, HPF Interpreter

HTML version of Basic Foils prepared October 22 1997

Foil 65 Benchmarking Set ( Fortran90D and Fortran 77 )

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 66 HPF/FORTRAN-90D Benchmark Suite

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 67 The final draft of the HPF Language Specification is version 1.0 - DRAFT, dated January 25, 1993

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Anonymous FTP minerva.npac.syr.edu (directory HPFF) titan.cs.rice.edu (directory public/HPFF/draft think.com (directory public/HPFF) ftp.gmd.de (directory hpf-europe) theory.tc.cornell.edu (directory pub)
Electronic Mail from the Softlib server at Rice University. This can be accessed in two ways: A. Send electronic mail to softlib@cs.rice.edu with "sendhpf-v10.ps" in the message body. This report is sent as a Postscript file. B. Send electronic mail to softlib@cs.rice.edu with "send hpf-v10.tarZ" in the message body. The report is sent as a uuencoded compressed tar file containing LaTeX source
Hardcopy Send requests to Theresa Chatman CITI/CRPC, Box 1892 Rice University Houston, TX 77251. There is a charge of $50.00 for this report to cover copying and mailing costs.

HTML version of Basic Foils prepared October 22 1997

Foil 68 Request for Public Comment on High Performance Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
HPFF encourages reviewers to submit comments as soon as possible, with a deadline of March 1, 1993 for consideration.
Send comments to hpff-comments@cs.rice.edu.
To facilitate the processing of comments we request that separate comment messages be submitted for each chapter of the document and that the chapter be clearly identified in the "Subject:" line of the message. Comments about general overall impressions of the HPF document should be labeled as Chapter 1. All comments on the language specification become the property of Rice University.
If email access is impossible for comment responses, hardcopy may be sent to: HPF Comments c/o Theresa Chatman CITI/CRPC, Box 1892 Rice University Houston, TX 77251 HPFF plans to process the feedback received at a meeting in March. Best efforts will be made to reply to comments submitted.

HTML version of Basic Foils prepared October 22 1997

Foil 69 Application Structure Versus Fortran90D Features

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 70 What applications does HPF support?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Somewhat related issue ? What problems are suitable for HPF and make good use of MIMD architectures ? i.e. run well on MIMD but poorly on SIMD
Classification of Problems (Fox, 1988)

HTML version of Basic Foils prepared October 22 1997

Foil 71 Current HPF can do I .....

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 72 Current HPF can do II.....

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Embarrassingly Parallel Problems Basic entities can be very complicated but they can be calculated independently. Use INDEPENDENT command in HPF
Search databases for given text ( ELIXIR for DECmpp ) (can be SIMD)
Analyse separately 107 events recorded from Synchrotron (SSC) collisions. ( definitely MIMD "farm" )
Calculate matrix elements for (chemistry) Hamiltonian (may need MIMD)
  • MOPAC, GAUSSIAN, Reaction scattering rate work within C3P at Caltech is
    • "calculate matrix elements"
    • "Multiply/Solve/Find eigenvalues of matrix
Quantum Monte Carlo
  • Independent "walkers"
  • Parallel Histogram (database) storage

HTML version of Basic Foils prepared October 22 1997

Foil 73 HPF can express using FORALL

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Particle Dynamics with cutoff forces
(Vij = 0 | ri - rj | > d ) and irregular spatial structure
CHARMM, AMBER ..............
Unstructured finite element meshes
  • Where data structure involves array of pointers. However need new distribution primitives to produce good performance. It is known how to optimize such unstructured problems for both static and adaptive cases
Performance of SIMD or Autonomous SIMD (Maspar) versus MIMD unclear for these applications.

HTML version of Basic Foils prepared October 22 1997

Foil 74 Classic Irregular Mesh Application

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 75 HPF Can Express

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 76 Simulation of Spin Models of Magnets

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Simple models reproduce quantitative behavior of real magnetic materials with complex interactions
Fundamental to the development of the theory of phase transitions and critical phenomena, which has very wide applicability.
Computational algorithms and techniques can be applied to other fields, e.g.
  • Monte Carlo simulation of lattice models of quantum field theory and quantum gravity
  • simulation of polymers, proteins, cell membranes
  • simulated annealing and other "physical" methods for global optimization
  • neural networks
  • condensed matter problems, e.g. superconducters
Since the models are simple and exact results are known in some cases (e.g. 2-d Ising Model), these are used as a testing ground for new computational techniques, e.g.
  • Monte Carlo Algorithms
  • measurements and data analysis
  • random numbers generators

HTML version of Basic Foils prepared October 22 1997

Foil 77 New Monte Carlo Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
The Metropolis Algorithm
  • Major problem in Monte Carlo simulation is critical slowing down. Standard algorithms (e.g. Metropolis) do single site, local updates. Number of iterations needed to change the large-scale structure and produce a new uncorrelated configuration diverges as a power of lattice size at a critical point.
Cluster Algorithm
  • New multi-spin, non-local algorithms (Swendsen-Wang, Wolff) rapidly change large-scale structure by identifying clusters of sites to be updated at once, greatly reducing critical slowing down.
  • Currently only applicable to a limited class of models
  • Ongoing research includes
    • Extensions to frustrated spin models (e.g. spin glasses) where critical slowing down is extreme
    • Precise measurements of autocorrelations and dynamic critical exponents to help understand dynamics of new algorithms
    • Application of new algorithms to simulation of spin models, e.g. O(3) model, fully frustrated Ising model
    • Parallel cluster algorithms
  • Simulated Tempering
    • New Method of making small changes in temperature while keeping system in equilibrium. Applications include:
    • Allowing tunneling between states at first order phase transitions (e.g. random field Ising model)
    • Global optimization (a la simulated annealing)

HTML version of Basic Foils prepared October 22 1997

Foil 78 Magnetization for Random Field Ising Model -- Metropolis

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 79 Magnetization for Random Field Ising Model -- Simulated Annealing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 80 Wolff cluster (bands shown in yellow) for 3 state Potts model at Tc

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 81 Swendsen-Wang clusters (boundaries shown in black) for 3 state Potts model at Tc

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 82 Cluster Algorithm versus Metropolis

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 83 Parallel Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Metropolis Algorithm
  • Regular, local algorithm
  • Local communication, perfect load balance using standard domain decomposition
  • Efficiently parallelizable on SIMD and MIMD machines
Swendsen-Wang Algorithm
  • Irregular, non-local, multiple cluster algorithm
  • Main computational task is to identify clusters - connected component labeling, also used in image processing
  • Clusters are highly irregular (fractal) in shape, and of all sizes (usually including one large cluster covering much of the lattice)
  • Efficient parallel component labeling is a very difficult problem - poor load balancing, lots of non-local communication
Wolff Algorithm
  • Irregular, non-local, single cluster algorithm
  • Standard domain decomposition gives very poor load balance and very limited parallelism
  • Traditional simple, regular, local algorithms parallelize efficiently
  • Newer irregular, adaptive, non-local algorithms are better sequentially but often more difficult to parallelize

HTML version of Basic Foils prepared October 22 1997

Foil 84 Parallel Cluster Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Parallel Component Labeling
  • Simple SIMD component labeling algorithms are very inefficient, taking order L iterations to label an L2 lattice
  • These algorithms can be used effectively on MIMD machines, however. Use fast sequential algorithms to label sub-lattices, and simple SIMD algorithm to match clusters across processors. Takes order square root of P iterations for an array of P processors. Good efficiencies if L>>P.
  • On SIMD machines (CM-2, Maspar) multi-grid algorithms using regular (hypercube) communication, and multi-scale algorithms using general non-local communication, do labeling in order log(L) iterations. Inefficient due to poor load balancing and irregular clusters.
Independent Parallelism
  • On coarse grain MIMD machines or workstation networks can parallelize over independent simulations with different random number streams. This is the only way to parallelize the Wolff algorithm efficiently.
  • On massively parallel machines, can use a combination of domain decomposition and independent parallelism for Swendsen-Wang algorithm, e.g. run 8 independent simulations of 64 nodes each on 512 node nCUBE.

HTML version of Basic Foils prepared October 22 1997

Foil 85 MIMD algorithm on nCUBE-1

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 86 SIMD algorithms on CM-2

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 87 Autocorrelation Plots for Cluster Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 88 HPF probably cannot express well

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Multiple Phase problems such as
  • Particle in the cell
  • Unstructured Multigrid
Need to express "alignment" between phases (different levels of multigrid, particles and grid for PIC) and then optimize distribution and redistribution between phases.
Irregular problems of this type are only suitable for MIMD.
Very irregular cases with mismatches between phases perform poorly even on MIMD.

HTML version of Basic Foils prepared October 22 1997

Foil 89 I do not know how to express - let alone optimize -

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 90 Late breaking results

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Michael Warren (with John Salmon) has reformulated Barnes-Hut parallel algorithm
New parallel algorithm appears suitable for incorporation into extended HPF compiler as a generalization of PARTI methods used to parallelize irregular grids

HTML version of Basic Foils prepared October 22 1997

Foil 91 Large N-Body Calculations (Quinn, Salmon, Warren)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Clustering algorithm (Appel, Barnes-Hut, Greengard)
Can replace M body force calculation by one using center of mass of cluster
  • Naive calculation .5 N2 t2 particle
  • Clustering (20-50) N log N t2 particle
  • (best when N >1000)
Can do O(10,000) particles with O(N2) algorithm
One to two orders of magnitude larger problem possible with clustering algorithm

HTML version of Basic Foils prepared October 22 1997

Foil 92 Hierarchical Decomposition

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 93 Ncube Speed Up of Barnes Hut Algorithm

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 94 17x106 Body Simulation
Diameter 250Mpc
Quinn, Salmon, Warren, Zurek

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 95 The largest "galaxy" halo (137,000 bodies) from the 8.8M body simulation
(Warren, Fullagar, Quinn, Zurek)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 96 8M bodies - 10 Mpc diameter
Final state with ~700 resolved "galaxies"
(Warren, Quinn, Zurek)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 97 Smooth Particle Hydro simulation of the collision of two "main sequence stars." 137,000 bodies (Warren,Davies)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 98 Expressing Problem Structure in Language

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Key idea behind data-parallel languages - and perhaps all good languages
The language expresses problem and not the machine architecture
Need different approach to express functional parallelism
Use"object-oriented" feature when problems has natural objects
Do not use "object-oriented" features when objects are artifacts of target machine
Need data and functional (object) parallel paradigms in many problems - especially multidisciplinary

HTML version of Basic Foils prepared October 22 1997

Foil 99 The map of Problem ---> Computer is performed in two or more stages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 100 From Problem to Machine Space as a Mapping

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 101 Different Granularities of Decomposition I

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
We can divide problems and machines into interconnected modules - and we do this at different granularities We can consider two limiting cases (I) High performance Low Latency links Machine o Classic MPP such as Intel Paragon Problem o QCD: Components are gluon fields. Links are local. Software model o High Performance Fortran o Currently only regular structure o Eventually all such tightly coupled loosely synchronous problems o Fortran + Message passing

HTML version of Basic Foils prepared October 22 1997

Foil 102 Different Granularities of Decomposition II

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 103 Compiler/Interpreter Tradeoffs at Different Levels of Granularity

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Can divide a problem into "highly optimized" v. "flexible" at different levels
HPF Compiler Component is full HPF Program Glue full programs together on bus
HPF Interpreter Component is single MATLAB/APL/HPF array statement Glue data parallel coarse grain array statement "function" calls together

HTML version of Basic Foils prepared October 22 1997

Foil 104 The Mapping of Heterogeneous MetaProblems onto Heterogeneous MetaComputer Systems

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 105 Mapping of Irregular Grid

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 106 Finding General Maps for FortranD/HPF

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 107 Three Physical Optimization Methods for Allocating Data to Multicomputer Nodes

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Three methods using physical analogies 1. Simulated Annealing Algorithm (SAA) Formulate optimization as finding ground state of physical systems. Use Monte Carlo to equilibrate and reduce temperature 2. Bold Neural Network (BNN) Mean field approximation to same analogy as in SAA to find quick minima. Related to "Eigenvalue Bisection" methods. 3. Genetic Algorithm (GA) Maximize fitness (minimize cost function) of evolving population of structures (candidate solutions).

HTML version of Basic Foils prepared October 22 1997

Foil 108 Finite Element Mesh

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index
Click here for subtitle

HTML version of Basic Foils prepared October 22 1997

Foil 109 Graph Contraction

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 110 Some Results of Load Balancing Studies I

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 111 Some Results of Load Balancing Studies II

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 112 Physics Analogy for Load Balancing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 113 Complex System SHLSoft governed by Hamiltonian = Execution Time

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 114 PHYSICS ANALOGY FOR STATIC AND DYNAMIC LOAD BALANCING

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 115 Definition of Temperature for a Complex System

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 116 Particle dynamics problem on a four node system

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 117 Energy Structure in Physics Analogy with Multiple Minima

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 118 Phase Transitions in Physical model -- Scattered versus Domain Decompositions

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *
Full HTML Index

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Wed Oct 22 1997