Full HTML for Basic Overview of HPC for Physical Sciences

Full HTML for

Basic foilset Overview of HPC for Physical Sciences

Given by Geoffrey C. Fox at Mardi Gras Conference on Concurrent Computing in Physical Sciences on February 18, 1993. Foils prepared October 22 1997
Outside Index Summary of Material

This overviews status of HPCC architectures

Software approaches focussing on HPF and an Application Analysis

Problem Architectures Load Balancing and the Complex System Approach

Table of Contents for full HTML of Overview of HPC for Physical Sciences

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Overview of High Performance Computing for Physical Sciences
2

Performance of Computers as a Function of Time
3

Performance of QCD Machines
4

When will Parallel Computing Take Over ?
5

Parallelism Implies
6

Program in Computational Science Implemented within current academic framework
7

We have learnt that Parallel Computing Works !
8

METHODOLOGY
9

Sideways View of Concurrent Computing Mappings
10

Sample uses of Concurrent Computers
- 4 Nodes
11

Parallel Computing Trends to Year 2000
12

39 National Grand Challenges
13

39 National Grand Challenges (II)
14

39 National Grand Challenges (III)
15

39 National Grand Challenges (IV)
16

Most Grand Challenges are
17

ACTION Motivation
18

Applications of Interest to Industry IA
19

Applications of Interest to Industry IB
20

Applications of Interest to Industry IC
21

Pluses and Minuses of Parallel Computing
22

P C E T E C H Ñ A Result of
23

The Life History of an HPC Idea
24

Core Enabling Software Technologies
25

Core Enabling Algorithms and Components
26

Portable Scalable Languages
27

Problem Architectures
28

Software Problem Architecture Interactions
29

Why build on existing languages - especially Fortran / C
30

Fortran( C ) Plus Message Passing
31

Fortran( C ) Plus Message Passing
32

Data Parallel Fortran
33

Data Parallel Fortran ( C, C++, LISP, ADA )
34

The Origins of High Performance Fortran
35

Strategy for HPF
36

Model for Scalable Languages
37

Dynamically Triangulated Random Surfaces
38

Update Strategies for DTRS
39

N=36 Wireframe DTRS
40

N=144 Lambda =1.5 Colored DTRS
41

N=144 Lambda=1 Colored DTRS
42

Computational Issues and Code Performance Optimization
43

Sequential Computer Architecture
44

Performance of Random Surface Code
45

Parallel Algorithms
46

Parallel Grid Generation
47

What is Fortran 90?
48

HPF Language Features - 1
49

HPF Language Features - 2
50

FORTRAN-90D
The First Implementation of HPF
(NPAC, Syracuse University)
Current Status
51

Fortran90D Performance on Gaussian Elimination
52

Gaussian Elimination
Example of Fortran90D Source Code
53

Fortran90D Interface
54

HPF Directives
55

Data Alignment
56

Data Distribution
57

FORALL
58

DO INDEPENDENT
59

HPF Library
60

Intrinsic Library
61

HPF Library
62

Fortran 90 Local Routine Intrinsics
63

Does HPF need a different type of compiler ?
64

HPF Interpreter
65

Benchmarking Set ( Fortran90D and Fortran 77 )
66

HPF/FORTRAN-90D Benchmark Suite
67

The final draft of the HPF Language Specification is version 1.0 - DRAFT, dated January 25, 1993
68

Request for Public Comment on High Performance Fortran
69

Application Structure Versus Fortran90D Features
70

What applications does HPF support?
71

Current HPF can do I .....
72

Current HPF can do II.....
73

HPF can express using FORALL
74

Classic Irregular Mesh Application
75

HPF Can Express
76

Simulation of Spin Models of Magnets
77

New Monte Carlo Algorithms
78

Magnetization for Random Field Ising Model -- Metropolis
79

Magnetization for Random Field Ising Model -- Simulated Annealing
80

Wolff cluster (bands shown in yellow) for 3 state Potts model at Tc
81

Swendsen-Wang clusters (boundaries shown in black) for 3 state Potts model at Tc
82

Cluster Algorithm versus Metropolis
83

Parallel Algorithms
84

Parallel Cluster Algorithms
85

MIMD algorithm on nCUBE-1
86

SIMD algorithms on CM-2
87

Autocorrelation Plots for Cluster Algorithms
88

HPF probably cannot express well
89

I do not know how to express - let alone optimize -
90

Late breaking results
91

Large N-Body Calculations (Quinn, Salmon, Warren)
92

Hierarchical Decomposition
93

Ncube Speed Up of Barnes Hut Algorithm
94

17x106 Body Simulation
Diameter 250Mpc
Quinn, Salmon, Warren, Zurek
95

The largest "galaxy" halo (137,000 bodies) from the 8.8M body simulation
(Warren, Fullagar, Quinn, Zurek)
96

8M bodies - 10 Mpc diameter
Final state with ~700 resolved "galaxies"
(Warren, Quinn, Zurek)
97

Smooth Particle Hydro simulation of the collision of two "main sequence stars." 137,000 bodies (Warren,Davies)
98

Expressing Problem Structure in Language
99

The map of Problem ---> Computer is performed in two or more stages
100

From Problem to Machine Space as a Mapping
101

Different Granularities of Decomposition I
102

Different Granularities of Decomposition II
103

Compiler/Interpreter Tradeoffs at Different Levels of Granularity
104

The Mapping of Heterogeneous MetaProblems onto Heterogeneous MetaComputer Systems
105

Mapping of Irregular Grid
106

Finding General Maps for FortranD/HPF
107

Three Physical Optimization Methods for Allocating Data to Multicomputer Nodes
108

Finite Element Mesh
109

Graph Contraction
110

Some Results of Load Balancing Studies I
111

Some Results of Load Balancing Studies II
112

Physics Analogy for Load Balancing
113

Complex System SHLSoft governed by Hamiltonian = Execution Time
114

PHYSICS ANALOGY FOR STATIC AND DYNAMIC LOAD BALANCING
115

Definition of Temperature for a Complex System
116

Particle dynamics problem on a four node system
117

Energy Structure in Physics Analogy with Multiple Minima
118

Phase Transitions in Physical model -- Scattered versus Domain Decompositions

Outside Index Summary of Material

HTML version of Basic Foils prepared October 22 1997

Foil 1 Overview of High Performance Computing for Physical Sciences

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Geoffrey C. Fox

gcf@nova.npac.syr.edu

315.443.2163

Mardi Gras Conference

Concurrent Computing in the Physical Sciences

Louisiana State University

Baton Rouge, LA

February 18, 1993

HTML version of Basic Foils prepared October 22 1997

Foil 2 Performance of Computers as a Function of Time

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 3 Performance of QCD Machines

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 4 When will Parallel Computing Take Over ?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

This is a phase transition

Needs headroom (Carver Mead) which is large (factor of 10 ?) due to large new software investment

nCUBE-1 and CM-2 were comparable in cost performance to conventional supercomputers

Enough to show that "Parallel Computing Works"
Not enough to take over!

Intel Paragon, CM-5, DECmpp (MP-2), IBM SP-1 have enough headroom?

HTML version of Basic Foils prepared October 22 1997

Foil 5 Parallelism Implies

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Different machines

New types of computers

New libraries

Rewritten Applications

Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science

Will be a nucleus for the phase transition

and accelerate use of parallel computers in the real world

HTML version of Basic Foils prepared October 22 1997

Foil 6 Program in Computational Science Implemented within current academic framework

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 7 We have learnt that Parallel Computing Works !

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Data Parallelism - universal form of scaling parallelism

Functional Parallelism - Important but typically modest speedup. - Critical in multidisciplinary applications.

On any machine architecture

Distributed Memory MIMD
Distributed Memory SIMD
Shared memory - this affects programming model This affects generality SIMD ~ 50% academic problems but < 50% commercial

HTML version of Basic Foils prepared October 22 1997

Foil 8 METHODOLOGY

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Simple, but general and extensible to many more nodes is domain decomposition

All successful concurrent machines with

Many nodes
High performance (this excludes Dataflow) Have obtained parallelism from "Data Parallelism" or "Domain Decomposition"

Problem is an algorithm applied to data set Obtain concurrency by acting on data concurrently. The three architectures considered here differ as follows:

MIMD Distributed Memory Processing and Data Distributed
MIMD Shared Memory Processing Distributed
SIMD Distributed Memory Synchronous Processing on Distributed Data

HTML version of Basic Foils prepared October 22 1997

Foil 9 Sideways View of Concurrent Computing Mappings

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 10 Sample uses of Concurrent Computers
- 4 Nodes

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 11 Parallel Computing Trends to Year 2000

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Hardware trends imply that all computers (PC's ---> Supercomputers) will be parallel by the year 2000

Upto now parallel computers are from small start-up companies (except Intel Supercomputer Division)
Recently Cray, Convex (HP), Digital, IBM have announced massively parallel computing initiatives
Several architectures but only one : Distributed memory MIMD multicomputer is known to scale from one to very many processors

Software is challenge and could prevent/delay hardware trend that suggests parallelism will be a mainline computer architecture

first we must get systems software correct
then applications software

HTML version of Basic Foils prepared October 22 1997

Foil 12 39 National Grand Challenges

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 13 39 National Grand Challenges (II)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 14 39 National Grand Challenges (III)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 15 39 National Grand Challenges (IV)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 16 Most Grand Challenges are

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Partial Differential Equations

Particle Dynamics and Multidisciplinary Integration

Image Processing Some:

Visualization

Artificial Intelligence Not Much:

Network Simulation

Economic (and other complex system) modeling

Scheduling

Manufacturing

Education

Entertainment

Information Processing

BMC3IS (Command & Control in military war)
Decision Support in global economic war

HTML version of Basic Foils prepared October 22 1997

Foil 17 ACTION Motivation

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Industrial and Government Problems

Are a factor of 10 to 100 larger (up to 106 lines)
- Have a longer lifetime
- Less accessible expertise
The parallel hardware must be much better (factor of >ten "headroom" in Carver Mead's model) than sequential competition. This will happen in the next two years.
Must have better software to port and maintain large parallel codes -- can only convert once!
- Are software issues the same for academic and industrial production codes?
- Probably not? So we need to integrate computer science (software) research and development with experience from industrial codes.

HTML version of Basic Foils prepared October 22 1997

Foil 18 Applications of Interest to Industry IA

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 19 Applications of Interest to Industry IB

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 20 Applications of Interest to Industry IC

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 21 Pluses and Minuses of Parallel Computing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

We know parallel versions for basic algorithms currently being used in most kernels

There are some counter examples: 1) Parallelize incomplete LU factorization for irregular sparse matrices 2) Develop new neural network to predict and hence avoid hurricanes or avoid computer failures.... 3) Optimal preconditioning for 3D partial differential equation is not known in many applications --- to be expected as little experience.
Can't use a computer of performance 10P without substantial experience on a computer with performance P

HTML version of Basic Foils prepared October 22 1997

Foil 22 P C E T E C H Ñ A Result of

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

The study of implementation of ACTION

What can a small enterprise like ACTION offer much bigger industries?

Study of role of applications in CRPC

What can a relatively small group of computer scientists (CRPC) offer the larger national HPCCI ?

Study of relation of CRPC and NSF Supercomputer Centers which are again much larger

Ñ> Need to concentrate in areas where CRPC (ACTION)'s unique capabilities have high value

P C E T E C H includes and expands technology development components of CRPC and ACTION

HTML version of Basic Foils prepared October 22 1997

Foil 23 The Life History of an HPC Idea

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 24 Core Enabling Software Technologies

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

PVM, Express, Linda

Einstein (CSW)

ISIS (Cornell)

High Performance Fortran Compiler

High Performance C, C++ Compiler

HPF Extensions - PARTI

Parallel / Distributed Computing Runtime Tools

ADIFOR

AVS and Extensions

MOVIE (Syracuse)

High Performance Fortran Interpreter

Televirtuality / Virtual Reality OS

Parallel Database - Oracle 7.0

OO Database

Integration Methodology for above software
Heterogeneous Network Target

HTML version of Basic Foils prepared October 22 1997

Foil 25 Core Enabling Algorithms and Components

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Mesh Generation

SCALAPACK

Sparse Matrix Solvers - Templates and libraries (Direct and Iterative)

Particle Dynamics Kernels - Templates and Libraries ( O(N2) to fast multipole)

Optimization Methodology and Templates

Linear programming
Non-linear programming

Scheduling (neural-net, parallel) Templates

HTML version of Basic Foils prepared October 22 1997

Foil 26 Portable Scalable Languages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

We may have to rewrite our code, but we want the resulting program to run on "All" current and future (anticipated) machines

No compelling new languages e.g. OCCAM does not solve enough problems to warrant adoption Adapt existing languages Fortran C++ ADA C LISP PROLOG BASIC ..........COBOL ?

Can map onto individual sequential or parallel computers or networks of these (distributed computing)

HTML version of Basic Foils prepared October 22 1997

Foil 27 Problem Architectures

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 28 Software Problem Architecture Interactions

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 29 Why build on existing languages - especially Fortran / C

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Experience of users

Good sequential and node compilers exist

Migration of existing codes

Hard to build a new language as need all the features/libraries of Fortran / C implemented well. This is why one uses Fortran 90 and not APL

Fortran and C are good (OK) for a class of applications

HTML version of Basic Foils prepared October 22 1997

Foil 30 Fortran( C ) Plus Message Passing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Natural MIMD model of programming

Used in vast majority of current successful MIMD programming

Each node has program which controls and calculates its data (owner-computes "rule" ) Do i run over local data CALL READ (for non-local data) CALCULATE ALL i's DATA

In simplest form one program per node of computer

HTML version of Basic Foils prepared October 22 1997

Foil 31 Fortran( C ) Plus Message Passing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Advantages

Portable to MIMD distributed and shared memory machines - should scale to future machines with some difficulties (overlapping communication with itself and calculation)
Available now
Will be industry standards Express, PICL, Linda now
"All Problems" can be written in it

Disadvantages

Can be "hard-work" for user inserting communication calls
Optimizations not portable
Only MIMD

HTML version of Basic Foils prepared October 22 1997

Foil 32 Data Parallel Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

e.g. Program represented as sequence of functions operating on arrays

Dimension: A(100,100), B(100,100), C(100,100) A = B + C

A = B + Shift in x, y (C)
forall i, j A(i,j) = B (i,j) + C (i-1, j) + ...... In APL, Fortran90 - like syntax or with explicit loops

All data owned by a particular processor

Owner computes rule: for an expression A(i,j) = .........., Processor holding A(i,j) calculates expression after communicating any needed data if r.h.s. contains data not owned by processor.

Similar concepts in C, C++, LISP, ADA .......

HTML version of Basic Foils prepared October 22 1997

Foil 33 Data Parallel Fortran ( C, C++, LISP, ADA )

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Advantages

Very portable and scalable to SIMD and MIMD
Should be able to eventually handle ~all synchronous and loosely synchronous problems including ones that only run well on MIMD
Industry Standard will be adopted
- "High Performance Fortran"
- hope for consensus ~December 1992
- include CMFortran, FortranD .....

Disadvantages

Need to wait for good compiler
Not all problems can be expressed in it

HTML version of Basic Foils prepared October 22 1997

Foil 34 The Origins of High Performance Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 35 Strategy for HPF

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Program in an SPMD data parallel style

User writes code with (conceptually) a single thread of control and globally accessible data

Program is annotated with assertions giving information about desired data locality and/or distribution

Compiler generates code implementing data and work distribution from the user-written SPMD application

For MIMD - implementation is multi-threaded message passing with local data; generated synchronization and send/receive code
for SIMD - implementation is parallel code with communication optimized by complier placement of data
for vectors - implementation is vectorized code optimized for the vector units
for RISC - implementation is pipelined superscalar code with compiler generated cache management

HTML version of Basic Foils prepared October 22 1997

Foil 36 Model for Scalable Languages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 37 Dynamically Triangulated Random Surfaces

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

String theories are candidates for a TOE (Theory of Everything). They provide a plausible quantum theory of gravity.

Basic objects are tiny one dimensional "strings" (rather than points). Dynamics described by a two dimensional space-time world-sheet (rather than world line).

Calculations involve integrating over all possible world-sheets, i.e. all 2-d surfaces embedded in some higher dimensional space (3-d, 4-d, 26-d, ....)

Done numerically by a Monte Carlo integration over different triangulations of the surface, produced dynamically using the Metropolis algorithm.

Current research centers on finding good discretizations of the continuum theories, and trying to find critical points with divergent correlation lengths, so the details of the discrete lattice structure become irrelevant and we can extract continuum results (c.f. lattice QCD)

Interesting critical points are phase transitions between a smooth phase and a crumpled phase of the surface. The order and critical exponents of this crumpling transition are still uncertain.

We have done the largest simulations so far on 2-d systems, but these are only 576 node meshes. Critical slowing down is a major problem - need better algorithms.

The same techniques are used to model real fluctuating surfaces, e.g. cell membranes, lipid bilayers, phase boundaries.

HTML version of Basic Foils prepared October 22 1997

Foil 38 Update Strategies for DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 39 N=36 Wireframe DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 40 N=144 Lambda =1.5 Colored DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 41 N=144 Lambda=1 Colored DTRS

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 42 Computational Issues and Code Performance Optimization

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Program for random surfaces is very irregular, with many linked list and pointer trees to traversed. Original code performed very poorly. Completely rewrote and simplified code and gained a substantial speed-up. Still performed poorly on some RISC processors such as the Intel i860.

Optimal use of cache memory is vital for processors such as the i860, which have a small cache memory, and the time to access external memory is much greater than the time to perform a floating point operation

Due to the dynamic nature of the code, the neighbors of a site which are used for the update of that site are constantly changing. This means we cannot make good use of the cache unless we keep a new array which constantly readjusts pointers to neighbors, so that data for neighboring sites is kept in neighboring regions of memory.

Other optimizations on i860 and use of non-IEEE arithmetic gave us a substantial improvement, but still nowhere near 80 Mflops peak performance!

Program performs much better on more recent processors (IBM RS/6000, Hewlett-Packard).

HTML version of Basic Foils prepared October 22 1997

Foil 43 Sequential Computer Architecture

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 44 Performance of Random Surface Code

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 45 Parallel Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Independent Parallelism

Have so far used independent parallelism on MIMD machines and workstation networks
Due to strong critical slowing down can only be used for small meshes, since producing an independent mesh for each processor takes so long. Need to parallelize over a single simulation.

Parallel Random Surfaces

Load Balancing - Poor load balancing on SIMD machines, since the number of links per node can range from 3 to 20.
Dynamic Domain Decomposition - Dynamic mesh means that neighboring points will change during the simulation and eventually be on distant processors, requiring substantial non-local communication. Need dynamic load balancing to preserve local communications (c.f. dynamic adjustment of arrays to keep neighboring points in fast cache memory on sequential machines.
Detailed Balance - The most difficult problem is to preserve detailed balance. A site cannot be updated at the same time as the other sites on which its value depends. For a regular 2-d grid, use black/red checkerboard update, For a dynamic, irregular grid, identifying independent sites is a time-consuming problem which must be redone before every iteration.
Irregular Adaptive Grids for PDEs - This problem has much in common with parallel algorithms for solving PDEs on irregular, adaptive grids (e.g. CFD)
Implementation - Currently developing parallel code for CM-5 and Intel Delta

HTML version of Basic Foils prepared October 22 1997

Foil 46 Parallel Grid Generation

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 47 What is Fortran 90?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

FORTRAN 77 entirely contained within Fortran 90

Language goal is to "modernize Fortran, so that it may continue its long history as a scientific and engineering programming language"

Major language additions include:

array operations
numeric computation facilities
parameterized intrinsic data types
modularzation of data and procedures
pointer facilities
many more intrinsic functions

A significantly larger language than FORTRAN 77

HTML version of Basic Foils prepared October 22 1997

Foil 48 HPF Language Features - 1

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Data Distribution Directives

locality of reference
TEMPLATE directive
ALIGN and REALIGN directives
DISTRIBUTE and REDISTRIBUTE directives
DYNAMIC directive
PROCESSORS and processor VIEW directives

Parallel Statements

FORALL statement and construct, PURE procedures
INDEPENDENT directive

Extended Intrinsic Functions and Standard Library

system inquiry functions
computational functions
distribution inquiry functions

HTML version of Basic Foils prepared October 22 1997

Foil 49 HPF Language Features - 2

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

EXTRINSIC Procedures

escape from global name space model
EXTRINSIC directive
Fortran 90 binding for EXTRINSIC interface

Parallel I/O Statements

none

Changes in Sequence and Storage Association

restrictions
SEQUENCE directive

In addition, a subset of HPF suitable for earlier implementation is defined

HTML version of Basic Foils prepared October 22 1997

Foil 50 FORTRAN-90D
The First Implementation of HPF
(NPAC, Syracuse University)
Current Status

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Alpha version: demonstrated at SUPERCOMPUTING '92 in Minneapolis, MN

one- and two-dimensional BLOCK, CYCLIC and BLOCK-CYCLIC distributions
FORALL statement
full run-time support for regular communications
PARTI (ICASE) primitives for irregular communications
intrinsic library

Now running on nCUBE2, iPSC/860 and network of SUN workstations

beta release: June 1993

alliance with industry to develop commercial compiler for HPF, optimized for specific architectures

next target: heterogeneous networks

HTML version of Basic Foils prepared October 22 1997

Foil 51 Fortran90D Performance on Gaussian Elimination

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 52 Gaussian Elimination
Example of Fortran90D Source Code

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 53 Fortran90D Interface

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 54 HPF Directives

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Syntactic form of a structured Fortran environment

A program that contains directives is expected to generate the same results whether directives are processed or not, assuming that the directives are used correctly

HPF directives are consistent with Fortran 90

if HPF were to be adopted as part of a future Fortran standard, only the comment character and the directive prefix would have to be removed from HPF directives
directives follow either free source form or fixed source form

HTML version of Basic Foils prepared October 22 1997

Foil 55 Data Alignment

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Specifies lowest-level abstract alignment of array elements

TEMPLATE is an abstract, very fine-grain virtual processor space Template labels physically independent objects e.g. gluons, fluid particles. Align degrees of freedom e.g. gluon components to template members

Use for machine-independent alignment between structures

HTML version of Basic Foils prepared October 22 1997

Foil 56 Data Distribution

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Specifies coarse-grain partioning of data

PROCESSORS is a coarse-grain processor grid, presumably in 1-1 correspondence with physical processors

Use for machine-dependent partitioning of data structures

Compiler will normally use "owner-computes rule"

HTML version of Basic Foils prepared October 22 1997

Foil 57 FORALL

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Elemental array assignment statement (Similar to array expressions)

FORALL is not a general parallel construct!

Functions without side effects may be called from FORALL

Other functions may be allowed as well

HTML version of Basic Foils prepared October 22 1997

Foil 58 DO INDEPENDENT

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

An assertion that the iterations of the DO may be executed in any order or concurrently

If the assertion is false, the compiler is free to take any action it deems necessary

No restrictions on the loop body

No way to perform explicit synchronization

HTML version of Basic Foils prepared October 22 1997

Foil 59 HPF Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Much of HPF's Parallel "Power" is buried in its library

Personal note: Fortran 90D motivated by similarities between Caltech message passing "collective" communication and F90 Intrinsics

18 Non-elemental (true array) Fortran 90 Intrinsics e.g. CSHIFT, SUM

7 new HPF Intrinsics e.g. Alignment, Template, Distribution inquiries

4 new HPF Reduction (combine) Functions

11 new HPF Combine-Scatter Functions

22 new HPF Parallel Prefix Functions

2 new HPF Parallel Sorting Functions

3 other new HPF Parallel Functions

7 HPF Local Intrinsics

74 library routines (==> ~1000 routines when written in clumsy F77)

HTML version of Basic Foils prepared October 22 1997

Foil 60 Intrinsic Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 61 HPF Library

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 62 Fortran 90 Local Routine Intrinsics

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 63 Does HPF need a different type of compiler ?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HPF makes much greater use of runtime optimizations than the traditional sequential or parallel compiler

Fortran 77: Do 1 I=1,N 1 A(I) = FUNC(B(I), B(I +/- 1), C(I), C(I +/- 1), .....) Traditionally arrange access to these elements in COMPILER

HPF: A=FUNC(B,C) Invoke ALREADY OPTIMIZED parallel implementation of FUNC In many cases, HPF compiler need not know anything about parallelization except interfaces, data layout etc.

HTML version of Basic Foils prepared October 22 1997

Foil 64 HPF Interpreter

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Runtime optimizations acceptable (and perhaps best) as "grain size" (i.e. total number elements in array divided by Nproc) is large ==> "Startup" of Runtime Procedure calls is small ==> Can use an interpreter rather efficiently cf. APL and *LISP on CM-2

There is only ONE data parallelism ==> Same Runtime library can support C, C++, ADA APL, HPF Interpreter

HTML version of Basic Foils prepared October 22 1997

Foil 65 Benchmarking Set ( Fortran90D and Fortran 77 )

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 66 HPF/FORTRAN-90D Benchmark Suite

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 67 The final draft of the HPF Language Specification is version 1.0 - DRAFT, dated January 25, 1993

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Anonymous FTP minerva.npac.syr.edu (directory HPFF) titan.cs.rice.edu (directory public/HPFF/draft think.com (directory public/HPFF) ftp.gmd.de (directory hpf-europe) theory.tc.cornell.edu (directory pub)

Electronic Mail from the Softlib server at Rice University. This can be accessed in two ways: A. Send electronic mail to softlib@cs.rice.edu with "sendhpf-v10.ps" in the message body. This report is sent as a Postscript file. B. Send electronic mail to softlib@cs.rice.edu with "send hpf-v10.tarZ" in the message body. The report is sent as a uuencoded compressed tar file containing LaTeX source

Hardcopy Send requests to Theresa Chatman CITI/CRPC, Box 1892 Rice University Houston, TX 77251. There is a charge of $50.00 for this report to cover copying and mailing costs.

HTML version of Basic Foils prepared October 22 1997

Foil 68 Request for Public Comment on High Performance Fortran

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HPFF encourages reviewers to submit comments as soon as possible, with a deadline of March 1, 1993 for consideration.

Send comments to hpff-comments@cs.rice.edu.

To facilitate the processing of comments we request that separate comment messages be submitted for each chapter of the document and that the chapter be clearly identified in the "Subject:" line of the message. Comments about general overall impressions of the HPF document should be labeled as Chapter 1. All comments on the language specification become the property of Rice University.

If email access is impossible for comment responses, hardcopy may be sent to: HPF Comments c/o Theresa Chatman CITI/CRPC, Box 1892 Rice University Houston, TX 77251 HPFF plans to process the feedback received at a meeting in March. Best efforts will be made to reply to comments submitted.

HTML version of Basic Foils prepared October 22 1997

Foil 69 Application Structure Versus Fortran90D Features

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 70 What applications does HPF support?

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Somewhat related issue ? What problems are suitable for HPF and make good use of MIMD architectures ? i.e. run well on MIMD but poorly on SIMD

Classification of Problems (Fox, 1988)

HTML version of Basic Foils prepared October 22 1997

Foil 71 Current HPF can do I .....

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 72 Current HPF can do II.....

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Embarrassingly Parallel Problems Basic entities can be very complicated but they can be calculated independently. Use INDEPENDENT command in HPF

Search databases for given text ( ELIXIR for DECmpp ) (can be SIMD)

Analyse separately 107 events recorded from Synchrotron (SSC) collisions. ( definitely MIMD "farm" )

Calculate matrix elements for (chemistry) Hamiltonian (may need MIMD)

MOPAC, GAUSSIAN, Reaction scattering rate work within C3P at Caltech is
- "calculate matrix elements"
- "Multiply/Solve/Find eigenvalues of matrix

Quantum Monte Carlo

Independent "walkers"
Parallel Histogram (database) storage

HTML version of Basic Foils prepared October 22 1997

Foil 73 HPF can express using FORALL

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Particle Dynamics with cutoff forces

(Vij = 0 | ri - rj | > d ) and irregular spatial structure

CHARMM, AMBER ..............

Unstructured finite element meshes

Where data structure involves array of pointers. However need new distribution primitives to produce good performance. It is known how to optimize such unstructured problems for both static and adaptive cases

Performance of SIMD or Autonomous SIMD (Maspar) versus MIMD unclear for these applications.

HTML version of Basic Foils prepared October 22 1997

Foil 74 Classic Irregular Mesh Application

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 75 HPF Can Express

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 76 Simulation of Spin Models of Magnets

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Simple models reproduce quantitative behavior of real magnetic materials with complex interactions

Fundamental to the development of the theory of phase transitions and critical phenomena, which has very wide applicability.

Computational algorithms and techniques can be applied to other fields, e.g.

Monte Carlo simulation of lattice models of quantum field theory and quantum gravity
simulation of polymers, proteins, cell membranes
simulated annealing and other "physical" methods for global optimization
neural networks
condensed matter problems, e.g. superconducters

Since the models are simple and exact results are known in some cases (e.g. 2-d Ising Model), these are used as a testing ground for new computational techniques, e.g.

Monte Carlo Algorithms
measurements and data analysis
random numbers generators

HTML version of Basic Foils prepared October 22 1997

Foil 77 New Monte Carlo Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

The Metropolis Algorithm

Major problem in Monte Carlo simulation is critical slowing down. Standard algorithms (e.g. Metropolis) do single site, local updates. Number of iterations needed to change the large-scale structure and produce a new uncorrelated configuration diverges as a power of lattice size at a critical point.

Cluster Algorithm

New multi-spin, non-local algorithms (Swendsen-Wang, Wolff) rapidly change large-scale structure by identifying clusters of sites to be updated at once, greatly reducing critical slowing down.
Currently only applicable to a limited class of models
Ongoing research includes
- Extensions to frustrated spin models (e.g. spin glasses) where critical slowing down is extreme
- Precise measurements of autocorrelations and dynamic critical exponents to help understand dynamics of new algorithms
- Application of new algorithms to simulation of spin models, e.g. O(3) model, fully frustrated Ising model
- Parallel cluster algorithms
Simulated Tempering
- New Method of making small changes in temperature while keeping system in equilibrium. Applications include:
- Allowing tunneling between states at first order phase transitions (e.g. random field Ising model)
- Global optimization (a la simulated annealing)

HTML version of Basic Foils prepared October 22 1997

Foil 78 Magnetization for Random Field Ising Model -- Metropolis

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 79 Magnetization for Random Field Ising Model -- Simulated Annealing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 80 Wolff cluster (bands shown in yellow) for 3 state Potts model at Tc

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 81 Swendsen-Wang clusters (boundaries shown in black) for 3 state Potts model at Tc

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 82 Cluster Algorithm versus Metropolis

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 83 Parallel Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Metropolis Algorithm

Regular, local algorithm
Local communication, perfect load balance using standard domain decomposition
Efficiently parallelizable on SIMD and MIMD machines

Swendsen-Wang Algorithm

Irregular, non-local, multiple cluster algorithm
Main computational task is to identify clusters - connected component labeling, also used in image processing
Clusters are highly irregular (fractal) in shape, and of all sizes (usually including one large cluster covering much of the lattice)
Efficient parallel component labeling is a very difficult problem - poor load balancing, lots of non-local communication

Wolff Algorithm

Irregular, non-local, single cluster algorithm
Standard domain decomposition gives very poor load balance and very limited parallelism
Traditional simple, regular, local algorithms parallelize efficiently
Newer irregular, adaptive, non-local algorithms are better sequentially but often more difficult to parallelize

HTML version of Basic Foils prepared October 22 1997

Foil 84 Parallel Cluster Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Parallel Component Labeling

Simple SIMD component labeling algorithms are very inefficient, taking order L iterations to label an L2 lattice
These algorithms can be used effectively on MIMD machines, however. Use fast sequential algorithms to label sub-lattices, and simple SIMD algorithm to match clusters across processors. Takes order square root of P iterations for an array of P processors. Good efficiencies if L>>P.
On SIMD machines (CM-2, Maspar) multi-grid algorithms using regular (hypercube) communication, and multi-scale algorithms using general non-local communication, do labeling in order log(L) iterations. Inefficient due to poor load balancing and irregular clusters.

Independent Parallelism

On coarse grain MIMD machines or workstation networks can parallelize over independent simulations with different random number streams. This is the only way to parallelize the Wolff algorithm efficiently.
On massively parallel machines, can use a combination of domain decomposition and independent parallelism for Swendsen-Wang algorithm, e.g. run 8 independent simulations of 64 nodes each on 512 node nCUBE.

HTML version of Basic Foils prepared October 22 1997

Foil 85 MIMD algorithm on nCUBE-1

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 86 SIMD algorithms on CM-2

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 87 Autocorrelation Plots for Cluster Algorithms

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 88 HPF probably cannot express well

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Multiple Phase problems such as

Particle in the cell
Unstructured Multigrid

Need to express "alignment" between phases (different levels of multigrid, particles and grid for PIC) and then optimize distribution and redistribution between phases.

Irregular problems of this type are only suitable for MIMD.

Very irregular cases with mismatches between phases perform poorly even on MIMD.

HTML version of Basic Foils prepared October 22 1997

Foil 89 I do not know how to express - let alone optimize -

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 90 Late breaking results

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Michael Warren (with John Salmon) has reformulated Barnes-Hut parallel algorithm

New parallel algorithm appears suitable for incorporation into extended HPF compiler as a generalization of PARTI methods used to parallelize irregular grids

HTML version of Basic Foils prepared October 22 1997

Foil 91 Large N-Body Calculations (Quinn, Salmon, Warren)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Clustering algorithm (Appel, Barnes-Hut, Greengard)

Can replace M body force calculation by one using center of mass of cluster

Naive calculation .5 N2 t2 particle
Clustering (20-50) N log N t2 particle
(best when N >1000)

Can do O(10,000) particles with O(N2) algorithm

One to two orders of magnitude larger problem possible with clustering algorithm

HTML version of Basic Foils prepared October 22 1997

Foil 92 Hierarchical Decomposition

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 93 Ncube Speed Up of Barnes Hut Algorithm

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 94 17x106 Body Simulation
Diameter 250Mpc
Quinn, Salmon, Warren, Zurek

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 95 The largest "galaxy" halo (137,000 bodies) from the 8.8M body simulation
(Warren, Fullagar, Quinn, Zurek)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 96 8M bodies - 10 Mpc diameter
Final state with ~700 resolved "galaxies"
(Warren, Quinn, Zurek)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 97 Smooth Particle Hydro simulation of the collision of two "main sequence stars." 137,000 bodies (Warren,Davies)

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 98 Expressing Problem Structure in Language

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Key idea behind data-parallel languages - and perhaps all good languages

The language expresses problem and not the machine architecture

Need different approach to express functional parallelism

Use"object-oriented" feature when problems has natural objects

Do not use "object-oriented" features when objects are artifacts of target machine

Need data and functional (object) parallel paradigms in many problems - especially multidisciplinary

HTML version of Basic Foils prepared October 22 1997

Foil 99 The map of Problem ---> Computer is performed in two or more stages

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 100 From Problem to Machine Space as a Mapping

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 101 Different Granularities of Decomposition I

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

We can divide problems and machines into interconnected modules - and we do this at different granularities We can consider two limiting cases (I) High performance Low Latency links Machine o Classic MPP such as Intel Paragon Problem o QCD: Components are gluon fields. Links are local. Software model o High Performance Fortran o Currently only regular structure o Eventually all such tightly coupled loosely synchronous problems o Fortran + Message passing

HTML version of Basic Foils prepared October 22 1997

Foil 102 Different Granularities of Decomposition II

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 103 Compiler/Interpreter Tradeoffs at Different Levels of Granularity

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Can divide a problem into "highly optimized" v. "flexible" at different levels

HPF Compiler Component is full HPF Program Glue full programs together on bus

HPF Interpreter Component is single MATLAB/APL/HPF array statement Glue data parallel coarse grain array statement "function" calls together

HTML version of Basic Foils prepared October 22 1997

Foil 104 The Mapping of Heterogeneous MetaProblems onto Heterogeneous MetaComputer Systems

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 105 Mapping of Irregular Grid

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 106 Finding General Maps for FortranD/HPF

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 107 Three Physical Optimization Methods for Allocating Data to Multicomputer Nodes

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Three methods using physical analogies 1. Simulated Annealing Algorithm (SAA) Formulate optimization as finding ground state of physical systems. Use Monte Carlo to equilibrate and reduce temperature 2. Bold Neural Network (BNN) Mean field approximation to same analogy as in SAA to find quick minima. Related to "Eigenvalue Bisection" methods. 3. Genetic Algorithm (GA) Maximize fitness (minimize cost function) of evolving population of structures (candidate solutions).

HTML version of Basic Foils prepared October 22 1997

Foil 108 Finite Element Mesh

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

Click here for subtitle

HTML version of Basic Foils prepared October 22 1997

Foil 109 Graph Contraction

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 110 Some Results of Load Balancing Studies I

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 111 Some Results of Load Balancing Studies II

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 112 Physics Analogy for Load Balancing

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 113 Complex System SHLSoft governed by Hamiltonian = Execution Time

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 114 PHYSICS ANALOGY FOR STATIC AND DYNAMIC LOAD BALANCING

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 115 Definition of Temperature for a Complex System

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 116 Particle dynamics problem on a four node system

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 117 Energy Structure in Physics Analogy with Multiple Minima

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

HTML version of Basic Foils prepared October 22 1997

Foil 118 Phase Transitions in Physical model -- Scattered versus Domain Decompositions

From GCF Talk U - Lewis/NASA 3/9/92 Mardi Gras Conference on Concurrent Computing in Physical Sciences -- February 18, 1993. *

Full HTML Index

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Wed Oct 22 1997

Basic foilset Overview of HPC for Physical Sciences

Table of Contents for full HTML of Overview of HPC for Physical Sciences

Foil 1 Overview of High Performance Computing for Physical Sciences

Foil 2 Performance of Computers as a Function of Time

Foil 3 Performance of QCD Machines

Foil 4 When will Parallel Computing Take Over ?

Foil 5 Parallelism Implies

Foil 6 Program in Computational Science Implemented within current academic framework

Foil 7 We have learnt that Parallel Computing Works !

Foil 8 METHODOLOGY

Foil 9 Sideways View of Concurrent Computing Mappings

Foil 10 Sample uses of Concurrent Computers - 4 Nodes

Foil 11 Parallel Computing Trends to Year 2000

Foil 12 39 National Grand Challenges

Foil 13 39 National Grand Challenges (II)

Foil 14 39 National Grand Challenges (III)

Foil 15 39 National Grand Challenges (IV)

Foil 16 Most Grand Challenges are

Foil 17 ACTION Motivation

Foil 18 Applications of Interest to Industry IA

Foil 19 Applications of Interest to Industry IB

Foil 20 Applications of Interest to Industry IC

Foil 21 Pluses and Minuses of Parallel Computing

Foil 22 P C E T E C H Ñ A Result of

Foil 23 The Life History of an HPC Idea

Foil 24 Core Enabling Software Technologies

Foil 25 Core Enabling Algorithms and Components

Foil 26 Portable Scalable Languages

Foil 27 Problem Architectures

Foil 28 Software Problem Architecture Interactions

Foil 29 Why build on existing languages - especially Fortran / C

Foil 30 Fortran( C ) Plus Message Passing

Foil 31 Fortran( C ) Plus Message Passing

Foil 32 Data Parallel Fortran

Foil 33 Data Parallel Fortran ( C, C++, LISP, ADA )

Foil 34 The Origins of High Performance Fortran

Foil 35 Strategy for HPF

Foil 36 Model for Scalable Languages

Foil 37 Dynamically Triangulated Random Surfaces

Foil 38 Update Strategies for DTRS

Foil 39 N=36 Wireframe DTRS

Foil 40 N=144 Lambda =1.5 Colored DTRS

Foil 41 N=144 Lambda=1 Colored DTRS

Foil 42 Computational Issues and Code Performance Optimization

Foil 43 Sequential Computer Architecture

Foil 44 Performance of Random Surface Code

Foil 45 Parallel Algorithms

Foil 46 Parallel Grid Generation

Foil 47 What is Fortran 90?

Foil 48 HPF Language Features - 1

Foil 49 HPF Language Features - 2

Foil 50 FORTRAN-90D The First Implementation of HPF (NPAC, Syracuse University) Current Status

Foil 51 Fortran90D Performance on Gaussian Elimination

Foil 52 Gaussian Elimination Example of Fortran90D Source Code

Foil 53 Fortran90D Interface

Foil 54 HPF Directives

Foil 55 Data Alignment

Foil 56 Data Distribution

Foil 57 FORALL

Foil 58 DO INDEPENDENT

Foil 59 HPF Library

Foil 60 Intrinsic Library

Foil 61 HPF Library

Foil 62 Fortran 90 Local Routine Intrinsics

Foil 63 Does HPF need a different type of compiler ?

Foil 64 HPF Interpreter

Foil 65 Benchmarking Set ( Fortran90D and Fortran 77 )

Foil 66 HPF/FORTRAN-90D Benchmark Suite

Foil 67 The final draft of the HPF Language Specification is version 1.0 - DRAFT, dated January 25, 1993

Foil 68 Request for Public Comment on High Performance Fortran

Foil 69 Application Structure Versus Fortran90D Features

Foil 70 What applications does HPF support?

Foil 71 Current HPF can do I .....

Foil 72 Current HPF can do II.....

Foil 73 HPF can express using FORALL

Foil 74 Classic Irregular Mesh Application

Foil 75 HPF Can Express

Foil 76 Simulation of Spin Models of Magnets

Foil 77 New Monte Carlo Algorithms

Foil 78 Magnetization for Random Field Ising Model -- Metropolis

Foil 10 Sample uses of Concurrent Computers
- 4 Nodes

Foil 50 FORTRAN-90D
The First Implementation of HPF
(NPAC, Syracuse University)
Current Status

Foil 52 Gaussian Elimination
Example of Fortran90D Source Code

Foil 94 17x106 Body Simulation
Diameter 250Mpc
Quinn, Salmon, Warren, Zurek

Foil 95 The largest "galaxy" halo (137,000 bodies) from the 8.8M body simulation
(Warren, Fullagar, Quinn, Zurek)

Foil 96 8M bodies - 10 Mpc diameter
Final state with ~700 resolved "galaxies"
(Warren, Quinn, Zurek)