Full HTML for Basic DoD HPF Training -- 5. Parallel Programming in HPF

Full HTML for

Basic foilset DoD HPF Training -- 5. Parallel Programming in HPF

Given by Chuck Koelbel -- Rice University at DoD Training and Others on 1995-98. Foils prepared August 7 98
Outside Index Summary of Material

Fifth in series of Chuck Koelbel on HPF

Overall Discussion followed by Examples

Jacobi Iteration
Gaussian Elimination
Conjugate Gradient
Irregular Mesh

Of Increasing Complexity .....

Table of Contents for full HTML of DoD HPF Training -- 5. Parallel Programming in HPF

Denote Foils where Image Critical

Denote Foils where Image has important information

Denote Foils where HTML is sufficient

1

"High Performance Fortran in Practice" tutorial 5. Parallel Programming in HPF
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Completely Revised for Supercomputing '95
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998
2

Outline
3

Designing Parallel Algorithms
4

Designing Parallel Programs in HPF
5

Jacobi Iteration
Jacobi Iteration: The Algorithm
6

Jacobi Iteration:
Equations and Pictures
7

Jacobi Iteration: Partitioning
8

Jacobi Iteration: Communication
9

Jacobi Iteration: Agglomeration
10

Jacobi Iteration: HPF Program
11

Jacobi Iteration: Mapping
12

Gaussian Elimination
Gaussian Elimination: The Algorithm
13

Gaussian Elimination: Pictures
14

Gaussian Elimination: Partitioning
15

Gaussian Elimination: Communication
16

Gaussian Elimination: Agglomeration
17

Gaussian Elimination: HPF Program
18

Gaussian Elimination: Mapping
19

Conjugate Gradient
Conjugate Gradient: The Algorithm
20

Conjugate Gradient:
Equations and Pictures
21

Conjugate Gradient: More Equations
22

Conjugate Gradient: Partitioning
23

Conjugate Gradient: Communication
24

Conjugate Gradient: Agglomeration
25

Conjugate Gradient: HPF Program
26

Conjugate Gradient: Mapping
27

Irregular Mesh
Irregular Mesh Relaxation:
The Algorithm
28

Irregular Mesh: Sequential Program
29

Irregular Mesh:
Partitioning & Communication
30

Irregular Mesh: Agglomeration I
31

Irregular Mesh: Agglomeration II
32

Irregular Mesh: Pictures
33

Irregular Mesh: Bad Data Distribution
34

Irregular Mesh: Good Data Distribution
35

Irregular Mesh: HPF Program
36

Irregular Mesh: Mapping

Outside Index Summary of Material

HTML version of Basic Foils prepared August 7 98

Foil 1 "High Performance Fortran in Practice" tutorial 5. Parallel Programming in HPF
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Completely Revised for Supercomputing '95
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 2 Outline

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

1. Introduction to Data-Parallelism

2. Fortran 90/95 Features

3. HPF Parallel Features

4. HPF Data Mapping Features

5. Parallel Programming in HPF ** Contents of This Presentation

6. HPF Version 2.0

HTML version of Basic Foils prepared August 7 98

Foil 3 Designing Parallel Algorithms

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Partition

Divide problem into tasks to expose potential parallelism

Communicate

Determine amount and pattern of data to share between tasks

Agglomerate

Combine tasks to increase granularity and reduce communication

Map

Assign agglomerated tasks to physical processors

HTML version of Basic Foils prepared August 7 98

Foil 4 Designing Parallel Programs in HPF

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Partition

Divide problem into tasks
In HPF, look for array update operators
- Elemental computations, linear algebra

Communicate

Determine amount and pattern of communication
In HPF, consider which elements are combined
- Topological neighbors, data reductions

Agglomerate

Combine tasks
In HPF, choose the data mapping
- BLOCK for index locality, CYCLIC for load balance

Map

Assign agglomerated tasks to physical processors
In HPF, trust the compiler

HTML version of Basic Foils prepared August 7 98

Foil 5 Jacobi Iteration
Jacobi Iteration: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

The Problem

Given a partial differential equation & boundary conditions
Find the solution

The Approach

Divide (continuous) space into a (discrete) grid
Guess a solution on the grid
Update the solution at every grid point
Repeat update until solution doesn't change

HTML version of Basic Foils prepared August 7 98

Foil 6 Jacobi Iteration:
Equations and Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 7 Jacobi Iteration: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

At the most abstract level, Jacobi iteration is a sequential process.

However, each step in the process is itself composed of many smaller operations.

Conclusion: Jacobi iteration has plenty of data parallelism

HTML version of Basic Foils prepared August 7 98

Foil 8 Jacobi Iteration: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Element updates:

Each requires 4 values from previous update step
Static, local communication
- Generally, the second-fastest kind (after no communication)

Convergence test:

Uses all values from latest update step
Static, global communication
- Reduction operation
- Efficient methods known, encapsulated in libraries

HTML version of Basic Foils prepared August 7 98

Foil 9 Jacobi Iteration: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Element updates require nearest neighbors

CYCLIC Þ communicate entire array
BLOCK Þ least communication volume
(BLOCK,*) Þ move u(i-1,j), u(i+1,j) " j
(BLOCK,BLOCK) Þ move u(ILOW-1,j), u(IHIGH+1,j) " j, u(i,JLOW-1), u(i,JHIGH+1) " i

Convergence test requires a whole-array reduction

Any distribution Þ static, structured communication

The bottom line

(BLOCK,*) on high-latency machines or small problem sizes
(BLOCK,BLOCK) on low-latency machines

HTML version of Basic Foils prepared August 7 98

Foil 10 Jacobi Iteration: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

REAL u(0:nx,0:ny), unew(0:nx,0:ny), f(0:nx,0:ny)

!HPF$ DISTRIBUTE u(BLOCK,*)

!HPF$ ALIGN (:,:) WITH u(:,:) :: unew, f

dx = 1.0/nx; dy = 1.0/ny; err = tol * 1e6

FORALL ( i=0:nx, j=0:ny )

f(i,j) = -2*(dx*i)**2+2*dx*i-2*(dy*j)**2+2*dy*j

END FORALL

u = 0.0; unew = 0.0

DO WHILE (err > tol)

FORALL ( i=1:nx-1, j=1:ny-1 ) &
- unew(i,j) = (u(i-1,j)+u(i+1,j)+u(i,j-1)+ &
- u(i,j+1)+f(i,j))/4
err = MAXVAL( ABS(unew-u) )
u = unew

END DO

HTML version of Basic Foils prepared August 7 98

Foil 11 Jacobi Iteration: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

This program is a piece of cake for the compiler.

Allocate portion of array on each processor based on DISTRIBUTE
Apply owner-computes rule analytically based on left-hand side
Detect shift communication from dependence analysis of subscripts or pattern matching
Recognize MAXVAL intrinsic as reduction communication
Place all communication directly outside parallel construct where it occurs
Number processors so that shifts do not cause contention

HTML version of Basic Foils prepared August 7 98

Foil 12 Gaussian Elimination
Gaussian Elimination: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

The Problem

Given N linear equations in N unknowns xi
Find values of all xi to satisfy the equations

The Approach

Use Eq. 1 to eliminate x1 from Eq. 2, 3, N
Use Eq. 2 to eliminate x2 from Eq. 3, N
Eq. N only involves xN Þ Solve it!
Work backwards to find xN-1, , x1

HTML version of Basic Foils prepared August 7 98

Foil 13 Gaussian Elimination: Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 14 Gaussian Elimination: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

At the most abstract level, Gaussian elimination is a sequential process.

Need all elements in column k to find pivot row
Need all elements in column k and pivot row to perform pivoting

However, each step in the process is itself composed of many smaller operations.

Perform all element updates independently

Conclusion: Gaussian elimination has plenty of data parallelism

Until the last few stages, anyway

HTML version of Basic Foils prepared August 7 98

Foil 15 Gaussian Elimination: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Pivot search:

Reduction along column
Static, global communication

Element updates:

Each requires itself, elements from pivot column & row
Static, global communication
- Broadcast

HTML version of Basic Foils prepared August 7 98

Foil 16 Gaussian Elimination: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Pivot selection requires a 1-D reduction

Distribute rows Þ parallel, with communication
Distribute columns Þ sequential, but no communication

Element updates require the old value and elements from the pivot row and column

Distribute rows Þ parallel, but broadcast the pivot row
Distribute columns Þ parallel, but broadcast the pivot column

Each stage works on a smaller contiguous region of the array

BLOCK Þ processors drop out of the computation
CYCLIC Þ work stays (fairly) evenly distributed until the end

The bottom line

(*,CYCLIC) if broadcast > pivoting one column
(CYCLIC,*) if broadcast < one column, synchronous comm.
(CYCLIC,CYCLIC) if broadcast < one col., overlapped comm.

HTML version of Basic Foils prepared August 7 98

Foil 17 Gaussian Elimination: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

REAL a(n,n), tmp(n)

!HPF$ DISTRIBUTE a(CYCLIC,CYCLIC)

!HPF$ ALIGN tmp(i) WITH a(*,i)

DO k = 1, n-1

! Select the pivot
ipivot = MAXLOC( ABS(a(k:n,k)) ) + k - 1
! Swap the rows
tmp(k:n) = a(ipivot,k:n)
a(ipivot,k:n) = a(k,k:n)
a(k,k:n) = tmp(k:n)
! Update the submatrix
FORALL ( i=k+1:n, j=k+1:n ) &
- & a(i,j) = a(i,j) - a(i,k)/tmp(k)*tmp(j)

END DO

HTML version of Basic Foils prepared August 7 98

Foil 18 Gaussian Elimination: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

This program is harder for the compiler.

Allocate portion of array on each processor based on DISTRIBUTE
Apply owner-computes rule analytically based on left-hand side
Detect communication from dependence analysis & intrinsics
Here, it really pays to transform the program!
- Reorder computation to always precompute the next pivot column
- Rearrange communication to pipeline the series of updates
- Do broadcasts asynchronously
- Net result: 2¥ speedup
Use standard numbering for processors

HTML version of Basic Foils prepared August 7 98

Foil 19 Conjugate Gradient
Conjugate Gradient: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

The Problem

Given a partial differential equation & boundary conditions
Find the solution

The Approach

Divide (continuous) space into a (discrete) grid
Guess a solution on the grid
Estimate how the solution should change
Move in that direction
Repeat estimate and move until solution doesn't change

HTML version of Basic Foils prepared August 7 98

Foil 20 Conjugate Gradient:
Equations and Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 21 Conjugate Gradient: More Equations

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

u = · initial guess "

r = f A * u

d = max( |r| )

i = 0; r = 0

WHILE (d > e) DO

i = i + 1; rold = r
r = r × r
IF (i=1) THEN p = r ELSE p = r + r/rold p
q = A * p
a = r / (p × q)
u = u + a p
r = r a q
d = max( |r| )

END WHILE

HTML version of Basic Foils prepared August 7 98

Foil 22 Conjugate Gradient: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

At the most abstract level, conjugate gradient has minimal parallelism

u, r, a can be updated independently

The real potential parallelism is in the matrix and vector operations, however.

r × r is a reduction of size N
u + a p is a vector update of size N
A * p is a (sparse) matrix-vector multiply, in this case of size O(N)
- It looks a lot like the operator in Jacobi

Conclusions:

Stick to the matrix/vector operators
Task parallelism (and pipelining) may improve some

HTML version of Basic Foils prepared August 7 98

Foil 23 Conjugate Gradient: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Convergence test:

Global reduction

Dot products:

Global reductions

Vector updates:

Elementwise scaling and addition of two vectors
No communication if vectors are aligned

Matrix-vector multiplies:

Depends on the matrix (PDE operator) in the problem
For the model problem, equivalent to Jacobi iteration grid update
Static, local communication

HTML version of Basic Foils prepared August 7 98

Foil 24 Conjugate Gradient: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Dot products and convergence test always require global communication

No reason to pick one DISTRIBUTE over another

Vector updates require no communication

Really no reason to choose a particular DISTRIBUTE

Matrix-vector multiply does care where its data come from

In this case, same advantages/disadvantages as Jacobi iteration

The bottom line

(BLOCK,*) on high-latency machines or small problem sizes
(BLOCK,BLOCK) on low-latency machines

HTML version of Basic Foils prepared August 7 98

Foil 25 Conjugate Gradient: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

REAL u(0:n,0:n), r(0:n,0:n), p(0:n,0:n)

REAL q(0:n,0:n), f(0:n,0:n)

!HPF$ DISTRIBUTE u(BLOCK,*)

!HPF$ ALIGN (:,:) WITH u(:,:) :: r, p, q, f

INTERFACE

SUBROUTINE a_times_vector( x, y )
REAL, INTENT(IN) :: x(:,:)
REAL, INTENT(OUT) :: y(:,:)
!HPF$ DISTRIBUTE x *(BLOCK,*)
!HPF$ ALIGN y(:,:) WITH *x(:,:)

END INTERFACE

u = 0.0

r = f

err = MAXVAL( ABS(r(1:n-1,1:n-1)) )

i = 0; rho = 0

HTML version of Basic Foils prepared August 7 98

Foil 26 Conjugate Gradient: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

This program looks like a more complicated version of Jacobi iteration to the compiler.

Allocate arrays based on DISTRIBUTE
Apply owner-computes rule
Detect communication from dependence analysis and intrinsics
Useful optimizations include aggregating communication, overlapping communication with computation
All of this becomes more interesting if the program encapsulates operations in a subroutine
- The programmer must trade off efficiency for flexibility and maintainability

HTML version of Basic Foils prepared August 7 98

Foil 27 Irregular Mesh
Irregular Mesh Relaxation:
The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

The Problem

Given an irregular mesh of values
Update each value using its neighbors in the mesh

The Approach

Store the mesh as a list of edges
Process all edges in paralle
- Compute contribution of edge
- Add to one endpoint, subtract from the otherl

HTML version of Basic Foils prepared August 7 98

Foil 28 Irregular Mesh: Sequential Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

REAL x(nnode), flux(nedge)

INTEGER iedge(nedge,2)

err = tol * 1e6

DO WHILE (err > tol)

DO i = 1, nedge
- flux(i) = (x(iedge(i,1))-x(iedge(i,2))) / 2
- err = err + flux(i)*flux(i)
END DO
DO i = 1, nedge
- x(iedge(i,1)) = x(iedge(i,1)) - flux(i)
- x(iedge(i,2)) = x(iedge(i,2)) + flux(i)
END DO
err = err / nedge

END DO

HTML version of Basic Foils prepared August 7 98

Foil 29 Irregular Mesh:
Partitioning & Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Each iteration of the relaxation uses all the data computed in the previous step, and the edge array

No parallelism at this level
(Does this sound familiar?)

Instead, use the data-parallel edge and node updates

flux(i)=(x(iedge(i,1))-x(iedge(i,2)))/2
- Independent because edge_val ‚ node_val
x(iedge(i,1)) = x(iedge(i,1)) - flux(i);
x(iedge(i,2)) = x(iedge(i,2)) + flux(i)
- Not independent because sometimes iedge(iY,1) = iedge(iX,2)
- Fortunately, HPF provides the SUM_SCATTER function

Communication needed in both stages

Between edges and nodes to compute flux
Edge-node and node-node to compute x
All communication is static, local with respect to grid, but unstructured with respect to array indices

HTML version of Basic Foils prepared August 7 98

Foil 30 Irregular Mesh: Agglomeration I

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Warning: Your compiler may do things differently!

Computing edge values requires edge list and node values

Distribute edges Þ parallel, no communication for edges
Replicate edges Þ sequential or broadcast edge values
Distribute nodes Þ move ³shared² endpoints
Replicate nodes Þ no movement for endpoints

Updating node values requires edge list, edge values, and node values

Distribute edges Þ parallel, no communication for edges
Replicate edges Þ sequential, no communication for edges
Distribute nodes Þ move ³shared² endpoints
Replicate nodes Þ move all endpoints

The bottom line, part I

Always distribute edges
Distribute nodes unless the problem is very small

HTML version of Basic Foils prepared August 7 98

Foil 31 Irregular Mesh: Agglomeration II

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

Warning: Your compiler may do things differently!

Computation is static, homogeneous, and over full array (with respect to the edges)

No load balancing issues

Accesses to node array are ³nearest neighbor² in the mesh

This is not reflected in the index order!
\ This does not favor either BLOCK or CYCLIC

To minimize communication, edge and node distributions must fit the mesh topology

HPF¹s regular distributions are not ideal for this
HPF 2.0 indirect distributions may be better, but require careful construction

The bottom line, part II

No silver bullet
Order the nodes and edges to bring ³close² entities together, then use BLOCK

HTML version of Basic Foils prepared August 7 98

Foil 32 Irregular Mesh: Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 33 Irregular Mesh: Bad Data Distribution

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 34 Irregular Mesh: Good Data Distribution

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 35 Irregular Mesh: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

USE HPF_LIBRARY

REAL x(nnode), flux(nedge)

INTEGER iedge(nedge,2)

INTEGER permute_node(nnode), permute_edge(nedge)

!HPF$ DISTRIBUTE x(BLOCK)

!HPF$ DISTRIBUTE flux(BLOCK)

!HPF$ ALIGN iedge(i,*) WITH flux(i)

!HPF$ ALIGN permute_edge(i) WITH flux(i)

!HPF$ ALIGN permute_node(i) WITH x(i)

CALL renumber_nodes( iedge, permute_node )

x( permute_node(:) ) = x

FORALL (i=1:nedge) iedge(i,:) = permute_node(iedge(i,:))

permute_edge = GRADE_UP( iedge(:,1) )

FORALL (i=1:nedge) iedge(i,:) = iedge(permute_edge(i),:)

err = tol * 1e6

DO WHILE (err > tol)

flux=(x(iedge(1:nedge,1))-x(iedge(1:nedge,2)))/2
x=SUM_SCATTER(-flux(1:nedge),x,iedge(1:nedge,1))
x=SUM_SCATTER( flux(1:nedge),x,iedge(1:nedge,2))
err = SUM( flux*flux ) / nedge

END DO

HTML version of Basic Foils prepared August 7 98

Foil 36 Irregular Mesh: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *

Full HTML Index

This program is going to be challenging

Indexing of arrays will be difficult
Owner-computes rule difficult to apply
Key technique: inspector-executor communication
- First time the code is executed, generate a table of required communication at run time (inspector)
- Problem: How big does that table get?
- Problem: How do you efficiently distribute that table to all processors?
- Use this table to manage unstructured communication until the communication pattern changes (executor)
- Problem: How do you know the pattern has changed?
Commercial compilers are attacking these problems, but there is a long way to go

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 16 1998

Basic foilset DoD HPF Training -- 5. Parallel Programming in HPF

Table of Contents for full HTML of DoD HPF Training -- 5. Parallel Programming in HPF

Foil 2 Outline

Foil 3 Designing Parallel Algorithms

Foil 4 Designing Parallel Programs in HPF

Foil 5 Jacobi IterationJacobi Iteration: The Algorithm

Foil 6 Jacobi Iteration: Equations and Pictures

Foil 7 Jacobi Iteration: Partitioning

Foil 8 Jacobi Iteration: Communication

Foil 9 Jacobi Iteration: Agglomeration

Foil 10 Jacobi Iteration: HPF Program

Foil 11 Jacobi Iteration: Mapping

Foil 12 Gaussian EliminationGaussian Elimination: The Algorithm

Foil 13 Gaussian Elimination: Pictures

Foil 14 Gaussian Elimination: Partitioning

Foil 15 Gaussian Elimination: Communication

Foil 16 Gaussian Elimination: Agglomeration

Foil 17 Gaussian Elimination: HPF Program

Foil 18 Gaussian Elimination: Mapping

Foil 19 Conjugate GradientConjugate Gradient: The Algorithm

Foil 20 Conjugate Gradient: Equations and Pictures

Foil 21 Conjugate Gradient: More Equations

Foil 22 Conjugate Gradient: Partitioning

Foil 23 Conjugate Gradient: Communication

Foil 24 Conjugate Gradient: Agglomeration

Foil 25 Conjugate Gradient: HPF Program

Foil 26 Conjugate Gradient: Mapping

Foil 27 Irregular MeshIrregular Mesh Relaxation: The Algorithm

Foil 28 Irregular Mesh: Sequential Program

Foil 29 Irregular Mesh: Partitioning & Communication

Foil 30 Irregular Mesh: Agglomeration I

Foil 31 Irregular Mesh: Agglomeration II

Foil 32 Irregular Mesh: Pictures

Foil 33 Irregular Mesh: Bad Data Distribution

Foil 34 Irregular Mesh: Good Data Distribution

Foil 35 Irregular Mesh: HPF Program

Foil 36 Irregular Mesh: Mapping

Foil 5 Jacobi Iteration
Jacobi Iteration: The Algorithm

Foil 6 Jacobi Iteration:
Equations and Pictures

Foil 12 Gaussian Elimination
Gaussian Elimination: The Algorithm

Foil 19 Conjugate Gradient
Conjugate Gradient: The Algorithm

Foil 20 Conjugate Gradient:
Equations and Pictures

Foil 27 Irregular Mesh
Irregular Mesh Relaxation:
The Algorithm

Foil 29 Irregular Mesh:
Partitioning & Communication