Full HTML for

Basic foilset DoD HPF Training -- 5. Parallel Programming in HPF

Given by Chuck Koelbel -- Rice University at DoD Training and Others on 1995-98. Foils prepared August 7 98
Outside Index Summary of Material


Fifth in series of Chuck Koelbel on HPF
Overall Discussion followed by Examples
  • Jacobi Iteration
  • Gaussian Elimination
  • Conjugate Gradient
  • Irregular Mesh
Of Increasing Complexity .....

Table of Contents for full HTML of DoD HPF Training -- 5. Parallel Programming in HPF

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 "High Performance Fortran in Practice" tutorial 5. Parallel Programming in HPF
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Completely Revised for Supercomputing '95
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

2 Outline
3 Designing Parallel Algorithms
4 Designing Parallel Programs in HPF
5 Jacobi Iteration
Jacobi Iteration: The Algorithm

6 Jacobi Iteration:
Equations and Pictures

7 Jacobi Iteration: Partitioning
8 Jacobi Iteration: Communication
9 Jacobi Iteration: Agglomeration
10 Jacobi Iteration: HPF Program
11 Jacobi Iteration: Mapping
12 Gaussian Elimination
Gaussian Elimination: The Algorithm

13 Gaussian Elimination: Pictures
14 Gaussian Elimination: Partitioning
15 Gaussian Elimination: Communication
16 Gaussian Elimination: Agglomeration
17 Gaussian Elimination: HPF Program
18 Gaussian Elimination: Mapping
19 Conjugate Gradient
Conjugate Gradient: The Algorithm

20 Conjugate Gradient:
Equations and Pictures

21 Conjugate Gradient: More Equations
22 Conjugate Gradient: Partitioning
23 Conjugate Gradient: Communication
24 Conjugate Gradient: Agglomeration
25 Conjugate Gradient: HPF Program
26 Conjugate Gradient: Mapping
27 Irregular Mesh
Irregular Mesh Relaxation:
The Algorithm

28 Irregular Mesh: Sequential Program
29 Irregular Mesh:
Partitioning & Communication

30 Irregular Mesh: Agglomeration I
31 Irregular Mesh: Agglomeration II
32 Irregular Mesh: Pictures
33 Irregular Mesh: Bad Data Distribution
34 Irregular Mesh: Good Data Distribution
35 Irregular Mesh: HPF Program
36 Irregular Mesh: Mapping

Outside Index Summary of Material



HTML version of Basic Foils prepared August 7 98

Foil 1 "High Performance Fortran in Practice" tutorial 5. Parallel Programming in HPF
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Completely Revised for Supercomputing '95
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 2 Outline

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
1. Introduction to Data-Parallelism
2. Fortran 90/95 Features
3. HPF Parallel Features
4. HPF Data Mapping Features
5. Parallel Programming in HPF ** Contents of This Presentation
6. HPF Version 2.0

HTML version of Basic Foils prepared August 7 98

Foil 3 Designing Parallel Algorithms

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Partition
  • Divide problem into tasks to expose potential parallelism
Communicate
  • Determine amount and pattern of data to share between tasks
Agglomerate
  • Combine tasks to increase granularity and reduce communication
Map
  • Assign agglomerated tasks to physical processors

HTML version of Basic Foils prepared August 7 98

Foil 4 Designing Parallel Programs in HPF

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Partition
  • Divide problem into tasks
  • In HPF, look for array update operators
    • Elemental computations, linear algebra
Communicate
  • Determine amount and pattern of communication
  • In HPF, consider which elements are combined
    • Topological neighbors, data reductions
Agglomerate
  • Combine tasks
  • In HPF, choose the data mapping
    • BLOCK for index locality, CYCLIC for load balance
Map
  • Assign agglomerated tasks to physical processors
  • In HPF, trust the compiler

HTML version of Basic Foils prepared August 7 98

Foil 5 Jacobi Iteration
Jacobi Iteration: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The Problem
  • Given a partial differential equation & boundary conditions
  • Find the solution
The Approach
  • Divide (continuous) space into a (discrete) grid
  • Guess a solution on the grid
  • Update the solution at every grid point
  • Repeat update until solution doesn't change

HTML version of Basic Foils prepared August 7 98

Foil 6 Jacobi Iteration:
Equations and Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 7 Jacobi Iteration: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
At the most abstract level, Jacobi iteration is a sequential process.
However, each step in the process is itself composed of many smaller operations.
Conclusion: Jacobi iteration has plenty of data parallelism

HTML version of Basic Foils prepared August 7 98

Foil 8 Jacobi Iteration: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Element updates:
  • Each requires 4 values from previous update step
  • Static, local communication
    • Generally, the second-fastest kind (after no communication)
Convergence test:
  • Uses all values from latest update step
  • Static, global communication
    • Reduction operation
    • Efficient methods known, encapsulated in libraries

HTML version of Basic Foils prepared August 7 98

Foil 9 Jacobi Iteration: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Element updates require nearest neighbors
  • CYCLIC Þ communicate entire array
  • BLOCK Þ least communication volume
  • (BLOCK,*) Þ move u(i-1,j), u(i+1,j) " j
  • (BLOCK,BLOCK) Þ move u(ILOW-1,j), u(IHIGH+1,j) " j, u(i,JLOW-1), u(i,JHIGH+1) " i
Convergence test requires a whole-array reduction
  • Any distribution Þ static, structured communication
The bottom line
  • (BLOCK,*) on high-latency machines or small problem sizes
  • (BLOCK,BLOCK) on low-latency machines

HTML version of Basic Foils prepared August 7 98

Foil 10 Jacobi Iteration: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
REAL u(0:nx,0:ny), unew(0:nx,0:ny), f(0:nx,0:ny)
!HPF$ DISTRIBUTE u(BLOCK,*)
!HPF$ ALIGN (:,:) WITH u(:,:) :: unew, f
dx = 1.0/nx; dy = 1.0/ny; err = tol * 1e6
FORALL ( i=0:nx, j=0:ny )
  • f(i,j) = -2*(dx*i)**2+2*dx*i-2*(dy*j)**2+2*dy*j
END FORALL
u = 0.0; unew = 0.0
DO WHILE (err > tol)
  • FORALL ( i=1:nx-1, j=1:ny-1 ) &
    • unew(i,j) = (u(i-1,j)+u(i+1,j)+u(i,j-1)+ &
    • u(i,j+1)+f(i,j))/4
  • err = MAXVAL( ABS(unew-u) )
  • u = unew
END DO

HTML version of Basic Foils prepared August 7 98

Foil 11 Jacobi Iteration: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
This program is a piece of cake for the compiler.
  • Allocate portion of array on each processor based on DISTRIBUTE
  • Apply owner-computes rule analytically based on left-hand side
  • Detect shift communication from dependence analysis of subscripts or pattern matching
  • Recognize MAXVAL intrinsic as reduction communication
  • Place all communication directly outside parallel construct where it occurs
  • Number processors so that shifts do not cause contention

HTML version of Basic Foils prepared August 7 98

Foil 12 Gaussian Elimination
Gaussian Elimination: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The Problem
  • Given N linear equations in N unknowns xi
  • Find values of all xi to satisfy the equations
The Approach
  • Use Eq. 1 to eliminate x1 from Eq. 2, 3, N
  • Use Eq. 2 to eliminate x2 from Eq. 3, N
  • Eq. N only involves xN Þ Solve it!
  • Work backwards to find xN-1, , x1

HTML version of Basic Foils prepared August 7 98

Foil 13 Gaussian Elimination: Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 14 Gaussian Elimination: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
At the most abstract level, Gaussian elimination is a sequential process.
  • Need all elements in column k to find pivot row
  • Need all elements in column k and pivot row to perform pivoting
However, each step in the process is itself composed of many smaller operations.
  • Perform all element updates independently
Conclusion: Gaussian elimination has plenty of data parallelism
  • Until the last few stages, anyway

HTML version of Basic Foils prepared August 7 98

Foil 15 Gaussian Elimination: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Pivot search:
  • Reduction along column
  • Static, global communication
Element updates:
  • Each requires itself, elements from pivot column & row
  • Static, global communication
    • Broadcast

HTML version of Basic Foils prepared August 7 98

Foil 16 Gaussian Elimination: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Pivot selection requires a 1-D reduction
  • Distribute rows Þ parallel, with communication
  • Distribute columns Þ sequential, but no communication
Element updates require the old value and elements from the pivot row and column
  • Distribute rows Þ parallel, but broadcast the pivot row
  • Distribute columns Þ parallel, but broadcast the pivot column
Each stage works on a smaller contiguous region of the array
  • BLOCK Þ processors drop out of the computation
  • CYCLIC Þ work stays (fairly) evenly distributed until the end
The bottom line
  • (*,CYCLIC) if broadcast > pivoting one column
  • (CYCLIC,*) if broadcast < one column, synchronous comm.
  • (CYCLIC,CYCLIC) if broadcast < one col., overlapped comm.

HTML version of Basic Foils prepared August 7 98

Foil 17 Gaussian Elimination: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
REAL a(n,n), tmp(n)
!HPF$ DISTRIBUTE a(CYCLIC,CYCLIC)
!HPF$ ALIGN tmp(i) WITH a(*,i)
DO k = 1, n-1
  • ! Select the pivot
  • ipivot = MAXLOC( ABS(a(k:n,k)) ) + k - 1
  • ! Swap the rows
  • tmp(k:n) = a(ipivot,k:n)
  • a(ipivot,k:n) = a(k,k:n)
  • a(k,k:n) = tmp(k:n)
  • ! Update the submatrix
  • FORALL ( i=k+1:n, j=k+1:n ) &
    • & a(i,j) = a(i,j) - a(i,k)/tmp(k)*tmp(j)
END DO

HTML version of Basic Foils prepared August 7 98

Foil 18 Gaussian Elimination: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
This program is harder for the compiler.
  • Allocate portion of array on each processor based on DISTRIBUTE
  • Apply owner-computes rule analytically based on left-hand side
  • Detect communication from dependence analysis & intrinsics
  • Here, it really pays to transform the program!
    • Reorder computation to always precompute the next pivot column
    • Rearrange communication to pipeline the series of updates
    • Do broadcasts asynchronously
    • Net result: 2¥ speedup
  • Use standard numbering for processors

HTML version of Basic Foils prepared August 7 98

Foil 19 Conjugate Gradient
Conjugate Gradient: The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The Problem
  • Given a partial differential equation & boundary conditions
  • Find the solution
The Approach
  • Divide (continuous) space into a (discrete) grid
  • Guess a solution on the grid
  • Estimate how the solution should change
  • Move in that direction
  • Repeat estimate and move until solution doesn't change

HTML version of Basic Foils prepared August 7 98

Foil 20 Conjugate Gradient:
Equations and Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 21 Conjugate Gradient: More Equations

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
u = · initial guess "
r = f ­ A * u
d = max( |r| )
i = 0; r = 0
WHILE (d > e) DO
  • i = i + 1; rold = r
  • r = r × r
  • IF (i=1) THEN p = r ELSE p = r + r/rold p
  • q = A * p
  • a = r / (p × q)
  • u = u + a p
  • r = r ­ a q
  • d = max( |r| )
END WHILE

HTML version of Basic Foils prepared August 7 98

Foil 22 Conjugate Gradient: Partitioning

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
At the most abstract level, conjugate gradient has minimal parallelism
  • u, r, a can be updated independently
The real potential parallelism is in the matrix and vector operations, however.
  • r × r is a reduction of size N
  • u + a p is a vector update of size N
  • A * p is a (sparse) matrix-vector multiply, in this case of size O(N)
    • It looks a lot like the operator in Jacobi
Conclusions:
  • Stick to the matrix/vector operators
  • Task parallelism (and pipelining) may improve some

HTML version of Basic Foils prepared August 7 98

Foil 23 Conjugate Gradient: Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Convergence test:
  • Global reduction
Dot products:
  • Global reductions
Vector updates:
  • Elementwise scaling and addition of two vectors
  • No communication if vectors are aligned
Matrix-vector multiplies:
  • Depends on the matrix (PDE operator) in the problem
  • For the model problem, equivalent to Jacobi iteration grid update
  • Static, local communication

HTML version of Basic Foils prepared August 7 98

Foil 24 Conjugate Gradient: Agglomeration

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Dot products and convergence test always require global communication
  • No reason to pick one DISTRIBUTE over another
Vector updates require no communication
  • Really no reason to choose a particular DISTRIBUTE
Matrix-vector multiply does care where its data come from
  • In this case, same advantages/disadvantages as Jacobi iteration
The bottom line
  • (BLOCK,*) on high-latency machines or small problem sizes
  • (BLOCK,BLOCK) on low-latency machines

HTML version of Basic Foils prepared August 7 98

Foil 25 Conjugate Gradient: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
REAL u(0:n,0:n), r(0:n,0:n), p(0:n,0:n)
REAL q(0:n,0:n), f(0:n,0:n)
!HPF$ DISTRIBUTE u(BLOCK,*)
!HPF$ ALIGN (:,:) WITH u(:,:) :: r, p, q, f
INTERFACE
  • SUBROUTINE a_times_vector( x, y )
  • REAL, INTENT(IN) :: x(:,:)
  • REAL, INTENT(OUT) :: y(:,:)
  • !HPF$ DISTRIBUTE x *(BLOCK,*)
  • !HPF$ ALIGN y(:,:) WITH *x(:,:)
END INTERFACE
u = 0.0
r = f
err = MAXVAL( ABS(r(1:n-1,1:n-1)) )
i = 0; rho = 0

HTML version of Basic Foils prepared August 7 98

Foil 26 Conjugate Gradient: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
This program looks like a more complicated version of Jacobi iteration to the compiler.
  • Allocate arrays based on DISTRIBUTE
  • Apply owner-computes rule
  • Detect communication from dependence analysis and intrinsics
  • Useful optimizations include aggregating communication, overlapping communication with computation
  • All of this becomes more interesting if the program encapsulates operations in a subroutine
    • The programmer must trade off efficiency for flexibility and maintainability

HTML version of Basic Foils prepared August 7 98

Foil 27 Irregular Mesh
Irregular Mesh Relaxation:
The Algorithm

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The Problem
  • Given an irregular mesh of values
  • Update each value using its neighbors in the mesh
The Approach
  • Store the mesh as a list of edges
  • Process all edges in paralle
    • Compute contribution of edge
    • Add to one endpoint, subtract from the otherl

HTML version of Basic Foils prepared August 7 98

Foil 28 Irregular Mesh: Sequential Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
REAL x(nnode), flux(nedge)
INTEGER iedge(nedge,2)
err = tol * 1e6
DO WHILE (err > tol)
  • DO i = 1, nedge
    • flux(i) = (x(iedge(i,1))-x(iedge(i,2))) / 2
    • err = err + flux(i)*flux(i)
  • END DO
  • DO i = 1, nedge
    • x(iedge(i,1)) = x(iedge(i,1)) - flux(i)
    • x(iedge(i,2)) = x(iedge(i,2)) + flux(i)
  • END DO
  • err = err / nedge
END DO

HTML version of Basic Foils prepared August 7 98

Foil 29 Irregular Mesh:
Partitioning & Communication

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Each iteration of the relaxation uses all the data computed in the previous step, and the edge array
  • No parallelism at this level
  • (Does this sound familiar?)
Instead, use the data-parallel edge and node updates
  • flux(i)=(x(iedge(i,1))-x(iedge(i,2)))/2
    • Independent because edge_val ‚ node_val
  • x(iedge(i,1)) = x(iedge(i,1)) - flux(i);
  • x(iedge(i,2)) = x(iedge(i,2)) + flux(i)
    • Not independent because sometimes iedge(iY,1) = iedge(iX,2)
    • Fortunately, HPF provides the SUM_SCATTER function
Communication needed in both stages
  • Between edges and nodes to compute flux
  • Edge-node and node-node to compute x
  • All communication is static, local with respect to grid, but unstructured with respect to array indices

HTML version of Basic Foils prepared August 7 98

Foil 30 Irregular Mesh: Agglomeration I

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Warning: Your compiler may do things differently!
Computing edge values requires edge list and node values
  • Distribute edges Þ parallel, no communication for edges
  • Replicate edges Þ sequential or broadcast edge values
  • Distribute nodes Þ move ³shared² endpoints
  • Replicate nodes Þ no movement for endpoints
Updating node values requires edge list, edge values, and node values
  • Distribute edges Þ parallel, no communication for edges
  • Replicate edges Þ sequential, no communication for edges
  • Distribute nodes Þ move ³shared² endpoints
  • Replicate nodes Þ move all endpoints
The bottom line, part I
  • Always distribute edges
  • Distribute nodes unless the problem is very small

HTML version of Basic Foils prepared August 7 98

Foil 31 Irregular Mesh: Agglomeration II

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Warning: Your compiler may do things differently!
Computation is static, homogeneous, and over full array (with respect to the edges)
  • No load balancing issues
Accesses to node array are ³nearest neighbor² in the mesh
  • This is not reflected in the index order!
  • \ This does not favor either BLOCK or CYCLIC
To minimize communication, edge and node distributions must fit the mesh topology
  • HPF¹s regular distributions are not ideal for this
  • HPF 2.0 indirect distributions may be better, but require careful construction
The bottom line, part II
  • No silver bullet
  • Order the nodes and edges to bring ³close² entities together, then use BLOCK

HTML version of Basic Foils prepared August 7 98

Foil 32 Irregular Mesh: Pictures

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 33 Irregular Mesh: Bad Data Distribution

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 34 Irregular Mesh: Good Data Distribution

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 35 Irregular Mesh: HPF Program

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
USE HPF_LIBRARY
REAL x(nnode), flux(nedge)
INTEGER iedge(nedge,2)
INTEGER permute_node(nnode), permute_edge(nedge)
!HPF$ DISTRIBUTE x(BLOCK)
!HPF$ DISTRIBUTE flux(BLOCK)
!HPF$ ALIGN iedge(i,*) WITH flux(i)
!HPF$ ALIGN permute_edge(i) WITH flux(i)
!HPF$ ALIGN permute_node(i) WITH x(i)
CALL renumber_nodes( iedge, permute_node )
x( permute_node(:) ) = x
FORALL (i=1:nedge) iedge(i,:) = permute_node(iedge(i,:))
permute_edge = GRADE_UP( iedge(:,1) )
FORALL (i=1:nedge) iedge(i,:) = iedge(permute_edge(i),:)
err = tol * 1e6
DO WHILE (err > tol)
  • flux=(x(iedge(1:nedge,1))-x(iedge(1:nedge,2)))/2
  • x=SUM_SCATTER(-flux(1:nedge),x,iedge(1:nedge,1))
  • x=SUM_SCATTER( flux(1:nedge),x,iedge(1:nedge,2))
  • err = SUM( flux*flux ) / nedge
END DO

HTML version of Basic Foils prepared August 7 98

Foil 36 Irregular Mesh: Mapping

From DoD HPF Training -- 5. Parallel Programming in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
This program is going to be challenging
  • Indexing of arrays will be difficult
  • Owner-computes rule difficult to apply
  • Key technique: inspector-executor communication
    • First time the code is executed, generate a table of required communication at run time (inspector)
    • Problem: How big does that table get?
    • Problem: How do you efficiently distribute that table to all processors?
    • Use this table to manage unstructured communication until the communication pattern changes (executor)
    • Problem: How do you know the pattern has changed?
  • Commercial compilers are attacking these problems, but there is a long way to go

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 16 1998