Full HTML for

Basic foilset DoD HPF Training -- 3. Parallel Constructs in HPF

Given by Chuck Koelbel -- Rice University at DoD Training and Others on 1995-98. Foils prepared August 7 98
Outside Index Summary of Material


Third in Chuck Koelbel's HPF Presentations
FORALL Pure Independent
Library -- Intrinsics Extrinsics LOCAL

Table of Contents for full HTML of DoD HPF Training -- 3. Parallel Constructs in HPF

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 "High Performance Fortran in Practice" tutorial -- 3. HPF Parallel Features
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Presented at Supercomputer '95, Mannheim, Germany, June 27, 1995
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Affiliates, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC '97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

2 Outline
3 Data-Parallel Statements
4 B. FORALL statements, etc.
The Single-Statement FORALL

5 The Multi-Statement FORALL
6 An Example of FORALL
7 An Example of DO
8 An Example of Nested FORALLs
9 Why Use FORALL?
10 Determinate Behavior of FORALL
11 Implementation of FORALL
12 Implementation of FORALL (cont.)
13 D. PURE functions
PURE Functions

14 PURE Functions in Pictures
15 PURE Functions and FORALL
16 Why Use PURE Functions?
17 PURE for Mandelbrot Sets
18 Avoiding the PURE Function in Mandelbrot
19 A. INDEPENDENT directives
The INDEPENDENT Directive

20 An Example of INDEPENDENT
21 Another Example of FORALL
With INDEPENDENT

22 An Example of
Nested INDEPENDENT FORALLs

23 The INDEPENDENT Directive:
More Details

24 Examples of Correct
INDEPENDENT Assertions

25 Examples of Incorrect
INDEPENDENT Assertions

26 Example of Data-Dependent
INDEPENDENT Assertion

27 Why Use INDEPENDENT?
28 Implementation of INDEPENDENT
29 In Summary:
DO, FORALL and INDEPENDENT

30 Hints for Using
Data Parallel Statements

31 C. Library, intrinsic, and EXTRINSIC functions
The HPF Library and New Intrinsics

32 Examples of HPF Library
33 Why Use the HPF Library?
34 Typical Uses of HPF Library
35 Implementation of the HPF 1 Library
36 EXTRINSIC Procedures
37 EXTRINSIC(HPF_LOCAL)
38 Example of HPF_LOCAL
EXTRINSIC(F77_LOCAL)

39 Example of F77_LOCAL
(From NAS FT Benchmark)

40 Example of F77_LOCAL
(From NAS FT Benchmark) II

41 Example of F77_LOCAL
(From NAS FT Benchmark) III

42 Why Use EXTRINSIC Procedures?

Outside Index Summary of Material



HTML version of Basic Foils prepared August 7 98

Foil 1 "High Performance Fortran in Practice" tutorial -- 3. HPF Parallel Features
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Presented at Supercomputer '95, Mannheim, Germany, June 27, 1995
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Affiliates, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC '97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 2 Outline

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
1. Introduction to Data-Parallelism
2. Fortran 90/95 Features
3. HPF Parallel Features ** This Presentation
4. HPF Data Mapping Features
5. Parallel Programming in HPF
6. HPF Version 2.0

HTML version of Basic Foils prepared August 7 98

Foil 3 Data-Parallel Statements

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Data parallelism emphasizes having many fine-grain operations, such as computations on every element of an array
HPF has several ways to exploit data parallelism:
  • Array expressions: Taken from Fortran 90
  • FORALL: Tightly-coupled parallel execution based on the structure of an index space
    • PURE: Procedures without side effects that may be called in FORALL
  • INDEPENDENT: Assertion that iterations do not interfere with each other
  • HPF library and intrinsics: Extended from Fortran 90

HTML version of Basic Foils prepared August 7 98

Foil 4 B. FORALL statements, etc.
The Single-Statement FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Syntax:
  • FORALL ( index-spec-list [,mask-expr] ) forall-assignment
  • index-spec is int-variable = triplet-spec
  • forall-assignment is ordinary assignment or pointer assignment
Semantics:
  • Equivalent to array assignment in Fortran 90
  • For each value of indices, check the mask
  • Compute right-hand sides for unmasked values
  • Make assignments to left-hand sides for unmasked values
  • Multiple assignments to the same location are not standard-conforming (i.e. are undefined)
Note: FORALL is not a general-purpose parallel loop!

HTML version of Basic Foils prepared August 7 98

Foil 5 The Multi-Statement FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Syntax:
  • FORALL ( index-spec-list [, mask] )
    • forall-body-list
  • END FORALL
  • forall-body can be a forall-assignment, FORALL, or WHERE
Semantics:
  • Multi-statement FORALL is shorthand for a series of single-statement FORALLs
  • Multi-statement FORALLs can be nested to produce more complex iteration spaces
  • Each bottom-level assignment statement is completed before the next one starts
Note: FORALL is not a general-purpose parallel loop!

HTML version of Basic Foils prepared August 7 98

Foil 6 An Example of FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Initially,
  • a = [0, 1, 2, 3, 4]
  • b = [0, 10, 20, 30, 40]
  • c = [-1, -1, -1, -1, -1]
FORALL ( i = 2:4 )
  • a(i) = a(i-1) + a(i+1)
  • c(i) = b(i) * a(i+1)
END FORALL
Afterwards,
  • a = [0, 2, 4, 6, 4]
  • b = [0, 10, 20, 30, 40]
  • c = [-1, 40, 120, 120, -1]

HTML version of Basic Foils prepared August 7 98

Foil 7 An Example of DO

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Initially,
  • a = [0, 1, 2, 3, 4]
  • b = [10, 20, 30, 40, 50]
  • c = [-1, -1, -1, -1, -1]
DO i = 2, 4
  • a(i) = a(i-1) + a(i+1)
  • c(i) = b(i) * a(i+1)
END DO
Afterwards,
  • a = [0, 2, 5, 9, 4]
  • b = [10, 20, 30, 40, 50]
  • c = [-1, 20, 60, 120, -1]

HTML version of Basic Foils prepared August 7 98

Foil 8 An Example of Nested FORALLs

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
FORALL ( i = 1:3 )
  • a(i) = b(i)
  • FORALL ( j = 1:i )
    • c(i,j) = d(i,j)
  • END FORALL
END FORALL

HTML version of Basic Foils prepared August 7 98

Foil 9 Why Use FORALL?

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Assignments to array sections
  • FORALL ( i = 1:4, j = 2:4 ) a(i,j) = a(i,j-1)
  • FORALL ( i = 1:4 ) a(i,i) = a(i,i) * scale
  • FORALL ( i = 1:4 )
    • FORALL ( j=i:4 ) a(i,j) = a(i,j) / a(i,i)
  • END FORALL
  • FORALL ( i=1:4 )
    • FORALL (j=ilo(i):ihi(i)) x(j) = x(j)*y(i)
  • END FORALL
Calculating based on a subscript
  • FORALL (i=0:n,j=0:n) a(i,j) = SQRT(1.0*(i*i+j*j))/n

HTML version of Basic Foils prepared August 7 98

Foil 10 Determinate Behavior of FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Consider the statement:
  • FORALL ( i = 1:n ) a(ix(i)) = a(i)
If ix has no repeated values (e.g. ix is a permutation), this is well-defined
  • Note that a(i) is always the ³old² value, not the new one computed elsewhere in the FORALL
If ix has repeated values (e.g. ix(i)=i/2), this is not defined by HPF
  • The compiler may take any action it feels appropriate
  • Assigning one of the possible values is appropriate
  • Reporting an error is appropriate
  • Assigning a random number is appropriate

HTML version of Basic Foils prepared August 7 98

Foil 11 Implementation of FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Use owner-computes rule to partition computation
  • See implementation of DISTRIBUTE
Semantics of the construct allow parallelism
  • All rows of dependence diagram can execute in parallel
Dependence analysis can further limit synchronization
  • If RHS and LHS do not access the same element, no synchronization is needed
Data need only be moved at synchronization points
  • Before each RHS and before each LHS
  • These may be combined with previous/following statement

HTML version of Basic Foils prepared August 7 98

Foil 12 Implementation of FORALL (cont.)

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Data movement becomes much more difficult on distributed memory machines if the FORALL calls a function
  • There is no race condition
  • But the data may not be on the processor where it is needed
  • And there is no synchronization point where it can be exchanged
For this reason, many implementations serialize some cases of parallel constructs
  • Usually there is a compiler override switch to parallelize at least some cases
  • Best advice: Donıt use global data in functions

HTML version of Basic Foils prepared August 7 98

Foil 13 D. PURE functions
PURE Functions

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Syntax:
  • PURE [type-spec] FUNCTION function-name( [ dummy-arg-name-list ] )
PURE functions have no side effects
Syntactic constraints:
  • Global variables and dummy arguments cannot be used in any context that may cause the variable to become defined
    • Left hand side of assignment
    • DO index, ASSIGN, ALLOCATE
    • Actual argument with INTENT(OUT)
    • Targets of pointer assignments (due to later use of pointers)
    • Full list of restrictions is too long to fit on this slide!
  • No external I/O or file operations
  • Only inherited distribution/alignment of dummies and locals
Intrinsic functions are PURE

HTML version of Basic Foils prepared August 7 98

Foil 14 PURE Functions in Pictures

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
COMMON X
REAL Y, Z
Z = FCN( Y )

HTML version of Basic Foils prepared August 7 98

Foil 15 PURE Functions and FORALL

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
PURE functions are the only ones that can be invoked from a FORALL
Safe to do this because they have no side effects
Useful to do this because there are things you cannot do (directly) in a FORALL
  • Conditionals and iteration
    • E.g., Do point-wise iteration this way
  • Local variables
    • E.g., Handle temporaries this way

HTML version of Basic Foils prepared August 7 98

Foil 16 Why Use PURE Functions?

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Elemental functions
  • Intrinsics
  • Equations of state, etc.
  • Note: A row can be an element!
Pointwise iteration
  • Mandelbrot sets
  • Pointwise Newton iterations

HTML version of Basic Foils prepared August 7 98

Foil 17 PURE for Mandelbrot Sets

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
! The caller (Explicit interface not shown)
FORALL ( i=1:n, j=1:m )
  • k(i,j) = mandelbrot( CMPLX((i-1)*1.0/(n-1), &
    • (j-1)*1.0/(m-1)), 1000 )
END FORALL
! The callee
PURE INTEGER FUNCTION mandelbrot(x, itol)
COMPLEX, INTENT(IN) :: x
INTEGER, INTENT(IN) :: itol
COMPLEX xtmp
INTEGER k
  • k = 0
  • xtmp = -x
  • DO WHILE (ABS(xtmp)<2.0 .AND. k<itol)
    • xtmp = xtmp*xtmp - x
    • k = k + 1
  • END DO
  • mandelbrot = k
END FUNCTION mandelbrot

HTML version of Basic Foils prepared August 7 98

Foil 18 Avoiding the PURE Function in Mandelbrot

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
k0 = 0
FORALL ( i=1:n, j=1:m )
  • x(i,j) = CMPLX((i-1)*1.0/(n-1),(j-1)*1.0/(m-1))
  • k(i,j) = 0
  • xtmp(i,j) = -x(i,j)
  • mask(i,j) = .TRUE.
END FORALL
DO WHILE (ANY(mask(1:n,1:m)) .AND. k0<1000)
  • FORALL ( i=1:n, j=1:m, mask(i,j))
    • xtmp(i,j) = xtmp(i,j)*xtmp(i,j)-x(i,j)
    • k(i,j) = k(i,j) + 1
    • mask(i,j) = ABS(xtmp(i,j))<2.0
  • END FORALL
  • k0 = k0 + 1
END DO

HTML version of Basic Foils prepared August 7 98

Foil 19 A. INDEPENDENT directives
The INDEPENDENT Directive

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Syntax:
  • !HPF$ INDEPENDENT [ , NEW( variable-list ) ]
Semantics:
  • INDEPENDENT is an assertion that no iteration affects any other iteration in any way
  • NEW variables are treated as if they were allocated anew for each iteration (DO only)
  • Applied to a DO: states that there are no loop carried dependences (except for NEW variables)
  • Applied to a FORALL: states that no index point assigns to any location that another uses
  • If the assertion is false, the program is not standard-conforming (i.e. results are not defined)
Note: INDEPENDENT is not a general parallel loop!

HTML version of Basic Foils prepared August 7 98

Foil 20 An Example of INDEPENDENT

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Initially,
  • a = [0, 2, 4, 6, 1, 3, 5, 7]
  • b = [6, 5, 4, 3, 2, 3, 4, 5]
  • c = [-1,-1,-1,-1,-1,-1,-1,-1]
!HPF$ INDEPENDENT
DO j = 1, 3
  • a(j) = a(b(j))
  • c(a(j)) = a(j)*b(a(j))
END DO
Afterwards,
  • a = [3, 1, 6, 6, 1, 3, 5, 7]
  • b = [6, 5, 4, 3, 2, 3, 4, 5]
  • c = [6,-1,12,-1,-1,18,-1,-1]

HTML version of Basic Foils prepared August 7 98

Foil 21 Another Example of FORALL
With INDEPENDENT

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Initially,
  • a = [0, 2, 4, 6, 1, 3, 5, 7]
  • b = [6, 5, 4, 3, 2, 3, 4, 5]
  • c = [-1,-1,-1,-1,-1,-1,-1,-1]
!HPF$ INDEPENDENT
FORALL ( j = 1:3 )
  • a(j) = a(b(j))
  • c(a(j)) = a(j)*b(a(j))
END FORALL
Afterwards,
  • a = [3, 1, 6, 6, 1, 3, 5, 7]
  • b = [6, 5, 4, 3, 2, 3, 4, 5]
  • c = [6,-1,12,-1,-1,18,-1,-1]

HTML version of Basic Foils prepared August 7 98

Foil 22 An Example of
Nested INDEPENDENT FORALLs

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  • a(i) = b(i)
  • FORALL ( j = 1:i )
    • c(i) = d(i)
  • END FORALL
END FORALL

HTML version of Basic Foils prepared August 7 98

Foil 23 The INDEPENDENT Directive:
More Details

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The Fundamental Rule:
  • If one iteration writes to an object, others cannot read or write it
Things that write to an object:
  • Assignment, ASSIGN, DO index,
    • To the object itself
    • To an aggregate that contains it
    • Through a pointer to it
  • Input/Output statements
    • Write the file pointer (except INQUIRE)
    • READ assigns to its input list
Things that read an object
  • Uses in expressions (as you expect)
  • Input/Output statements
    • Read the file pointer (always)

HTML version of Basic Foils prepared August 7 98

Foil 24 Examples of Correct
INDEPENDENT Assertions

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Always true
  • !HPF$ INDEPENDENT
  • FORALL (i=2:n-1) a(i)=b(i-1)+b(i)+b(i+1)
  • !HPF$ INDEPENDENT, NEW(j)
  • DO k = 2, m-1, 2
    • !HPF$ INDEPENDENT, NEW(vl,vr)
    • DO j = 2, n-1, 2
    • vr = x(j,k) - x(j-1,k)
    • vl = x(j+1,k) - x(j,k)
    • x(j,k) = x(j,k) + 0.5*(vr-vl)
    • END DO
  • END DO
Some compilers will catch these on their own; some won't

HTML version of Basic Foils prepared August 7 98

Foil 25 Examples of Incorrect
INDEPENDENT Assertions

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
INDEPENDENT does not handle reductions
  • !HPF$ INDEPENDENT
  • DO i = 1, n
    • x = x + a(i)*a(i)
  • END DO
INDEPENDENT does not know about higher-level correctness
  • DO WHILE (err > err_tol)
    • !HPF$ INDEPENDENT
    • DO i = 2, n-1
    • b(i) = a(i)
    • a(i) = 0.5 * (a(i-1) + a(i+1))
    • b(i) = ABS(b(i) - a(i))
    • END DO
    • err = MAXVAL(b(2:n-1))
  • END DO

HTML version of Basic Foils prepared August 7 98

Foil 26 Example of Data-Dependent
INDEPENDENT Assertion

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Sometimes true
  • !HPF$ INDEPENDENT, NEW(j, n1)
  • DO i = 1, nblue
    • n1 = iblue(i)
    • DO j = ibegin(n1), ibegin(n1+1)-1
    • x(n1) = x(n1) + y(j)*x(ired(j))
    • END DO
  • END DO

HTML version of Basic Foils prepared August 7 98

Foil 27 Why Use INDEPENDENT?

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Yet another way to do (some) array assignments
  • !HPF$ INDEPENDENT
  • DO i = 1, n
    • a(i) = b(i)
  • END DO
Express application-dependent information
  • ! ³colors² don't interfere with each other
  • !HPF$ INDEPENDENT, NEW(ix, i1,i2,f12)
  • DO i = 1, ncolor
    • DO ix = color_beg(i), color_end(i)
    • i1 = icolor(ix,1)
    • i2 = icolor(ix,2)
    • f12 = w(i1)-w(i2)
    • x(i1) = x(i1) + f12
    • x(i2) = x(i2) - f12
    • END DO
  • END DO

HTML version of Basic Foils prepared August 7 98

Foil 28 Implementation of INDEPENDENT

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Virtually identical to FORALL
Use owner-computes rule to partition computation
  • See implementation of DISTRIBUTE
Semantics of the construct allow parallelism
  • All iterations can execute in parallel
Data need only be moved at synchronization points
  • Before and after the loop
No further dependence analysis needed
Warnings about functions still apply
  • Possibly worse, due to more pathological cases

HTML version of Basic Foils prepared August 7 98

Foil 29 In Summary:
DO, FORALL and INDEPENDENT

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index

HTML version of Basic Foils prepared August 7 98

Foil 30 Hints for Using
Data Parallel Statements

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Use FORALL to extend array operations
  • More shapes
  • User-defined elemental functions
  • Better than array syntax?
Beware of hidden overheads
  • Intra-statement overheads
  • Inter-statement synchronizations
  • Complex pure functions
Assert INDEPENDENT only when it is true
  • This has implications for debuggers and programming environments!

HTML version of Basic Foils prepared August 7 98

Foil 31 C. Library, intrinsic, and EXTRINSIC functions
The HPF Library and New Intrinsics

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Extended intrinsics: MAXLOC, MINLOC
One elemental intrinsic: ILEN
System inquiry intrinsics: NUMBER_OF_PROCESSORS
New reduction functions: IAND
Combining-scatter functions: SUM_SCATTER
Prefix reduction functions: SUM_PREFIX
Sorting functions: GRADE_UP
Bit manipulation functions: POPCNT
Data distribution inquiry subroutines: HPF_DISTRIBUTION

HTML version of Basic Foils prepared August 7 98

Foil 32 Examples of HPF Library

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Many implement data-parallel operations that are not elemental

HTML version of Basic Foils prepared August 7 98

Foil 33 Why Use the HPF Library?

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
If you need the functions
  • Round out the set of reductions
    • All built-in associative, commutative functions
  • Partial reductions (prefix reductions) for all reductions
  • Sorting
Functions were chosen for the library because
  • They are useful
  • They are data-parallel
  • They are hard to implement efficiently by hand

HTML version of Basic Foils prepared August 7 98

Foil 34 Typical Uses of HPF Library

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Accumulations through indirection arrays
  • x = SUM_SCATTER( flux, x, nbr(1,1:n) )
  • x = SUM_SCATTER( -flux, x, nbr(2,1:n) )
  • ! Equivalent to the following
  • DO i = 1, n
    • x(nbr(1,i)) = x(nbr(1,i)) + flux(i)
    • x(nbr(2,i)) = x(nbr(2,i)) - flux(i)
  • END DO
Manipulating array-based sparse structures
  • inum(1:n) = MAX( iend(1:n)-ibeg(1:n)+1, 0 )
  • ibeg_new(1:n) = SUM_PREFIX(inum(1:n)) + 1
  • iend_new(1:n) = ibeg_new(1:n)+inum(1:n)-1
  • ! Moving the data left as exercise for reader

HTML version of Basic Foils prepared August 7 98

Foil 35 Implementation of the HPF 1 Library

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Functions were added to the library because
  • They are useful
  • They can be implemented in parallel on all (known) machines
  • The optimal implementation is complex or machine-dependent
Therefore, HPF puts the onus on the vendors to provide the best possible implementation
  • Can be implemented by subroutine library or in-line expansion
  • Should use the best known algorithms for reduction, prefix operations, etc.

HTML version of Basic Foils prepared August 7 98

Foil 36 EXTRINSIC Procedures

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Syntax:
  • EXTRINSIC ( extrinsic-kind-keyword )
  • Goes in FUNCTION or SUBROUTINE header (like PURE)
An escape mechanism for allowing HPF to call other languages and paradigms
On the caller (HPF) side:
  • There must be an explicit interface with the EXTRINSIC directive
  • Remapping occurs (if needed) to meet distribution specifications
  • System synchronizes all processors before the call
  • System calls ³local² routine on every processor
On the callee (non-HPF) side:
  • INTENT(IN) and INTENT(OUT) must be obeyed
  • If variables are replicated, callee must make them consistent before return
  • Processors can access their own section of distributed arrays

HTML version of Basic Foils prepared August 7 98

Foil 37 EXTRINSIC(HPF_LOCAL)

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
HPF_LOCAL is a SPMD language that meshes well with ³global² HPF
  • ­ It can do things HPF canıt
  • ­ Global HPF can do things it canıt
On the caller (global HPF) side
  • EXTRINSIC(HPF_LOCAL) REAL FUNCTION foo(x)
  • REAL x(:)
  • !HPF$ DISTRIBUTE x(BLOCK)
On the callee (local HPF) side
  • EXTRINSIC(HPF_LOCAL) REAL FUNCTION foo(x)
  • REAL x(:)
Inside foo, SIZE(x) checks the local section of x
HPF_LOCAL_LIBRARY provides GLOBAL_SIZE(x) to check the global size of x

HTML version of Basic Foils prepared August 7 98

Foil 38 Example of HPF_LOCAL
EXTRINSIC(F77_LOCAL)

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
HPF_LOCAL requires a Fortran 90 interface to the local subroutine with assumed-shape arrays
  • - Many libraries do not use assumed-shape arrays
  • - Thus was born the need for F77_LOCAL
On the caller (HPF) side:
  • EXTRINSIC(HPF_LOCAL) REAL FUNCTION foo(x)
  • REAL x(:)
  • !HPF$ DISTRIBUTE x(BLOCK)
On the callee side:
  • - Implicit interface (FORTRAN 77 doesn't have explicit interfaces)

HTML version of Basic Foils prepared August 7 98

Foil 39 Example of F77_LOCAL
(From NAS FT Benchmark)

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
!
! 3D FFT subroutine used by the PESSL implementation.
!
  • subroutine fft (n1, n2, n3, isign, scale, x, y)
  • use blacs
  • use types
  • implicit none
!
! Arguments
!
  • integer, intent(in) :: n1, n2, n3, isign
  • real(R8), intent(in) :: scale
!
  • complex(R8), dimension(:,:,:), intent(in) :: x
!hpf$ template gridx(n1,n2,n3)
!hpf$ distribute(*,*,block) :: gridx
!hpf$ align(:,:,:) with *gridx :: x
!
  • complex(R8), dimension(:,:,:), intent(out) :: y
!hpf$ template gridy(n3,n2,n1)
!hpf$ distribute(*,*,block) :: gridy
!hpf$ align(:,:,:) with *gridy :: y

HTML version of Basic Foils prepared August 7 98

Foil 40 Example of F77_LOCAL
(From NAS FT Benchmark) II

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
!
! Interfaces
!
  • interface
    • extrinsic (f77_local) subroutine &
  • & pdcft3 (x, y, n1, n2, n3, isign, scale, ictxt, ip)
    • use types
    • implicit none
    • integer, intent(in) :: n1, n2, n3, isign, ictxt
    • real(R8), intent(in) :: scale
    • integer, dimension(:), intent(in) :: ip
    • complex(R8), dimension(:,:,:), intent(in) :: x
!hpf$ template tempx(n1,n2,n3)
!hpf$ distribute(*,*,block) :: tempx
!hpf$ align(:,:,:) with *tempx :: x
    • complex(R8), dimension(:,:,:), intent(out) :: y
!hpf$ template tempy(n3,n2,n1)
!hpf$ distribute(*,*,block) :: tempy
!hpf$ align(:,:,:) with *tempy :: y
    • end subroutine pdcft3
  • end interface
!
  • ip = 0
  • call pdcft3 (x, y, n1, n2, n3, isign, scale, ictxt, ip)
!

HTML version of Basic Foils prepared August 7 98

Foil 41 Example of F77_LOCAL
(From NAS FT Benchmark) III

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
!
  • call blacs_get (0, 0, val)
  • ictxt = val(1)
!
! Calculate the USERMAP for the BLACS grid, and set up the grid. The
! processor ID assignment is done by the same algorithm used by PGHPF,
! and may be different for other HPF implementions.
!
  • nprow = size (usermap, 1)
  • npcol = size (usermap, 2)
  • do j = 1, npcol
    • do i = 1, nprow
    • usermap(i,j) = (j - 1) * nprow + (i - 1)
    • end do
  • end do
  • call blacs_gridmap (ictxt, usermap, nprow, nprow, npcol)
!
  • ip = 0
  • call pdcft3 (x, y, n1, n2, n3, isign, scale, ictxt, ip)
!
  • call blacs_gridexit (ictxt)
!

HTML version of Basic Foils prepared August 7 98

Foil 42 Why Use EXTRINSIC Procedures?

From DoD HPF Training -- 3. Parallel Constructs in HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Access to other programming languages and paradigms
  • HPF_SERIAL: Sequential execution
    • For example, interfacing with graphics libraries
  • HPF_LOCAL: Single Program Multiple Data (SPMD) paradigm
  • HPF_CRAFT: Cray native programming language (HPF 2.0)
Expressiveness
  • Some things are just hard to write in HPF
  • Some things are impossible to do in HPF
    • Nondeterminism
Efficiency
  • Because of interface requirements, this works best for superlinear operations (e.g. ScaLAPACK)

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 9 1998