Given by Geoffrey C. Fox(Tomasz Haupt) at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 16 Sept 1995
Outside Index
Summary of Material
what is HPF, what we need it for, where it came from |
why it is called "High Performance"? |
what are HPF compiler directives |
data mapping in HPF |
parallel statements and constructs in HPF |
subset HPF |
Fortran 90D |
Outside Index Summary of Material
Tom Haupt |
NPAC |
111 College Place |
Syracuse University |
Syracuse NY 13244 |
what is HPF, what we need it for, where it came from |
why it is called "High Performance"? |
what are HPF compiler directives |
data mapping in HPF |
parallel statements and constructs in HPF |
subset HPF |
Fortran 90D |
To express data parallelism |
The new HPF language features fall into four categories wrt Fortran 90
|
Data parallel programming
|
Top performance on MIMD and SIMD computers with non-uniform memory access costs
|
Code tuning for various architectures |
Parallelism in HPF is expressed explicitly
|
Compiler may choose not to exploit information about parallelism |
Compiler may detect parallelism in sequential code |
There is tradeoff between parallelism and communication |
Programmer defines the data mapping |
Underlaying assumptions are that: |
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor, |
and that it may be possible to carry out many such operations concurrently if they can be performed on different processors |
The directives are structured comments that suggest implementation strategies or assert facts about a program to the compiler |
They may affect the efficiency of the computation performed, but do not change the value computed by the program |
As in Fortran 90 statements, there are both:
|
HPF directives are consistent with Fortran 90 syntax except for the directive prefix:
|
Two forms of the directives are allowed
|
A template is an abstract space of indexed positions (an "array of nothings") |
In CMFortran terminology, Template is set of Virtual Processors -- one per data point |
A template is declared by the TEMPLATE directive that specifies:
|
Examples:
|
Abstract processors always form a rectilinear grid in 1 or more dimensions |
They are abstract coarse grain collections of data-points |
The processor arrangement is defined by the PROCESSORS directive that specifies:
|
Examples:
|
!HPF$ PROCESSORS P(4) |
!HPF$ TEMPLATE X(40) |
!HPF$ ALIGN WITH X :: A, B, C |
!HPF$ DISTRIBUTE X(BLOCK)
|
Syntax of Align:
|
Note : denotes all values of array index |
Examples of array indices:
|
Use of : examples:
|
Ranks of the alignee and the align-target may be different |
Examples:
|
... or other way round
|
while ... |
!HPF$ ALIGN A(:) WITH TEMPL(:,i) |
HPF allows for more general alignments such as:
|
!HPF$ TEMPLATE T(12,12) |
!HPF$ ALIGN A(:,J) WITH T(:,J+1) |
!HPF$ ALIGN B(I,J) WITH T(I+4,J+4) |
But nobody is clear if they are useful! |
Each align-dummy variable is considered to range over all valid index values for the corresponding dimension of the alignee. An align-subscript is evaluated for any specific combination of values for the align-dummy variables simply by evaluating each align-subscript as a expression. Their resulting subscript values must be legitimate subscripts for the align-target |
These examples have non-unit stride as perhaps in "red-black" Iterative Solver algorithms: |
Syntax: |
!HPF$ DISTRIBUTE distributee (dist-format) |
[ONTO dist-target] |
Allowed forms of dist-format:
|
Examples:
|
!HPF$ PROCESSORS P(4)
|
!HPF$ TEMPLATE T(16) |
!HPF$ ALIGN A(:) WITH T(:) |
*HPF PROCESSORS SQUARE(2,2) |
*HPF TEMPLATE T(4,4) |
*HPF ALIGN A(:,:) WITH T(:,:) |
*HPF DISTRIBUTE T(BLOCK,CYCLIC)ONTO SQUARE |
CHPF$ PROCESSORS Q(4) |
CHPF$ TEMPLATE FRED(16,16) |
CHPF$ ALIGN A(:,:) WITH FRED(:,:) |
CHPF$ ALIGN B(I,J) WITH FRED(I+2,J+2) |
CHPF$ DISTRIBUTE FRED(BLOCK,*) |
This example illustrates remapping from one to two dimensional decomposition
|
!HPF$ PROCESSORS P(64) |
!HPF$ PROCESSORS Q(8,8) |
!HPF$ DYNAMIC :: A,B |
!HPF$ ALIGN B(:) WITH A(:,*) |
!HPF$ DISTRIBUTE A(*,BLOCK)ONTO P
|
!HPF$ REALIGN B(:) WITH A(*,:)
|
!HPF$ REDISTRIBUTE A(CYCLIC,CYCLIC) ONTO Q
|
!HPF$ PROCESSORS Q(64) |
!HPF$ ALIGN B(I) WITH A(I+N) |
!HPF$ DISTRIBUTE A(BLOCK(M)) |
!HPF$ DISTRIBUTE(BLOCK), DYNAMIC :: P
|
!HPF$ REDISTRIBUTE P(CYCLIC)
|
Scope of a mapping directives is a single (sub)program unit |
A template is not a first-class Fortran 90 object: it cannot be passed as a subprogram argument |
There are 3 typical cases:
|
(not a comprehensive discussion; just an example) |
PROCESSORS |
TEMPLATE |
DYNAMICS |
INHERIT |
ALIGN |
DISTRIBUTE |
REALIGN |
REDISTRIBUTE |
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor
|
it may be possible to carry out many such operations concurrently if they can be performed on different processors
|
Parallel Statements
|
Parallel Constructs
|
Intrinsic functions and the HPF library |
Extrinsic functions |
This is as in CMFortran and Maspar MPFortran with example: |
This is as in CMFortran and Maspar MPFortran with example:
|
Semantics of WHERE statement:
|
A very important extension to Fortran 90 and defines one class of parallel DO loop |
It relaxes the restriction that operands of the rhs expressions must be conformable with the lhs array |
It may be masked with a scalar logical expression (extension to WHERE) |
A FORALL statement may call user-defined (PURE) functions on the elements of an array, simulating Fortran 90 elemental function invocation (albeit with a different syntax) |
FORALL (i=1:100,k=1:100) a(i,k) = b(i,k) A = B |
FORALL (i=2:100:2) a(i) = a(i-1) A(2:100:2) = A(1:99:2) |
FORALL (i=1:100) a(i) = i A = [1..100] |
FORALL (i=1:100, j=1:100) a(i, j) = i+j |
FORALL (i=1,100) a(i,i) = b(i) |
FORALL (i=1,100,j=1:100) a(i,j) = b(j,i) |
FORALL (i=1,100) a(i, 1:100) = b(1:100, i) |
FORALL (i=1:100, j=1:100, y(i,j).NE.0) x(i,j) = REAL(i+j)/y(i,j) |
FORALL (i=1,100) a(i,ix(i)) = x(i) |
FORALL (i=1,9) x(i) = SUM(x(1:10:i)) |
FORALL (i= 1,100) a(i) = myfunction(a(i+1)) |
Similar to Fortran 90 array assignments and WHERE |
Consider example: |
This is an exception from the global name space with replicated variables |
There is a fundamental difference in semantics between IF...ELSE and WHERE...ELSEWHERE constructs |
elemental
|
transformational and inquiry functions
|
new array reduction functions
|
array combining scatter functions
|
array prefix and suffix functions
|
array sorting functions
|
bit manipulation functions
|
mapping inquiry subroutines
|
An extrinsic function is a function written in a language other than HPF including
|
HPF defines interface and invocation sequence |
Allows one to get efficient parallel code where HPF language or compiler inadequate |
Array assignments, WHERE and FORALL |
Block of array assignments: FORALL, INDEPENDENT FORALL, INDEPENDENT DO, WHERE...ELSEWHERE |
Intrinsic and the HPF library functions |
Extrinsic functions |
Try to make it easier to build initial compilers! |
Fortran 90 Features in Subset HPF
|
HPF features not in Subset HPF
|
mapping directives
|
array assignments
|
intrinsics
|