HPFtutorial INTRODUCTION TO HIGH PERFORMANCE FORTRAN -- June 1995 Tom Haupt NPAC 111 College Place Syracuse University Syracuse NY 13244 Outline of HPF Presentation what is HPF, what we need it for, where it came from why it is called "High Performance"? what are HPF compiler directives data mapping in HPF parallel statements and constructs in HPF subset HPF Fortran 90D HPF is an extension of Fortran 90 To express data parallelism The new HPF language features fall into four categories wrt Fortran 90 compiler directives new language features library routines language restrictions A bit of HPF history ... Goals and Scope of HPF Data parallel programming single threaded global name space loosely synchronous parallel computation Top performance on MIMD and SIMD computers with non-uniform memory access costs preserve efficiency for different machines with comparable number of processors Code tuning for various architectures Parallelism in HPF Parallelism in HPF is expressed explicitly Fortran 90 array expressions and assignments (including WHERE) Array intrinsics FORALL statement and construct INDEPENDENT assertion on DO loops Compiler may choose not to exploit information about parallelism Compiler may detect parallelism in sequential code What gives high performance in HPF There is tradeoff between parallelism and communication Programmer defines the data mapping Underlaying assumptions are that: An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor, and that it may be possible to carry out many such operations concurrently if they can be performed on different processors Compiler directives used in HPF The directives are structured comments that suggest implementation strategies or assert facts about a program to the compiler They may affect the efficiency of the computation performed, but do not change the value computed by the program As in Fortran 90 statements, there are both: declarative directives executable directives Syntax of HPF Directives HPF directives are consistent with Fortran 90 syntax except for the directive prefix: !HPF$ CHPF$ *HPF$ Two forms of the directives are allowed Specification statements, such as !HPF DISTRIBUTE MYTEMPLATE(BLOCK) ONTO P Equivalent Attributed form, such as, !HPF DISTRIBUTE (BLOCK) ONTO P :: MYTEMPLATE Staged Data Mapping in HPF Template in HPF A template is an abstract space of indexed positions (an "array of nothings") In CMFortran terminology, Template is set of Virtual Processors -- one per data point A template is declared by the TEMPLATE directive that specifies: name of the template the rank (i.e., number of dimensions) the extent in each dimension Examples: CHPF$ TEMPLATE T(1000) !HPF$ TEMPLATE FRED(N, 2*N) *HPF$ TEMPLATE, DIMENSION(5,100,50) :: MINE, YOURS Abstract Processors in HPF Abstract processors always form a rectilinear grid in 1 or more dimensions They are abstract coarse grain collections of data-points The processor arrangement is defined by the PROCESSORS directive that specifies: name of the processor arrangement the rank (i.e., number of dimensions) the extend in each dimension Examples: !HPF$ PROCESSORS P(N) *HPF$ PROCESSORS BIZARRO(1972:1997,-20:17) CHPF$ SCALARPROC Example of Template and Processors REAL, ARRAY(40) :: A, B, C !HPF$ PROCESSORS P(4) !HPF$ TEMPLATE X(40) !HPF$ ALIGN WITH X :: A, B, C !HPF$ DISTRIBUTE X(BLOCK) ... C = A + B ... Align Directive in HPF Syntax of Align: !HPF ALIGN alignee WITH align-target where -- note [..] implies optional component alignee: alignee [(align-source-list)] align target: align-target[(align-subscript-list)] Alternatively *HPF ALIGN (align-source-list) WITH align-target :: alignee Examples of Align Directive Note : denotes all values of array index Examples of array indices: CHPF$ ALIGN A(i) WITH B(i) *HPF$ ALIGN (i,j) WITH TEMPL(i,j) :: A, B Use of : examples: !HPF$ ALIGN A(:) WITH B(:) CHPF$ align (:,:) WITH TEMPL(:,:) :: A, B Changing Rank in Align Directive Ranks of the alignee and the align-target may be different Examples: !HPF$ ALIGN A(:,j) WITH B(:) CHPF$ ALIGN A(:,*) WITH B(:) Replication in Align Directive ... or other way round !HPF$ ALIGN A(:) WITH TEMPL(:,*) while ... !HPF$ ALIGN A(:) WITH TEMPL(:,i) General Alignments in HPF HPF allows for more general alignments such as: REAL, ARRAY(5,8) :: A,B !HPF$ TEMPLATE T(12,12) !HPF$ ALIGN A(:,J) WITH T(:,J+1) !HPF$ ALIGN B(I,J) WITH T(I+4,J+4) But nobody is clear if they are useful! Formal Definition of Align Directive Each align-dummy variable is considered to range over all valid index values for the corresponding dimension of the alignee. An align-subscript is evaluated for any specific combination of values for the align-dummy variables simply by evaluating each align-subscript as a expression. Their resulting subscript values must be legitimate subscripts for the align-target More obscure Complicated Examples of Align Directive These examples have non-unit stride as perhaps in "red-black" Iterative Solver algorithms: Distribution Directive in HPF Syntax: !HPF$ DISTRIBUTE distributee (dist-format) [ONTO dist-target] Allowed forms of dist-format: * BLOCK CYCLIC BLOCK(int-expr) CYCLIC(int-expr) Examples: CHPF$ DISTRIBUTE TEMP(BLOCK,CYCLIC) !HPF$ DISTRIBUTE FRED(BLOCK(10)) ONTO P *HPF$ DISTRIBUTE (BLOCK,*) :: MYTEMPLATE Basic Examples of Distribute Directive !HPF$ PROCESSORS P(4) REAL, ARRAY(16) :: A !HPF$ TEMPLATE T(16) !HPF$ ALIGN A(:) WITH T(:) Two Dimensional Example of Distribute Directive ARRAY, REAL(4,4) :: A *HPF PROCESSORS SQUARE(2,2) *HPF TEMPLATE T(4,4) *HPF ALIGN A(:,:) WITH T(:,:) *HPF DISTRIBUTE T(BLOCK,CYCLIC)ONTO SQUARE Example of Distribute Directive with Complex Alignment REAL, ARRAY(12,16) :: A REAL, ARRAY(8,10) :: B CHPF$ PROCESSORS Q(4) CHPF$ TEMPLATE FRED(16,16) CHPF$ ALIGN A(:,:) WITH FRED(:,:) CHPF$ ALIGN B(I,J) WITH FRED(I+2,J+2) CHPF$ DISTRIBUTE FRED(BLOCK,*) Advanced Mapping Directives -- ReDistribution and ReAlign This example illustrates remapping from one to two dimensional decomposition REAL, ARRAY(64,64) :: A REAL, ARRAY(64) :: B !HPF$ PROCESSORS P(64) !HPF$ PROCESSORS Q(8,8) !HPF$ DYNAMIC :: A,B !HPF$ ALIGN B(:) WITH A(:,*) !HPF$ DISTRIBUTE A(*,BLOCK)ONTO P ... !HPF$ REALIGN B(:) WITH A(*,:) ... !HPF$ REDISTRIBUTE A(CYCLIC,CYCLIC) ONTO Q ... Advanced Mapping Directives -- Allocatable arrays and pointers SUBROUTINE SUB(N,M) REAL, ALLOCATABLE, DIMENSION(:) :: A,B REAL, POINTER :: P(:) !HPF$ PROCESSORS Q(64) !HPF$ ALIGN B(I) WITH A(I+N) !HPF$ DISTRIBUTE A(BLOCK(M)) !HPF$ DISTRIBUTE(BLOCK), DYNAMIC :: P ... ALLOCATE(A(128)) ALLOCATE(B(64)) ALLOCATE(P(1024)) ... !HPF$ REDISTRIBUTE P(CYCLIC) ... RETURN END Subprograms in HPF Scope of a mapping directives is a single (sub)program unit A template is not a first-class Fortran 90 object: it cannot be passed as a subprogram argument Passing Distributed Arrays as Subprogram Arguments in HPF There are 3 typical cases: we want to generate a known mapping modified by the callee assertions + inheritance we are going to accept any mapping of the actual argument inheritance we want to force a particular mapping explicit interface Inherit Distribution Directive in HPF (not a comprehensive discussion; just an example) Summary of Mapping Directives in HPF PROCESSORS TEMPLATE DYNAMICS INHERIT ALIGN DISTRIBUTE REALIGN REDISTRIBUTE Fundamental Parallelism Assumption in HPF An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor i.e. minimize communication it may be possible to carry out many such operations concurrently if they can be performed on different processors data parallelism Parallel statements and Constructs in HPF Parallel Statements Fortran 90 array assignments masked array assignments (WHERE) FORALL statement Parallel Constructs FORALL construct INDEPENDENT DO WHERE and WHERE...ELSEWHERE construct Intrinsic functions and the HPF library Extrinsic functions Parallelism in Fortran 90 array assignments This is as in CMFortran and Maspar MPFortran with example: WHERE (masked array assignment) in HPF This is as in CMFortran and Maspar MPFortran with example: WHERE (A .GT. 0) A = A - 100 Semantics of WHERE statement: 1. evaluate mask (in parallel) and store as a temporary T1 2. for each i that T1(i)=.TRUE. compute T2(i)=A(i) - 100 3. for each i that T1(i)=.TRUE. assign A(i)=T2(1) FORALL Statement in HPF A very important extension to Fortran 90 and defines one class of parallel DO loop It relaxes the restriction that operands of the rhs expressions must be conformable with the lhs array It may be masked with a scalar logical expression (extension to WHERE) A FORALL statement may call user-defined (PURE) functions on the elements of an array, simulating Fortran 90 elemental function invocation (albeit with a different syntax) Examples of FORALL statements in HPF FORALL (i=1:100,k=1:100) a(i,k) = b(i,k) A = B FORALL (i=2:100:2) a(i) = a(i-1) A(2:100:2) = A(1:99:2) FORALL (i=1:100) a(i) = i A = [1..100] FORALL (i=1:100, j=1:100) a(i, j) = i+j FORALL (i=1,100) a(i,i) = b(i) FORALL (i=1,100,j=1:100) a(i,j) = b(j,i) FORALL (i=1,100) a(i, 1:100) = b(1:100, i) FORALL (i=1:100, j=1:100, y(i,j).NE.0) x(i,j) = REAL(i+j)/y(i,j) FORALL (i=1,100) a(i,ix(i)) = x(i) FORALL (i=1,9) x(i) = SUM(x(1:10:i)) FORALL (i= 1,100) a(i) = myfunction(a(i+1)) Semantics of the FORALL statement in HPF Similar to Fortran 90 array assignments and WHERE Consider example: HPF FORALL construct Pictorially !HPF$ INDEPENDENT FORALL Pictorially !HPF$ INDEPENDENT DO Pictorially !HPF$ INDEPENDENT, NEW Variable This is an exception from the global name space with replicated variables WHERE...ELSEWHERE / IF...ELSE constructs in HPF There is a fundamental difference in semantics between IF...ELSE and WHERE...ELSEWHERE constructs Intrinsic functions in HPF elemental examples: A = SIN(X) FORALL (i=1:100:2) A(i) = EXP(A(i)) transformational and inquiry functions Fortran 90 SUM, PRODUCT, ANY, DOTPROD, EOSHIFT, MAXVAL, ... HPF system inquiry functions: NUMBER_OF_PROCESSORS, PROCESSORS_SHAPE extensions of MAXLOC and MINLOC, ILEN HPF library HPF library functions new array reduction functions IALL, IANY, IPARITY and PARITY (IAND, IOR, IEOR, and .NEQV.) array combining scatter functions XXX_SCATTER array prefix and suffix functions XXX_PREFIX, XXX_SUFFIX array sorting functions GRADE_DOWN, GRADE_UP bit manipulation functions LEADZ, POPCNT, POPPAR mapping inquiry subroutines HPF_ALIGNMENT, HPF_TEMPLATE, HPF_DISTRIBUTION HPF Intrinsic EXAMPLE: SUM Extrinsics in HPF An extrinsic function is a function written in a language other than HPF including a single-thead-of-control language a multi-thread-of-control language any programming language targeted to a single processor SPMD) with message passing such as MPI Fortran 77, Fortran 90, C, C++, Ada, HPF HPF defines interface and invocation sequence Allows one to get efficient parallel code where HPF language or compiler inadequate Summary: how to express parallelism in HPF Array assignments, WHERE and FORALL Block of array assignments: FORALL, INDEPENDENT FORALL, INDEPENDENT DO, WHERE...ELSEWHERE Intrinsic and the HPF library functions Extrinsic functions Definition of Official High Performance Fortran Subset Try to make it easier to build initial compilers! Fortran 90 Features in Subset HPF All FORTRAN 77 except for storage and sequence association Arithmetic and logical array features array sections, array constructors, arithmetic and logical operations on whole arrays and array sections, array assignment, masked array assignments, array-valued external functions, automatic arrays, ALLOCATABLE arrays, POINTERS, TARGET, assumed shape arrays intrinsic functions INTERFACE blocks HPF features not in Subset HPF DYNAMIC, REALIGN, REDISTRIBUTE and INHERIT PURE and EXTRINSIC function attribute FORALL construct HPF library The Original Syracuse Project --FORTRAN 90D mapping directives PROCESSORS TEMPLATE (mandatory) ALIGN DISTRIBUTE(block) and DISTRIBUTE(block,block) array assignments Fortran 90 array assignments WHERE statement FORALL statement intrinsics Fortran 90 elemental intrinsics Fortran 90 reductions (SUM, PRODUCT, ANY, ALL, COUNT, MAXVAL, MAXLOC, MINVAL, MINLOC) EOSHIFT, CSHIFT, DOTPROD, TRANSPOSE