Given by Geoffrey C. Fox at Delivered Lectures of CPS615 Basic Simulation Track for Computational Science on 26 September 96. Foils prepared 29 December 1996
Outside Index
Summary of Material
Secs 48.9
This quickly completes the discussion of problem architecture but rather than continuing qualitative discussion of HPF applications in notes |
Jumped to a discussion of HPF language describing |
Basic Approach to Parallelism with "owner-computes" rule |
Types of new constructs with |
TEMPLATE ALIGN and PROCESSORS described |
The lecture started with a description of the Web based Programming Laboratory developed by Kivanc Dincer |
Outside Index Summary of Material
Geoffrey Fox |
NPAC |
Room 3-131 CST |
111 College Place |
Syracuse NY 13244-4100 |
This quickly completes the discussion of problem architecture but rather than continuing qualitative discussion of HPF applications in notes |
Jumped to a discussion of HPF language describing |
Basic Approach to Parallelism with "owner-computes" rule |
Types of new constructs with |
TEMPLATE ALIGN and PROCESSORS described |
The lecture started with a description of the Web based Programming Laboratory developed by Kivanc Dincer |
Architecture of "virtual problem" determines nature of language |
Key idea behind data-parallel languages - and perhaps all good languages |
The language expresses problem and not the machine architecture |
Need different approach to express functional parallelism |
Use"object-oriented" feature when problems has natural objects |
Do not use "object-oriented" features when objects are artifacts of target machine |
Need data and functional (object) parallel paradigms in many problems - especially multidisciplinary |
See NPAC's High Performance Fortran Applications Resource |
Parallelism in HPF is expressed explicitly
|
Compiler may choose not to exploit information about parallelism |
Compiler may detect parallelism in sequential code |
A=B or more interestingly |
WHERE( B > 0. ) A = B |
ELSEWHERE A=0. |
END WHERE |
can be written |
DO I = n1,n2 |
DO J = m1,m2 |
IF(B(I,J) >0.) THEN A(I,J) = B(I,J) |
ELSE A(i,J) = 0. |
END IF |
END DO |
END DO |
Now a good HPF compiler will recognize the DO loops can be parallelized and give the same answer for Fortran90 and Fortran77 forms but often the detection of parallelism is not clear |
Note FORALL is guaranteed to be parallelizeable as by definition no side effects. |
All of Fortran90 |
New instructions FORALL and INDEPENDENT enhancing DO loops |
Data Alignment and Distribution Assertions |
Miscellaneous Support Operations but |
NO parallel Input/Output |
Little Support for Irregular Computations |
Little Support for any form of non mainstream data-parallelism |
Extrinsics as supporting links with explicit message-passing |
There is tradeoff between parallelism and communication |
Programmer defines the data mapping and compiler uses this to assign processing |
Underlying assumptions are that: |
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor, |
And that it may be possible to carry out many such operations concurrently if they can be performed on different processors |
This is embodied in "owner computes" rule -- namely that in for instance
|
Owner computes algorithm is usually good and often best |
The directives are structured comments that suggest implementation strategies or assert facts about a program to the compiler |
They may affect the efficiency of the computation performed, but do not change the value computed by the program |
As in Fortran 90 statements, there are both:
|
It must generate Fortran77(90) + Message Passing code or possibly in one pass map HPF code onto parallel machine code |
Traditional dataflow and dependency analysis is especially critical in Fortran77 parts of code |
It must use data mapping assertions to decide what is stored where and so organize computation |
Code must be transformed to respect this owner-computes model |
It must typically use "Loosely Synchronous" model with communicate-compute phases and then compiler generates all the communication needed
|
We need an excellent run-time library which the compiler invokes with parallel Intrinsics etc. |
HPF directives are consistent with Fortran 90 syntax except for the special prefix for directive:
|
Two forms of the directives are allowed
|
Data Mapping in HPF is all you need to do to get parallelism as long as you use the explicit array type syntax such as A=B+C |
The Owner Computes rule implies that specifying location of variables specifies (optimally or not) parallel execution! |
The new HPF-2 ON HOME directive is exception to this rule as specifies where a particular statement is to be executed |
(RE)DISTRIBUTE tells you where data is to be placed |
(RE)ALIGN tells you how different data structures are to be placed relative to each other |
A template is an abstract space of indexed positions (an "array of nothings") |
In CMFortran terminology, Template is set of Virtual Processors -- one per data point
|
A template is declared by the TEMPLATE directive that specifies:
|
Examples:
|
Abstract processors always form a rectilinear grid in 1 or more dimensions |
They are abstract coarse grain collections of data-points
|
The processor arrangement is defined by the PROCESSORS directive that specifies:
|
Examples:
|
!HPF$ PROCESSORS P(4) |
!HPF$ TEMPLATE X(40) |
!HPF$ ALIGN WITH X :: A, B, C |
!HPF$ DISTRIBUTE X(BLOCK)
|
Syntax of Align: |
!HPF ALIGN alignee WITH align-target
|
Alternatively |
*HPF ALIGN (align-source-list) WITH align-target :: alignee |
Note a colon(:) in directive denotes all values of array index |
Examples of array indices:
|
Use of : examples:
|