Full HTML for

Scripted foilset HPCC Software Technologies Fall 96 -- Overview and HPF

Given by Geoffrey C. Fox at Delivered Lectures of CPS615 Basic Simulation Track for Computational Science on 26 September 96. Foils prepared 26 September 1996
Outside Index Summary of Material


We go through the 2D Laplace's Equation with both HPF and MPI for Simple Jacobi Iteration
HPF and Fortran90 are reviewed followed by MPI
We also discuss the structure of problems as these determine why and when certain software approaches are appropriate

Table of Contents for full HTML of HPCC Software Technologies Fall 96 -- Overview and HPF

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

2 Abstract of CPS615 HPCC Software Technologies
3 Parallel Computing
Algorithms and
Software --
Laplace Example

4 The Solution of Laplace's Equation
5 Discretized Form of Laplace'e Equation on a Parallel Processor
6 Basic Structure of Domain to be Updated in Parallel Version
7 Sequential and Introduction to Parallel Coding for the
Laplace Example

8 SEQUENTIAL LAPLACE PROGRAMMING
JACOBI ITERATION IN ONE DIMENSION
(constant in y direction)

9 SEQUENTIAL LAPLACE PROGRAMMING
JACOBI ITERATION IN TWO DIMENSIONS

10 Approaches to Parallel Programming
11 SPMD or SCMD
Single Program (code) Multiple Data

12 Data Parallel
Programming for
Laplace Example

13 Parallel Laplace Programming
Data Parallel for Jacobi Iteration in One Dimension

14 Notes on HPF Implementation of Lapace Solver
15 HPF is an extension of Fortran 90
16 Why is Fortran90 Easier than Fortran77
17 Important Features of Fortran90
18 Introduction to Fortran90 Arrays - I
19 Introduction to Fortran90 Arrays - II
20 Fortran90 Arrays and Memory Allocation
21 More on Fortran90 Arrays and Subroutines
22 Typical Use of Array and Intrinsic Operations
23 Derived Type in Fortran90
24 Examples of POINTER's in Fortran90
25 MODULEs in Fortran90
26 MODULEs INTERFACES and Overloaded Operators in Fortran90
27 Outline of HPF Discussion
28 Information on HPF and HPF Forum (HPFF)
29 Possible Programming Models
30 Data Parallel Programming Model
31 Problem Architectures
32 5 Categories of Problems
33 Example of Basic Problem Architectures Regular Synchronous Problem Class
34 Example of Basic Problem Architectures Irregular Loosely Synchronous
35 Example of Basic Problem Architectures The difficult Asynchronous Class
36 Embarassingly Parallel Problem Class
37 Example of Basic Problem Architectures for MetaProblem Class
38 Software Bus Structure of USMADE
39 Computing as a Mapping Problem
40 Complex Systems to give a Theory of Computing
41 Parallel Computing is "just" an optimization problem, even if we can't agree on what to optimize
42 Complex System Representation of Levels in Computer Software
43 The map of Problem ---> Computer is performed in two or more statges
44 The Mapping of Space of Problem Architectures onto Space of Machine Architectures
45 What determines when Parallelism is Clear ?
46 Evaluation of High Performance Fortran What applications need what features of HPF and its extensions ?
47 What Issues should High Performance Fortran (HPF) Address!
48 Goal of High Performance Fortran
49 Any Complete Programming Environment Must Handle
50 HIGH PERFORMANCE FORTRAN COMPILERS
51 What type of compiler is HPF ?
52 The High Performance Fortran Library
53 HPF Intrinsic Library
54 High Performance Fortran Library -- I
55 High Performance Fortran Library -- II
56 Imprecise Mapping of Problem Classes into Runtime and Language Terms
57 General Applicability of HPF, HPF++, HPC++
58 Importance of HPF, HPC++ to Users
59 What about other languages ?
60 What applications does HPF support? If not - what extensions are needed?
61 5 Categories of Problems
62 Comparison of 3 different Programming Models
63 Parallelism in HPF
64 Fortran77 is part of Fortran90
65 HPF Features
66 What gives high performance in HPF
67 Compiler directives used in HPF
68 What does an HPF Compiler do?
69 Syntax of HPF Directives
70 Data Mapping in HPF
71 Staged Data Mapping in HPF
72 Template in HPF
73 Abstract Processors in HPF
74 Example of Template and Processors
75 Align Directive in HPF
76 Examples of Align Directive
77 Changing Rank in Align Directive
78 Replication in Align Directive
79 General Alignments in HPF
80 Formal Definition of Align Directive
81 More obscure Complicated Examples of Align Directive
82 Distribution Directive in HPF
83 Basic Examples of Distribute Directive
84 Two Dimensional Example of Distribute Directive
85 The Two Basic Distributions in HPF
86 The Example of Matrix Inversion
87 Example of Graphics Rendering
88 Example of Distribute Directive with Complex Alignment
89 Dynamic Data Mapping
90 Advanced Mapping Directives -- ReDistribution and ReAlign
91 Advanced Mapping Directives -- Allocatable arrays and pointers
92 Subprograms in HPF
93 Passing Distributed Arrays as Subprogram Arguments in HPF
94 Mapping Options for Dummy (Subroutine) Arguments
95 Inherit Distribution Directive in HPF
96 Summary of Mapping Directives in HPF
97 Fundamental Parallelism Assumption in HPF
98 Parallel statements and Constructs in HPF
99 Parallelism in Fortran 90 array assignments
100 WHERE (masked array assignment) in HPF
101 WHERE...ELSEWHERE / IF...ELSE constructs in HPF
102 Intrinsic functions in HPF
103 HPF library functions
104 SUM, SUM_PREFIX and SUM_SCATTER defined
105 HPF Intrinsic EXAMPLE: SUM
106 FORALL Statement in HPF
107 Examples of FORALL statements in HPF
108 Semantics of the FORALL statement in HPF
109 Vector Indices in FORALL's
110 Multiple Statement FORALL's
111 HPF FORALL construct Pictorially
112 PURE Functions in HPF
113 Example of PURE Function from Chuck Koelbel
114 The INDEPENDENT Assertion in HPF
115 !HPF$ INDEPENDENT FORALL Pictorially
116 !HPF$ INDEPENDENT DO Pictorially
117 !HPF$ INDEPENDENT, NEW Variable
118 Extrinsics in HPF
119 High Performance Fortran HPF2 Changes
120 ON HOME for Computation Placement
121 Reductions in INDEPENDENT DO Loops
122 Spawning Tasks in HPF
123 New Data Mapping Features in HPF 2.0 - I
124 New Data Mapping Features in HPF 2.0 - II

Outside Index Summary of Material



HTML version of Scripted Foils prepared 26 September 1996

Foil 1 CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
HPCC Software Technologies
HPF and MPI

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Scripted Foils prepared 26 September 1996

Foil 2 Abstract of CPS615 HPCC Software Technologies

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
We go through the 2D Laplace's Equation with both HPF and MPI for Simple Jacobi Iteration
HPF and Fortran90 are reviewed followed by MPI
We also discuss the structure of problems as these determine why and when certain software approaches are appropriate

HTML version of Scripted Foils prepared 26 September 1996

Foil 3 Parallel Computing
Algorithms and
Software --
Laplace Example

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 4 The Solution of Laplace's Equation

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Solve Laplace's Equation:
on a rectangular domain, with specified boundary conditions.
Use simple iterative Gauss-Seidel algorithm:
f new = ( f left + f right + f up + f down ) / 4

HTML version of Scripted Foils prepared 26 September 1996

Foil 5 Discretized Form of Laplace'e Equation on a Parallel Processor

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
256 Grid Points
16-Node Concurrent Processor
16 Grid Points in each Processor
f is unknown
f is known
  • Update Stencil

HTML version of Scripted Foils prepared 26 September 1996

Foil 6 Basic Structure of Domain to be Updated in Parallel Version

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
X denotes f values to be communicated

HTML version of Scripted Foils prepared 26 September 1996

Foil 7 Sequential and Introduction to Parallel Coding for the
Laplace Example

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 8 SEQUENTIAL LAPLACE PROGRAMMING
JACOBI ITERATION IN ONE DIMENSION
(constant in y direction)

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
BEGIN TEST=0
    • DO 1 I=2, NTOT1
    • TEMP = 0.5 * (PHIOLD(I-1) + PHIOLD(I+1))
    • TEST = AMAX1 (TEST, ABS (TEMP-PHIOLD(I)))
1 PHINEW (I) = TEMP
    • DO 2 I=2, NTOT1
2 PHIOLD (I) = PHINEW(I)
    • IF (TEST > CONVG) GO TO BEGIN
    • ELSE STOP

HTML version of Scripted Foils prepared 26 September 1996

Foil 9 SEQUENTIAL LAPLACE PROGRAMMING
JACOBI ITERATION IN TWO DIMENSIONS

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
BEGIN TEST = 0
    • DO 1 I=2, NTOT1
    • DO 1 J=2, NTOT1
    • TEMP=0.25 * (PHIOLD(I, J + 1) + PHIOLD (I, J-1) + PHIOLD (I+1,J) +
    • PHIOLD (I-1,J))
    • TEST = AMAX1(TEST,ABS(TEMP-PHIOLD(I,J)))
  • 1 PHINEW (I,J) = TEMP
    • DO 2 I=2, NTOT1
    • DO 2 J=2, NTOT1
  • 2 PHIOLD (I,J) = PHINEW (I,J)
    • IF (TEST>CONVG) GO TO BEGIN
    • ELSE STOP

HTML version of Scripted Foils prepared 26 September 1996

Foil 10 Approaches to Parallel Programming

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Data Parallel typified by CMFortran and its generalization - High Performance Fortran which we have discussed separately
Typical Data Parallel Fortran Statements are full array statements
  • B=A1 + A2
  • B=EOSHIFT(A,-1)
  • Function operations on arrays representing full data domain
Message Passing typified by later discussion of Laplace Example which specifies specific machine actions i.e. send a message between nodes whereas data parallel model is at higher level as it (tries) to specify a problem feature
Note: We are always using "data parallelism" at problem level whether software is "message passing" or "data parallel"
Data parallel software is translated by a compiler into "machine language" which is typically message passing

HTML version of Scripted Foils prepared 26 September 1996

Foil 11 SPMD or SCMD
Single Program (code) Multiple Data

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
A particular way of using MIMD machines - DOMINANT successful use so far.
Each processor runs same program but in general executes different instructions at different times.
Will later see corresponds to "loosely synchronous problems".
Style of current program example -- note although each doing roughly same thing -- i.e. updating grid points -- each node is NOT at same point in update at each clock cycle

HTML version of Scripted Foils prepared 26 September 1996

Foil 12 Data Parallel
Programming for
Laplace Example

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 13 Parallel Laplace Programming
Data Parallel for Jacobi Iteration in One Dimension

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ TEMPLATE WORLD(NTOT) (1)
!HPF$ DISTRIBUTE WORLD(BLOCK) (2)
    • DIMENSION PHINEW (NTOT), PHIOLD(NTOT)
!HPF$ ALIGN PHINEW WITH WORLD (3)
!HPF$ ALIGN PHIOLD WITH WORLD (3)
    • NTOT1 = NTOT-1
    • NTOT2 = NTOT-2
BEGIN PHINEW (2:NTOT1) =0.5* (EOSHIFT (PHIOLD,1) + EOSHIFT (PHIOLD, -1)) (4)
    • TEST = MAXVAL (ABS(PHINEW(2:NTOT1)-PHIOLD(2:NTOT1))) (5)
    • PHIOLD=PHINEW
    • IF (TEST>CONVG) GO TO BEGIN
    • ELSE STOP

HTML version of Scripted Foils prepared 26 September 1996

Foil 14 Notes on HPF Implementation of Lapace Solver

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
(1) DefInes a data world of NTOT entries
(2) Breaks up world into equal parts in each processor decomposed in one dimension
(3) Instructs that PHINEW AND PHIOLD are aligned exactly with world i.e. PHINEW(I) and PHIOLD(I) are stored in the I'th position of world
(4) EOSHIFT is "End-Off" SHIFT. The calls shift PHIOLD by one both to left and right. The indices take care of WORLD (1 and NTOT) being boundary values
  • EOSHIFT (and corresponding circular shift (CSHIFT) are "standard" Fortran90 array manipulation routines. They are not specific to parallel computing. We can write this statement (4) as a direct array operation.
BEGIN PHINEW(2:NTOT1) = 0.5 * (PHIOLD (1:NTOT2) + PHIOLD (3:NTOT))
(5) Forms TEST as maximum change in absolute value of j at any world location. MAXVAL is a Fortran90 intrinsic function.
NOTE: Any subscript n:m is an "array section" or set of indices

HTML version of Scripted Foils prepared 26 September 1996

Foil 15 HPF is an extension of Fortran 90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
To express data parallelism and so hide machine dependent features of parallel programming from the user
We use Fortran90 as base language both because it is
  • more advanced than Fortran 77 with object oriented features and
  • The array syntax allows an elegant explictly parallel expression of some operations
Use of Fortran90 is a Problem because
  • It is a complex language which it is difficult to build compilers for
  • Not many people use Fortran90 and maybe they will all switch to Java before Fortran90 gets in common practice
  • So perhaps it is "too little too late" and we should focus on supporting the past (Fortran77) and the "correct" future ( (HP)Java) and not an irrelevant middle solution .....

HTML version of Scripted Foils prepared 26 September 1996

Foil 16 Why is Fortran90 Easier than Fortran77

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
1 A(I)=B(I) is obviously parallel
Fortran90 Array Notation
  • A=B expresses this parallelism naturally in way compiler can easily detect in a deterministic way
  • Do 1 I=1,N
  • J=I
  • K=I
1 A(J)=B(K) is not so obviously parallel
Needs a difficult to define (in general case) algorithm (especially if IF statements, i.e. conditionals, defining I,J) to decide on existence and implementation of parallelism
Use of Fortran77 has "thrown away" natural parallelism at language level even though "run-time" restores as creates explicit values for variables such as J and K which are only known by analysis at compile time.

HTML version of Scripted Foils prepared 26 September 1996

Foil 17 Important Features of Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Arrays are very well supported with memory allocation and set of intrinsics, better passing to procedures etc.
  • This is key capability for HPF
Derived Types allow general object structure (without inheritance) in F90
  • Pointers
  • This area NOT well supported by HPF Compilers as of Summer 1996
Modules replace COMMON INCLUDE etc.
Procedures (functions,subroutines) allow better interfaces, recursion, optional parameters etc.
Better Syntax with free form, more loop control etc.

HTML version of Scripted Foils prepared 26 September 1996

Foil 18 Introduction to Fortran90 Arrays - I

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Arrays are "true" objects in Fortran90 and are stored as values of elements plus a data descriptor!
There are operations on full arrays which have natural parallel implementations seen in HPF
New set of array intrinsic (built-in) functions for elements, reductions (process array into a single value), tranformational (reshaping)
  • Note these functions are NOT sufficient for real problems and must have FORALL to define new parallel Array functions
  • FORALL is in HPF but not F90 -- Expected in next round of standard (Fortran95)
  • FORALL( I=0:nx)
    • A(i) = (A(I+1)+A(I-1))/(I+1)
  • END FORALL

HTML version of Scripted Foils prepared 26 September 1996

Foil 19 Introduction to Fortran90 Arrays - II

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Extract sections (subarrays) of arrays as u(lb:ub:step)
  • lb is Lower Bound
  • ub is Upper Bound
  • step (defaults to 1) is step
masked (conditional) Array operations using WHERE .... ELSEWHERE
Can still do Fortran77 array element operations (DO Loops) but of course this might not be interpretable for efficient parallelism by HPF compiler
Note Fortran90 designed for science and engineering with originally special concern for vector supercomputers but Cray supports F77 better than F90(!)

HTML version of Scripted Foils prepared 26 September 1996

Foil 20 Fortran90 Arrays and Memory Allocation

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
ALLOCATABLE Arrays can be defined at runtime with variable sizing
  • REAL, ALLOCATABLE :: u(:,:) , f(:,:)
  • ALLOCATE ( u(0:nx,0:ny) , f(1:27,0:ny) )
One can define POINTER and TARGET attributes which can be used like REAL, DIMENSION etc.
  • => operator allows one to set a POINTER to "point to" a TARGET
Arguments of a subroutine need NOT define array dimensions in subroutine as these as passed by calling program in data descriptor
Local arrays are created on stack and bounds maybe non constant and evaluated at procedure entry

HTML version of Scripted Foils prepared 26 September 1996

Foil 21 More on Fortran90 Arrays and Subroutines

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
One passes "assumed-shape" arrays from calling to callee routines using INTERFACE syntax
INTERFACE
  • SUBROUTINE residual (r,u,f)
    • REAL r(:,:) , u(:,:) , F(:,:)
  • END SUBROUTINE
END INTERFACE is called by
call residual (r,u,f) or
call residual ( r(0:nx:2, 0:ny:2) , u(0:nx:2, 0:ny:2) , f(0:nx:2, 0:ny:2) )
where latter example just processes every other element of arrays

HTML version of Scripted Foils prepared 26 September 1996

Foil 22 Typical Use of Array and Intrinsic Operations

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
REAL u(0:nx,0:ny), A(100,100) , fact , avg
u= fact * (u -avg) Scales and translates all elements of u
avg = .25*( CSHIFT(u,1,1) + CSHIFT(u,-1,1) + CSHIFT(u,1,2) + CSHIFT(u,-1,2)
calculates of average of 4 array elements surrounding each point. Note third argument in CSHIFT is label for axis (1=x 2=y)
SQRT( A(1:100) ) calculates a new array containing 100 square roots
SUM(A) is a reduction operator sumimg all elements of array A as a scalar
SIZE(A,1) is an Array Query Intrinsic giving size of A in the first dimension and is particularly useful for "assumed-shape" arrays passed into subroutines

HTML version of Scripted Foils prepared 26 September 1996

Foil 23 Derived Type in Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
TYPE PERSON
  • CHARACTER(LEN=10) NAME
  • REAL AGE
  • INTEGER ID
END TYPE PERSON
TYPE(PERSON) YOU,ME
The Identification number of YOU would be accessed as YOU%ID as an ordinary integer
One can define global operators so that YOU+ME could be defined
One can use name of derived type as a constructor
YOU = PERSON ('Pamela Fox', 12, 3)

HTML version of Scripted Foils prepared 26 September 1996

Foil 24 Examples of POINTER's in Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
One can define a linked list as:
TYPE ENTRY
  • REAL VALUE
  • INTEGER INDEX
  • TYPE(ENTRY), POINTER :: NEXT
END TYPE ENTRY
ALLOCATE Creates dynamically elements in a linked list
CURRENT = ENTRY( NEW_VALUE, NEW_INDEX, FIRST)
FIRST => CURRENT
adds a new entry at start of linked list and renames it with POINTER FIRST

HTML version of Scripted Foils prepared 26 September 1996

Foil 25 MODULEs in Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
General Syntax is:
MODULE name
  • Specify it!
CONTAINS This is optional
  • module subprograms
END MODULE name
MODULE IllustratingCommonBlock
  • INTEGER DIMENSION(52) :: CARDS
END MODULE IllustratingCommonBlock
replaces COMMON construct and can be used as
USE IllustratingCommonBlock

HTML version of Scripted Foils prepared 26 September 1996

Foil 26 MODULEs INTERFACES and Overloaded Operators in Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
MODULE INTERVAL_ARITHMETIC
  • TYPE INTERVAL
    • REAL LOWER, UPPER
  • END TYPE INTERVAL
  • INTERFACE OPERATOR(+) define overloaded + operator
    • MODULE PROCEDURE ADD_INTERVALS
  • END INTERFACE
CONTAINS
  • FUNCTION ADD_INTERVALS(A,B)
    • TYPE(INTERVAL) ADD_INTERVALS, A, B
    • ADD_INTERVALS%LOWER = A%LOWER + B%LOWER
    • ADD_INTERVALS%UPPER = A%UPPER + B%UPPER
  • END FUNCTION ADD_INTERVALS(A,B)
END MODULE INTERVAL_ARITHMETIC

HTML version of Scripted Foils prepared 26 September 1996

Foil 27 Outline of HPF Discussion

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
What is HPF, what we need it for, where it came from
How does HPF Get its Parallelism
Why it is called "High Performance"?
What are HPF compiler directives
Data mapping in HPF
  • TEMPLATE PROCESSORS DISTRIBUTE ALIGN
Parallel statements and constructs in HPF
  • Array statements, WHERE/ELSEWHERE, Intrinsics, FORALL, PURE, INDEPENDENT
Latest Discussions -- HPF-2
  • ON HOME, TASKING, Dynamic Data Mapping, Reductions in INDEPENDENT DO loops

HTML version of Scripted Foils prepared 26 September 1996

Foil 28 Information on HPF and HPF Forum (HPFF)

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Rice has taken lead in HPF Forum which is a much faster mechanism of getting agreement than formal 10 year process which Fortran90 suffered
World Wide Web at rice and Vienna
Mailing List is majordomo@cs.rice.edu and choose list (hpff, hpff-interpret, hpff-core) you wish to subscribe to
Anonymous FTP to titan.cs.rice.edu and look at
  • public/HPFF/README for latest list of files

HTML version of Scripted Foils prepared 26 September 1996

Foil 29 Possible Programming Models

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Explicit Message Passing as in PVM or MPI
User breaks program into parts and the parts send messages between them to implement communication necessary for synchronization and integration of parts into solution of a single program
This matches hardware but is not particularly natural for problem and can be machine dependent
Object Oriented programming is like message passing but now objects and not programs communicate
  • Very good when objects are natural from the problem and represent functional parallelism
  • However in data parallel problems tackled with object oriented approach, one must break problem up into a number of objects that depends on number of processors and so reflects machine and not problem

HTML version of Scripted Foils prepared 26 September 1996

Foil 30 Data Parallel Programming Model

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Data Parallelism is higher level than either message passing or object models (if objects used to break up data to respect computer)
It provides a Shared Memory Programming Model which can be executed on SIMD or MIMD computers, distributed or shared memory computers
Note it specifies problem not machine structure
It in principle provides the most attractive machine independent model for programmers as it reflects problem and not computer
Its disadvantage is that hard to build compilers especially for the most interesting new algorithms which are dynamic and irregular!

HTML version of Scripted Foils prepared 26 September 1996

Foil 31 Problem Architectures

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works for general discussion of problem architecture
Software Maps
Problem ---> Machine
Often software designed around machine architecture
Rather software should be designed around problem architecture
Only able to persuade application scientist to parallelize

HTML version of Scripted Foils prepared 26 September 1996

Foil 32 5 Categories of Problems

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Synchronous: Data Parallel Tightly coupled and software needs to exploit features of problem structure to get good performance. Comparatively easy as different data elements are essentially identical.
Loosely Synchronous:
  • As above but data elements are not identical. Still parallelizes due to macroscopic time synchronization.
Asynchronous:
  • Functional (or data) parallelism that is irregular in space and time. Often loosely coupled and so need not worry about optimal decompositions to minimize communication. Hard to parallelize (massively) unless ....
Embarrassingly parallel:
  • Essentially independent execution of disconnected components. (can involve reductions)
Metaproblems
  • Asynchronous collection of (loosely) synchronous components where these programs themselves can be parallelized

HTML version of Scripted Foils prepared 26 September 1996

Foil 33 Example of Basic Problem Architectures Regular Synchronous Problem Class

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works for QCD and Synchronous problems
eg. Quantum chromodynamics calculations (calculate proton mass)
Regular grid in 4D space-time - domain decomposition
Update algorithm identical at each point
Time evolution very simple
The world looked at microscopically in terms of a set of similar fundamental quantities
Monte Carlo - subtlety that cannot update concurrently points linked in Hamiltonian

HTML version of Scripted Foils prepared 26 September 1996

Foil 34 Example of Basic Problem Architectures Irregular Loosely Synchronous

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works for Loosely Synchronous problems
The world looked at macroscopically in terms of interactions between irregular inhomogeneous objects evolved as a time synchronized simulation
eg. circuit simulation for computer or biological applications
Linked sets of neurons (components) of different types
Time taken to model one time step of type i depends on nature of node and interconnect between them

HTML version of Scripted Foils prepared 26 September 1996

Foil 35 Example of Basic Problem Architectures The difficult Asynchronous Class

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works Asynchronous problems (including Computer Chess)
The world looked at macroscopically in terms of interactions between irregular inhomogeneous objects evolved as an event driven simulation
eg. Battle of Hastings

HTML version of Scripted Foils prepared 26 September 1996

Foil 36 Embarassingly Parallel Problem Class

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Loosely Synchronous, Synchronous or Asynchronous classify problems by their "control" or "synchronization" structure
However there is an important class of problems where this does not matter as the synchronization overhead -- even if in the difficult asynchronous case -- is irrelevant
This is when overhead small or zero. These are "embarassingly parallel problems" where each component of decomposed problem is essentially independent
Examples are:
  • Financial Modelling where each component is calculating some expected value for a particular (possibly Monte Carlo) set of assumptions about the future
  • OLTP (Online Transaction Processing) where each component is a separate checking of a credit card transaction against the account data
  • Graphics rendering where each component is the calculation of the color of a particular pixel.

HTML version of Scripted Foils prepared 26 September 1996

Foil 37 Example of Basic Problem Architectures for MetaProblem Class

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works for Metaproblems including Command and Control
Metaproblems are of growing importance in general HPCC community
One important Example is:
Manufacturing and Design including Multidisciplinary Optimization which combines many fields to look at total product design, manufacturability etc.
  • Geometry
  • Grid Generation
  • Fluid Flow ----> Performance
  • Acoustics
  • Structural Analysis
  • Optimization Module
  • Visualization
Also link in concurrent engineering
  • Design - Manufacturing - Marketing - Service

HTML version of Scripted Foils prepared 26 September 1996

Foil 38 Software Bus Structure of USMADE

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
USMADE is US Multidisciplinary Analysis and Design Environment Project of MADIC Industrial Consortium

HTML version of Scripted Foils prepared 26 September 1996

Foil 39 Computing as a Mapping Problem

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
A hierarchy of mapping problems
We would like to optimize the overall mapping

HTML version of Scripted Foils prepared 26 September 1996

Foil 40 Complex Systems to give a Theory of Computing

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Both problems and computer are
Collections of homogeneous or heterogeneous basic entities with some sort of static or dynamic connection between them
Modeling involves mapping one "complex problem" into another
Simulation or parallel computing involves map
    • Complex problem ---> Complex Computer
What parameters of these two underlying complex systems control effectiveness of map?

HTML version of Scripted Foils prepared 26 September 1996

Foil 41 Parallel Computing is "just" an optimization problem, even if we can't agree on what to optimize

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Execution time
  • main focus of HPCC community?
User happiness
  • main focus of software engineering community

HTML version of Scripted Foils prepared 26 September 1996

Foil 42 Complex System Representation of Levels in Computer Software

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing Works for problem architecture

HTML version of Scripted Foils prepared 26 September 1996

Foil 43 The map of Problem ---> Computer is performed in two or more statges

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Architecture of "virtual problem" determines nature of language

HTML version of Scripted Foils prepared 26 September 1996

Foil 44 The Mapping of Space of Problem Architectures onto Space of Machine Architectures

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 45 What determines when Parallelism is Clear ?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Key idea behind data-parallel languages - and perhaps all good languages
The language expresses problem and not the machine architecture
Need different approach to express functional parallelism
Use"object-oriented" feature when problems has natural objects
Do not use "object-oriented" features when objects are artifacts of target machine
Need data and functional (object) parallel paradigms in many problems - especially multidisciplinary

HTML version of Scripted Foils prepared 26 September 1996

Foil 46 Evaluation of High Performance Fortran What applications need what features of HPF and its extensions ?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See NPAC's High Performance Fortran Applications Resource

HTML version of Scripted Foils prepared 26 September 1996

Foil 47 What Issues should High Performance Fortran (HPF) Address!

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Should not solve ALL Problems
Goal: Express in a high level portable scalable fashion those aspects of problems one would like to use Fortran for
  • Need not solve all issues Roughly express all data parallelism and needed task parallelism
  • Integrate other programming paradigms to address remaining issues (see PCRC - Parallel Compiler Runtime Consortium)
Lessons from study of
  • CMFORTRAN experience
  • Some initial HPF experience
  • Fortran + Message Passing Experience
Particle Dynamics and Parallel Differential Equation Solving have been studied in detail (for HPF). Other fields less completely understood.

HTML version of Scripted Foils prepared 26 September 1996

Foil 48 Goal of High Performance Fortran

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Express in a high level portable scalable fashion those aspects of problems one would like to use Fortran for
  • Need not solve all issues Roughly express all data parallelism and needed task parallelism
  • Integrate other programming paradigms to address remaining issues (probably C++ can be better used for "difficult" cases. This implies integrated multilanguage support.)
Lessons from study of
  • CMFORTRAN experience
  • Some initial HPF experience
  • Fortran + Message Passing Experience
Particle Dynamics and Parallel Differential Equation Solving have been studied in detail (for HPF).
  • Image Processing quite well understood due to ARPA iWARP activity (CMU and elsewhere)
  • Other fields less completely understood.

HTML version of Scripted Foils prepared 26 September 1996

Foil 49 Any Complete Programming Environment Must Handle

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Data Parallelism
  • Completely handled in extended HPF?
Task Parallelism
  • Partially handled in (extended) HPF? e.g.
    • Embarrassingly Parallel Task Parallelism OK
    • Event Driven Simulations not possible
Metaproblems (task parallelism where each component data parallel)
  • Should we handle directly by; say, adding Fortran-M Syntax to HPF
  • or embed HPF in HPC++, Fortran-M (viewed as "outside" HPF), AVS ....
  • Metaproblems are discussed in Parallel Computing Works

HTML version of Scripted Foils prepared 26 September 1996

Foil 50 HIGH PERFORMANCE FORTRAN COMPILERS

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See HPF Forum or National Software Exchange
List or NPAC List for educational resources
HPF is designed so that parallelism is essentially explicit
This is embodied in 2 classes of operations:
  • INTRINSICS - 74 library routines
    • these would be coded in most efficient fashion possible for target machine - whether global address or message passing, eg., ALL, ANY, CSHIFT, SUM, ...
    • represent a set of primitives higher level than message passing which implement functionality needed in most (any) parallel processing systems
  • PARALLEL STATEMENTS - FORALL / array assignments, INDEPENDENT
    • here compile time analysis leads to efficient parallel implementation
    • HPF will again invoke a set of runtime library routines
    • need most efficient, not most convenient routines

HTML version of Scripted Foils prepared 26 September 1996

Foil 51 What type of compiler is HPF ?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
HPF makes much greater use of runtime optimizations than the traditional sequential or parallel compiler
Fortran 77: Do 1 I=1,N 1 A(I) = FUNC(B(I), B(I +/- 1), C(I), C(I +/- 1), .....)
Traditionally arrange access to these elements in COMPILER
HPF: A=FUNC(B,C)
Invoke ALREADY OPTIMIZED parallel implementation of FUNC In many cases, HPF compiler need not know anything about parallelization except interfaces, data layout etc.

HTML version of Scripted Foils prepared 26 September 1996

Foil 52 The High Performance Fortran Library

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Much of HPF's Parallel "Power" is buried in its library
Personal note: Fortran 90D motivated by similarities between Caltech message passing "collective" communication and F90 Intrinsics
18 Non-elemental (true array) Fortran 90 Intrinsics e.g. CSHIFT, SUM
7 new HPF Intrinsics e.g. Alignment, Template, Distribution inquiries
4 new HPF Reduction (combine) Functions
11 new HPF Combine-Scatter Functions
22 new HPF Parallel Prefix Functions
2 new HPF Parallel Sorting Functions
3 other new HPF Parallel Functions
7 HPF Local Intrinsics
74 library routines (==> ~1000 routines when written in clumsy F77)

HTML version of Scripted Foils prepared 26 September 1996

Foil 53 HPF Intrinsic Library

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Non-Elemental Fortran 90 Intrinsics
  • 1. ALL(MASK,DIM)
  • 2. ANY(MASK,DIM)
  • 3. COUNT(MASK,DIM)
  • 4. CSHIFT(ARRAY,SHIFT,DIM)
  • 5. DOT_PRODUCT(VECTOR_A,VECTOR_B)
  • 6. EOSHIFT(ARRAY,SHIFT,BOUNDARY,DIM)
  • 7. MATMULMATRIX_A, MATRIX_B
  • 8. MAXLOC(ARRAY,MASK)
  • 9. MAXVAL(ARRAY,DIM,MASK)
  • 10. MINLOC(ARRAY,MASK)
  • 11. MINVAL(ARRAY,DIM,MASK)
  • 12. PACK(ARRAY,MASK,VECTOR)
  • 13. PRODUCT(ARRAY,DIM,MASK)
  • 14. RESHAPE(SOURCE,SHAPE, PAD,ORDER)
  • 15. SPREAD(SOURCE,DIM,NCOPIES)
  • 16. SUM(ARRAY,DIM,MASK)
  • 17. TRANSPOSE(MATRIX)
  • 18. UNPACK(VECTOR,MASK,FIELD)
HPF Intrinsics
  • 1. NUMBER_OF_PROCESSORS(DIM)
  • 2. PROCESSORS_SHAPE()
  • 3. Extensions of MAXLOC and MINLOC
  • 4. Integer length ILEN
  • 5. Alignment Inquiry Intrinsic Subroutine HPF_ALIGNMENT
  • 6. Template Inquiry Intrinsic Subroutine HPF_TEMPLATE
  • 7. Distribution Inquiry Intrinsic Subroutine HPF_DISTRIBUTION

HTML version of Scripted Foils prepared 26 September 1996

Foil 54 High Performance Fortran Library -- I

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
New Reduction Functions
  • 1. IALL
  • 2. IANY
  • 4. PARITY
    • 3. IPARITY
Combining- Scatter Functions
  • 2. COUNT_SCATTER
  • 3. PRODUCT_SCATTTER
  • 1. SUM_SCATTER
  • 4. MAXVAL_SCATTER
  • 5. MINVAL_SCATTER
  • 6. IALL_SCATTER
  • 7. IANY_SCATTER
  • 8. IPARITY_SCATTER
  • 9. ALL_SCATTER
  • 10. ANY_SCATTER
  • 11. PARITY_SCATTER
Sorting Functions
  • 1. GRADE_UP
  • 2. GRADE_ DOWN

HTML version of Scripted Foils prepared 26 September 1996

Foil 55 High Performance Fortran Library -- II

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Parallel Prefix Functions
  • 1. SUM_PREFIX
  • 2. COUNT_PREFIX
  • 3. PRODUCT_PREFIX
  • 4. MAXVAL_PREFIX
  • 5. MINVAL_PREFIX
  • 6. IALL_PREFIX
  • 7. IANY_PREFIX
  • 8. IPARITY_PREFIX
  • 9. ALL_PREFIX
  • 10. ANY_PREFIX
  • 11. PARITY_PREFIX
  • 12. SUM_SUFFIX
  • 13. COUNT_SUFFIX
  • 14. PRODUCT_SUFFIX
  • 15. MAXVAL_SUFFIX
  • 16. MINVAL_SUFFIX
  • 17. IALL_SUFFIX
  • 18. IANY_SUFFIX
  • 19. IPARITY_SUFFIX
  • 20. ALL_SUFFIX
  • 21. ANY_SUFFIX
  • 22. PARITY_SUFFIX
Other Functions
  • 1. POPCNT
  • 2. POPPAR
  • 3. LEADZ

HTML version of Scripted Foils prepared 26 September 1996

Foil 56 Imprecise Mapping of Problem Classes into Runtime and Language Terms

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See Parallel Computing works General Discussion
STATIC
  • Synchronous and Embarassingly Parallel Problems - current HPF
ADAPTIVE
  • Loosely Synchronous but not Synchronous - future capabilities of High Performance Fortran (HPF+) but can be supported well in message passing
ASYNCHRONOUS
  • Asynchronous
INTEGRATION
  • Metaproblems
  • AVS works well but also can be integrated into languages such as HPC++, Fortran-M

HTML version of Scripted Foils prepared 26 September 1996

Foil 57 General Applicability of HPF, HPF++, HPC++

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 58 Importance of HPF, HPC++ to Users

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Standardized syntax lowers risk in application development
  • Can "guarantee" that code will run on future parallel machines with reasonable performance i.e. HPF, HPC++ are scalable standards
HPF, HPC++ benchmarks will enable easier comparison between different machines
  • Future procurements should (could) require HPF and HPC++?
Allows vendors to concentrate their scarce software resources
  • "All" parallel machines should offer HPCC Technologies
    • HPF
    • HPC++
    • MPI (Message Passing)
    • OSF etc.

HTML version of Scripted Foils prepared 26 September 1996

Foil 59 What about other languages ?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Parallelism is a feature of problems
It can be expressed in different languages with trade-offs in performance, flexibility and convenience
Fortran 77 Þ HPF
Fortran 90 Þ HPF
C++ Þ HPC++
  • Limitations (focus) of Fortran allow higher performance implementations
HPC++ more naturally handles complex data structures e.g. data parallel collections which are not the staple array data structure of Fortran
Must have common runtime support
  • Explored by PCRC - Parallel Compiler Runtime Consortium

HTML version of Scripted Foils prepared 26 September 1996

Foil 60 What applications does HPF support? If not - what extensions are needed?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
The following foils are expanded in HPF Applications Resource at NPAC
Somewhat related issue ?
  • What problems are suitable for HPF and make good use of MIMD architectures ?
  • i.e. run well on MIMD but poorly on SIMD
Classification of Problems (Fox, 1988)

HTML version of Scripted Foils prepared 26 September 1996

Foil 61 5 Categories of Problems

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
See overview discussion in Parallel Computing Works
Synchronous: Data Parallel Tightly coupled and software needs to exploit features of problem structure to get good performance. Comparatively easy as different data elements are essentially identical.
Loosely Synchronous:
  • As above but data elements are not identical. Still parallelizes due to macroscopic time synchronization.
Asynchronous:
  • Functional (or data) parallelism that is irregular in space and time. Often loosely coupled and so need not worry about optimal decompositions to minimize communication. Hard to parallelize (massively) unless ....
Embarrassingly parallel:
  • Essentially independent execution of disconnected components. (can involve reductions)
Metaproblems
  • Asynchronous collection of (loosely) synchronous components where these programs themselves can be parallelized

HTML version of Scripted Foils prepared 26 September 1996

Foil 62 Comparison of 3 different Programming Models

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 63 Parallelism in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Parallelism in HPF is expressed explicitly
  • Fortran 90 array expressions and assignments (including WHERE)
  • HPF Library and Array intrinsics
  • FORALL statement and construct
    • PURE labels procedures that can be used in FORALL as they have no "side-effects"
  • INDEPENDENT assertion on DO loops
Compiler may choose not to exploit information about parallelism
Compiler may detect parallelism in sequential code

HTML version of Scripted Foils prepared 26 September 1996

Foil 64 Fortran77 is part of Fortran90

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
A=B or more interestingly
WHERE( B > 0. ) A = B
ELSEWHERE A=0.
END WHERE
can be written
DO I = n1,n2
DO J = m1,m2
IF(B(I,J) >0.) THEN A(I,J) = B(I,J)
ELSE A(i,J) = 0.
END IF
END DO
END DO
Now a good HPF compiler will recognize the DO loops can be parallelized and give the same answer for Fortran90 and Fortran77 forms but often the detection of parallelism is not clear
Note FORALL is guaranteed to be parallelizeable as by definition no side effects.

HTML version of Scripted Foils prepared 26 September 1996

Foil 65 HPF Features

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
All of Fortran90
New instructions FORALL and INDEPENDENT enhancing DO loops
Data Alignment and Distribution Assertions
Miscellaneous Support Operations but
NO parallel Input/Output
Little Support for Irregular Computations
Little Support for any form of non mainstream data-parallelism
Extrinsics as supporting links with explicit message-passing

HTML version of Scripted Foils prepared 26 September 1996

Foil 66 What gives high performance in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
There is tradeoff between parallelism and communication
Programmer defines the data mapping and compiler uses this to assign processing
Underlying assumptions are that:
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor,
And that it may be possible to carry out many such operations concurrently if they can be performed on different processors
This is embodied in "owner computes" rule -- namely that in for instance
  • A(i,j)= .....
  • One brings everything on right hand side to process "owning" A(i,j) and performs computation in this processor
Owner computes algorithm is usually good and often best

HTML version of Scripted Foils prepared 26 September 1996

Foil 67 Compiler directives used in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
The directives are structured comments that suggest implementation strategies or assert facts about a program to the compiler
They may affect the efficiency of the computation performed, but do not change the value computed by the program
As in Fortran 90 statements, there are both:
  • declarative directives
  • executable directives

HTML version of Scripted Foils prepared 26 September 1996

Foil 68 What does an HPF Compiler do?

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
It must generate Fortran77(90) + Message Passing code or possibly in one pass map HPF code onto parallel machine code
Traditional dataflow and dependency analysis is especially critical in Fortran77 parts of code
It must use data mapping assertions to decide what is stored where and so organize computation
Code must be transformed to respect this owner-computes model
It must typically use "Loosely Synchronous" model with communicate-compute phases and then compiler generates all the communication needed
  • Due to latency issues compiler must minimize communication needed and maximize size of packets sent
We need an excellent run-time library which the compiler invokes with parallel Intrinsics etc.

HTML version of Scripted Foils prepared 26 September 1996

Foil 69 Syntax of HPF Directives

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
HPF directives are consistent with Fortran 90 syntax except for the special prefix for directive:
  • !HPF$ (only version allowed with free format)
  • CHPF$
  • *HPF$
Two forms of the directives are allowed
  • Specification statements, such as
  • !HPF$ DISTRIBUTE MYTEMPLATE(BLOCK) ONTO P
  • Equivalent Attributed form, such as,
  • !HPF$ DISTRIBUTE (BLOCK) ONTO P :: MYTEMPLATE

HTML version of Scripted Foils prepared 26 September 1996

Foil 70 Data Mapping in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Data Mapping in HPF is all you need to do to get parallelism as long as you use the explicit array type syntax such as A=B+C
The Owner Computes rule implies that specifying location of variables specifies (optimally or not) parallel execution!
The new HPF-2 ON HOME directive is exception to this rule as specifies where a particular statement is to be executed
(RE)DISTRIBUTE tells you where data is to be placed
(RE)ALIGN tells you how different data structures are to be placed relative to each other

HTML version of Scripted Foils prepared 26 September 1996

Foil 71 Staged Data Mapping in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 72 Template in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
A Template is an abstract space of indexed positions (an "array of nothings")
In CMFortran terminology, Template is set of Virtual Processors -- one per data point
  • Template typically specifies precisely the full natural data parallelism that is natural for problem and HPF maps this problem parallelism onto particular machine
A template is declared by the TEMPLATE directive that specifies:
  • name of the template
  • the rank (i.e., number of dimensions)
  • the extent in each dimension
Examples:
  • CHPF$ TEMPLATE T(1000)
  • !HPF$ TEMPLATE FRED(N, 2*N)
  • *HPF$ TEMPLATE, DIMENSION(5,100,50) :: MINE, YOURS

HTML version of Scripted Foils prepared 26 September 1996

Foil 73 Abstract Processors in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Abstract processors always form a rectilinear grid in 1 or more dimensions
They are abstract coarse grain collections of data-points
  • Remember efficiency says we must "block" communication into large chunks -- processors give us a general target for this
The processor arrangement is defined by the PROCESSORS directive that specifies:
  • name of the processor arrangement
  • the rank (i.e., number of dimensions)
  • the extend in each dimension
Examples:
  • !HPF$ PROCESSORS P(N)
  • *HPF$ PROCESSORS BIZARRO(1972:1997,-20:17)
  • CHPF$ PROCESSORS SCALARPROC (by default sequential)

HTML version of Scripted Foils prepared 26 September 1996

Foil 74 Example of Template and Processors

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ PROCESSORS P(4)
!HPF$ TEMPLATE X(40)
!HPF$ ALIGN WITH X :: A, B, C
!HPF$ DISTRIBUTE X(BLOCK)
  • ...
  • C = A + B
  • ...

HTML version of Scripted Foils prepared 26 September 1996

Foil 75 Align Directive in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Syntax of Align:
!HPF ALIGN alignee WITH align-target
  • where -- note [..] implies optional component
  • alignee: alignee [(align-source-list)]
  • align target: align-target[(align-subscript-list)]
Alternatively
*HPF ALIGN (align-source-list) WITH align-target :: alignee

HTML version of Scripted Foils prepared 26 September 1996

Foil 76 Examples of Align Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Note a colon(:) in directive denotes all values of array index
Examples of array indices:
  • CHPF$ ALIGN A(i) WITH B(i)
  • *HPF$ ALIGN (i,j) WITH TEMPL(i,j) :: A, B
Use of : examples:
  • !HPF$ ALIGN A(:) WITH B(:)
  • CHPF$ align (:,:) WITH TEMPL(:,:) :: A, B

HTML version of Scripted Foils prepared 26 September 1996

Foil 77 Changing Rank in Align Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Ranks of the alignee and the align-target may be different
Examples:
  • !HPF$ ALIGN A(:,j) WITH B(:)
  • CHPF$ ALIGN A(:,*) WITH B(:)

HTML version of Scripted Foils prepared 26 September 1996

Foil 78 Replication in Align Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
... or other way round
  • !HPF$ ALIGN A(:) WITH TEMPL(:,*)
while this only puts A on some parts of template...
!HPF$ ALIGN A(:) WITH TEMPL(:,i)

HTML version of Scripted Foils prepared 26 September 1996

Foil 79 General Alignments in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
HPF allows for more general alignments such as:
  • REAL, DIMENSION(5,8) :: A,B
!HPF$ TEMPLATE T(12,12)
!HPF$ ALIGN A(:,J) WITH T(:,J+1)
!HPF$ ALIGN B(I,J) WITH T(I+4,J+4)
Useful for simple numerical shifts as in example but not useful
in general case of arbitary
index values allowed by
ALIGN syntax

HTML version of Scripted Foils prepared 26 September 1996

Foil 80 Formal Definition of Align Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Each align-dummy variable is considered to range over all valid index values for the corresponding dimension of the alignee. An align-subscript is evaluated for any specific combination of values for the align-dummy variables simply by evaluating each align-subscript as a expression. Their resulting subscript values must be legitimate subscripts for the align-target

HTML version of Scripted Foils prepared 26 September 1996

Foil 81 More obscure Complicated Examples of Align Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
These examples have non-unit stride as perhaps in "red-black" Iterative Solver algorithms:

HTML version of Scripted Foils prepared 26 September 1996

Foil 82 Distribution Directive in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Syntax:
!HPF$ DISTRIBUTE distributee (dist-format)
[ONTO dist-target]
Allowed forms of dist-format:
  • * -- Implies no distribution in this index
  • BLOCK -- Critical to minimize communication
  • CYCLIC -- Critical for load balancing
  • BLOCK(int-expr) -- Not Obviously useful!
  • CYCLIC(int-expr) -- Very useful
Examples:
  • CHPF$ DISTRIBUTE TEMP(BLOCK,CYCLIC)
  • !HPF$ DISTRIBUTE FRED(BLOCK(10)) ONTO P
  • *HPF$ DISTRIBUTE (BLOCK,*) :: MYTEMPLATE

HTML version of Scripted Foils prepared 26 September 1996

Foil 83 Basic Examples of Distribute Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ PROCESSORS P(4)
  • REAL, DIMENSION(16) :: A
!HPF$ TEMPLATE T(16)
!HPF$ ALIGN A(:) WITH T(:)

HTML version of Scripted Foils prepared 26 September 1996

Foil 84 Two Dimensional Example of Distribute Directive

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
*HPF PROCESSORS SQUARE(2,2)
*HPF TEMPLATE T(4,4)
*HPF ALIGN A(:,:) WITH T(:,:)
*HPF DISTRIBUTE T(BLOCK,CYCLIC)ONTO SQUARE

HTML version of Scripted Foils prepared 26 September 1996

Foil 85 The Two Basic Distributions in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
We used BLOCK in the Laplace equation example and so this is appropriate distribution for "local" or geometric type problems
CYCLIC is called scattered in our early work (or is a special case of scattered which is perhaps random distribution of objects on processors) is appropriate in cases where "load-balancing" is more important than locality
  • Simplest examples are matrix inversion and graphics rendering problems
  • In solving equations (we will do later) Ax=b , there is no "nearest neighbor" structure between rows and columns, but rather one eliminates rows and columns and cyclic distribution ensures work remains balanced
  • In calculating pixels, work depends on complexity of picture at that pixel and so best to distribute pixels cyclically (or randomly) to processors.

HTML version of Scripted Foils prepared 26 September 1996

Foil 86 The Example of Matrix Inversion

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Matrix Inversion set up on two processors after
0 2 and 4 rows/columns eliminated
Note BLOCK decomposition leads to all work being on one processor at end even if starts off balanced

HTML version of Scripted Foils prepared 26 September 1996

Foil 87 Example of Graphics Rendering

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Here we show a 16 by 16 array of pixels with either CYCLIC or 8 by 8 two dimensional BLOCK,BLOCK

HTML version of Scripted Foils prepared 26 September 1996

Foil 88 Example of Distribute Directive with Complex Alignment

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
CHPF$ PROCESSORS Q(4)
CHPF$ TEMPLATE FRED(16,16)
CHPF$ ALIGN A(:,:) WITH FRED(:,:)
CHPF$ ALIGN B(I,J) WITH FRED(I+2,J+2)
CHPF$ DISTRIBUTE FRED(BLOCK,*)

HTML version of Scripted Foils prepared 26 September 1996

Foil 89 Dynamic Data Mapping

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
One data mapping is often not appropriate for an entire program
  • Often one has phases in which different distributions are needed in different phases
  • e.g. in 2D FFT, one typically finds FFT of F(I,J) by first distributing so for each J all I (x values) are in same processor and then transform so that for each I all J are in same processor
  • This ensures no communication in FFT phases which is important as typically in distributed one dimensional FFT there is substantial overhead
ALLOCATABLE arrays can change size
REALIGN and REDISTRIBUTE are executable DISTRIBUTE and ALIGN commands but are only to be used if one declares arrays on which they act DYNAMIC
Naturally DYNAMIC arrays can be initialized by ALIGN or DISTRIBUTE statements

HTML version of Scripted Foils prepared 26 September 1996

Foil 90 Advanced Mapping Directives -- ReDistribution and ReAlign

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
This example illustrates remapping from one to two dimensional decomposition for A and changing B from alignment with columns to alignment with rows
  • REAL, DIMENSION(64,64) :: A
  • REAL, DIMENSION(64) :: B
!HPF$ PROCESSORS P(64)
!HPF$ PROCESSORS Q(8,8)
!HPF$ DYNAMIC :: A,B
!HPF$ ALIGN B(:) WITH A(:,*)
!HPF$ DISTRIBUTE A(*,BLOCK)ONTO P
  • ...
!HPF$ REALIGN B(:) WITH A(*,:)
  • ...
!HPF$ REDISTRIBUTE A(CYCLIC,CYCLIC) ONTO Q
  • ...

HTML version of Scripted Foils prepared 26 September 1996

Foil 91 Advanced Mapping Directives -- Allocatable arrays and pointers

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ PROCESSORS Q(64)
!HPF$ ALIGN B(I) WITH A(I+N)
!HPF$ DISTRIBUTE A(BLOCK(M))
!HPF$ DISTRIBUTE(BLOCK), DYNAMIC :: P
  • ...
  • ALLOCATE(A(128))
  • ALLOCATE(B(64))
  • ALLOCATE(P(1024))
  • ...
!HPF$ REDISTRIBUTE P(CYCLIC)
  • ...
  • RETURN
  • END

HTML version of Scripted Foils prepared 26 September 1996

Foil 92 Subprograms in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Scope of any mapping directives is a single (sub)program unit
A template or distribution is not a first-class Fortran 90 object:
It cannot be passed as a subprogram argument and this creates significant complication!
HPF Compiler will typically pass an extra argument which is effectively an array-descriptor telling subroutine about distribution of passed arrays
One can use array query intrinsics to find out what is going on but of course compiler does this implicitly

HTML version of Scripted Foils prepared 26 September 1996

Foil 93 Passing Distributed Arrays as Subprogram Arguments in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
There are three typical cases:
Subroutine requires data to use a particular mapping determined by subroutine
  • Arguments must be remapped
Subroutine can use any mapping so actual argument should be passed and used with current mapping
  • Here we have two cases depending on whether programmer knows or not (and tells subroutine) what incoming distribution is
Sometimes we need to remap due to array sections being passed
Any remappings must be undone on return from subroutine

HTML version of Scripted Foils prepared 26 September 1996

Foil 94 Mapping Options for Dummy (Subroutine) Arguments

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
DISTRIBUTE
  • use * instead of dist-format or ONTO clause indicates that incoming distribution is acceptable i.e. leave data in place
  • * before dist-format or ONTO clause indicates that data should stay in place and asserts that distribution is what you claim
ALIGN
  • * instead of or before target has similar meanings to DISTRIBUTE
INHERIT
  • A new attribute allowing references back to the original full array and used when sections of array are passed

HTML version of Scripted Foils prepared 26 September 1996

Foil 95 Inherit Distribution Directive in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
(not a comprehensive discussion; just an example)

HTML version of Scripted Foils prepared 26 September 1996

Foil 96 Summary of Mapping Directives in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
PROCESSORS
TEMPLATE
ALIGN
DISTRIBUTE
INHERIT
DYNAMIC
REALIGN
REDISTRIBUTE

HTML version of Scripted Foils prepared 26 September 1996

Foil 97 Fundamental Parallelism Assumption in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor
  • i.e. minimize communication
it may be possible to carry out many such operations concurrently if they can be performed on different processors
  • data parallelism

HTML version of Scripted Foils prepared 26 September 1996

Foil 98 Parallel statements and Constructs in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Parallel Statements
  • Fortran 90 array assignments
  • masked array assignments (WHERE)
  • FORALL statement
Parallel Constructs
  • WHERE and WHERE...ELSEWHERE construct
  • FORALL construct
  • INDEPENDENT DO
Intrinsic functions and the HPF library
Extrinsic functions

HTML version of Scripted Foils prepared 26 September 1996

Foil 99 Parallelism in Fortran 90 array assignments

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
This is as in CMFortran and Maspar MPFortran with example:

HTML version of Scripted Foils prepared 26 September 1996

Foil 100 WHERE (masked array assignment) in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
This is as in CMFortran and Maspar MPFortran with example:
  • WHERE (A .GT. 0) A = A - 100
Semantics of WHERE statement:
    • 1. evaluate mask (in parallel) and store as a temporary T1
    • 2. for each i that T1(i)=.TRUE. compute T2(i)=A(i) - 100
    • 3. for each i that T1(i)=.TRUE. assign A(i)=T2(1)

HTML version of Scripted Foils prepared 26 September 1996

Foil 101 WHERE...ELSEWHERE / IF...ELSE constructs in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
There is a fundamental difference in semantics between IF...ELSE and WHERE...ELSEWHERE constructs

HTML version of Scripted Foils prepared 26 September 1996

Foil 102 Intrinsic functions in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
elemental
  • examples:
    • A = SIN(X)
    • FORALL (i=1:100:2) A(i) = EXP(A(i))
transformational and inquiry functions
  • Fortran 90
    • SUM, PRODUCT, ANY, DOTPROD, EOSHIFT, MAXVAL, ...
  • HPF
    • system inquiry functions:
    • NUMBER_OF_PROCESSORS,
    • PROCESSORS_SHAPE
    • extensions of MAXLOC and MINLOC, ILEN
  • HPF library

HTML version of Scripted Foils prepared 26 September 1996

Foil 103 HPF library functions

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
new array reduction functions
  • IALL, IANY, IPARITY and PARITY
  • (IAND, IOR, IEOR, and .NEQV.)
array combining scatter functions
  • XXX_SCATTER
array prefix and suffix functions
  • XXX_PREFIX, XXX_SUFFIX
array sorting functions
  • GRADE_DOWN, GRADE_UP
bit manipulation functions
  • LEADZ, POPCNT, POPPAR
mapping inquiry subroutines
  • HPF_ALIGNMENT, HPF_TEMPLATE, HPF_DISTRIBUTION

HTML version of Scripted Foils prepared 26 September 1996

Foil 104 SUM, SUM_PREFIX and SUM_SCATTER defined

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
X=SUM(A) sums all elements of A and places result in scalar X
Y = SUM_PREFIX(A) sets array Y of same size as A so that Y(i) has the sum of all A(j) for 1 <= j <= i
Y = SUM_SCATTER(A,B, IND) sets array element Y(i) as the sum of array element B(i) plus those elements of A(j) where IND(j) = I
  • X = SUM_SCATTER( flux, X, INDEX) is equivalent to
  • FORALL (i=1:N)
    • X(INDEX(i)) = X(INDEX(i)) + flux(i)
  • END FORALL assuming INDEX(i) just permutes numbers 1 to N and has no repeated values

HTML version of Scripted Foils prepared 26 September 1996

Foil 105 HPF Intrinsic EXAMPLE: SUM

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 106 FORALL Statement in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
A very important extension to Fortran 90 and defines one class of parallel DO loop
FORALL will be a language feature of Fortran95
It relaxes the restriction that operands of the rhs expressions must be conformable with the lhs array
It may be masked with a scalar logical expression (extension of WHERE construct)
A FORALL statement may call user-defined (PURE) functions on the elements of an array, simulating Fortran 90 elemental function invocation (albeit with a different syntax)
FORALL( index-spec-list [,mask-expr] ) forall assignment
  • where forall-assignment is conventional single Fortran90 statement

HTML version of Scripted Foils prepared 26 September 1996

Foil 107 Examples of FORALL statements in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
FORALL (i=1:100,k=1:100) a(i,k) = b(i,k) A = B
FORALL (i=2:100:2) a(i) = a(i-1) A(2:100:2) = A(1:99:2)
FORALL (i=1:100) a(i) = i A = [1..100]
FORALL (i=1:100, j=1:100) a(i, j) = i+j
FORALL (i=1,100) a(i,i) = b(i)
FORALL (i=1,100,j=1:100) a(i,j) = b(j,i)
FORALL (i=1,100) a(i, 1:100) = b(1:100, i)
FORALL (i=1:100, j=1:100, y(i,j).NE.0) x(i,j) = REAL(i+j)/y(i,j)
FORALL (i=1,100) a(i,ix(i)) = x(i)
FORALL (i=1,9) x(i) = SUM(x(1:10:i))
FORALL (i= 1,100) a(i) = myfunction(a(i+1))

HTML version of Scripted Foils prepared 26 September 1996

Foil 108 Semantics of the FORALL statement in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Similar to Fortran 90 array assignments and WHERE
Consider example:

HTML version of Scripted Foils prepared 26 September 1996

Foil 109 Vector Indices in FORALL's

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Consider FORALL( i=1:n ) a(ix(i)) = a(i)
is allowed in HPF but will only give sensible reproducible results if ix(i) is a true permutation of 1...n
  • but if ix(i) has repeated values, then the result is undefined
Of course we always use "old" value" of a(i) on rhs so that if ix(i)= i+1 and a(0) defined, then result is
  • a(i)/new = a(i-1)/old
  • and not result of recursion
  • a(1)/new =a(0)/old
  • a(2)/new = a(1)/new just calculated = a(0)/old

HTML version of Scripted Foils prepared 26 September 1996

Foil 110 Multiple Statement FORALL's

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
FORALL( index-spec-list [,mask-expr] )
forall-body
END FORALL
where forall-body can be a list of forall-assignment statements, FORALL or WHERE statements
So Multi-Statement FORALL's support nesting of FORALL's but
is in general Shorthand for a sequence of single statement FORALL's with by definition each statement completed before next one begins
The multi-statement FORALL is likely to be more efficient than several single statement ones as latter have synchronization overhead on each statement

HTML version of Scripted Foils prepared 26 September 1996

Foil 111 HPF FORALL construct Pictorially

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 112 PURE Functions in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
PURE functions have no side effects
  • ALL Intrinsics are PURE
DO loops can call any functions and parallelism unclear as function call can destroy parallelism
  • DO I=1,1000
    • A(I) = FUNC(A(I-1),X)
  • END DO
If FUNC alters A(I-1) or in fact A(any index except I), then this loop cannot be easily parallelized
FORALL statements can only call PURE functions and these must NOT define any global (e.g. any element of A in example) or dummy (A(i-1) or X) variable
  • PURE functions can only INHERIT distribution and alignment statements

HTML version of Scripted Foils prepared 26 September 1996

Foil 113 Example of PURE Function from Chuck Koelbel

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
FORALL( i=1:n, j=1:m )
  • k(i,j) = mandelbrot ( CMPLX((i-1)*1.0/(n-1), (j-1)*1.0/(m-1)), 1000)
END FORALL
This can call the PURE function mandelbrot which is essentially a generalized intrinsic
PURE INTEGER FUNCTION mandelbrot (x,itol)
  • COMPLEX, INTENT(IN) :: x
  • INTEGER, INTENT(IN) :: itol
  • COMPLEX xtmp
  • INTEGER k
    • k=0
    • xtmp = -x
    • DO WHILE( ABS(xtmp) < 2. .AND. k < itol )
    • xtmp = xtmp*xtmp - x
    • k = k + 1
    • END DO
    • mandelbrot = k
END FUNCTION mandelbrot

HTML version of Scripted Foils prepared 26 September 1996

Foil 114 The INDEPENDENT Assertion in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ INDEPENDENT [ ,NEW (variable-list) ]
INDEPENDENT asserts that no iteration affects any other in any way
It implements the "embarassingly parallel" problem class we discussed under structure of problems
Note rest of HPF tackles mainly the synchronous problem class with some loosely synchronous capability
  • HPF2 has "tasking" for metaproblem class and some extensions for further irregular loosely synchronous problems
NEW variables are defined to have fresh instantiations for each iteration as is typically needed for embarassingly parallel problems where in fact essentially all variables in a loop would be NEW
Note INDEPENDENT can be applied to FORALL and asserts that no index point assigns to any location that another iteration index value uses
  • This reduces copying needed in FORALL by COMPILER
HPF2 (see later) has extra feature of allowing REDUCTION (accumulated) variables in INDEPENDENT DO loops

HTML version of Scripted Foils prepared 26 September 1996

Foil 115 !HPF$ INDEPENDENT FORALL Pictorially

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 116 !HPF$ INDEPENDENT DO Pictorially

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index

HTML version of Scripted Foils prepared 26 September 1996

Foil 117 !HPF$ INDEPENDENT, NEW Variable

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
This is an exception from the conventional HPF picture of a global name space with either distributed or replicated variables

HTML version of Scripted Foils prepared 26 September 1996

Foil 118 Extrinsics in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
An extrinsic function is a function written in a language other than HPF including most naturally any node programming language (e.g. Fortran77) targeted to a single processor SPMD) with message passing such as MPI
HPF defines (Fortran90) interface and invocation characteristics
  • This defines what is input and output via INTENT and these rules must be obeyed by callee (non HPF side)
  • If variables are implicitly replicated, the callee must make them consistent before reurn i.e. callee must respect HPF model of parallelism
  • HPF will execute any remapping commands and hands callee the "right" part of any distributed arrays
  • All processors are synchronized before call to EXTRINSIC function
Allows one to get efficient parallel code where HPF language or compiler inadequate
  • Analogous to calling assembly language from Fortran, Native classes from Java, C from PERL etc.

HTML version of Scripted Foils prepared 26 September 1996

Foil 119 High Performance Fortran HPF2 Changes

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
The original HPF 1.0 omitted some key capabilities which were known to be important but syntax and functionality was unclear in 1993
  • Further experience has shown that HPF compilers have proven to be difficult to write!
The HPF Forum met in 1995-96 and has approved a set of Extensions and Simplifications of HPF
The concept of a base HPF 2.0 and Approved Extensions has been agreed
Note approved extensions (which presumably vendors need not implement) include critical capability for dynamic irregular problems
DYNAMIC REALIGN and REDISTRIBUTE are no longer in base language and are just "appproved extensions"
Suprisingly no parallel I/O capabilities were approved!

HTML version of Scripted Foils prepared 26 September 1996

Foil 120 ON HOME for Computation Placement

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF$ INDEPENDENT
DO i = 1 , n
  • !HPF$ ON HOME( ix(i) )
  • x(i) = y(ix(i)) - y(iy(i))
END DO
This modifies the owner computes rule by specifying that computation will not be performed on processor owning left hand side
ON HOME is an approved extension

HTML version of Scripted Foils prepared 26 September 1996

Foil 121 Reductions in INDEPENDENT DO Loops

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Many applications of INDEPENDENT DO loops do require reductions as they are typically calculating independently quantities but storing results as parts of various averages
e.g. in High Enegry Physics Data Analysis, each measured event can be computed via an INDEPENDENT DO but one wishes to find a particular observable (histogram, scatterplot) which is averaged over each event
Financial modelling is similar
x = 0
!HPF$ INDEPENDENT, NEW(xinc), REDUCTION(x)
do i = 1 , N
  • call sub(i, xinc)
  • x = x + xinc
END DO
xinc is a separate new variable each iteration but result is accumulated into global x

HTML version of Scripted Foils prepared 26 September 1996

Foil 122 Spawning Tasks in HPF

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Task Parallelism is sort of supported in HPF but not clear to me that this is a great idea as better to keep sophisticated task parallelism outside HPF which is really only designed to support data parallelism
!HPF$ TASKING
  • !HPF$ ON HOME(p(1:8))
  • CALL foo(x,y)
  • !HPF$ ON HOME(p(9:16))
  • CALL bar(z)
!HPF$ END
This extends SPMD model with foo running on eight and bar on another processors
Note foo and bar are expected to contain data parallel statements which distribute execution using conventional HPF over 8 processors

HTML version of Scripted Foils prepared 26 September 1996

Foil 123 New Data Mapping Features in HPF 2.0 - I

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
!HPF DISTRIBUTE x( BLOCK(SHADOW = 1 ) )
  • is designed to help compiler by specifying that it should set up a guard ring each side of BLOCK in each processor. In example Guard ring has extent 1
  • If x has dimension 100 and 4 processors, we allocate storage for
  • x(1..26) in Processor 1
  • x(24..51) in Processor 2
  • x(49..76) in Processor 3
  • x(74..100) in Processor 4
!HPF DISTRIBUTE x( BLOCK( /26,24,24,26/ ) )
  • is designed to specify general BLOCK distribution with in above example
  • x(1..26) in Processor 1
  • x(26..50) in Processor 2
  • x(51..76) in Processor 3
  • x(77..100) in Processor 4
!HPF DISTRIBUTE x( INDIRECT(map_array) )
  • is designed to allow an arbitary user array to specify location of x(i)
  • the processor number in map_array(i) would be typically be calculated by your favorite load balancing routine!

HTML version of Scripted Foils prepared 26 September 1996

Foil 124 New Data Mapping Features in HPF 2.0 - II

From HPCC Software Technologies Fall 96 -- Overview and HPF Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 26 September 96. *
Full HTML Index
Distribution is now allowed to Processor Subsets with typical Syntax:
  • !HPF$ PROCESSORS procs(1:np)
  • !HPF$ PROCESSORS b(BLOCK) ONTO procs(1:np/2-1)
Distribution is allowed for Derived Types but can only be done at ONE level
  • TYPE bunch_of_meshes
    • REAL A1(100,100,100), A2(100,100,100)
    • !HPF$ DISTRIBUTE (BLOCK,CYCLIC,*) :: A1,A2
  • END TYPE
  • Another Example shows distribution outside TYPE definition
  • TYPE tree with each node having one hundred children
    • TYPE(tree) , POINTER, DIMENSION(100) :: children
    • !HPF$ DYNAMIC children
    • REAL value
  • END TYPE
  • TYPE(tree) actualtree
  • ALLOCATE ( actualtree%children) allocates POINTERs to children
  • !HPF$ REDISTRIBUTE t%children(BLOCK) distributes (for first time) these pointers

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Feb 22 1998