Full HTML for

Basic foilset DoD HPF Training -- 1. Introduction to HPF

Given by Chuck Koelbel -- Rice University at DoD Training and Others on 1995-98. Foils prepared August 7 98
Outside Index Summary of Material


First in series of Chuck Koelbel on HPF
HPF and its performance
Types of Parallel Computers
Types of Applications
Data Parallelism Message Passing
Why use Compilers

Table of Contents for full HTML of DoD HPF Training -- 1. Introduction to HPF

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 "High Performance Fortran in Practice" tutorial 1. Introduction
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Presented at Supercomputer '95, Mannheim, Germany June 27, 1995
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

2 Support
3 Thanks to
4 High Performance Fortran Background
5 HPF Features
6 Performance of HPF
7 Recent Performance Results ‹ Princeton Ocean Model
8 Recent Performance Results ‹ NAS Parallel Benchmarks
9 For More Information
10 I. Intro. to Data-Parallelism
Outline

11 Parallel Machines
12 Distributed Memory Machines
13 Shared-Memory Machines
14 Distributed Shared Memory Machines
15 Parallel Algorithms
16 Data Parallelism in Algorithms
17 Functional Parallelism in Algorithms
18 Parallel Languages
19 Data-Parallel Languages
20 Data-Parallel Languages, cont.
21 Message-Passing Systems
22 How HPF Is Implemented
23 What Compilation Means
for Programmers

Outside Index Summary of Material



HTML version of Basic Foils prepared August 7 98

Foil 1 "High Performance Fortran in Practice" tutorial 1. Introduction
Presented at SIAM Parallel Processing San Francisco, Feb. 14, 1995
Presented at Supercomputer '95, Mannheim, Germany June 27, 1995
Presented at Supercomputing '95, San Diego, December 4, 1995
Presented at University of Tennessee (short form), Knoxville, March 21, 1996
Presented at Metacenter Regional Alliances, Cornell, May 6, 1996
Presented at Summer of HPF Workshop, Vienna, July 1, 1996
Presented at Institute for Mathematics & its Applications, Minneapolis, September 11-13, 1996
Presented at Corps of Engineers Waterways Experiments Station, Vicksburg, MS, October 30-November 1, 1996
Presented at Supercomputing '96, Pittsburgh, PA, November 17, 1996
Presented at NAVO, Stennis Space Center, MS, Feb 13, 1997
Presented at HPF Users Group (short version), Santa Fe, NM, February 23, 1997
Presented at ASC, Wright-Patterson Air Force Base, OH, March 5, 1997
Parts presented at SC'97, November 17, 1997
Parts presented (slideshow mode) at SC¹97, November 15-21, 1997
Presented at DOD HPC Users Group, June 1, 1998

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Charles Koelbel

HTML version of Basic Foils prepared August 7 98

Foil 2 Support

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Supported in part by
  • National Science Foundation
  • DoD HPCMP CEWES MSRC, Prime Contract DAHC94-96-C-0002
  • DoD HPCMP ASC MSRC, Prime Contract DAHC94-96-C-0003
As always
  • Views, opinions, and/or findings contained in this report are those of the author and should not be contrued as an offical Department of Defense position, policy, or decision unless so designated by other official documentation.

HTML version of Basic Foils prepared August 7 98

Foil 3 Thanks to

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
The High Performance Fortran Forum
The D System Group (Rice University)
The Fortran 90D Group (Syracuse University)
Ken Kennedy (Rice)
David Loveman (DEC)
Piyush Mehrotra (ICASE)
Rob Schreiber (HP Labs)
Guy Steele (Sun)
Mary Zosel (Livermore)

HTML version of Basic Foils prepared August 7 98

Foil 4 High Performance Fortran Background

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Defined by the High Performance Fortran Forum (HPFF) as a portable language for data-parallel computation
History:
  • Proposed at Supercomputing '91 and HPFF Kickoff Meeting
  • Meetings every 6 weeks during 1992 in Dallas
  • Final draft of HPF, version 1.0 in June 1993
  • New meetings in 1994 and 1995 to make corrections, define further requirements
Influences:
  • Industry: CM Fortran, C*, MasPar Fortran, DEC
  • Academia: ADAPT, Fortran D, Kali, Vienna Fortran, CRPC
  • Government: Applications experience, $$$

HTML version of Basic Foils prepared August 7 98

Foil 5 HPF Features

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
All of Fortran 90
FORALL and INDEPENDENT
Data Alignment and Distribution
Miscellaneous Support Operations
But:
  • No parallel I/O
  • No explicit message-passing
  • Little support for irregular computations
  • Little support for non-data parallelism

HTML version of Basic Foils prepared August 7 98

Foil 6 Performance of HPF

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Performance is highly dependent on the compiler and on the nature of the code
  • Best performance is for data-parallel computations
  • Tie ³natural² parallelism in the problem to distributed array dimensions
Commercial compilers are now competitive with MPI for regular problems
  • Subscripts like A(L+1,K-1)
Research continues on irregular problems and task parallelism
  • Subscripts like A(INDX(I1,2))
  • Tree searches with branch-and-bound cutoffs

HTML version of Basic Foils prepared August 7 98

Foil 7 Recent Performance Results ‹ Princeton Ocean Model

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
A full application for ocean modeling
  • Same source code on Cray T3E, SGI Origin 2000, and Compaq Pro 8000
  • 87¥57¥19 grid with (*,BLOCK,BLOCK) distribution
  • Portland Group PGHPF compiler, versions 2.2 and 2.3
  • Contact: Steve Piacsek (piacsek@nrlssc.navy.mil)

HTML version of Basic Foils prepared August 7 98

Foil 8 Recent Performance Results ‹ NAS Parallel Benchmarks

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Well-known compact application benchmarks from NASA
  • Modeling CFD applications
  • Class A (small) and B (large) input data
  • BT, EP, SP benchmarks written purely in HPF
  • FT benchmark calls EXTRINSIC FFT

HTML version of Basic Foils prepared August 7 98

Foil 9 For More Information

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
World Wide Web
These slides
Mailing Lists:
  • Write to: majordomo@cs.rice.edu
  • In message body: subscribe <list-name> <your-id>
  • Lists: hpff, hpff-interpret, hpff-core
Anonymous FTP:
  • Connect to titan.cs.rice.edu
  • Full draft in public/HPFF/draft
  • See public/HPFF/README for latest file list

HTML version of Basic Foils prepared August 7 98

Foil 10 I. Intro. to Data-Parallelism
Outline

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
1. Introduction to Data-Parallelism
2. Fortran 90/95 Features
3. HPF Parallel Features
4. HPF Data Mapping Features
5. Parallel Programming in HPF
6. HPF Version 2.0

HTML version of Basic Foils prepared August 7 98

Foil 11 Parallel Machines

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Parallel computers allow several CPUs to contribute to a computation simultaneously.
For our purposes, a parallel computer has three types of parts:
  • Processors n
  • Memory modules n
  • Communication / synchronization network n
Key points:
  • All processors must be busy for peak speed.
  • Local memory is directly connected to each processor.
  • Accessing local memory is much faster than other memory.
  • Synchronization is expensive, but necessary for correctness.

HTML version of Basic Foils prepared August 7 98

Foil 12 Distributed Memory Machines

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Every processor has a memory others can¹t access.
Advantages:
  • Can be scalable
  • Can hide latency of communication
Disadvantages:
  • Hard to program
  • Program and O/S (and sometimes data) must be replicated

HTML version of Basic Foils prepared August 7 98

Foil 13 Shared-Memory Machines

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
All processors access the same memory.
Advantages:
  • Easy to program (correctly)
  • Can share code and data among processors
Disadvantages:
  • Hard to program (optimally)
  • Not scalable due to bandwidth limitations

HTML version of Basic Foils prepared August 7 98

Foil 14 Distributed Shared Memory Machines

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Combining the advantages of shared and disttributed memory
Lots of hierarchical designs are appearing.
  • Typically, ³nodes² with 4 to 32 processors (see below)
  • Each processor has a local cache
  • Processors within a node access shared memory
  • Nodes can get data from or put data to other nodes¹ memories

HTML version of Basic Foils prepared August 7 98

Foil 15 Parallel Algorithms

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
A parallel algorithm is a collection of tasks and a partial ordering between them.
Design goals:
  • Match tasks to the available processors (exploit parallelism).
  • Minimize ordering (avoid unnecessary synchronizations).
  • Utilize remaining ordering (exploit locality).
Sources of parallelism:
  • Data parallelism: updating array elements simultaneously.
  • Functional parallelism: conceptually different tasks which combine to solve the problem.
  • Speculative parallelism: temporarily ignore partial ordering, repair later if this causes problems.
  • This list is not exhaustive

HTML version of Basic Foils prepared August 7 98

Foil 16 Data Parallelism in Algorithms

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Data-parallel algorithms exploit the parallelism inherent in many large data structures.
  • Many elements in structure ¤ many update operations
  • Example: Red-Black Relaxation
    • All red points can be updated in parallel
Analysis:
  • Scalable parallelism
  • Synchronization and locality may be suspect

HTML version of Basic Foils prepared August 7 98

Foil 17 Functional Parallelism in Algorithms

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Functional parallelism exploits the parallelism between the parts of many systems.
  • Many pieces to work on ¤ many independent operations
  • Example: Aeroelasticity (wing design)
    • CFD and CSM (and others, if need be) can be evaluated in parallel
Analysis:
  • Parallelism limited
  • Synchronization probably good

HTML version of Basic Foils prepared August 7 98

Foil 18 Parallel Languages

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
A parallel language provides an executable notation for implementing a parallel algorithm.
Design criteria:
  • How are parallel operations defined?
    • static tasks vs. dynamic tasks vs. implicit operations
  • How is data shared between tasks?
    • explicit communication/synchronization vs. shared memory
  • How is the language implemented?
    • low-overhead runtime systems vs. optimizing compilers
Usually a language reflects a particular type of parallelism.

HTML version of Basic Foils prepared August 7 98

Foil 19 Data-Parallel Languages

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Data-parallel languages provide an abstract, machine-independent model of parallelism.
  • Fine-grain parallel operations, such as element-wise operations on arrays
  • Shared data in large, global arrays with mapping ³hints²
  • Implicit synchronization between operations
  • Partially explicit communication from operation definitions
Advantages:
  • Global operations conceptually simple
  • Easy to program (particularly for certain scientific applications)
Disadvantages:
  • Unproven compilers

HTML version of Basic Foils prepared August 7 98

Foil 20 Data-Parallel Languages, cont.

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Abstractions like data parallelism split the work between the programmer and the compiler.
Programmer¹s task: Solve the problem in this model.
  • Concentrate on high-level structure and concepts
  • Aggregate operations on large data structures
  • Data in global arrays with mapping information
Compiler¹s task: Map conceptual (massive) parallelism to physical (finite) machine.
  • Fill in the grungy details
  • Guided by user ³hints² like mapping information
  • Optimizing computation and communication

HTML version of Basic Foils prepared August 7 98

Foil 21 Message-Passing Systems

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Program is based on relatively coarse-grain tasks
Separate address space and a processor number for each task
Data shared by explicit messages
  • Point-to-point and collective communications patterns
Examples: MPI, PVM, Occam
Advantages:
  • Close to hardware.
Disadvantages:
  • Many low-level details.

HTML version of Basic Foils prepared August 7 98

Foil 22 How HPF Is Implemented

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Analysis
  • Traditional dataflow and dependence analysis
  • Data mapping analysis
Computation Partitioning
  • Use data mapping info to create locality
  • Transform code to enhance locality
Communication Introduction
  • Communicate (move data) if data mapping and computation partitioning don¹t agree
  • Lots of optimization to minimize/package communication
Code Generation
  • The really ugly details
  • Lots of optimization possible here, too

HTML version of Basic Foils prepared August 7 98

Foil 23 What Compilation Means
for Programmers

From DoD HPF Training -- 1. Introduction to HPF DoD Training and Others -- 1995-98. *
Full HTML Index
Help analysis with assertions
  • ALIGN and DISTRIBUTE
  • INDEPENDENT
Distribute array dimensions that exhibit parallelism
  • Conflicts require complex compilers, REDISTRIBUTE, or new algorithms
Consider communications patterns
  • BLOCK generally good for local stencils and fully-filled arrays
  • CYCLIC and CYCLIC(K) generally good for load balancing and triangular loops
Don't hide what you are doing

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 9 1998