Full HTML for Basic Summary of Working Groups at PAWS and PetaSoft Meetings

Full HTML for

Basic foilset Summary of Working Groups at PAWS and PetaSoft Meetings

Given by Geoffrey C. Fox at PAWS(Mandalay Beach) and PetSoft(Bodega Bay) on April 23 and June 17-19,96. Foils prepared August 4 1996
Outside Index Summary of Material

Summary of Application Working Group heaeded by Fox at April 96 Mandalay Beach PAWS Meeting

PAWS meeting focussed on evaluation of NSF Point Architecture studies on a year 2007 100 Teraflop machine

Summary of PetaSoft Working Group headed by Fox and Chien from june 96 Bodega Bay Meeting

PetaSoft Meeting focussed on Software Issues for PetaFlop Machines

Table of Contents for full HTML of Summary of Working Groups at PAWS and PetaSoft Meetings

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Initial Findings of the Algorithm/Application Working Group -- PAW 96 April 23 1996
2

Overall Suggestions -- I
3

Overall Suggestions - II
4

Other Suggested Point Designs
5

Latency Research Is Needed
6

Geometric Structure of Problems and Computers
7

Memory Hierarchy versus Distribution
8

Needed Algorithm/Application Evaluations
9

Observation on Parallelism -- Burton Smith
10

Some Questions from the Application Group to the Point Designs - I
11

Some Questions from the Application Group to the Point Designs - II
12

Some Questions from the Application Group to the Point Designs - III
13

The Five PetaFlop Kernels - I
14

PetaFlop Kernel:a),b)Unit and Large Stride Vector Fetch and Store
15

PetaFlop Kernel: c). Irregular gather/scatter
16

PetaFlop Kernel: d). 3-D Jacobi kernel, typical of many 3-D physical modeling codes.
17

PetaFlop Kernel: d). 3-D Jacobi kernel -- Actual Code.
18

PetaFlop Kernel: e)Processing of tree-structured data.
19

Application Oriented Software Issues -- April 24,1996
20

Language Related Issues
21

Library and Tool Issues
22

Operating System Issues - I
23

Operating System Issues - II
24

"Initial" Findings of the "Implementation" Subgroup at PetaSoft 96
25

Initial Thoughts I
26

The MPI Program Execution Model
27

The PetaSoft Program Execution Model
28

Initial Thoughts II
29

Further Topics which were not discussed in detail
30

"Final" Findings and Recommendations of the "Implementation" Subgroup
31

Members of Implementation Group
32

Findings 1) and 2) -- Memory Hierarchy
33

Findings 3) and 4) -- Using Memory Hierarchy
34

Findings 5) and 6) -- Layered Software
35

The Layered Software Model
36

Some Examples of a Layered Software System
37

Finding 7) Testbeds
38

Findings 8) and 9) Applications
39

Findings 10) to 14) General Points
40

Recommendations 1) to 3) Memory and Software Hierarchy
41

Recommendations 4) to 6)
42

Requested Capabilities in Hardware Architecture

Outside Index Summary of Material

HTML version of Basic Foils prepared August 4 1996

Foil 1 Initial Findings of the Algorithm/Application Working Group -- PAW 96 April 23 1996

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Based on eight presentations, we could make a clear assessment only on GRAPE.

None of the designs is yet defined in sufficient quantitative detail to verify suitability on algorithms and applications of interest.

Due to the greatly different applications proposed by the various designs, is difficult to compare the systems.

There are some omissions in the proposed point designs -- some other possible designs are worth studying.

HTML version of Basic Foils prepared August 4 1996

Foil 2 Overall Suggestions -- I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Aspects of the rapid march to Pflop/s systems needs to be decoupled from long-term R&D in architectures, software and high-level programming models.

Designs need to expose the low-level execution model to the programmer.

Access to memory is the key design consideration for Pflop/s systems.

Enhanced message passing facilities are needed, including support for latency management, visualization and performance tools.

Expect that initial Petaflop applications will be built using "brute force" and not use "seamless" high level tools
These brute-force experiments will guide high level tool design

HTML version of Basic Foils prepared August 4 1996

Foil 3 Overall Suggestions - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Research is needed in novel algorithms that explore a wider range of the latency/bandwidth/memory space.

Define or speculate on novel applications that may be enabled by Pflops technology.

Engage several broader application communities to identify and quantify the merits of the various Pflop/s designs.

Need to develop a "common" set of applications with which to drive and evaluate various hardware and software approaches

We made a start with five Petaflop kernels

HTML version of Basic Foils prepared August 4 1996

Foil 4 Other Suggested Point Designs

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Superconducting distributed memory

A conventional distributed memory design where each node consists of a superconducting CPU, superconducting memory and conventional DRAM memory.

Distributed shared memory

How far can the emerging distributed shared memory concept can be extended? Can a Pflop/s system be constructed from one or a number of DSM systems?

A more aggressive PIM design

Essentially a general purpose Grape where we use half the silicon for as many "optimal" (250,000 transistor according to Kogge) CPU's that we can

HTML version of Basic Foils prepared August 4 1996

Foil 5 Latency Research Is Needed

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Latency tolerant algorithms will be essential

Today the most extreme latency issues are seen in networks of workstations and other such metacomputers.

Hardware mechanisms for hiding and/or tolerating latency.

Language facilities (such as programmer-directed prefetching) for managing latency.

Need a "hierarchy little language" to describe latency structure

Software tools, such as visualization facilities, for measuring and managing latency.

Performance studies of distributed SMPs to study latency issues.

HTML version of Basic Foils prepared August 4 1996

Foil 6 Geometric Structure of Problems and Computers

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

On some original MPP's (e.g. hypercubes) one worried about nearest neighbor structure of algorithms so could match geometric structure of computers and problems.

This became unnecessary and all that matters on most machines today is data locality

e.g. on machines like IBM SP-2, on processor and off processor memory access times are very different but there is no difference between the different processors once you go off CPU

The year 2005 machines seem different as machine geometry seems relevant especially in PIM designs with physical transit time important as clock speeds increase.

HTML version of Basic Foils prepared August 4 1996

Foil 7 Memory Hierarchy versus Distribution

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Today we see the standard trade-off between

Memory Hierarchy as in a Shared Memory SGI Power Challenge
Distributed Memory as in IBM SP-2
And indeed a merging of these ideas with Distributed Shared Memory

Most people believe that it is easier to manage hierarchy (in an "automatic" fashion as in a compiler) than it is to manage distributed memory

PIM gives us a classic distributed memory system as base design where as most other proposals (including Superconducting) give a shared memory as base design with hierarchy as issue.

Will increased memory hierarchy make its management so much harder so that trade off will change ?
Management of year 2005 distributed memory does not get much harder from systems we are familiar with today?

HTML version of Basic Foils prepared August 4 1996

Foil 8 Needed Algorithm/Application Evaluations

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

It is difficult to compare the various point designs, because the application targets of each individual proposal are so different.

Also these each have rather modest funding and can not be expected to satisfy all our requirements

Research is thus needed to analyze the suitability of these systems for a handful of common grand challenge problems.

Result: a set of quantitative requirements for various classes of systems.

We made a start to this with a set of four "Petaflop kernels"

HTML version of Basic Foils prepared August 4 1996

Foil 9 Observation on Parallelism -- Burton Smith

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Observation on Parallelism

Consider what level of parallelism will be required for the proposed superconducting system:

It will have 1024 processors, each with a clock frequency that is some 4000 times faster than the operational speed of DRAM main memory.

For main memory intensive applications, at least 4000 threads, such as a 4000-long vector of data, must be streaming through each CPU.

This means that the minimum level of parallelism required is 1024 x 4000 = 4 million.

Conclusion: One of the key advantages of a superconducting system will not be met unless large amounts of superconducting memory is included in the system, enough so that the fetches to main memory are relatively infrequent.

HTML version of Basic Foils prepared August 4 1996

Foil 10 Some Questions from the Application Group to the Point Designs - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

1. What is the Estimated cost of your system -- use the SIA estimate that semiconductor hardware will be 25 times cheaper per transistor in the 2006 time frame than it is in 1996. You may take $50 per Mbyte for 1996 memory and $5 per Mflop/s for 1996 sustained RISC processor performance (which roughly scales with transistor count), or else otherwise justify your estimate.

2. What are your Programming and Execution models

e.g.what are targeted programming languages and what features of machine will be exposed to the user.

HTML version of Basic Foils prepared August 4 1996

Foil 11 Some Questions from the Application Group to the Point Designs - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

3. What are your Hardware specifications -- peak performance rates of base processors, sizes and structures of the various levels of the memory hierarchy (including caches), latency and bandwidth of communication links between each level, etc.

4. What are your Synchronization mechanisms; hardware fault tolerance facilities.

5. What are your Mass storage, I/O and visualization facilities.

6. Comment on support for extended precision (i.e., 128-bit and/or higher precision integer and floating-point arithmetic) -- hardware or software.

HTML version of Basic Foils prepared August 4 1996

Foil 12 Some Questions from the Application Group to the Point Designs - III

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

7. Which of the ten key federal grand challenge applications will the proposed system address (aeronautical design, climate modeling, weather forecasting, electromagnetic cross-sections, nuclear stockpile stewardship, astrophysical modeling, drug design, material science, cryptology, molecular biology).

How does this relate to particular applications your point design is using in study
What are key features of your design which affect its utility in application categories and how will point design study evaluate this

8. Discuss your reliance on novel hardware and/or software technology, to be developed by the commercial sector.

Are you evolutionary or revolutionary!

HTML version of Basic Foils prepared August 4 1996

Foil 13 The Five PetaFlop Kernels - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Quantitatively assess the performance of the proposed system on the following five computational problems, based on design parameters and execution model. Justify the analysis.

a) Regular unit stride SAXPY Loop

b) Large Stride Vector Fetch and Store

c) Irregular Gather and Scatter

d)Nearest Neighbor (local) Communication Algorithm

e)Representation and Processing of Tree Structured Data

HTML version of Basic Foils prepared August 4 1996

Foil 14 PetaFlop Kernel:a),b)Unit and Large Stride Vector Fetch and Store

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

In the first three kernels (a) (b) (c), n = 10^8; arrays are presumed appropriately dimensioned.

a). SAXPY loop

do i = 1, n

c(i) = 3.141592653589793 * a(i) + b(i)

enddo

b). Large-stride vector fetch and store

do i = 1, n

b(131*i) = a(131313*i)

enddo

HTML version of Basic Foils prepared August 4 1996

Foil 15 PetaFlop Kernel: c). Irregular gather/scatter

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

! This loop initializes the array idx with

! pseudo-random numbers between 1 and n=10^8.

do i = 1, n

idx(i) = 1 + mod (13131313*i, n)

enddo

! Indexed loop:

do i = 1, n

b(idx(i)) = a(i)

enddo

HTML version of Basic Foils prepared August 4 1996

Foil 16 PetaFlop Kernel: d). 3-D Jacobi kernel, typical of many 3-D physical modeling codes.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

This is a simple 5 deep nested loop over three spatial and two component indices with nearest neighbor spatial structure and "full" matrices for components

This problem is defined with two sets of Parameters

1)Large Grid but not many components:

nc = 5; nx = 1000; ny = 1000; nz = 1000

2)Smaller Grid but more components as if many species: Repeat the above calculation with the parameters

nc = 150; nx = 100; ny = 100; nz = 100

HTML version of Basic Foils prepared August 4 1996

Foil 17 PetaFlop Kernel: d). 3-D Jacobi kernel -- Actual Code.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

do k = 1, nz

do j = 1, ny

do i = 1, nx

do m = 1, nc

do mp = 1, nc

a(i,j,k,m) = u(mp,m) * (b(i+1,j,k,mp) + b(i-1,j,k,mp) &

b(i,j+1,k,mp) + b(i,j-1,k,mp) + b(i,j,k+1,mp) + &

b(i,j,k-1,mp)) + a(i,j,k,m)

enddo

HTML version of Basic Foils prepared August 4 1996

Foil 18 PetaFlop Kernel: e)Processing of tree-structured data.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Consider a tree structure, such as

/ \

o o

/ \

o o

except with 10^4 nodes and an arbitrary structure, with one random integer at each node.

Is this tree a subtree of a similar tree of size 10^9 nodes?
Find the path to the node of the subtree in the large tree.

HTML version of Basic Foils prepared August 4 1996

Foil 19 Application Oriented Software Issues -- April 24,1996

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

The joint meeting of Application and Software Groups listed issues

Language Issues including Hierarchy Little Language
Library and Tool Issues
Operating System Issues

Two classes of Problem

Classic SPMD are in some sense the hardest as they have most restrictive latency and bandwidth requirements
Multidisciplinary applications are a loosely coupled mix of SPMD programs
- Loose coupling stresses functionality but not performance of the software

HTML version of Basic Foils prepared August 4 1996

Foil 20 Language Related Issues

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Need to prefetch irregular data

Move regular and irregular data between levels of memory

The Compiler User Interface needs hints and directives

These need a "Hierarchy Little Language" to express memory heirarchy at all levels

This will all one to get portability by expressing each architecture with a parameterization of its memory structure with such things as cache size etc.
Compare to register instruction in C; SCM and LCM in 7600; distribute in HPF

HTML version of Basic Foils prepared August 4 1996

Foil 21 Library and Tool Issues

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Need (to investigate) Threads and Parallel Compilers for latency tolerance

Need Latency Tolerant BLAS and higher level Capabilities

FFT, Linear Algebra, Adaptive Mesh, Collective Data Movement

Performance should be monitored (with no software overhead) in hardware

Need to incorporate the "Hierarchy Little Language":
"Pablo"-like software gather and visualization

Resource Management in presence of complex memory hierarchy

HTML version of Basic Foils prepared August 4 1996

Foil 22 Operating System Issues - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Need to study separately the issues for

"Micromemory" -- as little as 1 megabyte of memory per processor in PIM
General Hierarchical and Heterogeneous (distributed shared memory) Systems

As always in year 2005 need latency-tolerant Operating Systems!

Unclear message on need for Protection

Need support for multithreading and sophisticated scheduling

Scaling of Concurrent I/O ?

Need to understand this at teraflop performance level first!
Can one use COMA (Cache Only Memory) ideas for I/O

HTML version of Basic Foils prepared August 4 1996

Foil 23 Operating System Issues - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Message Passing

Clearly need buffer management, flow control, protocols etc.
But what is appropriate mechanisms -- need to revisit comparison of active messages versus MPI etc.

Need to be able to restart separate parts of the (large) hardware independently

Need to support performance evaluation with different levels of memory hierarchy exposed

Need Storage management and garbage collection

PIM micromemory different from hierarchical shared memory

Need good job and resource management tools

Checkpoint/restart only needed at natural synchoronization points

HTML version of Basic Foils prepared August 4 1996

Foil 24 "Initial" Findings of the "Implementation" Subgroup at PetaSoft 96

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

June 17-18 1996

Geoffrey Fox, Chair

Andrew Chien, Vice-Chair

HTML version of Basic Foils prepared August 4 1996

Foil 25 Initial Thoughts I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Define a "clean" model for machine architecture

Memory hierarchy including caches and geomterical (distributed) effects

Define a low level "Program Execution Model" (PEM) which allows one to describe movement of information and computation in the machine

This can be thought of as "MPI"/assembly language of the machine

On top of low level PEM, one can build an hierarchical (layered) software model

At the top of this layered software model, one finds objects or Problem Solving Environments (PSE's)
At an intermediate level there is Parallel C C++ or Fortran

One can program at each layer of the software and augment it by "escaping" to a lower level to improve performance

Directives (HPF assertions) and explicit insertion of lower level code (HPF extrinsics) are possible

HTML version of Basic Foils prepared August 4 1996

Foil 26 The MPI Program Execution Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

MPI represents strucure of machines (such as original Caltech Hypercube) with just two levels of memory

HTML version of Basic Foils prepared August 4 1996

Foil 27 The PetaSoft Program Execution Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

This addresses memory hierarchy intra-processor as well as inter-processor data movement

Level 2 Cache

Level 1 Cache

5 Memory Hierarchy levels in each Processor

Data Movement

HTML version of Basic Foils prepared August 4 1996

Foil 28 Initial Thoughts II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

One needs distributed and shared memory constructs in the PEM

One should look at extending HPF directives to refer to memory hierarchy

It is interesting to look at adding directives to high level software systems such as those based on objects

One needs (performance) predictability in lowest level PEM

User control must be possible for any significant caches
Note that as one goes to higher layers in the software model, useability increases and predictability decreases

One needs layered software tools to match layered execution software

Performance Monitoring
Load Balancing -- this should be under user control -- I.e. in runtime and not O/S
Debugging

It is possible that support of existing software (teraApps) may not be emphasis

HTML version of Basic Foils prepared August 4 1996

Foil 29 Further Topics which were not discussed in detail

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

(un)reliability of software which cannot have the extensive testing of a system deployed on a lot of machines

(un)portability of software which is targeted at these special machines

Need for resilience to faults which are inevitable in such large machines

Hardware implications of our software and machine models

Software should be delivered at the same time as hardware!

I/O

Concurrency and Scaling

HTML version of Basic Foils prepared August 4 1996

Foil 30 "Final" Findings and Recommendations of the "Implementation" Subgroup

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

June 19 1996

Geoffrey Fox, Chair

Andrew Chien, Vice-Chair

HTML version of Basic Foils prepared August 4 1996

Foil 31 Members of Implementation Group

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

William Carlson, CCS, wwc@super.org

Andrew Chien, UIUC, achien@cs.uiuc.edu

Geoffrey Fox, Syracuse Univ, gcf@npac.syr.edu

G.R. Gao, Delaware Univ., gao@cs.mcgill.ca

Edwin Sha, Notre Dame, esha@cse.nd.edu

Lennart Johnsson, johnsson@cs.uh.edu

Carl Kesselman, Caltech, carl@compbio.caltech.edu

Piyush Mehrotra, ICASE, pm@icase.edu

David Padua. UIUC, padua@uiuc.edu

Gail Pieper, pieper@mcs.anl.gov

John Salmon, Caltech, johns@cacr.caltech.edu

Vince Schuster, PGI, vinces@pgroup.com

HTML version of Basic Foils prepared August 4 1996

Foil 32 Findings 1) and 2) -- Memory Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

1)Deep Memory Hierarchies present New Challenges to High performance Implementation of programs

Latency
Bandwidth
Capacity

2)There are two dimensions of memory hierarchy management

Geometric or Global Structure
Local (cache) hierachies seen from thread or processor centric view

HTML version of Basic Foils prepared August 4 1996

Foil 33 Findings 3) and 4) -- Using Memory Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

3)One needs a machine "mode" which supports predictable and controllable memory system leading to communication and computation with same characteristics

Allow Compiler optimization
Allow Programmer control and optimization
For instance high performance would often need full program control of caches

4)One needs a low level software layer which provides direct control of the machine (memory hierarchy etc.) by a user program

This for initial users and program tuning

HTML version of Basic Foils prepared August 4 1996

Foil 34 Findings 5) and 6) -- Layered Software

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

5)One needs a layered (hierarchical) software model which supports an efficient use of multiple levels of abstraction in a single program.

Higher levels of Programming model hide extraneous complexity
highest layers are application dependent Problem Solving Environments and lower levels are machine dependent
Lower levels can be accessed for additional performance
e.g. HPF Extrinsics. Gcc ASM, MATLAB Fortran Routines, Native classes in Java

6)One needs a set of software tools which match the layered software (programming model)

Debuggers, Performance and load balancing tools

HTML version of Basic Foils prepared August 4 1996

Foil 35 The Layered Software Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

This is not really a simple stack but a set of complex relations between layers with many interfaces and modules

Interfaces are critical ( for composition across layers)

Enable control and performance for application scientists
Decouple CS system issues and allow exploration and innovation

Enable Next

10000

Users

For First 100

Pioneer Users

Higher Level abstractions

nearer to

application domain

Increasing Machine

detail, control

and management

HTML version of Basic Foils prepared August 4 1996

Foil 36 Some Examples of a Layered Software System

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Application Specific Problem Solving Environment

Coarse Grain Coordination Layer e.g. AVS

Massively Parallel Modules -- such as DAGH HPF F77 C HPC++

Fortran or C plus generic message passing (get,put) and generic memory hierarchy and locality control

Assembly Language plus specific (to architecture) data movement, shared memory and cache control

High

Level

Low

HTML version of Basic Foils prepared August 4 1996

Foil 37 Finding 7) Testbeds

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

7)One needs hardware systems which can be used by system software developers to support software prototyping using emulation and simulation of proposed petacomputers. This will evaluate scaling of the software and reduce risk

These systems must be available early so that working software is delivered at same time as deployed hardware
This prototyping machine must allow one to change all parts of the operating system and so hard to share machine during prototyping

HTML version of Basic Foils prepared August 4 1996

Foil 38 Findings 8) and 9) Applications

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

8)One needs a benchmark set of applications which can be used to evaluate the candidate petacomputer architectures and their software

These applications should capture key features (size, adaptivity etc.) of proposed petaApplications

9)One needs to analyse carefully both the benchmark and other petaApplications to derive requirements on performance, capacity of essential subsystems (I/O, communication, computation including interlayer issues in hierarchy) of petacomputer

HTML version of Basic Foils prepared August 4 1996

Foil 39 Findings 10) to 14) General Points

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

10)The PetaComputer should be designed with a Cross disciplinary team including software application and hardware expertise

11)For the initial "pioneer" users portability and tool/systems software robustness will be less important concerns than performance

12)The broad set of users may require efficient support of current (by then legacy) programming interfaces such as MPI HPF HPC++

13)The petaComputer offers an opportunity to explore new software model unconstrained by legacy interfaces and codes

14)Fault Resilience is essential in a large scale petaComputer

HTML version of Basic Foils prepared August 4 1996

Foil 40 Recommendations 1) to 3) Memory and Software Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

1)Explore issues in design of petaComputer machine models which will support the controllable hierarchical memory systems in a range of important architectures

Research and development in areas of findings 3) and 4)

2)Explore techniques for control of memory hierarchy for petaComputer architectures

Use testbeds

3)Explore issues in designing layered software architectures -- particularly efficient mapping and efficient interfaces to lower levels

Use context of petaflop applications and machines
e.g. HPF is a possible layer while HPF Extrinsics is an interface to a lower (MPI) layer

HTML version of Basic Foils prepared August 4 1996

Foil 41 Recommendations 4) to 6)

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

4)Establish and use testbeds which supports study "at scale" of system software with realistic applications

5)Explore the opportunity to design new software models unconstrained by legacy interfaces

6)Establish model (benchmark) applications for petaflop machines

HTML version of Basic Foils prepared August 4 1996

Foil 42 Requested Capabilities in Hardware Architecture

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *

Full HTML Index

Global Address Space

Can name all levels of hierarchy -- accessible from external subsystem
Cache coherence is not required
Varied granularity of access required

Cheap Synchronization with for instance Tag Bits

Exposed Mechanisms for keeping the processor utilized during long latency operations

Prefetching
Fast context switching
Thread support

To be continued

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999