Full HTML for

Basic foilset Summary of Working Groups at PAWS and PetaSoft Meetings

Given by Geoffrey C. Fox at PAWS(Mandalay Beach) and PetSoft(Bodega Bay) on April 23 and June 17-19,96. Foils prepared August 4 1996
Outside Index Summary of Material


Summary of Application Working Group heaeded by Fox at April 96 Mandalay Beach PAWS Meeting
  • PAWS meeting focussed on evaluation of NSF Point Architecture studies on a year 2007 100 Teraflop machine
Summary of PetaSoft Working Group headed by Fox and Chien from june 96 Bodega Bay Meeting
  • PetaSoft Meeting focussed on Software Issues for PetaFlop Machines

Table of Contents for full HTML of Summary of Working Groups at PAWS and PetaSoft Meetings

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Initial Findings of the Algorithm/Application Working Group -- PAW 96 April 23 1996
2 Overall Suggestions -- I
3 Overall Suggestions - II
4 Other Suggested Point Designs
5 Latency Research Is Needed
6 Geometric Structure of Problems and Computers
7 Memory Hierarchy versus Distribution
8 Needed Algorithm/Application Evaluations
9 Observation on Parallelism -- Burton Smith
10 Some Questions from the Application Group to the Point Designs - I
11 Some Questions from the Application Group to the Point Designs - II
12 Some Questions from the Application Group to the Point Designs - III
13 The Five PetaFlop Kernels - I
14 PetaFlop Kernel:a),b)Unit and Large Stride Vector Fetch and Store
15 PetaFlop Kernel: c). Irregular gather/scatter
16 PetaFlop Kernel: d). 3-D Jacobi kernel, typical of many 3-D physical modeling codes.
17 PetaFlop Kernel: d). 3-D Jacobi kernel -- Actual Code.
18 PetaFlop Kernel: e)Processing of tree-structured data.
19 Application Oriented Software Issues -- April 24,1996
20 Language Related Issues
21 Library and Tool Issues
22 Operating System Issues - I
23 Operating System Issues - II
24 "Initial" Findings of the "Implementation" Subgroup at PetaSoft 96
25 Initial Thoughts I
26 The MPI Program Execution Model
27 The PetaSoft Program Execution Model
28 Initial Thoughts II
29 Further Topics which were not discussed in detail
30 "Final" Findings and Recommendations of the "Implementation" Subgroup
31 Members of Implementation Group
32 Findings 1) and 2) -- Memory Hierarchy
33 Findings 3) and 4) -- Using Memory Hierarchy
34 Findings 5) and 6) -- Layered Software
35 The Layered Software Model
36 Some Examples of a Layered Software System
37 Finding 7) Testbeds
38 Findings 8) and 9) Applications
39 Findings 10) to 14) General Points
40 Recommendations 1) to 3) Memory and Software Hierarchy
41 Recommendations 4) to 6)
42 Requested Capabilities in Hardware Architecture

Outside Index Summary of Material



HTML version of Basic Foils prepared August 4 1996

Foil 1 Initial Findings of the Algorithm/Application Working Group -- PAW 96 April 23 1996

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Based on eight presentations, we could make a clear assessment only on GRAPE.
None of the designs is yet defined in sufficient quantitative detail to verify suitability on algorithms and applications of interest.
Due to the greatly different applications proposed by the various designs, is difficult to compare the systems.
There are some omissions in the proposed point designs -- some other possible designs are worth studying.

HTML version of Basic Foils prepared August 4 1996

Foil 2 Overall Suggestions -- I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Aspects of the rapid march to Pflop/s systems needs to be decoupled from long-term R&D in architectures, software and high-level programming models.
Designs need to expose the low-level execution model to the programmer.
Access to memory is the key design consideration for Pflop/s systems.
Enhanced message passing facilities are needed, including support for latency management, visualization and performance tools.
  • Expect that initial Petaflop applications will be built using "brute force" and not use "seamless" high level tools
  • These brute-force experiments will guide high level tool design

HTML version of Basic Foils prepared August 4 1996

Foil 3 Overall Suggestions - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Research is needed in novel algorithms that explore a wider range of the latency/bandwidth/memory space.
Define or speculate on novel applications that may be enabled by Pflops technology.
Engage several broader application communities to identify and quantify the merits of the various Pflop/s designs.
Need to develop a "common" set of applications with which to drive and evaluate various hardware and software approaches
  • We made a start with five Petaflop kernels

HTML version of Basic Foils prepared August 4 1996

Foil 4 Other Suggested Point Designs

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Superconducting distributed memory
  • A conventional distributed memory design where each node consists of a superconducting CPU, superconducting memory and conventional DRAM memory.
Distributed shared memory
  • How far can the emerging distributed shared memory concept can be extended? Can a Pflop/s system be constructed from one or a number of DSM systems?
A more aggressive PIM design
  • Essentially a general purpose Grape where we use half the silicon for as many "optimal" (250,000 transistor according to Kogge) CPU's that we can

HTML version of Basic Foils prepared August 4 1996

Foil 5 Latency Research Is Needed

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Latency tolerant algorithms will be essential
  • Today the most extreme latency issues are seen in networks of workstations and other such metacomputers.
Hardware mechanisms for hiding and/or tolerating latency.
Language facilities (such as programmer-directed prefetching) for managing latency.
Need a "hierarchy little language" to describe latency structure
Software tools, such as visualization facilities, for measuring and managing latency.
Performance studies of distributed SMPs to study latency issues.

HTML version of Basic Foils prepared August 4 1996

Foil 6 Geometric Structure of Problems and Computers

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
On some original MPP's (e.g. hypercubes) one worried about nearest neighbor structure of algorithms so could match geometric structure of computers and problems.
This became unnecessary and all that matters on most machines today is data locality
  • e.g. on machines like IBM SP-2, on processor and off processor memory access times are very different but there is no difference between the different processors once you go off CPU
The year 2005 machines seem different as machine geometry seems relevant especially in PIM designs with physical transit time important as clock speeds increase.

HTML version of Basic Foils prepared August 4 1996

Foil 7 Memory Hierarchy versus Distribution

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Today we see the standard trade-off between
  • Memory Hierarchy as in a Shared Memory SGI Power Challenge
  • Distributed Memory as in IBM SP-2
  • And indeed a merging of these ideas with Distributed Shared Memory
Most people believe that it is easier to manage hierarchy (in an "automatic" fashion as in a compiler) than it is to manage distributed memory
PIM gives us a classic distributed memory system as base design where as most other proposals (including Superconducting) give a shared memory as base design with hierarchy as issue.
  • Will increased memory hierarchy make its management so much harder so that trade off will change ?
  • Management of year 2005 distributed memory does not get much harder from systems we are familiar with today?

HTML version of Basic Foils prepared August 4 1996

Foil 8 Needed Algorithm/Application Evaluations

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
It is difficult to compare the various point designs, because the application targets of each individual proposal are so different.
  • Also these each have rather modest funding and can not be expected to satisfy all our requirements
Research is thus needed to analyze the suitability of these systems for a handful of common grand challenge problems.
Result: a set of quantitative requirements for various classes of systems.
We made a start to this with a set of four "Petaflop kernels"

HTML version of Basic Foils prepared August 4 1996

Foil 9 Observation on Parallelism -- Burton Smith

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Observation on Parallelism
Consider what level of parallelism will be required for the proposed superconducting system:
It will have 1024 processors, each with a clock frequency that is some 4000 times faster than the operational speed of DRAM main memory.
For main memory intensive applications, at least 4000 threads, such as a 4000-long vector of data, must be streaming through each CPU.
This means that the minimum level of parallelism required is 1024 x 4000 = 4 million.
Conclusion: One of the key advantages of a superconducting system will not be met unless large amounts of superconducting memory is included in the system, enough so that the fetches to main memory are relatively infrequent.

HTML version of Basic Foils prepared August 4 1996

Foil 10 Some Questions from the Application Group to the Point Designs - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
1. What is the Estimated cost of your system -- use the SIA estimate that semiconductor hardware will be 25 times cheaper per transistor in the 2006 time frame than it is in 1996. You may take $50 per Mbyte for 1996 memory and $5 per Mflop/s for 1996 sustained RISC processor performance (which roughly scales with transistor count), or else otherwise justify your estimate.
2. What are your Programming and Execution models
  • e.g.what are targeted programming languages and what features of machine will be exposed to the user.

HTML version of Basic Foils prepared August 4 1996

Foil 11 Some Questions from the Application Group to the Point Designs - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
3. What are your Hardware specifications -- peak performance rates of base processors, sizes and structures of the various levels of the memory hierarchy (including caches), latency and bandwidth of communication links between each level, etc.
4. What are your Synchronization mechanisms; hardware fault tolerance facilities.
5. What are your Mass storage, I/O and visualization facilities.
6. Comment on support for extended precision (i.e., 128-bit and/or higher precision integer and floating-point arithmetic) -- hardware or software.

HTML version of Basic Foils prepared August 4 1996

Foil 12 Some Questions from the Application Group to the Point Designs - III

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
7. Which of the ten key federal grand challenge applications will the proposed system address (aeronautical design, climate modeling, weather forecasting, electromagnetic cross-sections, nuclear stockpile stewardship, astrophysical modeling, drug design, material science, cryptology, molecular biology).
  • How does this relate to particular applications your point design is using in study
  • What are key features of your design which affect its utility in application categories and how will point design study evaluate this
8. Discuss your reliance on novel hardware and/or software technology, to be developed by the commercial sector.
  • Are you evolutionary or revolutionary!

HTML version of Basic Foils prepared August 4 1996

Foil 13 The Five PetaFlop Kernels - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Quantitatively assess the performance of the proposed system on the following five computational problems, based on design parameters and execution model. Justify the analysis.
a) Regular unit stride SAXPY Loop
b) Large Stride Vector Fetch and Store
c) Irregular Gather and Scatter
d)Nearest Neighbor (local) Communication Algorithm
e)Representation and Processing of Tree Structured Data

HTML version of Basic Foils prepared August 4 1996

Foil 14 PetaFlop Kernel:a),b)Unit and Large Stride Vector Fetch and Store

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
In the first three kernels (a) (b) (c), n = 10^8; arrays are presumed appropriately dimensioned.
a). SAXPY loop
do i = 1, n
  • c(i) = 3.141592653589793 * a(i) + b(i)
enddo
b). Large-stride vector fetch and store
do i = 1, n
  • b(131*i) = a(131313*i)
enddo

HTML version of Basic Foils prepared August 4 1996

Foil 15 PetaFlop Kernel: c). Irregular gather/scatter

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
! This loop initializes the array idx with
! pseudo-random numbers between 1 and n=10^8.
do i = 1, n
  • idx(i) = 1 + mod (13131313*i, n)
enddo
! Indexed loop:
do i = 1, n
  • b(idx(i)) = a(i)
enddo

HTML version of Basic Foils prepared August 4 1996

Foil 16 PetaFlop Kernel: d). 3-D Jacobi kernel, typical of many 3-D physical modeling codes.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
This is a simple 5 deep nested loop over three spatial and two component indices with nearest neighbor spatial structure and "full" matrices for components
This problem is defined with two sets of Parameters
1)Large Grid but not many components:
nc = 5; nx = 1000; ny = 1000; nz = 1000
2)Smaller Grid but more components as if many species: Repeat the above calculation with the parameters
nc = 150; nx = 100; ny = 100; nz = 100

HTML version of Basic Foils prepared August 4 1996

Foil 17 PetaFlop Kernel: d). 3-D Jacobi kernel -- Actual Code.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
do k = 1, nz
do j = 1, ny
do i = 1, nx
do m = 1, nc
do mp = 1, nc
a(i,j,k,m) = u(mp,m) * (b(i+1,j,k,mp) + b(i-1,j,k,mp) &
b(i,j+1,k,mp) + b(i,j-1,k,mp) + b(i,j,k+1,mp) + &
b(i,j,k-1,mp)) + a(i,j,k,m)
enddo
enddo
enddo
enddo
enddo

HTML version of Basic Foils prepared August 4 1996

Foil 18 PetaFlop Kernel: e)Processing of tree-structured data.

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Consider a tree structure, such as
o
/ \
o o
/ \
o o
\
o
except with 10^4 nodes and an arbitrary structure, with one random integer at each node.
  • Is this tree a subtree of a similar tree of size 10^9 nodes?
  • Find the path to the node of the subtree in the large tree.

HTML version of Basic Foils prepared August 4 1996

Foil 19 Application Oriented Software Issues -- April 24,1996

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
The joint meeting of Application and Software Groups listed issues
  • Language Issues including Hierarchy Little Language
  • Library and Tool Issues
  • Operating System Issues
Two classes of Problem
  • Classic SPMD are in some sense the hardest as they have most restrictive latency and bandwidth requirements
  • Multidisciplinary applications are a loosely coupled mix of SPMD programs
    • Loose coupling stresses functionality but not performance of the software

HTML version of Basic Foils prepared August 4 1996

Foil 20 Language Related Issues

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Need to prefetch irregular data
Move regular and irregular data between levels of memory
The Compiler User Interface needs hints and directives
These need a "Hierarchy Little Language" to express memory heirarchy at all levels
  • This will all one to get portability by expressing each architecture with a parameterization of its memory structure with such things as cache size etc.
  • Compare to register instruction in C; SCM and LCM in 7600; distribute in HPF

HTML version of Basic Foils prepared August 4 1996

Foil 21 Library and Tool Issues

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Need (to investigate) Threads and Parallel Compilers for latency tolerance
Need Latency Tolerant BLAS and higher level Capabilities
  • FFT, Linear Algebra, Adaptive Mesh, Collective Data Movement
Performance should be monitored (with no software overhead) in hardware
  • Need to incorporate the "Hierarchy Little Language":
  • "Pablo"-like software gather and visualization
Resource Management in presence of complex memory hierarchy

HTML version of Basic Foils prepared August 4 1996

Foil 22 Operating System Issues - I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Need to study separately the issues for
  • "Micromemory" -- as little as 1 megabyte of memory per processor in PIM
  • General Hierarchical and Heterogeneous (distributed shared memory) Systems
As always in year 2005 need latency-tolerant Operating Systems!
Unclear message on need for Protection
Need support for multithreading and sophisticated scheduling
Scaling of Concurrent I/O ?
  • Need to understand this at teraflop performance level first!
  • Can one use COMA (Cache Only Memory) ideas for I/O

HTML version of Basic Foils prepared August 4 1996

Foil 23 Operating System Issues - II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Message Passing
  • Clearly need buffer management, flow control, protocols etc.
  • But what is appropriate mechanisms -- need to revisit comparison of active messages versus MPI etc.
Need to be able to restart separate parts of the (large) hardware independently
Need to support performance evaluation with different levels of memory hierarchy exposed
Need Storage management and garbage collection
  • PIM micromemory different from hierarchical shared memory
Need good job and resource management tools
Checkpoint/restart only needed at natural synchoronization points

HTML version of Basic Foils prepared August 4 1996

Foil 24 "Initial" Findings of the "Implementation" Subgroup at PetaSoft 96

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
June 17-18 1996
Geoffrey Fox, Chair
Andrew Chien, Vice-Chair

HTML version of Basic Foils prepared August 4 1996

Foil 25 Initial Thoughts I

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Define a "clean" model for machine architecture
  • Memory hierarchy including caches and geomterical (distributed) effects
Define a low level "Program Execution Model" (PEM) which allows one to describe movement of information and computation in the machine
  • This can be thought of as "MPI"/assembly language of the machine
On top of low level PEM, one can build an hierarchical (layered) software model
  • At the top of this layered software model, one finds objects or Problem Solving Environments (PSE's)
  • At an intermediate level there is Parallel C C++ or Fortran
One can program at each layer of the software and augment it by "escaping" to a lower level to improve performance
  • Directives (HPF assertions) and explicit insertion of lower level code (HPF extrinsics) are possible

HTML version of Basic Foils prepared August 4 1996

Foil 26 The MPI Program Execution Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
MPI represents strucure of machines (such as original Caltech Hypercube) with just two levels of memory

HTML version of Basic Foils prepared August 4 1996

Foil 27 The PetaSoft Program Execution Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
This addresses memory hierarchy intra-processor as well as inter-processor data movement
Level 2 Cache
Level 1 Cache
5 Memory Hierarchy levels in each Processor
Data Movement

HTML version of Basic Foils prepared August 4 1996

Foil 28 Initial Thoughts II

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
One needs distributed and shared memory constructs in the PEM
One should look at extending HPF directives to refer to memory hierarchy
It is interesting to look at adding directives to high level software systems such as those based on objects
One needs (performance) predictability in lowest level PEM
  • User control must be possible for any significant caches
  • Note that as one goes to higher layers in the software model, useability increases and predictability decreases
One needs layered software tools to match layered execution software
  • Performance Monitoring
  • Load Balancing -- this should be under user control -- I.e. in runtime and not O/S
  • Debugging
It is possible that support of existing software (teraApps) may not be emphasis

HTML version of Basic Foils prepared August 4 1996

Foil 29 Further Topics which were not discussed in detail

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
(un)reliability of software which cannot have the extensive testing of a system deployed on a lot of machines
(un)portability of software which is targeted at these special machines
Need for resilience to faults which are inevitable in such large machines
Hardware implications of our software and machine models
Software should be delivered at the same time as hardware!
I/O
Concurrency and Scaling

HTML version of Basic Foils prepared August 4 1996

Foil 30 "Final" Findings and Recommendations of the "Implementation" Subgroup

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
June 19 1996
Geoffrey Fox, Chair
Andrew Chien, Vice-Chair

HTML version of Basic Foils prepared August 4 1996

Foil 31 Members of Implementation Group

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
William Carlson, CCS, wwc@super.org
Andrew Chien, UIUC, achien@cs.uiuc.edu
Geoffrey Fox, Syracuse Univ, gcf@npac.syr.edu
G.R. Gao, Delaware Univ., gao@cs.mcgill.ca
Edwin Sha, Notre Dame, esha@cse.nd.edu
Lennart Johnsson, johnsson@cs.uh.edu
Carl Kesselman, Caltech, carl@compbio.caltech.edu
Piyush Mehrotra, ICASE, pm@icase.edu
David Padua. UIUC, padua@uiuc.edu
Gail Pieper, pieper@mcs.anl.gov
John Salmon, Caltech, johns@cacr.caltech.edu
Vince Schuster, PGI, vinces@pgroup.com

HTML version of Basic Foils prepared August 4 1996

Foil 32 Findings 1) and 2) -- Memory Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
1)Deep Memory Hierarchies present New Challenges to High performance Implementation of programs
  • Latency
  • Bandwidth
  • Capacity
2)There are two dimensions of memory hierarchy management
  • Geometric or Global Structure
  • Local (cache) hierachies seen from thread or processor centric view

HTML version of Basic Foils prepared August 4 1996

Foil 33 Findings 3) and 4) -- Using Memory Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
3)One needs a machine "mode" which supports predictable and controllable memory system leading to communication and computation with same characteristics
  • Allow Compiler optimization
  • Allow Programmer control and optimization
  • For instance high performance would often need full program control of caches
4)One needs a low level software layer which provides direct control of the machine (memory hierarchy etc.) by a user program
  • This for initial users and program tuning

HTML version of Basic Foils prepared August 4 1996

Foil 34 Findings 5) and 6) -- Layered Software

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
5)One needs a layered (hierarchical) software model which supports an efficient use of multiple levels of abstraction in a single program.
  • Higher levels of Programming model hide extraneous complexity
  • highest layers are application dependent Problem Solving Environments and lower levels are machine dependent
  • Lower levels can be accessed for additional performance
  • e.g. HPF Extrinsics. Gcc ASM, MATLAB Fortran Routines, Native classes in Java
6)One needs a set of software tools which match the layered software (programming model)
  • Debuggers, Performance and load balancing tools

HTML version of Basic Foils prepared August 4 1996

Foil 35 The Layered Software Model

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
This is not really a simple stack but a set of complex relations between layers with many interfaces and modules
Interfaces are critical ( for composition across layers)
  • Enable control and performance for application scientists
  • Decouple CS system issues and allow exploration and innovation
Enable Next
10000
Users
For First 100
Pioneer Users
Higher Level abstractions
nearer to
application domain
Increasing Machine
detail, control
and management

HTML version of Basic Foils prepared August 4 1996

Foil 36 Some Examples of a Layered Software System

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Application Specific Problem Solving Environment
Coarse Grain Coordination Layer e.g. AVS
Massively Parallel Modules -- such as DAGH HPF F77 C HPC++
Fortran or C plus generic message passing (get,put) and generic memory hierarchy and locality control
Assembly Language plus specific (to architecture) data movement, shared memory and cache control
High
Level
Low

HTML version of Basic Foils prepared August 4 1996

Foil 37 Finding 7) Testbeds

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
7)One needs hardware systems which can be used by system software developers to support software prototyping using emulation and simulation of proposed petacomputers. This will evaluate scaling of the software and reduce risk
  • These systems must be available early so that working software is delivered at same time as deployed hardware
  • This prototyping machine must allow one to change all parts of the operating system and so hard to share machine during prototyping

HTML version of Basic Foils prepared August 4 1996

Foil 38 Findings 8) and 9) Applications

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
8)One needs a benchmark set of applications which can be used to evaluate the candidate petacomputer architectures and their software
  • These applications should capture key features (size, adaptivity etc.) of proposed petaApplications
9)One needs to analyse carefully both the benchmark and other petaApplications to derive requirements on performance, capacity of essential subsystems (I/O, communication, computation including interlayer issues in hierarchy) of petacomputer

HTML version of Basic Foils prepared August 4 1996

Foil 39 Findings 10) to 14) General Points

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
10)The PetaComputer should be designed with a Cross disciplinary team including software application and hardware expertise
11)For the initial "pioneer" users portability and tool/systems software robustness will be less important concerns than performance
12)The broad set of users may require efficient support of current (by then legacy) programming interfaces such as MPI HPF HPC++
13)The petaComputer offers an opportunity to explore new software model unconstrained by legacy interfaces and codes
14)Fault Resilience is essential in a large scale petaComputer

HTML version of Basic Foils prepared August 4 1996

Foil 40 Recommendations 1) to 3) Memory and Software Hierarchy

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
1)Explore issues in design of petaComputer machine models which will support the controllable hierarchical memory systems in a range of important architectures
  • Research and development in areas of findings 3) and 4)
2)Explore techniques for control of memory hierarchy for petaComputer architectures
  • Use testbeds
3)Explore issues in designing layered software architectures -- particularly efficient mapping and efficient interfaces to lower levels
  • Use context of petaflop applications and machines
  • e.g. HPF is a possible layer while HPF Extrinsics is an interface to a lower (MPI) layer

HTML version of Basic Foils prepared August 4 1996

Foil 41 Recommendations 4) to 6)

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
4)Establish and use testbeds which supports study "at scale" of system software with realistic applications
5)Explore the opportunity to design new software models unconstrained by legacy interfaces
6)Establish model (benchmark) applications for petaflop machines

HTML version of Basic Foils prepared August 4 1996

Foil 42 Requested Capabilities in Hardware Architecture

From Summary of Working Groups at PAWS and PetaSoft Meetings PAWS(Mandalay Beach) and PetSoft(Bodega Bay) -- April 23 and June 17-19,96. *
Full HTML Index
Global Address Space
  • Can name all levels of hierarchy -- accessible from external subsystem
  • Cache coherence is not required
  • Varied granularity of access required
Cheap Synchronization with for instance Tag Bits
Exposed Mechanisms for keeping the processor utilized during long latency operations
  • Prefetching
  • Fast context switching
  • Thread support
To be continued

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999