Full HTML for Scripted Variety of Foils Used Starting January 97

Full HTML for

Scripted foilset Variety of Foils Used Starting January 97

Given by Geoffrey C. Fox at General on 1997. Foils prepared 26 January 97
Outside Index Summary of Material

This uses material from Paul Smith and Peter Kogge as well as Fox

We describe the "National PetaFlop Study(s)" and what you can expect with or without a specific initiative

We discuss traditional, Processor in Memory, Superconducting, Special Purpose architectures as well as future Quantum Computers!

We survey possible applications, new needs and opportunities for software as well as the technologies and designs for new machines one can expect in the year 2007!

We review findings of studies and structure of a possible initiative

Table of Contents for full HTML of Variety of Foils Used Starting January 97

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

Denote Foils where Image is not available

Indicates Available audio which is lightgreened out if missing

1

Remarks on Petaflop
Technology and National Program
See:
http://www.npac.syr.edu/users/gcf/petaflopjan97
2

Abstract of PetaFlop Presentation Jan 97
3

NSTC/Committee on Computing, Information & Communication Meeting PetaFLOPS Workshops Results September 19, 1996 Dr. Paul H. Smith Department of Energy
4

Contents
5

I. Workshop series... background: PetaFLOPS workshops.
6

I. Workshop series... background: Community & Sponsoring Agencies
7

I. Workshop series... background: Workshops Purposes
8

Title of foil 6
9

Title of foil 7
10

Peak Supercomputer Performance
11

Some Important Trends -- COTS is King!
12

Comments on COTS for Hardware
13

Returning to Today - I
14

Returning to Today - II
15

Overall Remarks on the March to PetaFlops - I
16

Overall Remarks on the March to PetaFlops - II
17

II. Major Findings & Recommendations: Findings.
18

II. Major Findings & Recommendations: Findings.
19

II. Major Findings & Recommendations: Recommendations
20

II. Major Findings & Recommendations: Recommendations
21

PP Presentation
22

III. Key drivers for advanced computational capabilities beyond HPCC. Why PetaFLOPS??
23

III. Key drivers: The State of the Art
24

III. Key drivers: The Need for PetaFLOPS Computing
25

10 Possible PetaFlop Applications
26

Petaflop Performance for Flow in Porous Media?
27

Target Flow in Porous Media Problem (Glimm - Petaflop Workshop)
28

NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications
29

Bodega Bay: Primary Memory
30

Bodega Bay: Secondary Memory
31

Bodega Bay: Aggregate I/O
32

III. Key drivers: Technological Limitations
33

Chip Density Projections to year 2013
34

Clock Speed and I/O Speed in megabytes/sec per pin through year 2013
35

Supercomputer Architectures in Years 2005-2010 -- I
36

Supercomputer Architectures in Years 2005-2010 -- II
37

Supercomputer Architectures in Years 2005-2010 -- III
38

Performance Per Transistor
39

Comparison of Supercomputer Architectures
40

Three Major Markets -- Logic,ASIC,DRAM
41

Chip and Package Characteristics
42

Fabrication Characteristics
43

Electrical Design and Test Metrics
44

Technologies for High Performance Computers
45

Architectures for High Performance Computers - I
46

Architectures for High Performance Computers - II
47

There is no Best Machine!
48

Quantum Computing - I
49

Quantum Computing - II
50

Quantum Computing - III
51

Superconducting Technology -- Past
52

Superconducting Technology -- Present
53

Superconducting Technology -- Problems
54

IV. Architecture Point Designs & SW Design Studies: Point Design Study:
55

IV. Architecture Point Designs & SW Design Studies: 1996 Point Design Awards
56

The 8 NSF Point Designs
57

Architecture of MORPH NSF Petaflop Point Study
58

Architecture of I-ACOMA Petaflop Point Study
59

Some MetaComputer Systems
60

The GRAPE N-Body Machine
61

GRAPE architecture in NSF Petaflop Point Study
62

GRAPE Processing Unit in NSF Petaflop Point Study
63

Why isn't GRAPE a Perfect Solution?
64

Current PIM Chips
65

New "Strawman" PIM Processing Node Macro
66

"Strawman" Chip Floorplan
67

SIA-Based PIM Chip Projections
68

Superconducting Architecture in NSF Petaflop Point Study
69

IV. Architecture Point Designs & SW Design Studies: Architectural Framework
70

IV. Architecture Point Designs & SW Design Studies: Key SW Development Areas
71

IV. Architecture Point Designs & SW Design Studies: Software Implementation Strategy
72

Some Key Observations on PetaSoft Software
74

Time for a Software Revolution?
75

Architectural Framework from PetaSoft Meeting
76

Hierarchy from Application to Complex Computer
77

The Current HPCC Program Execution Model (PEM) illustratrated by MPI/HPF
78

The PetaSoft Program Execution Model
79

Some Examples of a Layered Software System
80

Features of JNAC Software Implementation Strategy
81

Role of The Architecture Review Board
82

The Five Key JNAC Software Development Areas
83

Examples of Machine Specific Software
84

Examples of Operating System Services I
85

Examples of Operating System Services II
86

General Philosophy from PetaSoft Meeting
87

Features of the The Layered Software Model
88

PetaSoft Findings 1) and 2) -- Memory Hierarchy
89

PetaSoft Findings 3) and 4) -- Using Memory Hierarchy
90

PetaSoft Findings 5) and 6) -- Layered Software
91

PetaSoft Recommendations 1) to 3) Memory and Software Hierarchy
92

III. Key drivers: Summary
93

V. A National program concept: Basis
94

V. A National program concept: Scope & Strategy
95

V. A National program concept: Technology Projection Model
96

V. A National program concept: Structure & Flow
97

V. A National program concept: Technology Model
98

V. A National program concept: Research Projects - Technology
99

V. A National program concept: Research Projects - Architecture
100

V. A National program concept: Research Projects - System Software
101

V. A National program concept: Research Projects - Applications & Algorithms
102

V. A National focused program concept: Early Program Milestones
103

Now we follow with A Comparison of JNAC and HPCC
104

Comparison of HPCC and JNAC - I
105

Comparison of HPCC and JNAC - II
106

Comparison of HPCC and JNAC - III
107

VI. Future actions necessary to mold an R&D program: The Message
108

VI. Future actions necessary to mold an R&D program: Near Term Recommendation
109

VI. Future actions necessary to mold an R&D program: Next Steps
110

VI. Future actions necessary to mold an R&D program: PetaFLOPS Algorithms Workshop (PAL`97)
111

VI. Future actions : Next Steps Integrate into Federal R&D Planning

Outside Index Summary of Material

HTML version of Scripted Foils prepared 26 January 97

Foil 1 Remarks on Petaflop
Technology and National Program
See:
http://www.npac.syr.edu/users/gcf/petaflopjan97

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Geoffrey Fox

Syracuse University

111 College Place

Syracuse

New York 13244-4100

HTML version of Scripted Foils prepared 26 January 97

Foil 2 Abstract of PetaFlop Presentation Jan 97

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

This uses material from Paul Smith and Peter Kogge as well as Fox

We describe the "National PetaFlop Study(s)" and what you can expect with or without a specific initiative

We discuss traditional, Processor in Memory, Superconducting, Special Purpose architectures as well as future Quantum Computers!

We survey possible applications, new needs and opportunities for software as well as the technologies and designs for new machines one can expect in the year 2007!

We review findings of studies and structure of a possible initiative

HTML version of Scripted Foils prepared 26 January 97

Foil 3 NSTC/Committee on Computing, Information & Communication Meeting PetaFLOPS Workshops Results September 19, 1996 Dr. Paul H. Smith Department of Energy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 4 Contents

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

I. Workshop series... background.

II. Major findings & recommendations from the PetaFLOPS workshops.

III. Key drivers for advanced computational capabilities beyond HPCC.

IV. PetaFLOPS Architecture Point Designs & SW Design Studies.

V. A National program concept.

VI. Future actions to mold an R&D program.

HTML version of Scripted Foils prepared 26 January 97

Foil 5 I. Workshop series... background: PetaFLOPS workshops.

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

PetaFLOPS I

January 1994 in Pasadena, CA.
60+ experts.
Set tone for subsequent Workshops.

PetaFLOPS Bodega Bay Summer Study

August 1995

PetaFLOPS Architecture Workshop, PAWS'96

April 1996

PetaSOFT'96

June 1996

HTML version of Scripted Foils prepared 26 January 97

Foil 6 I. Workshop series... background: Community & Sponsoring Agencies

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Sponsoring Agencies

NASA

NSF

DOE

DARPA

NSA

BMDO

Private

sector

Academic

Federal

National

Laboratories

HTML version of Scripted Foils prepared 26 January 97

Foil 7 I. Workshop series... background: Workshops Purposes

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

To identify immediate & future applications.

To provide standard base (PetaFLOPS I) to measure advances in PetaFLOPS R&D.

To identify critical enabling technologies.

To assist technology directors to plan for future programs beyond HPCC.

HTML version of Scripted Foils prepared 26 January 97

Foil 8 Title of foil 6

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

I. Workshop series... background: Coordinating Chairs

Dr. Paul H. Smith.....................................................................General

Special Assistant, Advanced Computing Technology

U.S.. Department of Energy

Dr. David Bailey .....................................................................Algorithms

NASA/Ames Research Center

Dr. Ian Foster ...........................................................................Software

Division of Mathematics and Computer Science

Argonne National Laboratory

Prof.. Geoffrey Fox ............................................................................................Architecture

Departments of Physics & Computer Science

Prof.. Peter Kogge ...................................................................Architecture

McCourtney Professor of Computer Science & Engineering

University of Notre Dame

HTML version of Scripted Foils prepared 26 January 97

Foil 9 Title of foil 7

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

I. Workshop series... background: Coordinating Chairs

Prof.. Sidney Karin .......................................................................General

Director for Advanced Computational Science & Engineering

University of California, San Diego

Dr Paul Messina ...........................................................................PetaFLOPS-I

Director, Center for Advanced Computing

California Institute of Technology

Dr. Thomas Sterling .....................................................................Architecture

Senior Scientist

Jet Propulsion Laboratory

Dr. Rick Stevens ..........................................................................Applications

Director, Mathematics & Computer Science Division

Argonne National Laboratory

Dr. John Van Rosendale ..............................................................Point Design

Division of Advanced Scientific Computing

National Science Foundation

HTML version of Scripted Foils prepared 26 January 97

Foil 10 Peak Supercomputer Performance

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

For "Convential" MPP/Distributed Shared Memory Architecture

Now(1996) Peak is 0.1 to 0.2 Teraflops in Production Centers

Note both SGI and IBM are changing architectures:
IBM Distributed Memory to Distributed Shared Memory
SGI Shared Memory to Distributed Shared Memory

In 1999, one will see production 1 Teraflop systems

In 2003, one will see production 10 Teraflop Systems

In 2007, one will see production 50-100 Teraflop Systems

Memory is Roughly 0.25 to 1 Terabyte per 1 Teraflop

If you are lucky/work hard: Realized performance is 30% of Peak

HTML version of Scripted Foils prepared 26 January 97

Foil 11 Some Important Trends -- COTS is King!

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Everybody now believes in COTS -- Consumer On the Shelf Technology -- one must use commercial building blocks for any specialized system whether it be a DoD weopens program or high end Supercomputer

These are both Niche Markets!

COTS for hardware can be applied to a greater or less extent

Gordon Bell's SNAP system says we will only have ATM networks of PC's running WindowsNT
SGI HP and IBM will take commodity processor nodes but link with custom switches (with different versions of distributed shared memory support)

COTS for Software is less common but (I expect) to become much more common

HPF producing HTTP not MPI with Java visulalization is an example of Software COTS

HTML version of Scripted Foils prepared 26 January 97

Foil 12 Comments on COTS for Hardware

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Currently MPP's have COTS processors and specialized networks but this could reverse

Pervasive ATM will indeed lead to COTS Networks BUT
Current microprocessors are roughly near optimal in terms of megaflops per square meter of silicon BUT
As (explicit) parallelism shunned by modern microprocessor, silicon is used for wasteful speculative execution with expectation that future systems will move to 8 way functional parallelism.

Thus estimate that 250,000 transistors (excluding on chip cache) is optimal for performance per square mm of silicon

Modern microprocessor is around ten times this size

Again simplicity is optimal but this requires parallelism

Contrary trend is that memory dominates use of silicon and so performance per square mm of silicon is often not relevant

HTML version of Scripted Foils prepared 26 January 97

Foil 13 Returning to Today - I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Tightly Coupled MPP's were (SP2,Paragon,CM5 etc) distributed memory but at least at the low end they are becoming hardware assisted shared memory

Unclear how well compilers will support this in a scaling fashion -- we will see how SGI/Cray systems based ideas pioneered at Stanford fair!

Note this is an example of COTS at work -- SGI/Sun/.. Symmetric Multiprocessors (Power Challenge from SGI) attractive as bus will support upto 16 processors in elegant shared memory software world.

Previously such systems were not pwerful enough to be interesting

Clustering such SGI Power Challenge like systems produces a powerful but difficult to program (as both distributed and shared memory) heterogeneous system

Meanwhile Tera Computer will offer a true Uniform Memory access shared memory using ingenious multi threaded software/hardware to hide latency

Unclear if this competitive in cost/performance with (scruffy) COTS approach

HTML version of Scripted Foils prepared 26 January 97

Foil 14 Returning to Today - II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Trend I -- Hardware Assisted Tightly Coupled Shared Memory MPP's are replacing pure distributed memory systems

Trend II -- The World Wide Web and increasing power of individual workstations is making geographically distributed heterogeneous distributed memory systems more attractive

Trend III -- To confuse the issue, the technology trends in next ten years suggest yet different architecture(s) such as PIM

Better use Scalable Portable Software with conflicting technology/architecture trends BUT must address latency agenda which isn't clearly portable!

HTML version of Scripted Foils prepared 26 January 97

Foil 15 Overall Remarks on the March to PetaFlops - I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

I find study interesting not only in its result but also in its methodology of several intense workshops combined with general discussions at national conferences

Exotic technologies such as "DNA Computing" and Quantum Computing do not seem relevant on this timescale

Note clock speeds will NOT improve much in the future but density of chips will continue to improve at roughly the current exponential rate over next 10-20 years

Superconducting technology is currently seriously limited by no appropriate memory technology that matches factor of 100-1000 faster CPU processing

Current project views software as perhaps the hardest problem

HTML version of Scripted Foils prepared 26 January 97

Foil 16 Overall Remarks on the March to PetaFlops - II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

All proposed designs have VERY deep memory hierarchies which are a challenge to algorithms, compilers and even communication subsystems

Major need for hig-end performance computers comes from government (both civilian and military) applications

DoE ASCI (study of aging of nuclear weopens) and Weather/Climate prediction are two examples

Government must develop systems using commercial suppliers but NOT relying on traditionasl industry applications to motivate

So Currently Petaflop initiative is thought of as an applied development project whereas HPCC was mainly a research endeavour

HTML version of Scripted Foils prepared 26 January 97

Foil 17 II. Major Findings & Recommendations: Findings.

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

PetaFLOPS possible; accelerate goals to 10 years.

Many important application drivers exist.

Memory dominant implementation factor.

Cost, power & efficiency dominate.

Innovation critical, new technology necessary.

Layered SW architecture mandatory.

Opportunities for immediate SW effort.

HTML version of Scripted Foils prepared 26 January 97

Foil 18 II. Major Findings & Recommendations: Findings.

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

New technology means paradigm shift.

Superconductivity technology is example.

Memory bandwidth.

Latency.

Software important.

Closer relationship between architecture and programming is needed.

Role of algorithms must improve.

HTML version of Scripted Foils prepared 26 January 97

Foil 19 II. Major Findings & Recommendations: Recommendations

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Conduct point design studies.

in hardware and software.
of promising architecture

Develop engineering prototypes.

Multiple technology track demonstrations

Start SW now, independent of HW.

Develop layered software architecture for scalability and code reuse

Explore algorithms for special purpose & reconfigurable structures.

HTML version of Scripted Foils prepared 26 January 97

Foil 20 II. Major Findings & Recommendations: Recommendations

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Support & accelerate R&D in paradigm shift technologies:

Superconductor RSFQ
Holographic photo-refractive storage
Optical guided and free-space interconnect
New semiconductor materials.

Perform detailed applications studies at scale.

Develop petaFLOPS scale latency management.

HTML version of Scripted Foils prepared 26 January 97

Foil 21 PP Presentation

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Nation's experts participated:

Academia
Private Sector
Federal Government

Strong need for computing at the high end.

PetaFLOPS levels of performance are feasible.

Preliminary set of goals for the next decade formulated with a PetaFLOPS system as the end product.

II. Major Findings & Recommendations:

Workshops Summaries.

HTML version of Scripted Foils prepared 26 January 97

Foil 22 III. Key drivers for advanced computational capabilities beyond HPCC. Why PetaFLOPS??

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

There are compelling applications that need that level of performance.

PetaFLOPS levels of performance are feasible, but substantial research is needed.

Private sector is not going to do it alone.

HTML version of Scripted Foils prepared 26 January 97

Foil 23 III. Key drivers: The State of the Art

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

TeraFLOPS machine architecture in hand.

Programming still is explicit message passing.

TeraFLOPS applications are coarse grain

Latency management not showstopper for TeraFLOPS.

Operating systems and tools provide relatively little support for the users

Parallelism has to be managed explicitly

HTML version of Scripted Foils prepared 26 January 97

Foil 24 III. Key drivers: The Need for PetaFLOPS Computing

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Applications that require petaFLOPS can already be identified

(DOE) Nuclear weapons stewardship
(NSA) Cryptology and digital signal processing
(NASA and NSF) Satellite data assimilation and climate modeling

The need for ever greater computing power will remain.

PetaFLOPS systems are right step for the next decade

HTML version of Scripted Foils prepared 26 January 97

Foil 25 10 Possible PetaFlop Applications

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Nuclear Weopens Stewardship (ASCI)

Cryptology and Digital Signal Processing

Satellite Data Analysis

Climate and Environmental Modeling

3-D Protein Molecule Reconstruction

Real-Time Medical Imaging

Severe Storm Forecasting

Design of Advanced Aircraft

DNA Sequence Matching

Molecular Simulations for nanotechnology

Large Scale Economic Modelling

Intelligent Planetary Spacecraft

HTML version of Scripted Foils prepared 26 January 97

Foil 26 Petaflop Performance for Flow in Porous Media?

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Why does one need a petaflop (1015 operations per second) computer?

These are problems where quite viscous (oil, pollutants) liquids percolate through the ground

Very sensitive to details of material

Most important problems are already solved at some level, but most solutions are insufficient and need improvement in various respects:

under resolution of solution details, averaging of local variations and under representation of physical details
rapid solutions to allow efficient exploration of system parameters
robust and automated solution, to allow integration of results in high level decision, design and control functions
inverse problems (history match) to reconstruct missing data require multiple solutions of the direct problem

HTML version of Scripted Foils prepared 26 January 97

Foil 27 Target Flow in Porous Media Problem (Glimm - Petaflop Workshop)

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Oil Resevoir Simulation

Geological variation occurs down to pore size of rock - almost 10-6 metres - model this (statistically)

Want to calculate flow between wells which are about 400 metres apart

103x103x102 = 108 grid elements

30 species

104 time steps

300 separate cases need to be considered

3x109 words of memory per case

1012 words total if all cases considered in parallel

1019 floating point operation

3 hours on a petaflop computer

HTML version of Scripted Foils prepared 26 January 97

Foil 28 NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 29 Bodega Bay: Primary Memory

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 30 Bodega Bay: Secondary Memory

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 31 Bodega Bay: Aggregate I/O

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 32 III. Key drivers: Technological Limitations

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Semiconductor component technology

feature size, other issues will finally put us in a regime in which Moore's law no longer holds

Architecture

levels of parallelism must be used
memory hierarchy due to increased processor speeds

System software

latency management
efficient handling of
- millions of concurrent threads
- thousands of I/O devices

HTML version of Scripted Foils prepared 26 January 97

Foil 33 Chip Density Projections to year 2013

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Extrapolated from SIA Projections to year 2007 -- See Chapter 6 of Petaflops Report -- July 94

HTML version of Scripted Foils prepared 26 January 97

Foil 34 Clock Speed and I/O Speed in megabytes/sec per pin through year 2013

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Extrapolated from SIA Projections to year 2007 -- See Chapter 6 of Petaflops Report -- July 94

HTML version of Scripted Foils prepared 26 January 97

Foil 35 Supercomputer Architectures in Years 2005-2010 -- I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Conventional (Distributed Shared Memory) Silcon

Clock Speed 1GHz
4 eight way parallel Complex RISC nodes per chip
4000 Processing chips gives over 100 tera(fl)ops
8000 2 Gigabyte DRAM gives 16 Terabytes Memory

Note Memory per Flop is much less than one to one

Natural scaling says time steps decrease at same rate as spatial intervals and so memory needed goes like (FLOPS in Gigaflops)**.75

If One Gigaflop requires One Gigabyte of memory (Or is it one Teraflop that needs one Terabyte?)

HTML version of Scripted Foils prepared 26 January 97

Foil 36 Supercomputer Architectures in Years 2005-2010 -- II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Superconducting Technology is promising but can it compete with silicon juggernaut?

Should be able to build a simple 200 Ghz Superconducting CPU with modest superconducting caches (around 32 Kilobytes)

Must use same DRAM technology as for silicon CPU ?

So tremendous challenge to build latency tolerant algorithms (as over a factor of 100 difference in CPU and memory speed) but advantage of factor 30-100 less parallelism needed

HTML version of Scripted Foils prepared 26 January 97

Foil 37 Supercomputer Architectures in Years 2005-2010 -- III

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Processor in Memory (PIM) Architecture is follow on to J machine (MIT) Execube (IBM -- Peter Kogge) Mosaic (Seitz)

More Interesting in 2007 as processors are be "real" and have nontrivial amount of memory
Naturally fetch a complete row (column) of memory at each access - perhaps 1024 bits

One could take in year 2007 each two gigabyte memory chip and alternatively build as a mosaic of

One Gigabyte of Memory
1000 250,000 transistor simple CPU's running at 1 Gigaflop each and each with one megabyte of on chip memory

12000 chips (Same amount of Silicon as in first design but perhaps more power) gives:

12 Terabytes of Memory
12 Petaflops performance
This design "extrapolates" specialized DSP's , the GRAPE (specialized teraflop N body machine) etc to a "somewhat specialized" system with a general CPU but a special memory poor architecture with particular 2/3D layout

HTML version of Scripted Foils prepared 26 January 97

Foil 38 Performance Per Transistor

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Performance data from uP vendors

Transistor count excludes on-chip caches

Performance normalized by clock rate

Conclusion: Simplest is best! (250K Transistor CPU)

Millions of Transistors (CPU)

Normalized SPECINTS

Normalized SPECFLTS

HTML version of Scripted Foils prepared 26 January 97

Foil 39 Comparison of Supercomputer Architectures

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Fixing 10-20 Terabytes of Memory, we can get

16000 way parallel natural evolution of today's machines with various architectures from distributed shared memory to clustered heirarchy

Peak Performance is 150 Teraflops with memory systems like today but worse with more levels of cache

5000 way parallel Superconducting system with 1 Petaflop performance but terrible imbalance between CPU and memory speeds

12 million way parallel PIM system with 12 petaflop performance and "distributed memory architecture" as off chip access with have serious penalities

There are many hybrid and intermediate choices -- these are extreme examples of "pure" architectures

HTML version of Scripted Foils prepared 26 January 97

Foil 40 Three Major Markets -- Logic,ASIC,DRAM

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Secs 47

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

L=Logic, D=DRAM, A=ASIC, mP = microprocessor

HTML version of Scripted Foils prepared 26 January 97

Foil 41 Chip and Package Characteristics

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Secs 37

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 42 Fabrication Characteristics

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Secs 25

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 43 Electrical Design and Test Metrics

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Secs 34

Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 44 Technologies for High Performance Computers

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

We can choose technology and architecture separately in designing our high performance system

Technology is like choosing ants people or tanks as basic units in our society analogy

or less frivolously neurons or brains

In HPCC arena, we can distinguish current technologies

COTS (Consumer off the shelf) Microprocessors
Custom node computer architectures
More generally these are all CMOS technologies

Near term technology choices include

Gallium Arsenide or Superconducting materials as opposed to Silicon
These are faster by a factor of 2 (GaAs) to 300 (Superconducting)

Further term technology choices include

DNA (Chemical) or Quantum technologies

It will cost $40 Billion for next industry investment in CMOS plants and this huge investment makes it hard for new technologies to "break in"

HTML version of Scripted Foils prepared 26 January 97

Foil 45 Architectures for High Performance Computers - I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Architecture is equivalent to organization or design in society analogy

Different models for society (Capitalism etc.) or different types of groupings in a given society
Businesses or Armies are more precisely controlled/organized than a crowd at the State Fair
We will generalize this to formal (army) and informal (crowds) organizations

We can distinguish formal and informal parallel computers

Informal parallel computers are typically "metacomputers"

i.e. a bunch of computers sitting on a department network

HTML version of Scripted Foils prepared 26 January 97

Foil 46 Architectures for High Performance Computers - II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Metacomputers are a very important trend which uses similar software and algorithms to conventional "MPP's" but have typically less optimized parameters

In particular network latency is higher and bandwidth is lower for an informal HPC
Latency is time for zero length communication -- start up time

Formal high performance computers are the classic (basic) object of study and are

"closely coupled" specially designed collections of compute nodes which have (in principle) been carefully optimized and balanced in the areas of

Processor (computer) nodes
Communication (internal) Network
Linkage of Memory and Processors
I/O (external network) capabilities
Overall Control or Synchronization Structure

HTML version of Scripted Foils prepared 26 January 97

Foil 47 There is no Best Machine!

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

In society, we see a rich set of technologies and architectures

Ant Hills
Brains as bunch of neurons
Cities as informal bunch of people
Armies as formal collections of people

With several different communication mechanisms with different trade-offs

One can walk -- low latency, low bandwidth
Go by car -- high latency (especially if can't park), reasonable bandwidth
Go by air -- higher latency and bandwidth than car
Phone -- High speed at long distance but can only communicate modest material (low capacity)

HTML version of Scripted Foils prepared 26 January 97

Foil 48 Quantum Computing - I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Quantum-Mechanical Computers by Seth Lloyd, Scientific American, Oct 95

Chapter 6 of The Feynman Lectures on Computation edited by Tony Hey and Robin Allen, Addison-Wesley, 1996

Quantum Computing: Dream or Nightmare? Haroche and Raimond, Physics Today, August 96 page 51

Basically any physical system can "compute" as one "just" needs a system that gives answers that depend on inputs and all physical systems have this property

Thus one can build "superconducting" "DNA" or "Quantum" computers exploiting respectively superconducting molecular or quantum mechanical rules

HTML version of Scripted Foils prepared 26 January 97

Foil 49 Quantum Computing - II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

For a "new technology" computer to be useful, one needs to be able to

conveniently prepare inputs,
conveniently program,
reliably produce answer (quicker than other techniques), and
conveniently read out answer

Conventional computers are built around bit ( taking values 0 or 1) manipulation

One can build arbitarily complex arithmetic if have some way of implementing NOT and AND

Quantum Systems naturally represent bits

A spin (of say an electron or proton) is either up or down
A hydrogen atom is either in lowest or (first) excited state etc.

HTML version of Scripted Foils prepared 26 January 97

Foil 50 Quantum Computing - III

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Interactions between quantum systems can cause "spin-flips" or state transitions and so implement arithmetic

Incident photons can "read" state of system and so give I/O capabilities

Quantum "bits" called qubits have another property as one has not only

State |0> and state |1> but also
Coherent states such as .7071*(|0> + |1>) which are equally in either state

Lloyd describes how such coherent states provide new types of computing capabilities

Natural random number as measuring state of qubit gives answer 0 or 1 randomly with equal probability
As Feynman suggests, qubit based computers are natural for large scale simulation of quantum physical systems -- this is "just" analog computing

HTML version of Scripted Foils prepared 26 January 97

Foil 51 Superconducting Technology -- Past

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Superconductors produce wonderful "wires" which transmit picosecond (10^-12 seconds) pulses at near speed of light

Superconducting is lower power and faster than diffusive electron transmission in CMOS
At about 0.35micron chip feature size, CMOS transmission time changes from domination by transmission (Distance) issues to resistive (diffusive effects)

Niobium used in constructing such superconducting circuits can be processed by similar fabrication techniques to CMOS

Josephson Junctions allow picosecond performance switches

BUT IBM (!969-1983) and Japan (MITI 1981-90) terminated major efforts in this area

HTML version of Scripted Foils prepared 26 January 97

Foil 52 Superconducting Technology -- Present

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

New ideas have resurrected this concept using RSFQ -- Rapid Single Flux Quantum -- approach

This naturally gives a bit which is 0 or 1 (or in fact n units!)

This gives interesting circuits of similar structure to CMOS systems but with a clock speed of order 100-300GHz -- factor of 100 better than CMOS which will asymptote at around 1 GHz (= one nanosecond cycle time)

HTML version of Scripted Foils prepared 26 January 97

Foil 53 Superconducting Technology -- Problems

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

At least two major problems:

Semiconductor industry will invest some some $40B in CMOS "plants" and infrastructure

Currently perhaps $100M a year going into superconducting circuit area!
How do we "bootstrap" superconducting industry?

Cannot build memory to match CPU speed and current designs have superconducting CPU's (with perhaps 256 Kbytes superconducting memory per processor) but conventional CMOS memory

So compared with current computers have a thousand times faster CPU, factor of four smaller cache of CPU speed and same speed basic memory as now
Can such machines perform well -- need new algorithms?
Can one design new superconducting memories?

Superconducting technology also has a bad "name" due to IBM termination!

HTML version of Scripted Foils prepared 26 January 97

Foil 54 IV. Architecture Point Designs & SW Design Studies: Point Design Study:

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Sponsored by NSF, DARPA and NASA

Eight awards made of $100,000 each

6 month study of architecture /

SW environment / algorithms

HTML version of Scripted Foils prepared 26 January 97

Foil 55 IV. Architecture Point Designs & SW Design Studies: 1996 Point Design Awards

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Reconfigurable OO architecture

Processor in memory architecture

Algorithmic focus

Hierarchical design

Aggressive cache only architecture

Architecture for N-body problems

Single quantum flux superconducting design

Optical interconnect

HTML version of Scripted Foils prepared 26 January 97

Foil 56 The 8 NSF Point Designs

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Relatively Conventional but still Innovative!

Andrew Chien et al. (Illinois) Reconfiguarable MORPH Architecture
Torrellas, Padua (Illinois) Aggressive Cache-Only Memory Architecture Multiprocessor (I-Acoma)
Ziavras et al. (NJIT, Wayne State) Optical Interconnect with Conventional Processors and Memory
Kumar and Sameh (Minnesota) Focus on Algorithms for Hybrid Systems (Clusters of clusters of deep memory hierarchies)
Fortes and Taylor(Purdue/Northwestern)Application focus on using Hierarchical Processors and Memory Architecture

Special Purpose Systems

McMillan, Hut et al. (Drexel, Princeton, Tokyo, Illinois) Special Purpose GRAPE System for N body Problems

Architecture Innovation (Perhaps Special Purpose)

Kogge et al. (Notre Dame) Processor in Memory (PIM) Technology Point Design

Radical Technology Innovation (Superconducting Processors)

Sterling et al. (Caltech, SUNY-SB,McGill) HTMT: Hybrid Technology Multi-Threaded Architecture

HTML version of Scripted Foils prepared 26 January 97

Foil 57 Architecture of MORPH NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 58 Architecture of I-ACOMA Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 59 Some MetaComputer Systems

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Cluster of workstations or PC's

Heterogeneous MetaComputer System

HTML version of Scripted Foils prepared 26 January 97

Foil 60 The GRAPE N-Body Machine

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

N body problems (e.g. Newton's laws for one million stars in a globular cluster) can have succesful special purpose devices

See GRAPE (GRAvity PipE) machine (Sugimoto et al. Nature 345 page 90,1990)

Essential reason is that such problems need much less memory per floating point unit than most problems
Globular Cluster: 10^6 computations per datum stored
Finite Element Iteration: A few computations per datum stored
Rule of thumb is that one needs one gigabyte of memory per gigaflop of computation in general problems and this general design puts most cost into memory not into CPU.

Note GRAPE uses EXACTLY same parallel algorithm that one finds in the books (e.g. Solving Problems on Concurrent Processors) for N-body problems on classic distributed memory MIMD machines

HTML version of Scripted Foils prepared 26 January 97

Foil 61 GRAPE architecture in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 62 GRAPE Processing Unit in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 63 Why isn't GRAPE a Perfect Solution?

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

GRAPE will execute the classic O(N^2) (parallel) N body algorithm BUT this is not the algorithm used in most such computations

Rather there is the O(N) or O(N)logN so called "fast-multipole" algorithm which uses hierarchical approach

On one million stars, fast multipole is a factor of 100-1000 faster than GRAPE algorithm
fast multipole works in most but not all N-body problems (in globular clusters, extreme heterogenity makes direct O(N^2) method most attractive)

So special purpose devices cannot usually take advantage of new nifty algorithms!

HTML version of Scripted Foils prepared 26 January 97

Foil 64 Current PIM Chips

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Storage

0.5 MB

0.05 MB

0.128 MB

Chip

EXECUBE

AD SHARC

TI MVP

MIT MAP

Terasys PIM

First

Silicon

1993

1994

1996

1993

Peak

50 Mips

120 Mflops

2000 Mops

800 Mflops

625 M bit

ops

0.016 MB

MB/

Perf.

0.01

MB/Mip

0.005

MB/MF

0.000025

MB/Mop

0.00016

MB/MF

0.000026

MB/bit op

Organization

16 bit

SIMD/MIMD CMOS

Single CPU and

Memory

1 CPU, 4 DSP's

4 Superscalar

CPU's

1024

16-bit ALU's

HTML version of Scripted Foils prepared 26 January 97

Foil 65 New "Strawman" PIM Processing Node Macro

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 66 "Strawman" Chip Floorplan

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 67 SIA-Based PIM Chip Projections

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

MB per cm2

MF per cm2

MB/MF ratios

HTML version of Scripted Foils prepared 26 January 97

Foil 68 Superconducting Architecture in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 69 IV. Architecture Point Designs & SW Design Studies: Architectural Framework

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

PetaFLOPS Applications which are grouped into sets with an interface to their own

Problem Solving Environments

Application Level or Virtual Problem Interface ADI

Operating System Services

Multi Resolution Virtual Machine Interfaces joining at lowest levels with

Machine Specific Software

Hardware Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 70 IV. Architecture Point Designs & SW Design Studies: Key SW Development Areas

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

The mission critical applications

Development of shared problem solving environments with rich set of application targeted libraries and resources

Development of common systems software

Programming environments from compilers to multi-level runtime support at the machine independent ADI's

Machine specific software including lowest level of data movement/manipulation

HTML version of Scripted Foils prepared 26 January 97

Foil 71 IV. Architecture Point Designs & SW Design Studies: Software Implementation Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Start now on initial studies to explore the possible system architectures.

These "PetaFLOPS software point studies" should be interdisciplinary involving hardware, systems software and applications expertise.

HTML version of Scripted Foils prepared 26 January 97

Foil 72 Suggested Software Strategy for JNAC (aka Petaflops) Initiative

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

August 28 1996

Geoffrey Fox

HTML version of Scripted Foils prepared 26 January 97

Foil 73 Some Key Observations on PetaSoft Software

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

All proposed hardware architectures have a complex memory hierarchy which should be abstracted with a software architecture

Consisting of a mix of machine specific and generic levels with well defined ADI's or Abstract Device Interfaces
Management of latency with concurent threads or otherwise critical

This implies a layered software architecture reflected in all components

Compiler Language and Runtime, Tools, Systems Software etc.

The Software Architecture should be defined early on so that hardware and software respect it!

JNAC Architecture Review Board will be responsible for interfaces and evaluating compliance with them

Users and Compilers must be able to have full control of data movement and placement in all parts of petaflop system

Size and Complex Memory Structure of PetaFlop machines represent major challenges in scaling existing Software Concepts

HTML version of Scripted Foils prepared 26 January 97

Foil 74 Time for a Software Revolution?

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Well the rest of the Software World is Changing with emergence of WebWindows Environment!

Current approaches (HPF,MPI) lack needed capability to address memory hierarchy of either today's or any future contemplated high performance architecture -- whether sequential or parallel

Problem Solving Environments are needed to support complex applications implied by both Web and increasing capabilities of scientific simulations

So I suggest rethinking High Performance Computing Software Models and Implementations!

HTML version of Scripted Foils prepared 26 January 97

Foil 75 Architectural Framework from PetaSoft Meeting

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

PetaFlop Applications which are grouped into sets with an interface to their own

Problem Solving Environments

Application Level or Virtual Problem Interface ADI

Operating System Services

Multi Resolution Virtual Machine Interfaces joining at lowest levels with

Machine Specific Software

Hardware Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 76 Hierarchy from Application to Complex Computer

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Domain Specific Application Problem Solving Environment

Numerical Objects in (C++/Fortran/C/Java) High Level Virtual Problem

Expose the Coarse Grain Parallelism of the Real Complex Computer

Expose All Levels of Memory Hierarchy of the Real Complex Computer

Virtual

Problem /Appl. ADI

Multi

Level

Machine ADI

Pure Script (Interpreted)

High Level Language but Optimized Compilation

Machine Optimized RunTime

Semi-Interpreted

a la Applets

HTML version of Scripted Foils prepared 26 January 97

Foil 77 The Current HPCC Program Execution Model (PEM) illustratrated by MPI/HPF

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

MPI represents data movement with the abstraction for a structure of machines with just two levels of memory

On Processor and Off Processor

This was a reasonable model in the past but even today fails to represent complex memory structure of typical microprocessor node

Note HPF Distribution Model has similar (to MPI) underlying relatively simple Abstraction for PEM

HTML version of Scripted Foils prepared 26 January 97

Foil 78 The PetaSoft Program Execution Model

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

This addresses memory hierarchy intra-processor as well as inter-processor

Data Movement and Replication defined between Processors as well as between levels of hierarchy on a given processor

Level 2 Cache

Level 1 Cache

HTML version of Scripted Foils prepared 26 January 97

Foil 79 Some Examples of a Layered Software System

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Application Specific Problem Solving Environment

Coarse Grain Coordination Layer e.g. AVS

Massively Parallel Modules (libraries) -- such as DAGH HPF F77 C HPC++ HPJava

Fortran or C plus generic message passing (get,put) and generic memory hierarchy and locality control

Assembly Language plus specific (to architecture) data movement, shared memory and cache control

High

Level

Low Level

HTML version of Scripted Foils prepared 26 January 97

Foil 80 Features of JNAC Software Implementation Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Main JNAC Program is a mix of both research and development with

Development is focused on JNAC machines and identified application areas and along lines of a Broad Systems Architecture established (evaluated and evolved) by JNAC

Work should start now on initial studies to explore the possible system architectures and

Suggest locations for the "sweet-spots" (necks in the hour glass) to define interfaces

These "petaflop software point studies" should be interdisciplinary involving hardware, systems software and applications expertise

HTML version of Scripted Foils prepared 26 January 97

Foil 81 Role of The Architecture Review Board

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Establish and Review Software Architecture and consistent use of Interfaces

}

JNAC Architecture Review

Board

The Five Software Development Areas

}

HTML version of Scripted Foils prepared 26 January 97

Foil 82 The Five Key JNAC Software Development Areas

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

The mission Critical Applications

Development of Approximately 3 shared Problem Solving Environments with rich set of application targeted libraries and resources

e.g. PSE's for PDE's, Image Analysis, Forces Modeling

Development of Common Systems Software

Programming Environments from Compilers to multi-level runtime support at the machine independent ADI's

Machine Specific software including lowest level of data movement/manipulation

HTML version of Scripted Foils prepared 26 January 97

Foil 83 Examples of Machine Specific Software

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

code generation

memory management

routing/interconnect

thread management

diagnostics

fault containment

interupt handling

device drivers

HTML version of Scripted Foils prepared 26 January 97

Foil 84 Examples of Operating System Services I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

scalable filesystems

networking interfaces

scheduling

HL-memory management

HL-latency management

performance data

debugging tools

intermediate code representations

HTML version of Scripted Foils prepared 26 January 97

Foil 85 Examples of Operating System Services II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

object files (a.out)

HL-resource management

query of systems state

operating systems services

compiler middleware

basic visualization tools

numerical libraries

HTML version of Scripted Foils prepared 26 January 97

Foil 86 General Philosophy from PetaSoft Meeting

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Define a "clean" model for machine architecture

Memory hierarchy including caches and geomterical (distributed) effects

Define a low level "Program Execution Model" (PEM) which allows one to describe movement of information and computation in the machine

This can be thought of as "MPI"/assembly language of the machine

On top of low level PEM, one can build an hierarchical (layered) software model

At the top of this layered software model, one finds objects or Problem Solving Environments (PSE's)
At an intermediate level there is Parallel C C++ or Fortran

One can program at each layer of the software and augment it by "escaping" to a lower level to improve performance

Directives (HPF assertions) and explicit insertion of lower level code (HPF extrinsics) are possible

HTML version of Scripted Foils prepared 26 January 97

Foil 87 Features of the The Layered Software Model

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

This is not really a simple stack but a set of complex relations between layers with many interfaces and modules

Interfaces are critical ( for composition across layers)

Enable control and performance for application scientists
Decouple CS system issues and allow exploration and innovation

Enable Next

10000

Users

For First 100

Pioneer Users

Higher Level abstractions

nearer to

application domain

Increasing Machine

detail, control

and management

HTML version of Scripted Foils prepared 26 January 97

Foil 88 PetaSoft Findings 1) and 2) -- Memory Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

1)Deep Memory Hierarchies present New Challenges to High performance Implementation of programs

Latency
Bandwidth
Capacity

2)There are two dimensions of memory hierarchy management

Geometric or Global Structure
Local (cache) hierachies seen from thread or processor centric view

HTML version of Scripted Foils prepared 26 January 97

Foil 89 PetaSoft Findings 3) and 4) -- Using Memory Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

3)One needs a machine "mode" which supports predictable and controllable memory system leading to communication and computation with same characteristics

Allow Compiler optimization
Allow Programmer control and optimization
For instance high performance would often need full program control of caches

4)One needs a low level software layer which provides direct control of the machine (memory hierarchy etc.) by a user program

This for initial users and program tuning

HTML version of Scripted Foils prepared 26 January 97

Foil 90 PetaSoft Findings 5) and 6) -- Layered Software

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

5)One needs a layered (hierarchical) software model which supports an efficient use of multiple levels of abstraction in a single program.

Higher levels of Programming model hide extraneous complexity
highest layers are application dependent Problem Solving Environments and lower levels are machine dependent
Lower levels can be accessed for additional performance
e.g. HPF Extrinsics. Gcc ASM, MATLAB Fortran Routines, Native classes in Java

6)One needs a set of software tools which match the layered software (programming model)

Debuggers, Performance and load balancing tools

HTML version of Scripted Foils prepared 26 January 97

Foil 91 PetaSoft Recommendations 1) to 3) Memory and Software Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

1)Explore issues in design of petaComputer machine models which will support the controllable hierarchical memory systems in a range of important architectures

Research and development in areas of findings 3) and 4)

2)Explore techniques for control of memory hierarchy for petaComputer architectures

Use testbeds

3)Explore issues in designing layered software architectures -- particularly efficient mapping and efficient interfaces to lower levels

Use context of petaflop applications and machines
e.g. HPF is a possible layer while HPF Extrinsics is an interface to a lower (MPI) layer

HTML version of Scripted Foils prepared 26 January 97

Foil 92 III. Key drivers: Summary

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

There are compelling applications

New architectures need to be investigated

Component technologies need to developed

Major advances are needed in system software and tools

Industry is less likely than ever to push limits.

HTML version of Scripted Foils prepared 26 January 97

Foil 93 V. A National program concept: Basis

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Key, focused R&D must be explicitly funded

Program is mostly D augmented with increases in R in HW/SW.

Advanced systems designed and prototyped by the program.

D will need strong central management.

Applications tightly coupled with coordinated SW development groups.

HTML version of Scripted Foils prepared 26 January 97

Foil 94 V. A National program concept: Scope & Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Target dozens of applications (not 100's)

100's of programmers not thousands

Deploy PF class systems < 10 years

Starting in FY98

Multiple technology options

New technologies and architectures

Balance vendor vs direct development

Open RFP for future systems

HTML version of Scripted Foils prepared 26 January 97

Foil 95 V. A National program concept: Technology Projection Model

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Three "tracks" for illustration (might be more or less)

current trends (SIA roadmap)
new architecture w/commercial process tech.
new arch and new process tech.

Deploy systems continuously

Span generations with software model

Pull with RFPs

Push with technology investments

HTML version of Scripted Foils prepared 26 January 97

Foil 96 V. A National program concept: Structure & Flow

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 97 V. A National program concept: Technology Model

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 98 V. A National program concept: Research Projects - Technology

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Chip Interface:

Fabrication Technology
Laser & CMOS chip integration

Optical Networks:

Pbps bandwidths
1000 ports

Superconducting Memories:

100 billion accesses/sec.

Holographic Memories

HTML version of Scripted Foils prepared 26 January 97

Foil 99 V. A National program concept: Research Projects - Architecture

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Natural Evolution Systems:

Likely path of COTS technology.

Special Purpose Architecture:

Driven by Specific algorithms
Develop proof of concept.

Hybrid Technology Architecture Development:

Exploit advanced technologies.
Integrate advanced technologies into system.

HTML version of Scripted Foils prepared 26 January 97

Foil 100 V. A National program concept: Research Projects - System Software

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

PetaFLOPS Languages:

Develop consistent set of languages
Develop programming interfaces.
Support multiple architecture projects

Operating Systems:

Highly scalable.
Massive concurrency
High bandwidth & virtual memory.

Runtime Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 101 V. A National program concept: Research Projects - Applications & Algorithms

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Algorithms to reduce latency associated with

petaFLOPS scale:

memory hierarchies &

processor ensembles.

Driver Applications:

Important to national objectives
Includes scaling up to petaFLOPS
Supports multiple architecture projects

HTML version of Scripted Foils prepared 26 January 97

Foil 102 V. A National focused program concept: Early Program Milestones

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

SW level interface definitions

Projection of performance requirements to lower levels (performance based design)

Applications analysis wrt specific programming models (machines)

Experimental testbeds simulated/modeled on existing MPP

HTML version of Scripted Foils prepared 26 January 97

Foil 103 Now we follow with A Comparison of JNAC and HPCC

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Next Three foils isolate some differences and commonalities in two programs

HTML version of Scripted Foils prepared 26 January 97

Foil 104 Comparison of HPCC and JNAC - I

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Both set a hardware goal (teraflop for HPCC and petaflop for JNAC) to focus activity but in each case systems and applications were main justification

Both couple applications with software and architectures in multidisciplinary teams with multi-agency support

HPCC was dominantly research

JNAC is roughly 50-50 research and development

HPCC inevitably developed MPP's and transferred parallel computing to computing mainstream

JNAC's challenge is memory hierarchy and will transfer understanding of this to mainstream independent of parallelism

HTML version of Scripted Foils prepared 26 January 97

Foil 105 Comparison of HPCC and JNAC - II

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HPCC aimed at Grand challenges in Industry Government and Academia

JNAC aimed at government (including NSF) mission critical applications

HPCC developed software (PSE's) largely independently in each Grand Challenge

JNAC will link software efforts to a few PSE's and a common set of JNAC Interfaces

HPCC tended to develop hardware with rapidly changing architectures which software "chased" rather laboriously

JNAC develops software simultaneously with hardware and to a uniform common architecture allowing better re-use of both application and systems software

HTML version of Scripted Foils prepared 26 January 97

Foil 106 Comparison of HPCC and JNAC - III

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

HPCC aimed to transfer technology to Industry for commercialization

JNAC relies on industry to build systems designed by laboratory, university and industry consortia

HPCC is Research -->Capitalization-->Product

JNAC is mission driven development linked to supporting research with engineering prototypes as capitalization stage

HPCC was a broad program aimed at "all" (large scale) users of computers

JNAC is a focused program and aims at "top 100" power users

HTML version of Scripted Foils prepared 26 January 97

Foil 107 VI. Future actions necessary to mold an R&D program: The Message

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Need to invest in computing at the high end.

PetaFLOPS level of performance are feasible.

Private sector is not going to do it alone.

HTML version of Scripted Foils prepared 26 January 97

Foil 108 VI. Future actions necessary to mold an R&D program: Near Term Recommendation

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Conduct detailed PetaFLOPS architecture design & simulation studies.

Initiate early software development of layered architecture.

Develop PetaFLOPS scale latency management

Accelerate R&D in advanced technologies.

Invent algorithms for special purpose and reconfigurable structures.

HTML version of Scripted Foils prepared 26 January 97

Foil 109 VI. Future actions necessary to mold an R&D program: Next Steps

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

The PetaFLOPS Frontier (Oct.. 96)

Open community, international

PetaFLOPS Algorithms Workshop (Apr.. 97)

PetaFLOPS II Conference (Sep.. 97)

Algorithm driven Architecture
Based on Point Designs against components
Applications drivers/scenarios
Leap Technology elements

Engage community in establishing challenges, directions, topics for research

HTML version of Scripted Foils prepared 26 January 97

Foil 110 VI. Future actions necessary to mold an R&D program: PetaFLOPS Algorithms Workshop (PAL`97)

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Location: Williamsburg Hospitality House, Williamsburg, VA; April 20-25, 1997

Chair; David Bailey, NASA Ames

Objectives:

Identify novel algorithmic approaches that may be better suited to future PetaFLOPS systems.
Present quantitative analysis of PetaFLOPS algorithms
Plan future PetaFLOPS algorithm research activities.
Present and analyze new PetaFLOPS architecture designs.

HTML version of Scripted Foils prepared 26 January 97

Foil 111 VI. Future actions : Next Steps Integrate into Federal R&D Planning

From Variety of Foils Used Starting January 97 General -- 1997. *

Full HTML Index

Coordinate with High End Computing & Computation (HECC) Working Group.

Develop Technical Approach --NOW

Strategy for developing National Initiative.

Multi-agency efforts.

Federal agencies plan for FY'98 budget submission

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 10 1997

Scripted foilset Variety of Foils Used Starting January 97

Table of Contents for full HTML of Variety of Foils Used Starting January 97

Foil 1 Remarks on PetaflopTechnology and National ProgramSee:http://www.npac.syr.edu/users/gcf/petaflopjan97

Foil 2 Abstract of PetaFlop Presentation Jan 97

Foil 3 NSTC/Committee on Computing, Information & Communication Meeting PetaFLOPS Workshops Results September 19, 1996 Dr. Paul H. Smith Department of Energy

Foil 4 Contents

Foil 5 I. Workshop series... background: PetaFLOPS workshops.

Foil 6 I. Workshop series... background: Community & Sponsoring Agencies

Foil 7 I. Workshop series... background: Workshops Purposes

Foil 8 Title of foil 6

Foil 9 Title of foil 7

Foil 10 Peak Supercomputer Performance

Foil 11 Some Important Trends -- COTS is King!

Foil 12 Comments on COTS for Hardware

Foil 13 Returning to Today - I

Foil 14 Returning to Today - II

Foil 15 Overall Remarks on the March to PetaFlops - I

Foil 16 Overall Remarks on the March to PetaFlops - II

Foil 17 II. Major Findings & Recommendations: Findings.

Foil 18 II. Major Findings & Recommendations: Findings.

Foil 19 II. Major Findings & Recommendations: Recommendations

Foil 20 II. Major Findings & Recommendations: Recommendations

Foil 21 PP Presentation

Foil 22 III. Key drivers for advanced computational capabilities beyond HPCC. Why PetaFLOPS??

Foil 23 III. Key drivers: The State of the Art

Foil 24 III. Key drivers: The Need for PetaFLOPS Computing

Foil 25 10 Possible PetaFlop Applications

Foil 26 Petaflop Performance for Flow in Porous Media?

Foil 27 Target Flow in Porous Media Problem (Glimm - Petaflop Workshop)

Foil 28 NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications

Foil 29 Bodega Bay: Primary Memory

Foil 30 Bodega Bay: Secondary Memory

Foil 31 Bodega Bay: Aggregate I/O

Foil 32 III. Key drivers: Technological Limitations

Foil 33 Chip Density Projections to year 2013

Foil 34 Clock Speed and I/O Speed in megabytes/sec per pin through year 2013

Foil 35 Supercomputer Architectures in Years 2005-2010 -- I

Foil 36 Supercomputer Architectures in Years 2005-2010 -- II

Foil 37 Supercomputer Architectures in Years 2005-2010 -- III

Foil 38 Performance Per Transistor

Foil 39 Comparison of Supercomputer Architectures

Foil 40 Three Major Markets -- Logic,ASIC,DRAM

Foil 41 Chip and Package Characteristics

Foil 42 Fabrication Characteristics

Foil 43 Electrical Design and Test Metrics

Foil 44 Technologies for High Performance Computers

Foil 45 Architectures for High Performance Computers - I

Foil 46 Architectures for High Performance Computers - II

Foil 47 There is no Best Machine!

Foil 48 Quantum Computing - I

Foil 49 Quantum Computing - II

Foil 50 Quantum Computing - III

Foil 51 Superconducting Technology -- Past

Foil 52 Superconducting Technology -- Present

Foil 53 Superconducting Technology -- Problems

Foil 54 IV. Architecture Point Designs & SW Design Studies: Point Design Study:

Foil 55 IV. Architecture Point Designs & SW Design Studies: 1996 Point Design Awards

Foil 56 The 8 NSF Point Designs

Foil 57 Architecture of MORPH NSF Petaflop Point Study

Foil 58 Architecture of I-ACOMA Petaflop Point Study

Foil 59 Some MetaComputer Systems

Foil 60 The GRAPE N-Body Machine

Foil 61 GRAPE architecture in NSF Petaflop Point Study

Foil 62 GRAPE Processing Unit in NSF Petaflop Point Study

Foil 63 Why isn't GRAPE a Perfect Solution?

Foil 64 Current PIM Chips

Foil 65 New "Strawman" PIM Processing Node Macro

Foil 66 "Strawman" Chip Floorplan

Foil 67 SIA-Based PIM Chip Projections

Foil 68 Superconducting Architecture in NSF Petaflop Point Study

Foil 69 IV. Architecture Point Designs & SW Design Studies: Architectural Framework

Foil 70 IV. Architecture Point Designs & SW Design Studies: Key SW Development Areas

Foil 71 IV. Architecture Point Designs & SW Design Studies: Software Implementation Strategy

Foil 72 Suggested Software Strategy for JNAC (aka Petaflops) Initiative

Foil 73 Some Key Observations on PetaSoft Software

Foil 74 Time for a Software Revolution?

Foil 75 Architectural Framework from PetaSoft Meeting

Foil 76 Hierarchy from Application to Complex Computer

Foil 77 The Current HPCC Program Execution Model (PEM) illustratrated by MPI/HPF

Foil 78 The PetaSoft Program Execution Model

Foil 1 Remarks on Petaflop
Technology and National Program
See:
http://www.npac.syr.edu/users/gcf/petaflopjan97