Full HTML for

Scripted foilset Variety of Foils Used Starting January 97

Given by Geoffrey C. Fox at General on 1997. Foils prepared 26 January 97
Outside Index Summary of Material


This uses material from Paul Smith and Peter Kogge as well as Fox
We describe the "National PetaFlop Study(s)" and what you can expect with or without a specific initiative
We discuss traditional, Processor in Memory, Superconducting, Special Purpose architectures as well as future Quantum Computers!
We survey possible applications, new needs and opportunities for software as well as the technologies and designs for new machines one can expect in the year 2007!
We review findings of studies and structure of a possible initiative

Table of Contents for full HTML of Variety of Foils Used Starting January 97

Denote Foils where Image Critical
Denote Foils where HTML is sufficient
Denote Foils where Image is not available
Indicates Available audio which is lightgreened out if missing
1 Remarks on Petaflop
Technology and National Program
See:
http://www.npac.syr.edu/users/gcf/petaflopjan97

2 Abstract of PetaFlop Presentation Jan 97
3 NSTC/Committee on Computing, Information & Communication Meeting PetaFLOPS Workshops Results September 19, 1996 Dr. Paul H. Smith Department of Energy
4 Contents
5 I. Workshop series... background: PetaFLOPS workshops.
6 I. Workshop series... background: Community & Sponsoring Agencies
7 I. Workshop series... background: Workshops Purposes
8 Title of foil 6
9 Title of foil 7
10 Peak Supercomputer Performance
11 Some Important Trends -- COTS is King!
12 Comments on COTS for Hardware
13 Returning to Today - I
14 Returning to Today - II
15 Overall Remarks on the March to PetaFlops - I
16 Overall Remarks on the March to PetaFlops - II
17 II. Major Findings & Recommendations: Findings.
18 II. Major Findings & Recommendations: Findings.
19 II. Major Findings & Recommendations: Recommendations
20 II. Major Findings & Recommendations: Recommendations
21 PP Presentation
22 III. Key drivers for advanced computational capabilities beyond HPCC. Why PetaFLOPS??
23 III. Key drivers: The State of the Art
24 III. Key drivers: The Need for PetaFLOPS Computing
25 10 Possible PetaFlop Applications
26 Petaflop Performance for Flow in Porous Media?
27 Target Flow in Porous Media Problem (Glimm - Petaflop Workshop)
28 NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications
29 Bodega Bay: Primary Memory
30 Bodega Bay: Secondary Memory
31 Bodega Bay: Aggregate I/O
32 III. Key drivers: Technological Limitations
33 Chip Density Projections to year 2013
34 Clock Speed and I/O Speed in megabytes/sec per pin through year 2013
35 Supercomputer Architectures in Years 2005-2010 -- I
36 Supercomputer Architectures in Years 2005-2010 -- II
37 Supercomputer Architectures in Years 2005-2010 -- III
38 Performance Per Transistor
39 Comparison of Supercomputer Architectures
40 Three Major Markets -- Logic,ASIC,DRAM
41 Chip and Package Characteristics
42 Fabrication Characteristics
43 Electrical Design and Test Metrics
44 Technologies for High Performance Computers
45 Architectures for High Performance Computers - I
46 Architectures for High Performance Computers - II
47 There is no Best Machine!
48 Quantum Computing - I
49 Quantum Computing - II
50 Quantum Computing - III
51 Superconducting Technology -- Past
52 Superconducting Technology -- Present
53 Superconducting Technology -- Problems
54 IV. Architecture Point Designs & SW Design Studies: Point Design Study:
55 IV. Architecture Point Designs & SW Design Studies: 1996 Point Design Awards
56 The 8 NSF Point Designs
57 Architecture of MORPH NSF Petaflop Point Study
58 Architecture of I-ACOMA Petaflop Point Study
59 Some MetaComputer Systems
60 The GRAPE N-Body Machine
61 GRAPE architecture in NSF Petaflop Point Study
62 GRAPE Processing Unit in NSF Petaflop Point Study
63 Why isn't GRAPE a Perfect Solution?
64 Current PIM Chips
65 New "Strawman" PIM Processing Node Macro
66 "Strawman" Chip Floorplan
67 SIA-Based PIM Chip Projections
68 Superconducting Architecture in NSF Petaflop Point Study
69 IV. Architecture Point Designs & SW Design Studies: Architectural Framework
70 IV. Architecture Point Designs & SW Design Studies: Key SW Development Areas
71 IV. Architecture Point Designs & SW Design Studies: Software Implementation Strategy
72 Suggested Software Strategy for JNAC (aka Petaflops) Initiative
73 Some Key Observations on PetaSoft Software
74 Time for a Software Revolution?
75 Architectural Framework from PetaSoft Meeting
76 Hierarchy from Application to Complex Computer
77 The Current HPCC Program Execution Model (PEM) illustratrated by MPI/HPF
78 The PetaSoft Program Execution Model
79 Some Examples of a Layered Software System
80 Features of JNAC Software Implementation Strategy
81 Role of The Architecture Review Board
82 The Five Key JNAC Software Development Areas
83 Examples of Machine Specific Software
84 Examples of Operating System Services I
85 Examples of Operating System Services II
86 General Philosophy from PetaSoft Meeting
87 Features of the The Layered Software Model
88 PetaSoft Findings 1) and 2) -- Memory Hierarchy
89 PetaSoft Findings 3) and 4) -- Using Memory Hierarchy
90 PetaSoft Findings 5) and 6) -- Layered Software
91 PetaSoft Recommendations 1) to 3) Memory and Software Hierarchy
92 III. Key drivers: Summary
93 V. A National program concept: Basis
94 V. A National program concept: Scope & Strategy
95 V. A National program concept: Technology Projection Model
96 V. A National program concept: Structure & Flow
97 V. A National program concept: Technology Model
98 V. A National program concept: Research Projects - Technology
99 V. A National program concept: Research Projects - Architecture
100 V. A National program concept: Research Projects - System Software
101 V. A National program concept: Research Projects - Applications & Algorithms
102 V. A National focused program concept: Early Program Milestones
103 Now we follow with A Comparison of JNAC and HPCC
104 Comparison of HPCC and JNAC - I
105 Comparison of HPCC and JNAC - II
106 Comparison of HPCC and JNAC - III
107 VI. Future actions necessary to mold an R&D program: The Message
108 VI. Future actions necessary to mold an R&D program: Near Term Recommendation
109 VI. Future actions necessary to mold an R&D program: Next Steps
110 VI. Future actions necessary to mold an R&D program: PetaFLOPS Algorithms Workshop (PAL`97)
111 VI. Future actions : Next Steps Integrate into Federal R&D Planning

Outside Index Summary of Material



HTML version of Scripted Foils prepared 26 January 97

Foil 1 Remarks on Petaflop
Technology and National Program
See:
http://www.npac.syr.edu/users/gcf/petaflopjan97

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Geoffrey Fox
Syracuse University
111 College Place
Syracuse
New York 13244-4100

HTML version of Scripted Foils prepared 26 January 97

Foil 2 Abstract of PetaFlop Presentation Jan 97

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
This uses material from Paul Smith and Peter Kogge as well as Fox
We describe the "National PetaFlop Study(s)" and what you can expect with or without a specific initiative
We discuss traditional, Processor in Memory, Superconducting, Special Purpose architectures as well as future Quantum Computers!
We survey possible applications, new needs and opportunities for software as well as the technologies and designs for new machines one can expect in the year 2007!
We review findings of studies and structure of a possible initiative

HTML version of Scripted Foils prepared 26 January 97

Foil 3 NSTC/Committee on Computing, Information & Communication Meeting PetaFLOPS Workshops Results September 19, 1996 Dr. Paul H. Smith Department of Energy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 4 Contents

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
I. Workshop series... background.
II. Major findings & recommendations from the PetaFLOPS workshops.
III. Key drivers for advanced computational capabilities beyond HPCC.
IV. PetaFLOPS Architecture Point Designs & SW Design Studies.
V. A National program concept.
VI. Future actions to mold an R&D program.

HTML version of Scripted Foils prepared 26 January 97

Foil 5 I. Workshop series... background: PetaFLOPS workshops.

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
PetaFLOPS I
  • January 1994 in Pasadena, CA.
  • 60+ experts.
  • Set tone for subsequent Workshops.
PetaFLOPS Bodega Bay Summer Study
  • August 1995
PetaFLOPS Architecture Workshop, PAWS'96
  • April 1996
PetaSOFT'96
  • June 1996

HTML version of Scripted Foils prepared 26 January 97

Foil 6 I. Workshop series... background: Community & Sponsoring Agencies

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Sponsoring Agencies
NASA
NSF
DOE
DARPA
NSA
BMDO
t
Private
sector
Academic
Federal
National
Laboratories

HTML version of Scripted Foils prepared 26 January 97

Foil 7 I. Workshop series... background: Workshops Purposes

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
To identify immediate & future applications.
To provide standard base (PetaFLOPS I) to measure advances in PetaFLOPS R&D.
To identify critical enabling technologies.
To assist technology directors to plan for future programs beyond HPCC.

HTML version of Scripted Foils prepared 26 January 97

Foil 8 Title of foil 6

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
I. Workshop series... background: Coordinating Chairs
Dr. Paul H. Smith.....................................................................General
Special Assistant, Advanced Computing Technology
U.S.. Department of Energy
Dr. David Bailey .....................................................................Algorithms
NASA/Ames Research Center
Dr. Ian Foster ...........................................................................Software
Division of Mathematics and Computer Science
Argonne National Laboratory
Prof.. Geoffrey Fox ............................................................................................Architecture
Departments of Physics & Computer Science
Prof.. Peter Kogge ...................................................................Architecture
McCourtney Professor of Computer Science & Engineering
University of Notre Dame

HTML version of Scripted Foils prepared 26 January 97

Foil 9 Title of foil 7

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
I. Workshop series... background: Coordinating Chairs
Prof.. Sidney Karin .......................................................................General
Director for Advanced Computational Science & Engineering
University of California, San Diego
Dr Paul Messina ...........................................................................PetaFLOPS-I
Director, Center for Advanced Computing
California Institute of Technology
Dr. Thomas Sterling .....................................................................Architecture
Senior Scientist
Jet Propulsion Laboratory
Dr. Rick Stevens ..........................................................................Applications
Director, Mathematics & Computer Science Division
Argonne National Laboratory
Dr. John Van Rosendale ..............................................................Point Design
Division of Advanced Scientific Computing
National Science Foundation

HTML version of Scripted Foils prepared 26 January 97

Foil 10 Peak Supercomputer Performance

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
For "Convential" MPP/Distributed Shared Memory Architecture
Now(1996) Peak is 0.1 to 0.2 Teraflops in Production Centers
  • Note both SGI and IBM are changing architectures:
  • IBM Distributed Memory to Distributed Shared Memory
  • SGI Shared Memory to Distributed Shared Memory
In 1999, one will see production 1 Teraflop systems
In 2003, one will see production 10 Teraflop Systems
In 2007, one will see production 50-100 Teraflop Systems
Memory is Roughly 0.25 to 1 Terabyte per 1 Teraflop
If you are lucky/work hard: Realized performance is 30% of Peak

HTML version of Scripted Foils prepared 26 January 97

Foil 11 Some Important Trends -- COTS is King!

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Everybody now believes in COTS -- Consumer On the Shelf Technology -- one must use commercial building blocks for any specialized system whether it be a DoD weopens program or high end Supercomputer
  • These are both Niche Markets!
COTS for hardware can be applied to a greater or less extent
  • Gordon Bell's SNAP system says we will only have ATM networks of PC's running WindowsNT
  • SGI HP and IBM will take commodity processor nodes but link with custom switches (with different versions of distributed shared memory support)
COTS for Software is less common but (I expect) to become much more common
  • HPF producing HTTP not MPI with Java visulalization is an example of Software COTS

HTML version of Scripted Foils prepared 26 January 97

Foil 12 Comments on COTS for Hardware

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Currently MPP's have COTS processors and specialized networks but this could reverse
  • Pervasive ATM will indeed lead to COTS Networks BUT
  • Current microprocessors are roughly near optimal in terms of megaflops per square meter of silicon BUT
  • As (explicit) parallelism shunned by modern microprocessor, silicon is used for wasteful speculative execution with expectation that future systems will move to 8 way functional parallelism.
Thus estimate that 250,000 transistors (excluding on chip cache) is optimal for performance per square mm of silicon
  • Modern microprocessor is around ten times this size
Again simplicity is optimal but this requires parallelism
Contrary trend is that memory dominates use of silicon and so performance per square mm of silicon is often not relevant

HTML version of Scripted Foils prepared 26 January 97

Foil 13 Returning to Today - I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Tightly Coupled MPP's were (SP2,Paragon,CM5 etc) distributed memory but at least at the low end they are becoming hardware assisted shared memory
  • Unclear how well compilers will support this in a scaling fashion -- we will see how SGI/Cray systems based ideas pioneered at Stanford fair!
Note this is an example of COTS at work -- SGI/Sun/.. Symmetric Multiprocessors (Power Challenge from SGI) attractive as bus will support upto 16 processors in elegant shared memory software world.
  • Previously such systems were not pwerful enough to be interesting
Clustering such SGI Power Challenge like systems produces a powerful but difficult to program (as both distributed and shared memory) heterogeneous system
Meanwhile Tera Computer will offer a true Uniform Memory access shared memory using ingenious multi threaded software/hardware to hide latency
  • Unclear if this competitive in cost/performance with (scruffy) COTS approach

HTML version of Scripted Foils prepared 26 January 97

Foil 14 Returning to Today - II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Trend I -- Hardware Assisted Tightly Coupled Shared Memory MPP's are replacing pure distributed memory systems
Trend II -- The World Wide Web and increasing power of individual workstations is making geographically distributed heterogeneous distributed memory systems more attractive
Trend III -- To confuse the issue, the technology trends in next ten years suggest yet different architecture(s) such as PIM
Better use Scalable Portable Software with conflicting technology/architecture trends BUT must address latency agenda which isn't clearly portable!

HTML version of Scripted Foils prepared 26 January 97

Foil 15 Overall Remarks on the March to PetaFlops - I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
I find study interesting not only in its result but also in its methodology of several intense workshops combined with general discussions at national conferences
Exotic technologies such as "DNA Computing" and Quantum Computing do not seem relevant on this timescale
Note clock speeds will NOT improve much in the future but density of chips will continue to improve at roughly the current exponential rate over next 10-20 years
Superconducting technology is currently seriously limited by no appropriate memory technology that matches factor of 100-1000 faster CPU processing
Current project views software as perhaps the hardest problem

HTML version of Scripted Foils prepared 26 January 97

Foil 16 Overall Remarks on the March to PetaFlops - II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
All proposed designs have VERY deep memory hierarchies which are a challenge to algorithms, compilers and even communication subsystems
Major need for hig-end performance computers comes from government (both civilian and military) applications
  • DoE ASCI (study of aging of nuclear weopens) and Weather/Climate prediction are two examples
Government must develop systems using commercial suppliers but NOT relying on traditionasl industry applications to motivate
So Currently Petaflop initiative is thought of as an applied development project whereas HPCC was mainly a research endeavour

HTML version of Scripted Foils prepared 26 January 97

Foil 17 II. Major Findings & Recommendations: Findings.

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
PetaFLOPS possible; accelerate goals to 10 years.
Many important application drivers exist.
Memory dominant implementation factor.
Cost, power & efficiency dominate.
Innovation critical, new technology necessary.
Layered SW architecture mandatory.
Opportunities for immediate SW effort.

HTML version of Scripted Foils prepared 26 January 97

Foil 18 II. Major Findings & Recommendations: Findings.

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
New technology means paradigm shift.
  • Superconductivity technology is example.
Memory bandwidth.
Latency.
Software important.
Closer relationship between architecture and programming is needed.
Role of algorithms must improve.

HTML version of Scripted Foils prepared 26 January 97

Foil 19 II. Major Findings & Recommendations: Recommendations

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Conduct point design studies.
  • in hardware and software.
  • of promising architecture
Develop engineering prototypes.
  • Multiple technology track demonstrations
Start SW now, independent of HW.
Develop layered software architecture for scalability and code reuse
Explore algorithms for special purpose & reconfigurable structures.

HTML version of Scripted Foils prepared 26 January 97

Foil 20 II. Major Findings & Recommendations: Recommendations

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Support & accelerate R&D in paradigm shift technologies:
  • Superconductor RSFQ
  • Holographic photo-refractive storage
  • Optical guided and free-space interconnect
  • New semiconductor materials.
Perform detailed applications studies at scale.
Develop petaFLOPS scale latency management.

HTML version of Scripted Foils prepared 26 January 97

Foil 21 PP Presentation

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Nation's experts participated:
  • Academia
  • Private Sector
  • Federal Government
Strong need for computing at the high end.
PetaFLOPS levels of performance are feasible.
Preliminary set of goals for the next decade formulated with a PetaFLOPS system as the end product.
II. Major Findings & Recommendations:
Workshops Summaries.

HTML version of Scripted Foils prepared 26 January 97

Foil 22 III. Key drivers for advanced computational capabilities beyond HPCC. Why PetaFLOPS??

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
There are compelling applications that need that level of performance.
PetaFLOPS levels of performance are feasible, but substantial research is needed.
Private sector is not going to do it alone.

HTML version of Scripted Foils prepared 26 January 97

Foil 23 III. Key drivers: The State of the Art

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
TeraFLOPS machine architecture in hand.
Programming still is explicit message passing.
TeraFLOPS applications are coarse grain
Latency management not showstopper for TeraFLOPS.
Operating systems and tools provide relatively little support for the users
Parallelism has to be managed explicitly

HTML version of Scripted Foils prepared 26 January 97

Foil 24 III. Key drivers: The Need for PetaFLOPS Computing

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Applications that require petaFLOPS can already be identified
  • (DOE) Nuclear weapons stewardship
  • (NSA) Cryptology and digital signal processing
  • (NASA and NSF) Satellite data assimilation and climate modeling
The need for ever greater computing power will remain.
PetaFLOPS systems are right step for the next decade

HTML version of Scripted Foils prepared 26 January 97

Foil 25 10 Possible PetaFlop Applications

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Nuclear Weopens Stewardship (ASCI)
Cryptology and Digital Signal Processing
Satellite Data Analysis
Climate and Environmental Modeling
3-D Protein Molecule Reconstruction
Real-Time Medical Imaging
Severe Storm Forecasting
Design of Advanced Aircraft
DNA Sequence Matching
Molecular Simulations for nanotechnology
Large Scale Economic Modelling
Intelligent Planetary Spacecraft

HTML version of Scripted Foils prepared 26 January 97

Foil 26 Petaflop Performance for Flow in Porous Media?

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Why does one need a petaflop (1015 operations per second) computer?
These are problems where quite viscous (oil, pollutants) liquids percolate through the ground
Very sensitive to details of material
Most important problems are already solved at some level, but most solutions are insufficient and need improvement in various respects:
  • under resolution of solution details, averaging of local variations and under representation of physical details
  • rapid solutions to allow efficient exploration of system parameters
  • robust and automated solution, to allow integration of results in high level decision, design and control functions
  • inverse problems (history match) to reconstruct missing data require multiple solutions of the direct problem

HTML version of Scripted Foils prepared 26 January 97

Foil 27 Target Flow in Porous Media Problem (Glimm - Petaflop Workshop)

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Oil Resevoir Simulation
Geological variation occurs down to pore size of rock - almost 10-6 metres - model this (statistically)
Want to calculate flow between wells which are about 400 metres apart
103x103x102 = 108 grid elements
30 species
104 time steps
300 separate cases need to be considered
3x109 words of memory per case
1012 words total if all cases considered in parallel
1019 floating point operation
3 hours on a petaflop computer

HTML version of Scripted Foils prepared 26 January 97

Foil 28 NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 29 Bodega Bay: Primary Memory

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 30 Bodega Bay: Secondary Memory

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 31 Bodega Bay: Aggregate I/O

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 32 III. Key drivers: Technological Limitations

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Semiconductor component technology
  • feature size, other issues will finally put us in a regime in which Moore's law no longer holds
Architecture
  • levels of parallelism must be used
  • memory hierarchy due to increased processor speeds
System software
  • latency management
  • efficient handling of
    • millions of concurrent threads
    • thousands of I/O devices

HTML version of Scripted Foils prepared 26 January 97

Foil 33 Chip Density Projections to year 2013

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Extrapolated from SIA Projections to year 2007 -- See Chapter 6 of Petaflops Report -- July 94

HTML version of Scripted Foils prepared 26 January 97

Foil 34 Clock Speed and I/O Speed in megabytes/sec per pin through year 2013

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Extrapolated from SIA Projections to year 2007 -- See Chapter 6 of Petaflops Report -- July 94

HTML version of Scripted Foils prepared 26 January 97

Foil 35 Supercomputer Architectures in Years 2005-2010 -- I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Conventional (Distributed Shared Memory) Silcon
  • Clock Speed 1GHz
  • 4 eight way parallel Complex RISC nodes per chip
  • 4000 Processing chips gives over 100 tera(fl)ops
  • 8000 2 Gigabyte DRAM gives 16 Terabytes Memory
Note Memory per Flop is much less than one to one
Natural scaling says time steps decrease at same rate as spatial intervals and so memory needed goes like (FLOPS in Gigaflops)**.75
  • If One Gigaflop requires One Gigabyte of memory (Or is it one Teraflop that needs one Terabyte?)

HTML version of Scripted Foils prepared 26 January 97

Foil 36 Supercomputer Architectures in Years 2005-2010 -- II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Superconducting Technology is promising but can it compete with silicon juggernaut?
Should be able to build a simple 200 Ghz Superconducting CPU with modest superconducting caches (around 32 Kilobytes)
Must use same DRAM technology as for silicon CPU ?
So tremendous challenge to build latency tolerant algorithms (as over a factor of 100 difference in CPU and memory speed) but advantage of factor 30-100 less parallelism needed

HTML version of Scripted Foils prepared 26 January 97

Foil 37 Supercomputer Architectures in Years 2005-2010 -- III

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Processor in Memory (PIM) Architecture is follow on to J machine (MIT) Execube (IBM -- Peter Kogge) Mosaic (Seitz)
  • More Interesting in 2007 as processors are be "real" and have nontrivial amount of memory
  • Naturally fetch a complete row (column) of memory at each access - perhaps 1024 bits
One could take in year 2007 each two gigabyte memory chip and alternatively build as a mosaic of
  • One Gigabyte of Memory
  • 1000 250,000 transistor simple CPU's running at 1 Gigaflop each and each with one megabyte of on chip memory
12000 chips (Same amount of Silicon as in first design but perhaps more power) gives:
  • 12 Terabytes of Memory
  • 12 Petaflops performance
  • This design "extrapolates" specialized DSP's , the GRAPE (specialized teraflop N body machine) etc to a "somewhat specialized" system with a general CPU but a special memory poor architecture with particular 2/3D layout

HTML version of Scripted Foils prepared 26 January 97

Foil 38 Performance Per Transistor

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Performance data from uP vendors
Transistor count excludes on-chip caches
Performance normalized by clock rate
Conclusion: Simplest is best! (250K Transistor CPU)
Millions of Transistors (CPU)
Millions of Transistors (CPU)
Normalized SPECINTS
Normalized SPECFLTS

HTML version of Scripted Foils prepared 26 January 97

Foil 39 Comparison of Supercomputer Architectures

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Fixing 10-20 Terabytes of Memory, we can get
16000 way parallel natural evolution of today's machines with various architectures from distributed shared memory to clustered heirarchy
  • Peak Performance is 150 Teraflops with memory systems like today but worse with more levels of cache
5000 way parallel Superconducting system with 1 Petaflop performance but terrible imbalance between CPU and memory speeds
12 million way parallel PIM system with 12 petaflop performance and "distributed memory architecture" as off chip access with have serious penalities
There are many hybrid and intermediate choices -- these are extreme examples of "pure" architectures

HTML version of Scripted Foils prepared 26 January 97

Foil 40 Three Major Markets -- Logic,ASIC,DRAM

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index Secs 47
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994
L=Logic, D=DRAM, A=ASIC, mP = microprocessor

HTML version of Scripted Foils prepared 26 January 97

Foil 41 Chip and Package Characteristics

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index Secs 37
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 42 Fabrication Characteristics

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index Secs 25
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 43 Electrical Design and Test Metrics

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index Secs 34
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994

HTML version of Scripted Foils prepared 26 January 97

Foil 44 Technologies for High Performance Computers

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
We can choose technology and architecture separately in designing our high performance system
Technology is like choosing ants people or tanks as basic units in our society analogy
  • or less frivolously neurons or brains
In HPCC arena, we can distinguish current technologies
  • COTS (Consumer off the shelf) Microprocessors
  • Custom node computer architectures
  • More generally these are all CMOS technologies
Near term technology choices include
  • Gallium Arsenide or Superconducting materials as opposed to Silicon
  • These are faster by a factor of 2 (GaAs) to 300 (Superconducting)
Further term technology choices include
  • DNA (Chemical) or Quantum technologies
It will cost $40 Billion for next industry investment in CMOS plants and this huge investment makes it hard for new technologies to "break in"

HTML version of Scripted Foils prepared 26 January 97

Foil 45 Architectures for High Performance Computers - I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Architecture is equivalent to organization or design in society analogy
  • Different models for society (Capitalism etc.) or different types of groupings in a given society
  • Businesses or Armies are more precisely controlled/organized than a crowd at the State Fair
  • We will generalize this to formal (army) and informal (crowds) organizations
We can distinguish formal and informal parallel computers
Informal parallel computers are typically "metacomputers"
  • i.e. a bunch of computers sitting on a department network

HTML version of Scripted Foils prepared 26 January 97

Foil 46 Architectures for High Performance Computers - II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Metacomputers are a very important trend which uses similar software and algorithms to conventional "MPP's" but have typically less optimized parameters
  • In particular network latency is higher and bandwidth is lower for an informal HPC
  • Latency is time for zero length communication -- start up time
Formal high performance computers are the classic (basic) object of study and are
"closely coupled" specially designed collections of compute nodes which have (in principle) been carefully optimized and balanced in the areas of
  • Processor (computer) nodes
  • Communication (internal) Network
  • Linkage of Memory and Processors
  • I/O (external network) capabilities
  • Overall Control or Synchronization Structure

HTML version of Scripted Foils prepared 26 January 97

Foil 47 There is no Best Machine!

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
In society, we see a rich set of technologies and architectures
  • Ant Hills
  • Brains as bunch of neurons
  • Cities as informal bunch of people
  • Armies as formal collections of people
With several different communication mechanisms with different trade-offs
  • One can walk -- low latency, low bandwidth
  • Go by car -- high latency (especially if can't park), reasonable bandwidth
  • Go by air -- higher latency and bandwidth than car
  • Phone -- High speed at long distance but can only communicate modest material (low capacity)

HTML version of Scripted Foils prepared 26 January 97

Foil 48 Quantum Computing - I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Quantum-Mechanical Computers by Seth Lloyd, Scientific American, Oct 95
Chapter 6 of The Feynman Lectures on Computation edited by Tony Hey and Robin Allen, Addison-Wesley, 1996
Quantum Computing: Dream or Nightmare? Haroche and Raimond, Physics Today, August 96 page 51
Basically any physical system can "compute" as one "just" needs a system that gives answers that depend on inputs and all physical systems have this property
Thus one can build "superconducting" "DNA" or "Quantum" computers exploiting respectively superconducting molecular or quantum mechanical rules

HTML version of Scripted Foils prepared 26 January 97

Foil 49 Quantum Computing - II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
For a "new technology" computer to be useful, one needs to be able to
  • conveniently prepare inputs,
  • conveniently program,
  • reliably produce answer (quicker than other techniques), and
  • conveniently read out answer
Conventional computers are built around bit ( taking values 0 or 1) manipulation
One can build arbitarily complex arithmetic if have some way of implementing NOT and AND
Quantum Systems naturally represent bits
  • A spin (of say an electron or proton) is either up or down
  • A hydrogen atom is either in lowest or (first) excited state etc.

HTML version of Scripted Foils prepared 26 January 97

Foil 50 Quantum Computing - III

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Interactions between quantum systems can cause "spin-flips" or state transitions and so implement arithmetic
Incident photons can "read" state of system and so give I/O capabilities
Quantum "bits" called qubits have another property as one has not only
  • State |0> and state |1> but also
  • Coherent states such as .7071*(|0> + |1>) which are equally in either state
Lloyd describes how such coherent states provide new types of computing capabilities
  • Natural random number as measuring state of qubit gives answer 0 or 1 randomly with equal probability
  • As Feynman suggests, qubit based computers are natural for large scale simulation of quantum physical systems -- this is "just" analog computing

HTML version of Scripted Foils prepared 26 January 97

Foil 51 Superconducting Technology -- Past

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Superconductors produce wonderful "wires" which transmit picosecond (10^-12 seconds) pulses at near speed of light
  • Superconducting is lower power and faster than diffusive electron transmission in CMOS
  • At about 0.35micron chip feature size, CMOS transmission time changes from domination by transmission (Distance) issues to resistive (diffusive effects)
Niobium used in constructing such superconducting circuits can be processed by similar fabrication techniques to CMOS
Josephson Junctions allow picosecond performance switches
BUT IBM (!969-1983) and Japan (MITI 1981-90) terminated major efforts in this area

HTML version of Scripted Foils prepared 26 January 97

Foil 52 Superconducting Technology -- Present

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
New ideas have resurrected this concept using RSFQ -- Rapid Single Flux Quantum -- approach
This naturally gives a bit which is 0 or 1 (or in fact n units!)
This gives interesting circuits of similar structure to CMOS systems but with a clock speed of order 100-300GHz -- factor of 100 better than CMOS which will asymptote at around 1 GHz (= one nanosecond cycle time)

HTML version of Scripted Foils prepared 26 January 97

Foil 53 Superconducting Technology -- Problems

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
At least two major problems:
Semiconductor industry will invest some some $40B in CMOS "plants" and infrastructure
  • Currently perhaps $100M a year going into superconducting circuit area!
  • How do we "bootstrap" superconducting industry?
Cannot build memory to match CPU speed and current designs have superconducting CPU's (with perhaps 256 Kbytes superconducting memory per processor) but conventional CMOS memory
  • So compared with current computers have a thousand times faster CPU, factor of four smaller cache of CPU speed and same speed basic memory as now
  • Can such machines perform well -- need new algorithms?
  • Can one design new superconducting memories?
Superconducting technology also has a bad "name" due to IBM termination!

HTML version of Scripted Foils prepared 26 January 97

Foil 54 IV. Architecture Point Designs & SW Design Studies: Point Design Study:

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Sponsored by NSF, DARPA and NASA
Eight awards made of $100,000 each
6 month study of architecture /
SW environment / algorithms

HTML version of Scripted Foils prepared 26 January 97

Foil 55 IV. Architecture Point Designs & SW Design Studies: 1996 Point Design Awards

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Reconfigurable OO architecture
Processor in memory architecture
Algorithmic focus
Hierarchical design
Aggressive cache only architecture
Architecture for N-body problems
Single quantum flux superconducting design
Optical interconnect

HTML version of Scripted Foils prepared 26 January 97

Foil 56 The 8 NSF Point Designs

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Relatively Conventional but still Innovative!
  • Andrew Chien et al. (Illinois) Reconfiguarable MORPH Architecture
  • Torrellas, Padua (Illinois) Aggressive Cache-Only Memory Architecture Multiprocessor (I-Acoma)
  • Ziavras et al. (NJIT, Wayne State) Optical Interconnect with Conventional Processors and Memory
  • Kumar and Sameh (Minnesota) Focus on Algorithms for Hybrid Systems (Clusters of clusters of deep memory hierarchies)
  • Fortes and Taylor(Purdue/Northwestern)Application focus on using Hierarchical Processors and Memory Architecture
Special Purpose Systems
  • McMillan, Hut et al. (Drexel, Princeton, Tokyo, Illinois) Special Purpose GRAPE System for N body Problems
Architecture Innovation (Perhaps Special Purpose)
  • Kogge et al. (Notre Dame) Processor in Memory (PIM) Technology Point Design
Radical Technology Innovation (Superconducting Processors)
  • Sterling et al. (Caltech, SUNY-SB,McGill) HTMT: Hybrid Technology Multi-Threaded Architecture

HTML version of Scripted Foils prepared 26 January 97

Foil 57 Architecture of MORPH NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 58 Architecture of I-ACOMA Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 59 Some MetaComputer Systems

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Cluster of workstations or PC's
Heterogeneous MetaComputer System

HTML version of Scripted Foils prepared 26 January 97

Foil 60 The GRAPE N-Body Machine

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
N body problems (e.g. Newton's laws for one million stars in a globular cluster) can have succesful special purpose devices
See GRAPE (GRAvity PipE) machine (Sugimoto et al. Nature 345 page 90,1990)
  • Essential reason is that such problems need much less memory per floating point unit than most problems
  • Globular Cluster: 10^6 computations per datum stored
  • Finite Element Iteration: A few computations per datum stored
  • Rule of thumb is that one needs one gigabyte of memory per gigaflop of computation in general problems and this general design puts most cost into memory not into CPU.
Note GRAPE uses EXACTLY same parallel algorithm that one finds in the books (e.g. Solving Problems on Concurrent Processors) for N-body problems on classic distributed memory MIMD machines

HTML version of Scripted Foils prepared 26 January 97

Foil 61 GRAPE architecture in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 62 GRAPE Processing Unit in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 63 Why isn't GRAPE a Perfect Solution?

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
GRAPE will execute the classic O(N^2) (parallel) N body algorithm BUT this is not the algorithm used in most such computations
Rather there is the O(N) or O(N)logN so called "fast-multipole" algorithm which uses hierarchical approach
  • On one million stars, fast multipole is a factor of 100-1000 faster than GRAPE algorithm
  • fast multipole works in most but not all N-body problems (in globular clusters, extreme heterogenity makes direct O(N^2) method most attractive)
So special purpose devices cannot usually take advantage of new nifty algorithms!

HTML version of Scripted Foils prepared 26 January 97

Foil 64 Current PIM Chips

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Storage
0.5 MB
0.5 MB
0.05 MB
0.128 MB
Chip
EXECUBE
AD SHARC
TI MVP
MIT MAP
Terasys PIM
First
Silicon
1993
1994
1994
1996
1993
Peak
50 Mips
120 Mflops
2000 Mops
800 Mflops
625 M bit
ops
0.016 MB
MB/
Perf.
0.01
MB/Mip
0.005
MB/MF
0.000025
MB/Mop
0.00016
MB/MF
0.000026
MB/bit op
Organization
16 bit
SIMD/MIMD CMOS
Single CPU and
Memory
1 CPU, 4 DSP's
4 Superscalar
CPU's
1024
16-bit ALU's

HTML version of Scripted Foils prepared 26 January 97

Foil 65 New "Strawman" PIM Processing Node Macro

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 66 "Strawman" Chip Floorplan

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 67 SIA-Based PIM Chip Projections

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
MB per cm2
MF per cm2
MB/MF ratios
MB/MF ratios

HTML version of Scripted Foils prepared 26 January 97

Foil 68 Superconducting Architecture in NSF Petaflop Point Study

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
See Original Foil

HTML version of Scripted Foils prepared 26 January 97

Foil 69 IV. Architecture Point Designs & SW Design Studies: Architectural Framework

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
PetaFLOPS Applications which are grouped into sets with an interface to their own
Problem Solving Environments
Application Level or Virtual Problem Interface ADI
Operating System Services
Multi Resolution Virtual Machine Interfaces joining at lowest levels with
Machine Specific Software
Hardware Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 70 IV. Architecture Point Designs & SW Design Studies: Key SW Development Areas

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
The mission critical applications
Development of shared problem solving environments with rich set of application targeted libraries and resources
Development of common systems software
Programming environments from compilers to multi-level runtime support at the machine independent ADI's
Machine specific software including lowest level of data movement/manipulation

HTML version of Scripted Foils prepared 26 January 97

Foil 71 IV. Architecture Point Designs & SW Design Studies: Software Implementation Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Start now on initial studies to explore the possible system architectures.
These "PetaFLOPS software point studies" should be interdisciplinary involving hardware, systems software and applications expertise.

HTML version of Scripted Foils prepared 26 January 97

Foil 72 Suggested Software Strategy for JNAC (aka Petaflops) Initiative

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
August 28 1996
Geoffrey Fox

HTML version of Scripted Foils prepared 26 January 97

Foil 73 Some Key Observations on PetaSoft Software

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
All proposed hardware architectures have a complex memory hierarchy which should be abstracted with a software architecture
  • Consisting of a mix of machine specific and generic levels with well defined ADI's or Abstract Device Interfaces
  • Management of latency with concurent threads or otherwise critical
This implies a layered software architecture reflected in all components
  • Compiler Language and Runtime, Tools, Systems Software etc.
The Software Architecture should be defined early on so that hardware and software respect it!
  • JNAC Architecture Review Board will be responsible for interfaces and evaluating compliance with them
Users and Compilers must be able to have full control of data movement and placement in all parts of petaflop system
Size and Complex Memory Structure of PetaFlop machines represent major challenges in scaling existing Software Concepts

HTML version of Scripted Foils prepared 26 January 97

Foil 74 Time for a Software Revolution?

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Well the rest of the Software World is Changing with emergence of WebWindows Environment!
Current approaches (HPF,MPI) lack needed capability to address memory hierarchy of either today's or any future contemplated high performance architecture -- whether sequential or parallel
Problem Solving Environments are needed to support complex applications implied by both Web and increasing capabilities of scientific simulations
So I suggest rethinking High Performance Computing Software Models and Implementations!

HTML version of Scripted Foils prepared 26 January 97

Foil 75 Architectural Framework from PetaSoft Meeting

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
PetaFlop Applications which are grouped into sets with an interface to their own
Problem Solving Environments
Application Level or Virtual Problem Interface ADI
Operating System Services
Multi Resolution Virtual Machine Interfaces joining at lowest levels with
Machine Specific Software
Hardware Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 76 Hierarchy from Application to Complex Computer

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Domain Specific Application Problem Solving Environment
Numerical Objects in (C++/Fortran/C/Java) High Level Virtual Problem
Expose the Coarse Grain Parallelism of the Real Complex Computer
Expose All Levels of Memory Hierarchy of the Real Complex Computer
Virtual
Problem /Appl. ADI
Multi
Level
Machine ADI
Pure Script (Interpreted)
High Level Language but Optimized Compilation
Machine Optimized RunTime
Semi-Interpreted
a la Applets

HTML version of Scripted Foils prepared 26 January 97

Foil 77 The Current HPCC Program Execution Model (PEM) illustratrated by MPI/HPF

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
MPI represents data movement with the abstraction for a structure of machines with just two levels of memory
  • On Processor and Off Processor
This was a reasonable model in the past but even today fails to represent complex memory structure of typical microprocessor node
Note HPF Distribution Model has similar (to MPI) underlying relatively simple Abstraction for PEM

HTML version of Scripted Foils prepared 26 January 97

Foil 78 The PetaSoft Program Execution Model

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
This addresses memory hierarchy intra-processor as well as inter-processor
  • Data Movement and Replication defined between Processors as well as between levels of hierarchy on a given processor
Level 2 Cache
Level 1 Cache

HTML version of Scripted Foils prepared 26 January 97

Foil 79 Some Examples of a Layered Software System

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Application Specific Problem Solving Environment
Coarse Grain Coordination Layer e.g. AVS
Massively Parallel Modules (libraries) -- such as DAGH HPF F77 C HPC++ HPJava
Fortran or C plus generic message passing (get,put) and generic memory hierarchy and locality control
Assembly Language plus specific (to architecture) data movement, shared memory and cache control
High
Level
Low Level

HTML version of Scripted Foils prepared 26 January 97

Foil 80 Features of JNAC Software Implementation Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Main JNAC Program is a mix of both research and development with
Development is focused on JNAC machines and identified application areas and along lines of a Broad Systems Architecture established (evaluated and evolved) by JNAC
Work should start now on initial studies to explore the possible system architectures and
Suggest locations for the "sweet-spots" (necks in the hour glass) to define interfaces
These "petaflop software point studies" should be interdisciplinary involving hardware, systems software and applications expertise

HTML version of Scripted Foils prepared 26 January 97

Foil 81 Role of The Architecture Review Board

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Establish and Review Software Architecture and consistent use of Interfaces
}
}
}
}
JNAC Architecture Review
Board
The Five Software Development Areas
}

HTML version of Scripted Foils prepared 26 January 97

Foil 82 The Five Key JNAC Software Development Areas

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
The mission Critical Applications
Development of Approximately 3 shared Problem Solving Environments with rich set of application targeted libraries and resources
  • e.g. PSE's for PDE's, Image Analysis, Forces Modeling
Development of Common Systems Software
Programming Environments from Compilers to multi-level runtime support at the machine independent ADI's
Machine Specific software including lowest level of data movement/manipulation

HTML version of Scripted Foils prepared 26 January 97

Foil 83 Examples of Machine Specific Software

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
code generation
memory management
routing/interconnect
thread management
diagnostics
fault containment
interupt handling
device drivers

HTML version of Scripted Foils prepared 26 January 97

Foil 84 Examples of Operating System Services I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
scalable filesystems
networking interfaces
scheduling
HL-memory management
HL-latency management
performance data
debugging tools
intermediate code representations

HTML version of Scripted Foils prepared 26 January 97

Foil 85 Examples of Operating System Services II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
object files (a.out)
HL-resource management
query of systems state
operating systems services
compiler middleware
basic visualization tools
numerical libraries

HTML version of Scripted Foils prepared 26 January 97

Foil 86 General Philosophy from PetaSoft Meeting

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Define a "clean" model for machine architecture
  • Memory hierarchy including caches and geomterical (distributed) effects
Define a low level "Program Execution Model" (PEM) which allows one to describe movement of information and computation in the machine
  • This can be thought of as "MPI"/assembly language of the machine
On top of low level PEM, one can build an hierarchical (layered) software model
  • At the top of this layered software model, one finds objects or Problem Solving Environments (PSE's)
  • At an intermediate level there is Parallel C C++ or Fortran
One can program at each layer of the software and augment it by "escaping" to a lower level to improve performance
  • Directives (HPF assertions) and explicit insertion of lower level code (HPF extrinsics) are possible

HTML version of Scripted Foils prepared 26 January 97

Foil 87 Features of the The Layered Software Model

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
This is not really a simple stack but a set of complex relations between layers with many interfaces and modules
Interfaces are critical ( for composition across layers)
  • Enable control and performance for application scientists
  • Decouple CS system issues and allow exploration and innovation
Enable Next
10000
Users
For First 100
Pioneer Users
Higher Level abstractions
nearer to
application domain
Increasing Machine
detail, control
and management

HTML version of Scripted Foils prepared 26 January 97

Foil 88 PetaSoft Findings 1) and 2) -- Memory Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
1)Deep Memory Hierarchies present New Challenges to High performance Implementation of programs
  • Latency
  • Bandwidth
  • Capacity
2)There are two dimensions of memory hierarchy management
  • Geometric or Global Structure
  • Local (cache) hierachies seen from thread or processor centric view

HTML version of Scripted Foils prepared 26 January 97

Foil 89 PetaSoft Findings 3) and 4) -- Using Memory Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
3)One needs a machine "mode" which supports predictable and controllable memory system leading to communication and computation with same characteristics
  • Allow Compiler optimization
  • Allow Programmer control and optimization
  • For instance high performance would often need full program control of caches
4)One needs a low level software layer which provides direct control of the machine (memory hierarchy etc.) by a user program
  • This for initial users and program tuning

HTML version of Scripted Foils prepared 26 January 97

Foil 90 PetaSoft Findings 5) and 6) -- Layered Software

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
5)One needs a layered (hierarchical) software model which supports an efficient use of multiple levels of abstraction in a single program.
  • Higher levels of Programming model hide extraneous complexity
  • highest layers are application dependent Problem Solving Environments and lower levels are machine dependent
  • Lower levels can be accessed for additional performance
  • e.g. HPF Extrinsics. Gcc ASM, MATLAB Fortran Routines, Native classes in Java
6)One needs a set of software tools which match the layered software (programming model)
  • Debuggers, Performance and load balancing tools

HTML version of Scripted Foils prepared 26 January 97

Foil 91 PetaSoft Recommendations 1) to 3) Memory and Software Hierarchy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
1)Explore issues in design of petaComputer machine models which will support the controllable hierarchical memory systems in a range of important architectures
  • Research and development in areas of findings 3) and 4)
2)Explore techniques for control of memory hierarchy for petaComputer architectures
  • Use testbeds
3)Explore issues in designing layered software architectures -- particularly efficient mapping and efficient interfaces to lower levels
  • Use context of petaflop applications and machines
  • e.g. HPF is a possible layer while HPF Extrinsics is an interface to a lower (MPI) layer

HTML version of Scripted Foils prepared 26 January 97

Foil 92 III. Key drivers: Summary

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
There are compelling applications
New architectures need to be investigated
Component technologies need to developed
Major advances are needed in system software and tools
Industry is less likely than ever to push limits.

HTML version of Scripted Foils prepared 26 January 97

Foil 93 V. A National program concept: Basis

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Key, focused R&D must be explicitly funded
Program is mostly D augmented with increases in R in HW/SW.
Advanced systems designed and prototyped by the program.
D will need strong central management.
Applications tightly coupled with coordinated SW development groups.

HTML version of Scripted Foils prepared 26 January 97

Foil 94 V. A National program concept: Scope & Strategy

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Target dozens of applications (not 100's)
100's of programmers not thousands
Deploy PF class systems < 10 years
Starting in FY98
Multiple technology options
New technologies and architectures
Balance vendor vs direct development
Open RFP for future systems

HTML version of Scripted Foils prepared 26 January 97

Foil 95 V. A National program concept: Technology Projection Model

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Three "tracks" for illustration (might be more or less)
  • current trends (SIA roadmap)
  • new architecture w/commercial process tech.
  • new arch and new process tech.
Deploy systems continuously
Span generations with software model
Pull with RFPs
Push with technology investments

HTML version of Scripted Foils prepared 26 January 97

Foil 96 V. A National program concept: Structure & Flow

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 97 V. A National program concept: Technology Model

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index

HTML version of Scripted Foils prepared 26 January 97

Foil 98 V. A National program concept: Research Projects - Technology

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Chip Interface:
  • Fabrication Technology
  • Laser & CMOS chip integration
Optical Networks:
  • Pbps bandwidths
  • 1000 ports
Superconducting Memories:
  • 100 billion accesses/sec.
Holographic Memories

HTML version of Scripted Foils prepared 26 January 97

Foil 99 V. A National program concept: Research Projects - Architecture

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Natural Evolution Systems:
  • Likely path of COTS technology.
Special Purpose Architecture:
  • Driven by Specific algorithms
  • Develop proof of concept.
Hybrid Technology Architecture Development:
  • Exploit advanced technologies.
  • Integrate advanced technologies into system.

HTML version of Scripted Foils prepared 26 January 97

Foil 100 V. A National program concept: Research Projects - System Software

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
PetaFLOPS Languages:
  • Develop consistent set of languages
  • Develop programming interfaces.
  • Support multiple architecture projects
Operating Systems:
  • Highly scalable.
  • Massive concurrency
  • High bandwidth & virtual memory.
Runtime Systems

HTML version of Scripted Foils prepared 26 January 97

Foil 101 V. A National program concept: Research Projects - Applications & Algorithms

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Algorithms to reduce latency associated with
petaFLOPS scale:
memory hierarchies &
processor ensembles.
Driver Applications:
  • Important to national objectives
  • Includes scaling up to petaFLOPS
  • Supports multiple architecture projects

HTML version of Scripted Foils prepared 26 January 97

Foil 102 V. A National focused program concept: Early Program Milestones

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
SW level interface definitions
Projection of performance requirements to lower levels (performance based design)
Applications analysis wrt specific programming models (machines)
Experimental testbeds simulated/modeled on existing MPP

HTML version of Scripted Foils prepared 26 January 97

Foil 103 Now we follow with A Comparison of JNAC and HPCC

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Next Three foils isolate some differences and commonalities in two programs

HTML version of Scripted Foils prepared 26 January 97

Foil 104 Comparison of HPCC and JNAC - I

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Both set a hardware goal (teraflop for HPCC and petaflop for JNAC) to focus activity but in each case systems and applications were main justification
Both couple applications with software and architectures in multidisciplinary teams with multi-agency support
HPCC was dominantly research
  • JNAC is roughly 50-50 research and development
HPCC inevitably developed MPP's and transferred parallel computing to computing mainstream
  • JNAC's challenge is memory hierarchy and will transfer understanding of this to mainstream independent of parallelism

HTML version of Scripted Foils prepared 26 January 97

Foil 105 Comparison of HPCC and JNAC - II

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
HPCC aimed at Grand challenges in Industry Government and Academia
  • JNAC aimed at government (including NSF) mission critical applications
HPCC developed software (PSE's) largely independently in each Grand Challenge
  • JNAC will link software efforts to a few PSE's and a common set of JNAC Interfaces
HPCC tended to develop hardware with rapidly changing architectures which software "chased" rather laboriously
  • JNAC develops software simultaneously with hardware and to a uniform common architecture allowing better re-use of both application and systems software

HTML version of Scripted Foils prepared 26 January 97

Foil 106 Comparison of HPCC and JNAC - III

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
HPCC aimed to transfer technology to Industry for commercialization
  • JNAC relies on industry to build systems designed by laboratory, university and industry consortia
HPCC is Research -->Capitalization-->Product
  • JNAC is mission driven development linked to supporting research with engineering prototypes as capitalization stage
HPCC was a broad program aimed at "all" (large scale) users of computers
  • JNAC is a focused program and aims at "top 100" power users

HTML version of Scripted Foils prepared 26 January 97

Foil 107 VI. Future actions necessary to mold an R&D program: The Message

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Need to invest in computing at the high end.
PetaFLOPS level of performance are feasible.
Private sector is not going to do it alone.

HTML version of Scripted Foils prepared 26 January 97

Foil 108 VI. Future actions necessary to mold an R&D program: Near Term Recommendation

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Conduct detailed PetaFLOPS architecture design & simulation studies.
Initiate early software development of layered architecture.
Develop PetaFLOPS scale latency management
Accelerate R&D in advanced technologies.
Invent algorithms for special purpose and reconfigurable structures.

HTML version of Scripted Foils prepared 26 January 97

Foil 109 VI. Future actions necessary to mold an R&D program: Next Steps

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
The PetaFLOPS Frontier (Oct.. 96)
  • Open community, international
PetaFLOPS Algorithms Workshop (Apr.. 97)
PetaFLOPS II Conference (Sep.. 97)
  • Algorithm driven Architecture
  • Based on Point Designs against components
  • Applications drivers/scenarios
  • Leap Technology elements
Engage community in establishing challenges, directions, topics for research

HTML version of Scripted Foils prepared 26 January 97

Foil 110 VI. Future actions necessary to mold an R&D program: PetaFLOPS Algorithms Workshop (PAL`97)

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Location: Williamsburg Hospitality House, Williamsburg, VA; April 20-25, 1997
Chair; David Bailey, NASA Ames
Objectives:
  • Identify novel algorithmic approaches that may be better suited to future PetaFLOPS systems.
  • Present quantitative analysis of PetaFLOPS algorithms
  • Plan future PetaFLOPS algorithm research activities.
  • Present and analyze new PetaFLOPS architecture designs.

HTML version of Scripted Foils prepared 26 January 97

Foil 111 VI. Future actions : Next Steps Integrate into Federal R&D Planning

From Variety of Foils Used Starting January 97 General -- 1997. *
Full HTML Index
Coordinate with High End Computing & Computation (HECC) Working Group.
Develop Technical Approach --NOW
Strategy for developing National Initiative.
Multi-agency efforts.
Federal agencies plan for FY'98 budget submission

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Aug 10 1997