Printing List for CPS615 - Overview of Computer Architectures

Find this at http://www.npac.syr.edu/users/gcf/cps615arch98/

CPS615 - Overview of Computer Architectures

Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 98. Foils prepared 17 November 1998

This presentation came from material developed by David Culler and Jack Dongarra available on the Web
See summary of Saleh Elmohamed and Ken Hawick at http://nhse.npac.syr.edu/hpccsurvey/
We discuss several examples in detail including T3E, Origin 2000, Sun E10000 and Tera MTA
These are used to illustrate major architecture types
We discuss key sequential architecture issues including cache structure
We also discuss technologies from today's commodities through Petaflop ideas and Quantum Computing

This mixed presentation uses parts of the following base foilsets which can also be looked at on their own!

CPS615ArchMasterFall98     Master Foilset for HPC Achitecture Overview
CPS615Master96             Master Set of Foils for 1996 Session of 
                            CPS615
CPS615Master97             Master Set of Foils for 1997 Session of 
                            CPS615
DynamicWebPagesgivenbyURL  Title and Abstract of FakeFoilset
CPS615-95A                 Master Set A of Overview Material on 
                            Parallel Computing for CPS615 Foils
CPS615-95B                 Master Set B of Overview Material on 
                            Parallel Computing for CPS615 Foils
SmithPetaOverview2         PetaFlop(JNAC) Overview Presentations -- 
                            Results of Studies and Next Steps Sep 
                            19,96
GeneralFoils97             Variety of Foils Used Starting January 97
GeneralResFoils96          Miscellaneous Presentation Material used in
                             1996
CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC 
                            -- From GigaFlops to PetaFlops and From 
                            Tightly Coupled MPP's to the World Wide 
                            Web
KoggePimTalk               Processing-In-Memory (PIM) Architectures 
                            for Very High Performance MPP Computing

Table of Contents for CPS615 - Overview of Computer Architectures

A Brief Discussion of Computer Architectures

Enough to motivate Introductory Technologies
Start with Some Pictures from NPAC

     CPS615ArchMasterFall98 001 001 Computer Architecture for 
                                    Computational Science
     CPS615ArchMasterFall98 002 002 Abstract of Computer Architecture 
                                    Overview
     CPS615ArchMasterFall98 003 003 Some NPAC Parallel Machines

Technologies of Relevance

             CPS615Master96 012 004 Technologies for High Performance 
                                    Computers
             CPS615Master96 013 005 Architectures for High Performance
                                     Computers - I
             CPS615Master96 014 006 Architectures for High Performance
                                     Computers - II
             CPS615Master96 015 007 There is no Best Machine!

Commodity Driving Forces

     CPS615ArchMasterFall98 004 008 Architectural Trends I
     CPS615ArchMasterFall98 005 009 Architectural Trends
     CPS615ArchMasterFall98 006 010 3 Classes of VLSI Design?
             CPS615Master97 011 011 Ames Summer 97 Workshop on Device 
                                    Technology -- Moore's Law - I
             CPS615Master97 012 012 Ames Summer 97 Workshop on Device 
                                    Technology -- Moore's Law - II
             CPS615Master97 013 013 Ames Summer 97 Workshop on Device 
                                    Technology -- Alternate 
                                    Technologies I
             CPS615Master97 014 014 Ames Summer 97 Workshop on Device 
                                    Technology -- Alternate 
                                    Technologies II
     CPS615ArchMasterFall98 007 015 Architectural Trends: Bus-based 
                                    SMPs
     CPS615ArchMasterFall98 008 016 Bus Bandwidth
     CPS615ArchMasterFall98 009 017 Economics

Parallel Computing Architectures

     CPS615ArchMasterFall98 010 018 Important High Performance 
                                    Computing Architectures
     CPS615ArchMasterFall98 011 019 Some General Issues Addressed by 
                                    High Performance Architectures
             CPS615Master96 022 020 Architecture Classes of High 
                                    Performance Computers
             CPS615Master96 028 021 Flynn's Classification of HPC 
                                    Systems

Performance Issues

     CPS615ArchMasterFall98 022 022 Raw Uniprocessor  Performance: 
                                    Cray v. Microprocessor LINPACK n 
                                    by n Matrix Solves
     CPS615ArchMasterFall98 023 023 Raw Parallel  Performance: LINPACK
     CPS615ArchMasterFall98 024 024 Linear Linpack HPC Performance 
                                    versus Time
     CPS615ArchMasterFall98 025 025 Top 10 Supercomputers November 
                                    1998
     CPS615ArchMasterFall98 026 026 Distribution of 500 Fastest 
                                    Computers
     CPS615ArchMasterFall98 027 027 CPU Technology used in Top 500 
                                    versus Time
     CPS615ArchMasterFall98 028 028 Geographical Distribution of Top 
                                    500 Supercomputers versus time
     CPS615ArchMasterFall98 029 029 Node Technology used in Top 500 
                                    Supercomputers versus Time
     CPS615ArchMasterFall98 030 030 Total Performance in Top 500 
                                    Supercomputers versus Time and 
                                    Manufacturer
     CPS615ArchMasterFall98 031 031 Number of Top 500 Systems as a 
                                    function of time and Manufacturer
     CPS615ArchMasterFall98 032 032 Total Number of Top 500 Systems 
                                    Installed June 98 versus 
                                    Manufacturer
  DynamicWebPagesgivenbyURL 003 033 Netlib Benchweb Benchmarks
  DynamicWebPagesgivenbyURL 004 034 Linpack Benchmarks
  DynamicWebPagesgivenbyURL 005 035 Java Linpack Benchmarks
  DynamicWebPagesgivenbyURL 006 036 Java Numerics

Sequential Computer Architecture

             CPS615Master96 023 037 von Neuman Architecture in a 
                                    Nutshell

Pipelining

     CPS615ArchMasterFall98 012 038 What is a Pipeline -- Cafeteria 
                                    Analogy?
                 CPS615-95A 042 039 Instruction Flow in A Simple 
                                    Machine  Pipeline
     CPS615ArchMasterFall98 013 040 Example of MIPS R4000 Floating 
                                    Point
     CPS615ArchMasterFall98 014 041 MIPS R4000 Floating Point Stages

Caches

             CPS615Master96 024 042 Illustration of Importance of 
                                    Cache
     CPS615ArchMasterFall98 015 043 Sequential Memory Structure
     CPS615ArchMasterFall98 016 044 Cache Issues I
     CPS615ArchMasterFall98 017 045 Cache Issues II
     CPS615ArchMasterFall98 018 046 Spatial versus Temporal Locality I
     CPS615ArchMasterFall98 019 047 Spatial versus Temporal Locality 
                                    II

Cray T3E as an Example of a Cache

     CPS615ArchMasterFall98 051 048 Cray/SGI memory latencies
     CPS615ArchMasterFall98 052 049 Architecture of Cray T3E
     CPS615ArchMasterFall98 053 050 T3E Messaging System
     CPS615ArchMasterFall98 054 051 Cray T3E Cache Structure
     CPS615ArchMasterFall98 055 052 Cray T3E Cache Performance
     CPS615ArchMasterFall98 056 053 Finite Difference Example for T3E 
                                    Cache Use I
     CPS615ArchMasterFall98 057 054 Finite Difference Example for T3E 
                                    Cache Use II
     CPS615ArchMasterFall98 058 055 How to use Cache in Example I
     CPS615ArchMasterFall98 059 056 How to use Cache in Example II

Vector Architecture

     CPS615ArchMasterFall98 050 057 Cray Vector  Supercomputers
             CPS615Master96 025 058 Vector Supercomputers in a 
                                    Nutshell - I
             CPS615Master96 026 059 Vector Supercomputing in a picture
             CPS615Master96 027 060 Vector Supercomputers in a 
                                    Nutshell - II

Parallel Memory Structure

             CPS615Master96 029 061 Parallel Computer Architecture 
                                    Memory Structure
             CPS615Master96 030 062 Comparison of Memory Access 
                                    Strategies
             CPS615Master96 031 063 Types of Parallel Memory 
                                    Architectures -- Physical 
                                    Characteristics
             CPS615Master96 032 064 Diagrams of Shared and Distributed
                                     Memories

Parallel Control Structure

             CPS615Master96 033 065 Parallel Computer Architecture 
                                    Control Structure

MIMD Architectures

     CPS615ArchMasterFall98 043 066 Mark2 Hypercube  built by 
                                    JPL(1985) Cosmic Cube (1983)  
                                    built by  Caltech (Chuck Seitz)
     CPS615ArchMasterFall98 044 067 64 Ncube Processors (each with 6 
                                    memory chips) on a large board
     CPS615ArchMasterFall98 045 068 ncube1 Chip -- integrated CPU  and
                                     communication channels
     CPS615ArchMasterFall98 046 069 Example of Message  Passing 
                                    System:  IBM SP-2
     CPS615ArchMasterFall98 047 070 Example of Message  Passing 
                                    System:  Intel Paragon
     CPS615ArchMasterFall98 101 071 ASCI Red -- Intel Supercomputer at
                                     Sandia

Parallel Computing Cache Issues

     CPS615ArchMasterFall98 020 072 Parallel Computer Memory Structure
     CPS615ArchMasterFall98 066 073 Cache Coherent or Not?
     CPS615ArchMasterFall98 021 074 Cache Coherence

Origin 2000 as an Example of Cache Coherence

     CPS615ArchMasterFall98 060 075 SGI Origin 2000 I
     CPS615ArchMasterFall98 061 076 SGI Origin II
     CPS615ArchMasterFall98 062 077 SGI Origin Block Diagram
     CPS615ArchMasterFall98 063 078 SGI Origin III
     CPS615ArchMasterFall98 064 079 SGI Origin 2 Processor Node Board
     CPS615ArchMasterFall98 065 080 Performance of NCSA 128 node SGI 
                                    Origin 2000
     CPS615ArchMasterFall98 067 081 Summary of Cache Coherence 
                                    Approaches

SIMD Architectures

             CPS615Master96 036 082 Some Major Hardware Architectures 
                                    - SIMD
             CPS615Master96 037 083 SIMD (Single Instruction Multiple 
                                    Data) Architecture
     CPS615ArchMasterFall98 093 084 Examples of Some SIMD machines
     CPS615ArchMasterFall98 097 085 SIMD CM 2 from Thinking Machines
     CPS615ArchMasterFall98 098 086 Official  Thinking  Machines  
                                    Specification  of CM2

Metacomputers

             CPS615Master96 038 087 Some Major Hardware Architectures 
                                    - Mixed
             CPS615Master96 039 088 Some MetaComputer Systems
     CPS615ArchMasterFall98 048 089 Clusters of PC's 1986-1998
     CPS615ArchMasterFall98 049 090 HP Kayak PC (300 MHz Intel Pentium
                                     II) vs Origin 2000

Special Purpose Devices

             CPS615Master96 040 091 Comments on Special Purpose 
                                    Devices
             CPS615Master96 041 092 The GRAPE N-Body Machine
             CPS615Master96 042 093 Why isn't GRAPE a Perfect 
                                    Solution?
     CPS615ArchMasterFall98 099 094 GRAPE Special Purpose Machines
     CPS615ArchMasterFall98 100 095 Quantum ChromoDynamics (QCD) 
                                    Special Purpose Machines

Granularity

             CPS615Master96 043 096 Granularity of Parallel Components
                                     - I
             CPS615Master96 044 097 Granularity of Parallel Components
                                     - II

Parallel Computer Networks

             CPS615Master96 045 098 Classes of Communication Networks
             CPS615Master96 046 099 Switch and Bus based Architectures
             CPS615Master96 047 100 Examples of Interconnection 
                                    Topologies
             CPS615Master96 048 101 Useful Concepts in Communication 
                                    Systems

Network Performance

                 CPS615-95B 021 102 Latency and Bandwidth of a Network
                 CPS615-95B 022 103 Transfer Time in Microseconds for 
                                    both Shared Memory Operations and 
                                    Explicit Message Passing
                 CPS615-95B 023 104 Latency/Bandwidth Space for 0-byte
                                     message(Latency) and 1 MB 
                                    message(bandwidth).
             CPS615Master96 049 105 Communication Performance of Some 
                                    MPP's
             CPS615Master96 050 106 Implication of Hardware 
                                    Performance
     CPS615ArchMasterFall98 077 107 MPI Bandwidth on SGI Origin and 
                                    Sun Shared Memory Machines
     CPS615ArchMasterFall98 078 108 Latency Measurements on Origin and
                                     Sun for MPI

Architectures according to Culler

     CPS615ArchMasterFall98 033 109 Two Basic Programming Models
     CPS615ArchMasterFall98 034 110 Shared Address Space Architectures
     CPS615ArchMasterFall98 035 111 Shared Address Space Model
     CPS615ArchMasterFall98 036 112 Communication Hardware
     CPS615ArchMasterFall98 037 113 History -- Mainframe
     CPS615ArchMasterFall98 038 114 History -- Minicomputer
     CPS615ArchMasterFall98 039 115 Scalable Interconnects
     CPS615ArchMasterFall98 040 116 Message Passing Architectures
     CPS615ArchMasterFall98 041 117 Message-Passing Abstraction e.g. 
                                    MPI
     CPS615ArchMasterFall98 042 118 First Message-Passing Machines

Intel SMP

     CPS615ArchMasterFall98 068 119 SMP Example: Intel Pentium Pro 
                                    Quad

Sun E10000 as Example of UMA Commodity System

     CPS615ArchMasterFall98 069 120 Sun E10000 in a Nutshell
     CPS615ArchMasterFall98 070 121 Sun Enterprise Systems E6000/10000
     CPS615ArchMasterFall98 071 122 Starfire E10000 Architecture I
     CPS615ArchMasterFall98 072 123 Starfire E10000 Architecture II
     CPS615ArchMasterFall98 073 124 Sun Enterprise E6000/6500 
                                    Architecture
     CPS615ArchMasterFall98 074 125 Sun's Evaluation of E10000 
                                    Characteristics I
     CPS615ArchMasterFall98 075 126 Sun's Evaluation of E10000 
                                    Characteristics II
     CPS615ArchMasterFall98 076 127 Scalability of E1000

Current Near Term Trends

     CPS615ArchMasterFall98 094 128 Consider Scientific Supercomputing
     CPS615ArchMasterFall98 095 129 Toward Architectural Convergence
     CPS615ArchMasterFall98 096 130 Convergence: Generic Parallel 
                                    Architecture

Emerging Architectures MTA and COMA

     CPS615ArchMasterFall98 079 131 Tera Multithreaded Supercomputer
     CPS615ArchMasterFall98 080 132 Tera Computer at San Diego 
                                    Supercomputer Center
     CPS615ArchMasterFall98 081 133 Overview of the Tera MTA I
     CPS615ArchMasterFall98 082 134 Overview of the Tera MTA II
     CPS615ArchMasterFall98 083 135 Tera 1 Processor Architecture from
                                      H. Bokhari (ICASE)
     CPS615ArchMasterFall98 084 136 Tera Processor Characteristics
     CPS615ArchMasterFall98 085 137 Tera System Diagram
     CPS615ArchMasterFall98 086 138 Interconnect / Communications 
                                    System of Tera I
     CPS615ArchMasterFall98 087 139 Interconnect / Communications 
                                    System of Tera II
     CPS615ArchMasterFall98 088 140 T90/Tera MTA Hardware Comparison
     CPS615ArchMasterFall98 089 141 Tera Configurations / Performance
     CPS615ArchMasterFall98 090 142 Performance of MTA wrt T90 and in 
                                    parallel
     CPS615ArchMasterFall98 091 143 Tera MTA Performance on NAS 
                                    Benchmarks Compared to T90
     CPS615ArchMasterFall98 092 144 Cache Only COMA Machines

Application Motivation for PetaFlops

         SmithPetaOverview2 015 145 III. Key drivers:  The Need for 
                                    PetaFLOPS  Computing
             GeneralFoils97 007 146 10 Possible PetaFlop Applications
          GeneralResFoils96 031 147 Petaflop Performance for Flow in 
                                    Porous Media?
          GeneralResFoils96 032 148 Target Flow in Porous Media 
                                    Problem (Glimm - Petaflop 
                                    Workshop)
          GeneralResFoils96 033 149 NASA's Projection of Memory and 
                                    Computational Requirements upto 
                                    Petaflops for Aerospace 
                                    Applications

The 3 classes of PetaFlop Designs

CornellHPCCOverview96Master 005 150 Supercomputer Architectures in 
                                    Years 2005-2010 -- I
CornellHPCCOverview96Master 006 151 Supercomputer Architectures in 
                                    Years 2005-2010 -- II
CornellHPCCOverview96Master 007 152 Supercomputer Architectures in 
                                    Years 2005-2010 -- III
               KoggePimTalk 037 153 Performance Per Transistor
CornellHPCCOverview96Master 008 154 Comparison of Supercomputer 
                                    Architectures

The Processor in Memory Design

               KoggePimTalk 023 155 Current PIM Chips
               KoggePimTalk 030 156 New "Strawman" PIM  
                                    Processing Node Macro
               KoggePimTalk 031 157 "Strawman" Chip 
                                    Floorplan
               KoggePimTalk 038 158 SIA-Based PIM Chip Projections

Exotic Technology: Quantum Computing

             CPS615Master96 016 159 Quantum Computing - I
             CPS615Master96 017 160 Quantum Computing - II
             CPS615Master96 018 161 Quantum Computing - III

Exotic Technology: Superconducting Technology

             CPS615Master96 019 162 Superconducting Technology -- Past
             CPS615Master96 020 163 Superconducting Technology -- 
                                    Present
             CPS615Master96 021 164 Superconducting Technology -- 
                                    Problems

List of Foils Used as they occur

CPS615ArchMasterFall98 Master Foilset for HPC Achitecture Overview
1 2 3 4 5 6 7 8 9 10 11 22 23 24 25 26 27 28 29 30 31 32 12 13 14 15 16 17 18 19 51 52 53 54 55 56 57 58 59 50 43 44 45 46 47 101 20 66 21 60 61 62 63 64 65 67 93 97 98 48 49 99 100 77 78 33 34 35 36 37 38 39 40 41 42 68 69 70 71 72 73 74 75 76 94 95 96 79 80 81 82 83 84 85 86 87 88 89 90 91 92

CPS615Master96 Master Set of Foils for 1996 Session of CPS615
12 13 14 15 22 28 23 24 25 26 27 29 30 31 32 33 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 16 17 18 19 20 21

CPS615Master97 Master Set of Foils for 1997 Session of CPS615
11 12 13 14

DynamicWebPagesgivenbyURL Title and Abstract of FakeFoilset
3 4 5 6

CPS615-95A Master Set A of Overview Material on Parallel Computing for CPS615 Foils
42

CPS615-95B Master Set B of Overview Material on Parallel Computing for CPS615 Foils
21 22 23

SmithPetaOverview2 PetaFlop(JNAC) Overview Presentations -- Results of Studies and Next Steps Sep 19,96
15

GeneralFoils97 Variety of Foils Used Starting January 97
7

GeneralResFoils96 Miscellaneous Presentation Material used in 1996
31 32 33

CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC -- From GigaFlops to PetaFlops and From Tightly Coupled MPP's to the World Wide Web
5 6 7 8

KoggePimTalk Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing
37 23 30 31 38

Sorted List of Foils Used

CPS615ArchMasterFall98 Master Foilset for HPC Achitecture Overview
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101

CPS615Master96 Master Set of Foils for 1996 Session of CPS615
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

CPS615Master97 Master Set of Foils for 1997 Session of CPS615
11 12 13 14

DynamicWebPagesgivenbyURL Title and Abstract of FakeFoilset
3 4 5 6

CPS615-95A Master Set A of Overview Material on Parallel Computing for CPS615 Foils
42

CPS615-95B Master Set B of Overview Material on Parallel Computing for CPS615 Foils
21 22 23

SmithPetaOverview2 PetaFlop(JNAC) Overview Presentations -- Results of Studies and Next Steps Sep 19,96
15

GeneralFoils97 Variety of Foils Used Starting January 97
7

GeneralResFoils96 Miscellaneous Presentation Material used in 1996
31 32 33

CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC -- From GigaFlops to PetaFlops and From Tightly Coupled MPP's to the World Wide Web
5 6 7 8

KoggePimTalk Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing
23 30 31 37 38

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Apr 12 1999