This presentation came from material developed by David Culler and Jack Dongarra available on the Web |
See summary of Saleh Elmohamed and Ken Hawick at http://nhse.npac.syr.edu/hpccsurvey/ |
We discuss several examples in detail including T3E, Origin 2000, Sun E10000 and Tera MTA |
These are used to illustrate major architecture types |
We discuss key sequential architecture issues including cache structure |
We also discuss technologies from today's commodities through Petaflop ideas and Quantum Computing |
CPS615ArchMasterFall98 Master Foilset for HPC Achitecture Overview CPS615Master96 Master Set of Foils for 1996 Session of CPS615 CPS615Master97 Master Set of Foils for 1997 Session of CPS615 DynamicWebPagesgivenbyURL Title and Abstract of FakeFoilset CPS615-95A Master Set A of Overview Material on Parallel Computing for CPS615 Foils CPS615-95B Master Set B of Overview Material on Parallel Computing for CPS615 Foils SmithPetaOverview2 PetaFlop(JNAC) Overview Presentations -- Results of Studies and Next Steps Sep 19,96 GeneralFoils97 Variety of Foils Used Starting January 97 GeneralResFoils96 Miscellaneous Presentation Material used in 1996 CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC -- From GigaFlops to PetaFlops and From Tightly Coupled MPP's to the World Wide Web KoggePimTalk Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing
CPS615ArchMasterFall98 001 001 Computer Architecture for Computational Science CPS615ArchMasterFall98 002 002 Abstract of Computer Architecture Overview CPS615ArchMasterFall98 003 003 Some NPAC Parallel Machines
CPS615Master96 012 004 Technologies for High Performance Computers CPS615Master96 013 005 Architectures for High Performance Computers - I CPS615Master96 014 006 Architectures for High Performance Computers - II CPS615Master96 015 007 There is no Best Machine!
CPS615ArchMasterFall98 004 008 Architectural Trends I CPS615ArchMasterFall98 005 009 Architectural Trends CPS615ArchMasterFall98 006 010 3 Classes of VLSI Design? CPS615Master97 011 011 Ames Summer 97 Workshop on Device Technology -- Moore's Law - I CPS615Master97 012 012 Ames Summer 97 Workshop on Device Technology -- Moore's Law - II CPS615Master97 013 013 Ames Summer 97 Workshop on Device Technology -- Alternate Technologies I CPS615Master97 014 014 Ames Summer 97 Workshop on Device Technology -- Alternate Technologies II CPS615ArchMasterFall98 007 015 Architectural Trends: Bus-based SMPs CPS615ArchMasterFall98 008 016 Bus Bandwidth CPS615ArchMasterFall98 009 017 Economics
CPS615ArchMasterFall98 010 018 Important High Performance Computing Architectures CPS615ArchMasterFall98 011 019 Some General Issues Addressed by High Performance Architectures CPS615Master96 022 020 Architecture Classes of High Performance Computers CPS615Master96 028 021 Flynn's Classification of HPC Systems
CPS615ArchMasterFall98 022 022 Raw Uniprocessor Performance: Cray v. Microprocessor LINPACK n by n Matrix Solves CPS615ArchMasterFall98 023 023 Raw Parallel Performance: LINPACK CPS615ArchMasterFall98 024 024 Linear Linpack HPC Performance versus Time CPS615ArchMasterFall98 025 025 Top 10 Supercomputers November 1998 CPS615ArchMasterFall98 026 026 Distribution of 500 Fastest Computers CPS615ArchMasterFall98 027 027 CPU Technology used in Top 500 versus Time CPS615ArchMasterFall98 028 028 Geographical Distribution of Top 500 Supercomputers versus time CPS615ArchMasterFall98 029 029 Node Technology used in Top 500 Supercomputers versus Time CPS615ArchMasterFall98 030 030 Total Performance in Top 500 Supercomputers versus Time and Manufacturer CPS615ArchMasterFall98 031 031 Number of Top 500 Systems as a function of time and Manufacturer CPS615ArchMasterFall98 032 032 Total Number of Top 500 Systems Installed June 98 versus Manufacturer DynamicWebPagesgivenbyURL 003 033 Netlib Benchweb Benchmarks DynamicWebPagesgivenbyURL 004 034 Linpack Benchmarks DynamicWebPagesgivenbyURL 005 035 Java Linpack Benchmarks DynamicWebPagesgivenbyURL 006 036 Java Numerics
CPS615Master96 023 037 von Neuman Architecture in a Nutshell
CPS615ArchMasterFall98 012 038 What is a Pipeline -- Cafeteria Analogy? CPS615-95A 042 039 Instruction Flow in A Simple Machine Pipeline CPS615ArchMasterFall98 013 040 Example of MIPS R4000 Floating Point CPS615ArchMasterFall98 014 041 MIPS R4000 Floating Point Stages
CPS615Master96 024 042 Illustration of Importance of Cache CPS615ArchMasterFall98 015 043 Sequential Memory Structure CPS615ArchMasterFall98 016 044 Cache Issues I CPS615ArchMasterFall98 017 045 Cache Issues II CPS615ArchMasterFall98 018 046 Spatial versus Temporal Locality I CPS615ArchMasterFall98 019 047 Spatial versus Temporal Locality II
CPS615ArchMasterFall98 051 048 Cray/SGI memory latencies CPS615ArchMasterFall98 052 049 Architecture of Cray T3E CPS615ArchMasterFall98 053 050 T3E Messaging System CPS615ArchMasterFall98 054 051 Cray T3E Cache Structure CPS615ArchMasterFall98 055 052 Cray T3E Cache Performance CPS615ArchMasterFall98 056 053 Finite Difference Example for T3E Cache Use I CPS615ArchMasterFall98 057 054 Finite Difference Example for T3E Cache Use II CPS615ArchMasterFall98 058 055 How to use Cache in Example I CPS615ArchMasterFall98 059 056 How to use Cache in Example II
CPS615ArchMasterFall98 050 057 Cray Vector Supercomputers CPS615Master96 025 058 Vector Supercomputers in a Nutshell - I CPS615Master96 026 059 Vector Supercomputing in a picture CPS615Master96 027 060 Vector Supercomputers in a Nutshell - II
CPS615Master96 029 061 Parallel Computer Architecture Memory Structure CPS615Master96 030 062 Comparison of Memory Access Strategies CPS615Master96 031 063 Types of Parallel Memory Architectures -- Physical Characteristics CPS615Master96 032 064 Diagrams of Shared and Distributed Memories
CPS615Master96 033 065 Parallel Computer Architecture Control Structure
CPS615ArchMasterFall98 043 066 Mark2 Hypercube built by JPL(1985) Cosmic Cube (1983) built by Caltech (Chuck Seitz) CPS615ArchMasterFall98 044 067 64 Ncube Processors (each with 6 memory chips) on a large board CPS615ArchMasterFall98 045 068 ncube1 Chip -- integrated CPU and communication channels CPS615ArchMasterFall98 046 069 Example of Message Passing System: IBM SP-2 CPS615ArchMasterFall98 047 070 Example of Message Passing System: Intel Paragon CPS615ArchMasterFall98 101 071 ASCI Red -- Intel Supercomputer at Sandia
CPS615ArchMasterFall98 020 072 Parallel Computer Memory Structure CPS615ArchMasterFall98 066 073 Cache Coherent or Not? CPS615ArchMasterFall98 021 074 Cache Coherence
CPS615ArchMasterFall98 060 075 SGI Origin 2000 I CPS615ArchMasterFall98 061 076 SGI Origin II CPS615ArchMasterFall98 062 077 SGI Origin Block Diagram CPS615ArchMasterFall98 063 078 SGI Origin III CPS615ArchMasterFall98 064 079 SGI Origin 2 Processor Node Board CPS615ArchMasterFall98 065 080 Performance of NCSA 128 node SGI Origin 2000 CPS615ArchMasterFall98 067 081 Summary of Cache Coherence Approaches
CPS615Master96 036 082 Some Major Hardware Architectures - SIMD CPS615Master96 037 083 SIMD (Single Instruction Multiple Data) Architecture CPS615ArchMasterFall98 093 084 Examples of Some SIMD machines CPS615ArchMasterFall98 097 085 SIMD CM 2 from Thinking Machines CPS615ArchMasterFall98 098 086 Official Thinking Machines Specification of CM2
CPS615Master96 038 087 Some Major Hardware Architectures - Mixed CPS615Master96 039 088 Some MetaComputer Systems CPS615ArchMasterFall98 048 089 Clusters of PC's 1986-1998 CPS615ArchMasterFall98 049 090 HP Kayak PC (300 MHz Intel Pentium II) vs Origin 2000
CPS615Master96 040 091 Comments on Special Purpose Devices CPS615Master96 041 092 The GRAPE N-Body Machine CPS615Master96 042 093 Why isn't GRAPE a Perfect Solution? CPS615ArchMasterFall98 099 094 GRAPE Special Purpose Machines CPS615ArchMasterFall98 100 095 Quantum ChromoDynamics (QCD) Special Purpose Machines
CPS615Master96 043 096 Granularity of Parallel Components - I CPS615Master96 044 097 Granularity of Parallel Components - II
CPS615Master96 045 098 Classes of Communication Networks CPS615Master96 046 099 Switch and Bus based Architectures CPS615Master96 047 100 Examples of Interconnection Topologies CPS615Master96 048 101 Useful Concepts in Communication Systems
CPS615-95B 021 102 Latency and Bandwidth of a Network CPS615-95B 022 103 Transfer Time in Microseconds for both Shared Memory Operations and Explicit Message Passing CPS615-95B 023 104 Latency/Bandwidth Space for 0-byte message(Latency) and 1 MB message(bandwidth). CPS615Master96 049 105 Communication Performance of Some MPP's CPS615Master96 050 106 Implication of Hardware Performance CPS615ArchMasterFall98 077 107 MPI Bandwidth on SGI Origin and Sun Shared Memory Machines CPS615ArchMasterFall98 078 108 Latency Measurements on Origin and Sun for MPI
CPS615ArchMasterFall98 033 109 Two Basic Programming Models CPS615ArchMasterFall98 034 110 Shared Address Space Architectures CPS615ArchMasterFall98 035 111 Shared Address Space Model CPS615ArchMasterFall98 036 112 Communication Hardware CPS615ArchMasterFall98 037 113 History -- Mainframe CPS615ArchMasterFall98 038 114 History -- Minicomputer CPS615ArchMasterFall98 039 115 Scalable Interconnects CPS615ArchMasterFall98 040 116 Message Passing Architectures CPS615ArchMasterFall98 041 117 Message-Passing Abstraction e.g. MPI CPS615ArchMasterFall98 042 118 First Message-Passing Machines
CPS615ArchMasterFall98 068 119 SMP Example: Intel Pentium Pro Quad
CPS615ArchMasterFall98 069 120 Sun E10000 in a Nutshell CPS615ArchMasterFall98 070 121 Sun Enterprise Systems E6000/10000 CPS615ArchMasterFall98 071 122 Starfire E10000 Architecture I CPS615ArchMasterFall98 072 123 Starfire E10000 Architecture II CPS615ArchMasterFall98 073 124 Sun Enterprise E6000/6500 Architecture CPS615ArchMasterFall98 074 125 Sun's Evaluation of E10000 Characteristics I CPS615ArchMasterFall98 075 126 Sun's Evaluation of E10000 Characteristics II CPS615ArchMasterFall98 076 127 Scalability of E1000
CPS615ArchMasterFall98 094 128 Consider Scientific Supercomputing CPS615ArchMasterFall98 095 129 Toward Architectural Convergence CPS615ArchMasterFall98 096 130 Convergence: Generic Parallel Architecture
CPS615ArchMasterFall98 079 131 Tera Multithreaded Supercomputer CPS615ArchMasterFall98 080 132 Tera Computer at San Diego Supercomputer Center CPS615ArchMasterFall98 081 133 Overview of the Tera MTA I CPS615ArchMasterFall98 082 134 Overview of the Tera MTA II CPS615ArchMasterFall98 083 135 Tera 1 Processor Architecture from H. Bokhari (ICASE) CPS615ArchMasterFall98 084 136 Tera Processor Characteristics CPS615ArchMasterFall98 085 137 Tera System Diagram CPS615ArchMasterFall98 086 138 Interconnect / Communications System of Tera I CPS615ArchMasterFall98 087 139 Interconnect / Communications System of Tera II CPS615ArchMasterFall98 088 140 T90/Tera MTA Hardware Comparison CPS615ArchMasterFall98 089 141 Tera Configurations / Performance CPS615ArchMasterFall98 090 142 Performance of MTA wrt T90 and in parallel CPS615ArchMasterFall98 091 143 Tera MTA Performance on NAS Benchmarks Compared to T90 CPS615ArchMasterFall98 092 144 Cache Only COMA Machines
SmithPetaOverview2 015 145 III. Key drivers: The Need for PetaFLOPS Computing GeneralFoils97 007 146 10 Possible PetaFlop Applications GeneralResFoils96 031 147 Petaflop Performance for Flow in Porous Media? GeneralResFoils96 032 148 Target Flow in Porous Media Problem (Glimm - Petaflop Workshop) GeneralResFoils96 033 149 NASA's Projection of Memory and Computational Requirements upto Petaflops for Aerospace Applications
CornellHPCCOverview96Master 005 150 Supercomputer Architectures in Years 2005-2010 -- I CornellHPCCOverview96Master 006 151 Supercomputer Architectures in Years 2005-2010 -- II CornellHPCCOverview96Master 007 152 Supercomputer Architectures in Years 2005-2010 -- III KoggePimTalk 037 153 Performance Per Transistor CornellHPCCOverview96Master 008 154 Comparison of Supercomputer Architectures
KoggePimTalk 023 155 Current PIM Chips KoggePimTalk 030 156 New "Strawman" PIM Processing Node Macro KoggePimTalk 031 157 "Strawman" Chip Floorplan KoggePimTalk 038 158 SIA-Based PIM Chip Projections
CPS615Master96 016 159 Quantum Computing - I CPS615Master96 017 160 Quantum Computing - II CPS615Master96 018 161 Quantum Computing - III
CPS615Master96 019 162 Superconducting Technology -- Past CPS615Master96 020 163 Superconducting Technology -- Present CPS615Master96 021 164 Superconducting Technology -- Problems
CPS615ArchMasterFall98 Master Foilset for HPC Achitecture Overview1 2 3 4 5 6 7 8 9 10 11 22 23 24 25 26 27 28 29 30 31 32 12 13 14 15 16 17 18 19 51 52 53 54 55 56 57 58 59 50 43 44 45 46 47 101 20 66 21 60 61 62 63 64 65 67 93 97 98 48 49 99 100 77 78 33 34 35 36 37 38 39 40 41 42 68 69 70 71 72 73 74 75 76 94 95 96 79 80 81 82 83 84 85 86 87 88 89 90 91 92
CPS615Master96 Master Set of Foils for 1996 Session of CPS61512 13 14 15 22 28 23 24 25 26 27 29 30 31 32 33 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 16 17 18 19 20 21
CPS615Master97 Master Set of Foils for 1997 Session of CPS61511 12 13 14
DynamicWebPagesgivenbyURL Title and Abstract of FakeFoilset3 4 5 6
CPS615-95A Master Set A of Overview Material on Parallel Computing for CPS615 Foils42
CPS615-95B Master Set B of Overview Material on Parallel Computing for CPS615 Foils21 22 23
SmithPetaOverview2 PetaFlop(JNAC) Overview Presentations -- Results of Studies and Next Steps Sep 19,9615
GeneralFoils97 Variety of Foils Used Starting January 977
GeneralResFoils96 Miscellaneous Presentation Material used in 199631 32 33
CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC -- From GigaFlops to PetaFlops and From Tightly Coupled MPP's to the World Wide Web5 6 7 8
KoggePimTalk Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing37 23 30 31 38
CPS615ArchMasterFall98 Master Foilset for HPC Achitecture Overview1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
CPS615Master96 Master Set of Foils for 1996 Session of CPS61512 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
CPS615Master97 Master Set of Foils for 1997 Session of CPS61511 12 13 14
DynamicWebPagesgivenbyURL Title and Abstract of FakeFoilset3 4 5 6
CPS615-95A Master Set A of Overview Material on Parallel Computing for CPS615 Foils42
CPS615-95B Master Set B of Overview Material on Parallel Computing for CPS615 Foils21 22 23
SmithPetaOverview2 PetaFlop(JNAC) Overview Presentations -- Results of Studies and Next Steps Sep 19,9615
GeneralFoils97 Variety of Foils Used Starting January 977
GeneralResFoils96 Miscellaneous Presentation Material used in 199631 32 33
CornellHPCCOverview96MasterMaster Foils for A Short Overview of HPCC -- From GigaFlops to PetaFlops and From Tightly Coupled MPP's to the World Wide Web5 6 7 8
KoggePimTalk Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing23 30 31 37 38