Full HTML for

Basic foilset Introduction to Computational Science

Given by Geoffrey C. Fox at CPS615 Computational Science Class on Spring Semester 2000. Foils prepared 20 February 00
Outside Index Summary of Material


Course Logistics
Course Overview arranged around 3 Exemplar applications with simple and complex versions
Status of High Performance Computing and Computation HPCC nationally
Application Driving Forces
Computational Science and Information Technology
  • Internetics
Technology and Commodity Driving Forces
  • Inevitability of Parallelism in different forms

Table of Contents for full HTML of Introduction to Computational Science

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 Introduction to Computational Science
2 Abstract of Computational Science Introduction
3 Basic CPS615 Contact Points
4 Course Organization
5 Books and Registration
6 Material Covered in CPS615- I
7 Structure of CPS615 - II -- More on Principles of Parallel Programming
8 Structure of CPS615 - III
9 Structure of CPS615 - IV
10 Very Useful References
11 Some Comments on Simulation and HPCC
12 More Features of Current HPCC II
13 More Features of Current HPCC III
14 Application Driving Forces
15 Selection of Motivating Applications
16 Units of HPCC
17 Application Motivation I: Earthquakes
18 Application Motivation I: Earthquakes (Contd.)
19 Application Motivation II: Web Search
20 Exemplar III: Database transaction processing
21 Application Motivation IV: Numerical Relativity
22 Application Motivation IV: Numerical Relativity (Contd.)
23 Application Exemplar V: SMOP
24 SMOP: Space Mission Operations Portal
25 Summary of Application Trends
26 Problem Solving Environments I
27 Problem Solving Environments II
28 Problem Solving Environments III
29 Computational Science and Information Technology or Internetics
30 Computational Science and Information Technology (CSIT)?
31 Synergy of Parallel Computing and Web Internetics/CSIT as Unifying Principle
32 Is Computational Science an Academic Discipline?
33 Technology Driving Forces
34 TOP 500 from Dongarra, Meuer, Strohmaier
35 Top 500 Performance versus time 93-99
36 Projected Top 500 Until Year 2009
37 Architecture of Top 500 Computers
38 CPU Chips of the TOP 500
39 The Computing Pyramid
40 Technology Trends -- CPU's
41 General Technology Trends
42 Technology: A Closer Look
43 Clock Frequency Growth Rate
44 Transistor Count Growth Rate
45 Similar Story for Storage
46 Sequential Memory Structure
47 How to use more transistors?
48 Possible Gain from ILP
49 Trends in Parallelism
50 Parallel Computing Rationale
51 Importance of Memory Structure in High Performance Architectures
52 Processor-DRAM Growing Performance Gap (latency)
53 Parallel Computer Memory Structure

Outside Index Summary of Material



HTML version of Basic Foils prepared 20 February 00

Foil 1 Introduction to Computational Science

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Spring Semester 2000
Geoffrey Fox
Northeast Parallel Architectures Center
Syracuse University
111 College Place
Syracuse NY
gcf@npac.syr.edu
gcf@cs.fsu.edu

HTML version of Basic Foils prepared 20 February 00

Foil 2 Abstract of Computational Science Introduction

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Course Logistics
Course Overview arranged around 3 Exemplar applications with simple and complex versions
Status of High Performance Computing and Computation HPCC nationally
Application Driving Forces
Computational Science and Information Technology
  • Internetics
Technology and Commodity Driving Forces
  • Inevitability of Parallelism in different forms

HTML version of Basic Foils prepared 20 February 00

Foil 3 Basic CPS615 Contact Points

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST
Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST
Grader: Qiang Zheng -- zhengq@npac.syr.edu, 314439209
NPAC administrative support: Nicole Caza -- ncaza@npac.syr.edu, 3154431722 and room 3-206 CST
The class alias cps615spring00@npac.syr.edu
Powers that be alias cps615adm@npac.syr.edu
Home Page is: http://www.npac.syr.edu/projects/cps615spring00
Homework etc.: http://class-server.npac.syr.edu:3768

HTML version of Basic Foils prepared 20 February 00

Foil 4 Course Organization

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Graded on the basis of approximately 8 Homework sets which will be due Thursday of the week following day (Monday or Wednesday given out)
There will be one project -- which will start after message passing (MPI) discussed
Total grade is 70% homework, 30% project
Languages will Fortran or C and Java -- we will do a survey early on to clarify this
All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web

HTML version of Basic Foils prepared 20 February 00

Foil 5 Books and Registration

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann, 1997.
Gropp, Lusk and Skjellum, Using MPI, Second Edition 1999, MIT Press.
Please register for CPS615 at the student course records database at

HTML version of Basic Foils prepared 20 February 00

Foil 6 Material Covered in CPS615- I

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Status of High Performance Computing and Computation HPCC nationally
Application driving forces
  • Some Case Studies -- Importance of algorithms, data and simulations
  • Systems approach: Problem Solving Environments
What is Computational Science Nationally and how does it with Information Technology
Technology driving forces
  • Moore's law and exponentially increasing transistors
  • Dominance of Commodity Implementation
Basic Principles of High Performance Systems
  • We have a good methodology but software that trails applications and technologies

HTML version of Basic Foils prepared 20 February 00

Foil 7 Structure of CPS615 - II -- More on Principles of Parallel Programming

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Overview of Sequential and Parallel Computer Architectures
  • Comparison of Architecture of World Wide Web and Parallel Systems (Clusters versus Integrated Systems)
Elementary discussion of Parallel Computing in Society and why it must obviously work for computers!
What Features of Applications matter
  • Decomposition, Communication, Irregular, Dynamic ....
Issues of Scaling
What sort of software exists and Programming Paradigms
  • Data parallel, Message Passing

HTML version of Basic Foils prepared 20 February 00

Foil 8 Structure of CPS615 - III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Three Exemplars: Partial Differential Equations (PDE), Particle Dynamics, Matrix Problems
Simple base version of first Example -- Laplace's Equation
  • Illustrate parallel computing and lead to a
General Discussion of Programming Models -- SPMD and Message Passing (MPI) with Fortran, C and Java
Data Parallel (HPF, Fortran90) languages will be discussed later but NOT used
Visualization is important but will not be discussed
Return to First Example: Computational Fluid Dynamics and other PDE based Applications
  • Grid Generation
Return to Parallel Architectures -- Real Systems in more detail, Trends, Petaflops

HTML version of Basic Foils prepared 20 February 00

Foil 9 Structure of CPS615 - IV

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Second Exemplar: Particle Dynamics
  • Simple Example O(N2) Law -- best possible parallel performance
  • Real Applications: Astrophysics and Green's functions for Earthquakes with Fast Multipole Solvers
Third Exemplar: Matrix Problems
  • Matrix Multiplication is "too easy"
  • Linear solvers are demanding but use libraries
  • Sparse Solvers are most important in practice
Advanced Software Systems
  • OpenMP and Threads
  • Parallel Compilers and Data Structures
  • Tools -- Debuggers, Performance, Load Balancing
  • Parallel I/O
  • Problem Solving Environments
Application Wrap-up: Integration of Ideas and the Future

HTML version of Basic Foils prepared 20 February 00

Foil 10 Very Useful References

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
David Bailey and Bob Lucas CS267 Applications of Parallel Computers
Jim Demmel's Parallel Applications Course: http://www.cs.berkeley.edu/~demmel/cs267_Spr99/
Dave Culler's Parallel Architecture course: http://www.cs.berkeley.edu/~culler/cs258-s99/
David Culler and Horst Simon 1997 Parallel Applications: http://now.CS.Berkeley.edu/cs267/
Jack Dongarra High Performance Computing: http://www.cs.utk.edu/~dongarra/WEB-PAGES/index.html
Michael Heath Parallel Numerical Algorithms: http://www.cse.uiuc.edu/cse412/index.html
Willi Schonauer book (hand written): http://www.uni-karlsruhe.de/Uni/RZ/Personen/rz03/book/index.html
Parallel computing at CMU: http://www.cs.cmu.edu/~scandal/research/parallel.html

HTML version of Basic Foils prepared 20 February 00

Foil 11 Some Comments on Simulation and HPCC

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
HPCC is a maturing field with many organizations installing large scale systems
These include NSF (academic computations) with PACI activity, DoE (Dept of Energy) with ASCI and DoD (Defense) with Modernization
There are new applications with new algorithmic challenges
  • Web Search and Hosting Applications
  • Our work on Black Holes has novel adaptive meshes
  • On earthquake simulation, new "fast multipole" approaches to a problem not tackled this way before
  • On financial modeling, new Monte Carlo methods for complex options
These have many new issues but typically issue is integrating new systems and not some new mathematical idea

HTML version of Basic Foils prepared 20 February 00

Foil 12 More Features of Current HPCC II

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
One counter example (SIAM News December 99) is Akamai Technologies founded by MIT Computer Science theorist Tom Leighton and others
  • Supports optimal mirror sites network topology to minimize Web Page access delays
  • Nasdaq AKAM currently has market capitalization of $20 Billion
Hardware trends reflect both commodity technology and commodity systems
  • SGI/Cray declining in importance and
  • Sun (leveraging commercial Enterprise Servers) gains as do
  • PC/Workstation Clusters typically aimed at specialized applications
  • e.g. Inktomi (market capitalization $10B) uses 100's of PC's to provide Web Search services
Note Sun systems are "pure" Shared Memory and Inktomi are "pure" distributed memory showing architectural focus in two distinct areas with distributed memory mainly supporting specialized services

HTML version of Basic Foils prepared 20 February 00

Foil 13 More Features of Current HPCC III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Software ideas are "sound" but not very easy to use as so far nobody has found a precise way of expressing parallelism in a way that combines:
  • Users think is natural for problem
  • Compilers or other tools can easily interpret
  • Applies to a range of problems
  • Curiously it is "easy" to find good "qualitative" ways of seeing why parallelism "obvious" etc.
Integration of Simulation and Data is of growing importance
  • Internet technologies good at such integration but don't help parallelization
  • "Computational Grids" or Metacomputing focus on such integration
Problem Solving Environments help bring all components of a problem area into a single interface and
  • Help one access multiple available hosts but
  • again don't directly address parallelism (or more precisely decomposition)

HTML version of Basic Foils prepared 20 February 00

Foil 14 Application Driving Forces

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
5 Exemplars

HTML version of Basic Foils prepared 20 February 00

Foil 15 Selection of Motivating Applications

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Large Scale Simulations in Engineering
  • Model airflow around an aircraft
  • Study environmental issues -- flow of contaminants
  • Forecast weather
  • Oil Industry: Reservoir Simulation and analysis of Seismic data
Large Scale Academic Simulations (Physics, Chemistry, Biology)
  • Study of Evolution of Universe
  • Study of fundamental particles: Quarks and Gluons
  • Study of protein folding
  • Study of catalysts
  • Forecast Earthquakes (has real world relevance)
"Other Critical Real World Applications"
  • Transaction Processing
  • Web Search Engines and Web Document Repositories
  • Run optimization and classification algorithms in datamining of Enterprise Information Systems
  • Model Financial Instruments

HTML version of Basic Foils prepared 20 February 00

Foil 16 Units of HPCC

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
From Jim Demmel we need to define:

HTML version of Basic Foils prepared 20 February 00

Foil 17 Application Motivation I: Earthquakes

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Kobe 1995 Earthquake caused $200 Billion in damage and was quite unexpected -- the big one(s) in California are expected to be worse
Field Involves Integration of simulation (of earth dynamics) with sensor data (e.g. new GPS satellite measurements of strains http://www.scign.org) and with information gotten from pick and shovel at the fault line.
  • Laboratory experiments on shaking building and measurement of frictions between types of rock materials at faults
Northridge Quake

HTML version of Basic Foils prepared 20 February 00

Foil 18 Application Motivation I: Earthquakes (Contd.)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Technologies include data-mining (is dog barking really correlated with earthquakes) as well as PDE solvers where both finite element and fast multipole methods (for Green's function problems) are important
Multidisciplinary links of ground motion to building response simulations
Applications include real-time estimates of after-shocks used by scientists and perhaps crisis management groups
http://www.npac.syr.edu/projects/gem

HTML version of Basic Foils prepared 20 February 00

Foil 19 Application Motivation II: Web Search

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Note Web Search, like transaction analysis has "obvious" parallelization (over both users and data)with modest synchronization issues
Critical issues are: fault-tolerance (.9999 to .99999 reliability); bounded search time (a fraction of a second); scalability (to the world); fast system upgrade times
(days)

HTML version of Basic Foils prepared 20 February 00

Foil 20 Exemplar III: Database transaction processing

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
TPC-C Benchmark Results from March 96
Parallelism is pervasive (more natural in SQL than Fortran)
Small to moderate scale parallelism very important

HTML version of Basic Foils prepared 20 February 00

Foil 21 Application Motivation IV: Numerical Relativity

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
As with all physical simulations, realistic 3D computations require "Teraflop" (10^12 operations per second) performance
Numerical Relativity just solves the "trivial" Einstein equations G?? = 8?T?? with indices running over 4 dimensions
Apply to collision of two black holes which are expected to be a major source of gravitational waves for which US and Europe are building major detectors
Unique features includes freedom to choose coordinate systems (Gauge freedom) in ways that changes nature of equations
Black Hole has amazing boundary condition that no information can escape from it.
  • Not so clear how to formulate this numerically and involves interplay between computer science and physics
At infinity, one has "simple" (but numerically difficult) wave equation; near black hole one finds very non linear system

HTML version of Basic Foils prepared 20 February 00

Foil 22 Application Motivation IV: Numerical Relativity (Contd.)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Fortran90 (array syntax) very attractive to handle equations which are naturally written in Tensor (multi-dimensional) form
12 independent field values defined on a mesh with black holes excised -- non trivial dynamic irregularity as holes rotate and spiral into each other in interesting domain
Irregular dynamic mesh is not so natural in (parallel) Fortran 90 and one needs technology (including distributed data structures like DAGH) to support adaptive finite difference codes.
Separate Holes are simulated till Merger
0.7
6.9
13.2
Time

HTML version of Basic Foils prepared 20 February 00

Foil 23 Application Exemplar V: SMOP

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
SMOP is Space Mission Operations Portal or Portal to Space Internet
Ground stations are placed internationally (linked to the "Space Internet") and so networking has international scope
Real-time control for monitoring
Dataflow computing model (Khoros) for customized filtering of data
  • This is SIP (Signal and Image Processing) for DoD
Direct connection to hand-held devices for mission or processing status changes
  • alert experts on call (A major expense of space missions is large number of people needed just in case during all or part of mission)
Illustrates Integration of special needs (Space) with more general infrastructure and the integration of computing with data
  • Note role of Agreed Interfaces in enabling this integration
  • Web Success has shown that Interfaces are critical technology
Current NASA Contract (CSOC) for this is $3.5 Billion over 10 years for a team led by Lockheed Martin

HTML version of Basic Foils prepared 20 February 00

Foil 24 SMOP: Space Mission Operations Portal

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Satellite +
Sensor(s)
Relay Station
(Remote) Ground Station
or
or
or .......
Compute Engines
XML Computing Portals Interface
or
Compute Engines
Filter,Monitor,Plan ..
XML Grid Forum Interface

HTML version of Basic Foils prepared 20 February 00

Foil 25 Summary of Application Trends

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
There is a dynamic interplay between application needing more hardware and hardware allowing new/more applications
Transition to parallel computing has occurred for scientific and engineering computing but this is 1-2% of computer market
  • Integration of Data/Computing
Rapid progress in commercial computing
  • Database and transactions as well as financial modeling/oil reservoir simulation
  • Web servers including multi-media and search growing importance
  • Typically functional or pleasingly parallel
  • Traditionally smaller-scale, but recently large-scale systems of growing use for Internet (world-wide) resources

HTML version of Basic Foils prepared 20 February 00

Foil 26 Problem Solving Environments I

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
SMOP illustrates a Computing Portal or PSE
From John Rice at http://www.cs.purdue.edu/research/cse/pses
"A PSE is a computer system that provides all the computational facilities needed to solve a target class of problems.
These features include advanced solution methods, automatic and semiautomatic selection of solution methods, and ways to easily incorporate novel solution methods.
Moreover, PSEs use the language of the target class of problems, so users can run them without specialized knowledge of the underlying computer hardware or software.
By exploiting modern technologies such as interactive color graphics, powerful processors, and networks of specialized services, PSEs can track extended problem solving tasks and allow users to review them easily.

HTML version of Basic Foils prepared 20 February 00

Foil 27 Problem Solving Environments II

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Overall, PSEs create a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."
PSEs can be traced to the 1963 proposal of Culler and Fried for an "Online Computer Center for Scientific Problems".
Important examples of PSEs are:
  • Matlab, which has been a popular commercial system in the linear algebra and signal processing fields.
  • Khoros is another well-known PSE in the latter field.
  • Purdue also produced a high level interface PDElab to solving 2D and 3D partial differential equations.
The current set of PSEs are built around observation that Yahoo is a "PSE for the World's Information"

HTML version of Basic Foils prepared 20 February 00

Foil 28 Problem Solving Environments III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Improved computer performance gives one the opportunity to integrate together multiple capabilities
Thus Object Web technology allow one to implement a PSE as a Web Portal to Computational Science
Other essentially equivalent terms to PSEs are Scientific Workbenches or Toolkits.

HTML version of Basic Foils prepared 20 February 00

Foil 29 Computational Science and Information Technology or Internetics

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
http://www.npac.syr.edu/users/gcf/internetics2/ http://www.npac.syr.edu/users/gcf/internetics/

HTML version of Basic Foils prepared 20 February 00

Foil 30 Computational Science and Information Technology (CSIT)?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Together cover practical use of leading edge computer science technologies to address "real" applications
Two tracks at Syracuse
CPS615/713: Simulation Track
CPS606/616/640/714: Information Track
Large Scale Parallel Computing Low latency Closely Coupled Components
World Wide Distributed Computing Loosely Coupled Components
Parallel Algorithms
Performance
Visualization
Fortran90
HPF MPI Interconnection Networks
Transactions
Security
Compression
PERL JavaScript Multimedia
e-commerce
Wide Area Networks
(Parallel) I/O
Java VRML Collaboration Integration (middleware) Metacomputing / PSE's CORBA Databases

HTML version of Basic Foils prepared 20 February 00

Foil 31 Synergy of Parallel Computing and Web Internetics/CSIT as Unifying Principle

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
The two forms of Large Scale Computing Scale Computer for Scale Users in Proportion Power User to number of computers
Parallel Commodity Distributed Computers Information Systems Technology <--------------- Internetics Technologies --------------->
Parallel Computer Distributed Computer
HPF MPI PDE's HPJava HTML CORBA

HTML version of Basic Foils prepared 20 February 00

Foil 32 Is Computational Science an Academic Discipline?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
There can be no doubt that topics in Computational Science are useful and those in CSIT (Computational Science and Information Technology) are even more useful
  • CSIT incorporates trend towards data intensive computing and networked scientific collaboration
The CSIT technologies are also difficult and involve fundamental ideas
The area is of interest to both those in computer science and application fields
Probably most jobs going to Computer Science graduates really need CSIT education but unfortunately
  • Employees only know about "Computer Science/Engineering"
So most current implementations make CSIT as a set of courses within existing disciplines
  • Situation could change though as CSIT is "correct"

HTML version of Basic Foils prepared 20 February 00

Foil 33 Technology Driving Forces

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
The commodity Stranglehold

HTML version of Basic Foils prepared 20 February 00

Foil 34 TOP 500 from Dongarra, Meuer, Strohmaier

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
http://www.netlib.org/utk/people/JackDongarra/SLIDES/top500-11-99.htm
Here are Top 10 for 1999

HTML version of Basic Foils prepared 20 February 00

Foil 35 Top 500 Performance versus time 93-99

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
First, Tenth, 100th, 500th, SUM of all 500 versus Time

HTML version of Basic Foils prepared 20 February 00

Foil 36 Projected Top 500 Until Year 2009

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
First, Tenth, 100th, 500th, SUM of all 500 Projected in Time
Earth Simulator from Japan
http://geofem.tokyo.rist.or.jp/

HTML version of Basic Foils prepared 20 February 00

Foil 37 Architecture of Top 500 Computers

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
From Jack Dongarra http://www.netlib.org/utk/people/JackDongarra
Shared Memory
(designed distributed memory)

HTML version of Basic Foils prepared 20 February 00

Foil 38 CPU Chips of the TOP 500

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
From Jack Dongarra http://www.netlib.org/utk/people/JackDongarra

HTML version of Basic Foils prepared 20 February 00

Foil 39 The Computing Pyramid

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Bottom of Pyramid has 100 times dollar value and 1000 times compute power of best supercomputer

HTML version of Basic Foils prepared 20 February 00

Foil 40 Technology Trends -- CPU's

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
The natural building block for multiprocessors is now also about the fastest!

HTML version of Basic Foils prepared 20 February 00

Foil 41 General Technology Trends

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Microprocessor performance increases 50% - 100% per year
Transistor count doubles every 3 years
DRAM size quadruples every 3 years
Huge $ investment per generation is carried by commodity PC market
Note that "simple" single-processor performance is plateauing, but that parallelism is a natural way to improve it. But many different forms of parallelism "Data or Thread Parallel" or "Automatic Instruction Parallel"
Linpack Performance
Linear Equation Solver Benchmark

HTML version of Basic Foils prepared 20 February 00

Foil 42 Technology: A Closer Look

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
From David Culler again
Basic driver of performance advances is decreasing feature size ( ?)
  • Circuits become either faster or lower in power
Die size is growing too
  • Clock rate improves roughly proportional to improvement in ? (slower as speed decreases)
  • Number of transistors improves like ?? (or faster as die size increases)
Performance increases > 100x per decade; clock rate ᝺x, rest of increase is due to transistor count
Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect
But of course you could change this natural "sweet spot"
? 1
CPU
Cache
Interconnect

HTML version of Basic Foils prepared 20 February 00

Foil 43 Clock Frequency Growth Rate

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
30% per year

HTML version of Basic Foils prepared 20 February 00

Foil 44 Transistor Count Growth Rate

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
100 million transistors on chip by early 2000's A.D.
Transistor count grows faster than clock rate
- 40% per year, order of magnitude more contribution in 2 decades

HTML version of Basic Foils prepared 20 February 00

Foil 45 Similar Story for Storage

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Divergence between memory capacity and speed more pronounced
  • Capacity increased by 1000x from 1980-95, speed only 2x
  • Gigabit DRAM by c. 2000, but gap with processor speed much greater
Larger memories are slower, while processors get faster
  • Need to transfer more data in parallel
  • Need deeper cache hierarchies
  • How to organize caches?
Parallelism increases effective size of each level of hierarchy, without increasing access time
Parallelism and locality within memory systems too
  • New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface
  • Buffer caches most recently accessed data
Disks too: Parallel disks plus caching

HTML version of Basic Foils prepared 20 February 00

Foil 46 Sequential Memory Structure

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Data locality implies CPU finds information it needs in cache which stores most recently accessed information
This means one reuses a given memory reference in many nearby computations e.g.
A1 = B*C
A2 = B*D + B*B
.... Reuses B
L3 Cache
Main
Memory
Disk
Increasing Memory
Capacity Decreasing
Memory Speed (factor of 100 difference between processor
and main memory
speed)

HTML version of Basic Foils prepared 20 February 00

Foil 47 How to use more transistors?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Parallelism in processing
  • multiple operations per cycle reduces CPI
  • soon thread level parallelism
Cache to give locality in data access
  • avoids latency and reduces CPI
  • also improves processor utilization
Both need (transistor) resources, so tradeoff
ILP (Instruction Loop Parallelism) drove performance gains of sequential microprocessors
ILP Success was not expected by aficionado's of parallel computing and this "delayed" relevance of scaling "outer-loop" parallelism as user's just purchased faster "sequential machines"
CPI = Clock Cycles per Instruction

HTML version of Basic Foils prepared 20 February 00

Foil 48 Possible Gain from ILP

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Hardware allowed many instructions per cycle using transistor budget for ILP parallelism
Limited Speed up (average 2.75 below) and inefficient (50% or worse)
However TOTALLY automatic (compiler generated)

HTML version of Basic Foils prepared 20 February 00

Foil 49 Trends in Parallelism

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Thread level parallelism is the on chip version of dominant scaling (data) parallelism

HTML version of Basic Foils prepared 20 February 00

Foil 50 Parallel Computing Rationale

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Transistors are still getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
This chip would have little ILP (or parallelism in "innermost loops")
Thus next generation of processor chips more or less have to have multiple CPU's as gain from ILP limited
However getting much more speedup than this requires use of "outer loop" or data parallelism.
  • This is naturally implemented with threads on chip
The March of Parallelism: Multiple boards --> Multiple chips on a board --> Multiple CPU's on a chip
Implies that "outer loop" Parallel Computing gets more and more important in dominant commodity market
Use of "Outer Loop" parallelism can not (yet) be automated

HTML version of Basic Foils prepared 20 February 00

Foil 51 Importance of Memory Structure in High Performance Architectures

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!
  • Harder for sequential than parallel computers
Key limit is that memory gets slower as it gets larger and one tries to keep information as near to CPU as possible (in necessarily small size storage)
This Data locality is unifying concept of caches (sequential) and parallel computer multiple memory accesses
Problem seen in extreme case for Superconducting CPU's which can be 100X current CPU's but seem to need to use conventional memory

HTML version of Basic Foils prepared 20 February 00

Foil 52 Processor-DRAM Growing Performance Gap (latency)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
From Jim Demmel -- remaining from David Culler
This implies need for complex memory systems to hide memory latency
µProc
60%/yr.
DRAM
7%/yr.
1
10
100
1000
1980
1981
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
CPU
1982
Processor-Memory
Performance Gap: (grows 50% / year)
Time
"Moore's Law"
Performance

HTML version of Basic Foils prepared 20 February 00

Foil 53 Parallel Computer Memory Structure

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *
Full HTML Index
For both parallel and sequential computers, cost is accessing remote memories with some form of "communication"
Data locality addresses in both cases
Differences are quantitative size of effect and what is done by user and what automatically
L3 Cache
Main
Memory
Board Level Interconnection Networks
....
....
System Level Interconnection Network
L3 Cache
Main
Memory
L3 Cache
Main
Memory
Slow
Very Slow

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Mar 16 2000