Full HTML for Basic Introduction to Computational Science

Full HTML for

Basic foilset Introduction to Computational Science

Given by Geoffrey C. Fox at CPS615 Computational Science Class on Spring Semester 2000. Foils prepared 20 February 00
Outside Index Summary of Material

Course Logistics

Course Overview arranged around 3 Exemplar applications with simple and complex versions

Status of High Performance Computing and Computation HPCC nationally

Application Driving Forces

Computational Science and Information Technology

Internetics

Technology and Commodity Driving Forces

Inevitability of Parallelism in different forms

Table of Contents for full HTML of Introduction to Computational Science

Denote Foils where Image Critical

Denote Foils where Image has important information

Denote Foils where HTML is sufficient

1

Introduction to Computational Science
2

Abstract of Computational Science Introduction
3

Basic CPS615 Contact Points
4

Course Organization
5

Books and Registration
6

Material Covered in CPS615- I
7

Structure of CPS615 - II -- More on Principles of Parallel Programming
8

Structure of CPS615 - III
9

Structure of CPS615 - IV
10

Very Useful References
11

Some Comments on Simulation and HPCC
12

More Features of Current HPCC II
13

More Features of Current HPCC III
14

Application Driving Forces
15

Selection of Motivating Applications
16

Units of HPCC
17

Application Motivation I: Earthquakes
18

Application Motivation I: Earthquakes (Contd.)
19

Application Motivation II: Web Search
20

Exemplar III: Database transaction processing
21

Application Motivation IV: Numerical Relativity
22

Application Motivation IV: Numerical Relativity (Contd.)
23

Application Exemplar V: SMOP
24

SMOP: Space Mission Operations Portal
25

Summary of Application Trends
26

Problem Solving Environments I
27

Problem Solving Environments II
28

Problem Solving Environments III
29

Computational Science and Information Technology or Internetics
30

Computational Science and Information Technology (CSIT)?
31

Synergy of Parallel Computing and Web Internetics/CSIT as Unifying Principle
32

Is Computational Science an Academic Discipline?
33

Technology Driving Forces
34

TOP 500 from Dongarra, Meuer, Strohmaier
35

Top 500 Performance versus time 93-99
36

Projected Top 500 Until Year 2009
37

Architecture of Top 500 Computers
38

CPU Chips of the TOP 500
39

The Computing Pyramid
40

Technology Trends -- CPU's
41

General Technology Trends
42

Technology: A Closer Look
43

Clock Frequency Growth Rate
44

Transistor Count Growth Rate
45

Similar Story for Storage
46

Sequential Memory Structure
47

How to use more transistors?
48

Possible Gain from ILP
49

Trends in Parallelism
50

Parallel Computing Rationale
51

Importance of Memory Structure in High Performance Architectures
52

Processor-DRAM Growing Performance Gap (latency)
53

Parallel Computer Memory Structure

Outside Index Summary of Material

HTML version of Basic Foils prepared 20 February 00

Foil 1 Introduction to Computational Science

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Spring Semester 2000

Geoffrey Fox

Northeast Parallel Architectures Center

Syracuse University

111 College Place

Syracuse NY

gcf@npac.syr.edu

gcf@cs.fsu.edu

HTML version of Basic Foils prepared 20 February 00

Foil 2 Abstract of Computational Science Introduction

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Course Logistics

Course Overview arranged around 3 Exemplar applications with simple and complex versions

Status of High Performance Computing and Computation HPCC nationally

Application Driving Forces

Computational Science and Information Technology

Internetics

Technology and Commodity Driving Forces

Inevitability of Parallelism in different forms

HTML version of Basic Foils prepared 20 February 00

Foil 3 Basic CPS615 Contact Points

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST

Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST

Grader: Qiang Zheng -- zhengq@npac.syr.edu, 314439209

NPAC administrative support: Nicole Caza -- ncaza@npac.syr.edu, 3154431722 and room 3-206 CST

The class alias cps615spring00@npac.syr.edu

Powers that be alias cps615adm@npac.syr.edu

Home Page is: http://www.npac.syr.edu/projects/cps615spring00

Homework etc.: http://class-server.npac.syr.edu:3768

HTML version of Basic Foils prepared 20 February 00

Foil 4 Course Organization

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Graded on the basis of approximately 8 Homework sets which will be due Thursday of the week following day (Monday or Wednesday given out)

There will be one project -- which will start after message passing (MPI) discussed

Total grade is 70% homework, 30% project

Languages will Fortran or C and Java -- we will do a survey early on to clarify this

All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web

HTML version of Basic Foils prepared 20 February 00

Foil 5 Books and Registration

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufmann, 1997.

Book web page: http://fawlty.cs.usfca.edu/mpi/

Gropp, Lusk and Skjellum, Using MPI, Second Edition 1999, MIT Press.

Book web page: http://mitpress.mit.edu/book-home.tcl?isbn=0262571323

Please register for CPS615 at the student course records database at

https://students.npac.syr.edu:5557/student.html
See instructions at http://www.npac.syr.edu/projects/cps615spring00/Reg_Grad615.html

HTML version of Basic Foils prepared 20 February 00

Foil 6 Material Covered in CPS615- I

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Status of High Performance Computing and Computation HPCC nationally

Application driving forces

Some Case Studies -- Importance of algorithms, data and simulations
Systems approach: Problem Solving Environments

What is Computational Science Nationally and how does it with Information Technology

Technology driving forces

Moore's law and exponentially increasing transistors
Dominance of Commodity Implementation

Basic Principles of High Performance Systems

We have a good methodology but software that trails applications and technologies

HTML version of Basic Foils prepared 20 February 00

Foil 7 Structure of CPS615 - II -- More on Principles of Parallel Programming

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Overview of Sequential and Parallel Computer Architectures

Comparison of Architecture of World Wide Web and Parallel Systems (Clusters versus Integrated Systems)

Elementary discussion of Parallel Computing in Society and why it must obviously work for computers!

What Features of Applications matter

Decomposition, Communication, Irregular, Dynamic ....

Issues of Scaling

What sort of software exists and Programming Paradigms

Data parallel, Message Passing

HTML version of Basic Foils prepared 20 February 00

Foil 8 Structure of CPS615 - III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Three Exemplars: Partial Differential Equations (PDE), Particle Dynamics, Matrix Problems

Simple base version of first Example -- Laplace's Equation

Illustrate parallel computing and lead to a

General Discussion of Programming Models -- SPMD and Message Passing (MPI) with Fortran, C and Java

Data Parallel (HPF, Fortran90) languages will be discussed later but NOT used

Visualization is important but will not be discussed

Return to First Example: Computational Fluid Dynamics and other PDE based Applications

Grid Generation

Return to Parallel Architectures -- Real Systems in more detail, Trends, Petaflops

HTML version of Basic Foils prepared 20 February 00

Foil 9 Structure of CPS615 - IV

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Second Exemplar: Particle Dynamics

Simple Example O(N2) Law -- best possible parallel performance
Real Applications: Astrophysics and Green's functions for Earthquakes with Fast Multipole Solvers

Third Exemplar: Matrix Problems

Matrix Multiplication is "too easy"
Linear solvers are demanding but use libraries
Sparse Solvers are most important in practice

Advanced Software Systems

OpenMP and Threads
Parallel Compilers and Data Structures
Tools -- Debuggers, Performance, Load Balancing
Parallel I/O
Problem Solving Environments

Application Wrap-up: Integration of Ideas and the Future

HTML version of Basic Foils prepared 20 February 00

Foil 10 Very Useful References

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

David Bailey and Bob Lucas CS267 Applications of Parallel Computers

http://www.nersc.gov/~dhbailey/cs267/ Taught 2000

Jim Demmel's Parallel Applications Course: http://www.cs.berkeley.edu/~demmel/cs267_Spr99/

Dave Culler's Parallel Architecture course: http://www.cs.berkeley.edu/~culler/cs258-s99/

David Culler and Horst Simon 1997 Parallel Applications: http://now.CS.Berkeley.edu/cs267/

Jack Dongarra High Performance Computing: http://www.cs.utk.edu/~dongarra/WEB-PAGES/index.html

Michael Heath Parallel Numerical Algorithms: http://www.cse.uiuc.edu/cse412/index.html

Willi Schonauer book (hand written): http://www.uni-karlsruhe.de/Uni/RZ/Personen/rz03/book/index.html

Parallel computing at CMU: http://www.cs.cmu.edu/~scandal/research/parallel.html

HTML version of Basic Foils prepared 20 February 00

Foil 11 Some Comments on Simulation and HPCC

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

HPCC is a maturing field with many organizations installing large scale systems

These include NSF (academic computations) with PACI activity, DoE (Dept of Energy) with ASCI and DoD (Defense) with Modernization

New Terascale initiative from NSF provides $36M for new machine (http://www.interact.nsf.gov/cise/descriptions.nsf/pd/tcs/)

There are new applications with new algorithmic challenges

Web Search and Hosting Applications
Our work on Black Holes has novel adaptive meshes
On earthquake simulation, new "fast multipole" approaches to a problem not tackled this way before
On financial modeling, new Monte Carlo methods for complex options

These have many new issues but typically issue is integrating new systems and not some new mathematical idea

HTML version of Basic Foils prepared 20 February 00

Foil 12 More Features of Current HPCC II

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

One counter example (SIAM News December 99) is Akamai Technologies founded by MIT Computer Science theorist Tom Leighton and others

Supports optimal mirror sites network topology to minimize Web Page access delays
Nasdaq AKAM currently has market capitalization of $20 Billion

Hardware trends reflect both commodity technology and commodity systems

SGI/Cray declining in importance and
Sun (leveraging commercial Enterprise Servers) gains as do
PC/Workstation Clusters typically aimed at specialized applications
e.g. Inktomi (market capitalization $10B) uses 100's of PC's to provide Web Search services

Note Sun systems are "pure" Shared Memory and Inktomi are "pure" distributed memory showing architectural focus in two distinct areas with distributed memory mainly supporting specialized services

HTML version of Basic Foils prepared 20 February 00

Foil 13 More Features of Current HPCC III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Software ideas are "sound" but not very easy to use as so far nobody has found a precise way of expressing parallelism in a way that combines:

Users think is natural for problem
Compilers or other tools can easily interpret
Applies to a range of problems
Curiously it is "easy" to find good "qualitative" ways of seeing why parallelism "obvious" etc.

Integration of Simulation and Data is of growing importance

Internet technologies good at such integration but don't help parallelization
"Computational Grids" or Metacomputing focus on such integration

Problem Solving Environments help bring all components of a problem area into a single interface and

Help one access multiple available hosts but
again don't directly address parallelism (or more precisely decomposition)

HTML version of Basic Foils prepared 20 February 00

Foil 14 Application Driving Forces

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

5 Exemplars

HTML version of Basic Foils prepared 20 February 00

Foil 15 Selection of Motivating Applications

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Large Scale Simulations in Engineering

Model airflow around an aircraft
Study environmental issues -- flow of contaminants
Forecast weather
Oil Industry: Reservoir Simulation and analysis of Seismic data

Large Scale Academic Simulations (Physics, Chemistry, Biology)

Study of Evolution of Universe
Study of fundamental particles: Quarks and Gluons
Study of protein folding
Study of catalysts
Forecast Earthquakes (has real world relevance)

"Other Critical Real World Applications"

Transaction Processing
Web Search Engines and Web Document Repositories
Run optimization and classification algorithms in datamining of Enterprise Information Systems
Model Financial Instruments

HTML version of Basic Foils prepared 20 February 00

Foil 16 Units of HPCC

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

From Jim Demmel we need to define:

HTML version of Basic Foils prepared 20 February 00

Foil 17 Application Motivation I: Earthquakes

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Kobe 1995 Earthquake caused $200 Billion in damage and was quite unexpected -- the big one(s) in California are expected to be worse

Field Involves Integration of simulation (of earth dynamics) with sensor data (e.g. new GPS satellite measurements of strains http://www.scign.org) and with information gotten from pick and shovel at the fault line.

Laboratory experiments on shaking building and measurement of frictions between types of rock materials at faults

Northridge Quake

HTML version of Basic Foils prepared 20 February 00

Foil 18 Application Motivation I: Earthquakes (Contd.)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Technologies include data-mining (is dog barking really correlated with earthquakes) as well as PDE solvers where both finite element and fast multipole methods (for Green's function problems) are important

Multidisciplinary links of ground motion to building response simulations

Applications include real-time estimates of after-shocks used by scientists and perhaps crisis management groups

http://www.npac.syr.edu/projects/gem

HTML version of Basic Foils prepared 20 February 00

Foil 19 Application Motivation II: Web Search

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Note Web Search, like transaction analysis has "obvious" parallelization (over both users and data)with modest synchronization issues

Critical issues are: fault-tolerance (.9999 to .99999 reliability); bounded search time (a fraction of a second); scalability (to the world); fast system upgrade times

(days)

HTML version of Basic Foils prepared 20 February 00

Foil 20 Exemplar III: Database transaction processing

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

TPC-C Benchmark Results from March 96

Parallelism is pervasive (more natural in SQL than Fortran)

Small to moderate scale parallelism very important

HTML version of Basic Foils prepared 20 February 00

Foil 21 Application Motivation IV: Numerical Relativity

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

As with all physical simulations, realistic 3D computations require "Teraflop" (10^12 operations per second) performance

Numerical Relativity just solves the "trivial" Einstein equations G?? = 8?T?? with indices running over 4 dimensions

Apply to collision of two black holes which are expected to be a major source of gravitational waves for which US and Europe are building major detectors

Unique features includes freedom to choose coordinate systems (Gauge freedom) in ways that changes nature of equations

Black Hole has amazing boundary condition that no information can escape from it.

Not so clear how to formulate this numerically and involves interplay between computer science and physics

At infinity, one has "simple" (but numerically difficult) wave equation; near black hole one finds very non linear system

HTML version of Basic Foils prepared 20 February 00

Foil 22 Application Motivation IV: Numerical Relativity (Contd.)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Fortran90 (array syntax) very attractive to handle equations which are naturally written in Tensor (multi-dimensional) form

12 independent field values defined on a mesh with black holes excised -- non trivial dynamic irregularity as holes rotate and spiral into each other in interesting domain

Irregular dynamic mesh is not so natural in (parallel) Fortran 90 and one needs technology (including distributed data structures like DAGH) to support adaptive finite difference codes.

Separate Holes are simulated till Merger

0.7

6.9

13.2

Time

HTML version of Basic Foils prepared 20 February 00

Foil 23 Application Exemplar V: SMOP

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

SMOP is Space Mission Operations Portal or Portal to Space Internet

Ground stations are placed internationally (linked to the "Space Internet") and so networking has international scope

Real-time control for monitoring

Dataflow computing model (Khoros) for customized filtering of data

This is SIP (Signal and Image Processing) for DoD

Direct connection to hand-held devices for mission or processing status changes

alert experts on call (A major expense of space missions is large number of people needed just in case during all or part of mission)

Illustrates Integration of special needs (Space) with more general infrastructure and the integration of computing with data

Note role of Agreed Interfaces in enabling this integration
Web Success has shown that Interfaces are critical technology

Current NASA Contract (CSOC) for this is $3.5 Billion over 10 years for a team led by Lockheed Martin

HTML version of Basic Foils prepared 20 February 00

Foil 24 SMOP: Space Mission Operations Portal

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Satellite +

Sensor(s)

Relay Station

(Remote) Ground Station

or .......

Compute Engines

XML Computing Portals Interface

Compute Engines

Filter,Monitor,Plan ..

XML Grid Forum Interface

HTML version of Basic Foils prepared 20 February 00

Foil 25 Summary of Application Trends

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

There is a dynamic interplay between application needing more hardware and hardware allowing new/more applications

Transition to parallel computing has occurred for scientific and engineering computing but this is 1-2% of computer market

Integration of Data/Computing

Rapid progress in commercial computing

Database and transactions as well as financial modeling/oil reservoir simulation
Web servers including multi-media and search growing importance
Typically functional or pleasingly parallel
Traditionally smaller-scale, but recently large-scale systems of growing use for Internet (world-wide) resources

HTML version of Basic Foils prepared 20 February 00

Foil 26 Problem Solving Environments I

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

SMOP illustrates a Computing Portal or PSE

From John Rice at http://www.cs.purdue.edu/research/cse/pses

"A PSE is a computer system that provides all the computational facilities needed to solve a target class of problems.

These features include advanced solution methods, automatic and semiautomatic selection of solution methods, and ways to easily incorporate novel solution methods.

Moreover, PSEs use the language of the target class of problems, so users can run them without specialized knowledge of the underlying computer hardware or software.

By exploiting modern technologies such as interactive color graphics, powerful processors, and networks of specialized services, PSEs can track extended problem solving tasks and allow users to review them easily.

HTML version of Basic Foils prepared 20 February 00

Foil 27 Problem Solving Environments II

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Overall, PSEs create a framework that is all things to all people: they solve simple or complex problems, support rapid prototyping or detailed analysis, and can be used in introductory education or at the frontiers of science."

PSEs can be traced to the 1963 proposal of Culler and Fried for an "Online Computer Center for Scientific Problems".

Important examples of PSEs are:

Matlab, which has been a popular commercial system in the linear algebra and signal processing fields.
Khoros is another well-known PSE in the latter field.
Purdue also produced a high level interface PDElab to solving 2D and 3D partial differential equations.

The current set of PSEs are built around observation that Yahoo is a "PSE for the World's Information"

HTML version of Basic Foils prepared 20 February 00

Foil 28 Problem Solving Environments III

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Improved computer performance gives one the opportunity to integrate together multiple capabilities

Thus Object Web technology allow one to implement a PSE as a Web Portal to Computational Science

Other essentially equivalent terms to PSEs are Scientific Workbenches or Toolkits.

HTML version of Basic Foils prepared 20 February 00

Foil 29 Computational Science and Information Technology or Internetics

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

http://www.npac.syr.edu/users/gcf/internetics2/ http://www.npac.syr.edu/users/gcf/internetics/

HTML version of Basic Foils prepared 20 February 00

Foil 30 Computational Science and Information Technology (CSIT)?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Together cover practical use of leading edge computer science technologies to address "real" applications

Two tracks at Syracuse

CPS615/713: Simulation Track

CPS606/616/640/714: Information Track

Large Scale Parallel Computing Low latency Closely Coupled Components

World Wide Distributed Computing Loosely Coupled Components

Parallel Algorithms

Performance

Visualization

Fortran90

HPF MPI Interconnection Networks

Transactions

Security

Compression

PERL JavaScript Multimedia

e-commerce

Wide Area Networks

(Parallel) I/O

Java VRML Collaboration Integration (middleware) Metacomputing / PSE's CORBA Databases

HTML version of Basic Foils prepared 20 February 00

Foil 31 Synergy of Parallel Computing and Web Internetics/CSIT as Unifying Principle

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

The two forms of Large Scale Computing Scale Computer for Scale Users in Proportion Power User to number of computers

Parallel Commodity Distributed Computers Information Systems Technology <--------------- Internetics Technologies --------------->

Parallel Computer Distributed Computer

HPF MPI PDE's HPJava HTML CORBA

HTML version of Basic Foils prepared 20 February 00

Foil 32 Is Computational Science an Academic Discipline?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

There can be no doubt that topics in Computational Science are useful and those in CSIT (Computational Science and Information Technology) are even more useful

CSIT incorporates trend towards data intensive computing and networked scientific collaboration

The CSIT technologies are also difficult and involve fundamental ideas

The area is of interest to both those in computer science and application fields

Probably most jobs going to Computer Science graduates really need CSIT education but unfortunately

Employees only know about "Computer Science/Engineering"

So most current implementations make CSIT as a set of courses within existing disciplines

Situation could change though as CSIT is "correct"

HTML version of Basic Foils prepared 20 February 00

Foil 33 Technology Driving Forces

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

The commodity Stranglehold

HTML version of Basic Foils prepared 20 February 00

Foil 34 TOP 500 from Dongarra, Meuer, Strohmaier

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

http://www.netlib.org/utk/people/JackDongarra/SLIDES/top500-11-99.htm

Here are Top 10 for 1999

HTML version of Basic Foils prepared 20 February 00

Foil 35 Top 500 Performance versus time 93-99

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

First, Tenth, 100th, 500th, SUM of all 500 versus Time

HTML version of Basic Foils prepared 20 February 00

Foil 36 Projected Top 500 Until Year 2009

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

First, Tenth, 100th, 500th, SUM of all 500 Projected in Time

Earth Simulator from Japan

http://geofem.tokyo.rist.or.jp/

HTML version of Basic Foils prepared 20 February 00

Foil 37 Architecture of Top 500 Computers

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

From Jack Dongarra http://www.netlib.org/utk/people/JackDongarra

Shared Memory

(designed distributed memory)

HTML version of Basic Foils prepared 20 February 00

Foil 38 CPU Chips of the TOP 500

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

From Jack Dongarra http://www.netlib.org/utk/people/JackDongarra

HTML version of Basic Foils prepared 20 February 00

Foil 39 The Computing Pyramid

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Bottom of Pyramid has 100 times dollar value and 1000 times compute power of best supercomputer

HTML version of Basic Foils prepared 20 February 00

Foil 40 Technology Trends -- CPU's

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

The natural building block for multiprocessors is now also about the fastest!

HTML version of Basic Foils prepared 20 February 00

Foil 41 General Technology Trends

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Microprocessor performance increases 50% - 100% per year

Transistor count doubles every 3 years

DRAM size quadruples every 3 years

Huge $ investment per generation is carried by commodity PC market

Note that "simple" single-processor performance is plateauing, but that parallelism is a natural way to improve it. But many different forms of parallelism "Data or Thread Parallel" or "Automatic Instruction Parallel"

Linpack Performance

Linear Equation Solver Benchmark

HTML version of Basic Foils prepared 20 February 00

Foil 42 Technology: A Closer Look

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

From David Culler again

Basic driver of performance advances is decreasing feature size ( ?)

Circuits become either faster or lower in power

Die size is growing too

Clock rate improves roughly proportional to improvement in ? (slower as speed decreases)
Number of transistors improves like ?? (or faster as die size increases)

Performance increases > 100x per decade; clock rate ᝺x, rest of increase is due to transistor count

Current microprocessor: 1/3 compute, 1/3 cache, 1/3 off-chip connect

But of course you could change this natural "sweet spot"

? 1

CPU

Cache

Interconnect

HTML version of Basic Foils prepared 20 February 00

Foil 43 Clock Frequency Growth Rate

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

30% per year

HTML version of Basic Foils prepared 20 February 00

Foil 44 Transistor Count Growth Rate

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

100 million transistors on chip by early 2000's A.D.

Transistor count grows faster than clock rate

- 40% per year, order of magnitude more contribution in 2 decades

HTML version of Basic Foils prepared 20 February 00

Foil 45 Similar Story for Storage

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Divergence between memory capacity and speed more pronounced

Capacity increased by 1000x from 1980-95, speed only 2x
Gigabit DRAM by c. 2000, but gap with processor speed much greater

Larger memories are slower, while processors get faster

Need to transfer more data in parallel
Need deeper cache hierarchies
How to organize caches?

Parallelism increases effective size of each level of hierarchy, without increasing access time

Parallelism and locality within memory systems too

New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface
Buffer caches most recently accessed data

Disks too: Parallel disks plus caching

HTML version of Basic Foils prepared 20 February 00

Foil 46 Sequential Memory Structure

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Data locality implies CPU finds information it needs in cache which stores most recently accessed information

This means one reuses a given memory reference in many nearby computations e.g.

A1 = B*C

A2 = B*D + B*B

.... Reuses B

L3 Cache

Main

Memory

Disk

Increasing Memory

Capacity Decreasing

Memory Speed (factor of 100 difference between processor

and main memory

speed)

HTML version of Basic Foils prepared 20 February 00

Foil 47 How to use more transistors?

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Parallelism in processing

multiple operations per cycle reduces CPI
soon thread level parallelism

Cache to give locality in data access

avoids latency and reduces CPI
also improves processor utilization

Both need (transistor) resources, so tradeoff

ILP (Instruction Loop Parallelism) drove performance gains of sequential microprocessors

ILP Success was not expected by aficionado's of parallel computing and this "delayed" relevance of scaling "outer-loop" parallelism as user's just purchased faster "sequential machines"

CPI = Clock Cycles per Instruction

HTML version of Basic Foils prepared 20 February 00

Foil 48 Possible Gain from ILP

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Hardware allowed many instructions per cycle using transistor budget for ILP parallelism

Limited Speed up (average 2.75 below) and inefficient (50% or worse)

However TOTALLY automatic (compiler generated)

HTML version of Basic Foils prepared 20 February 00

Foil 49 Trends in Parallelism

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Thread level parallelism is the on chip version of dominant scaling (data) parallelism

HTML version of Basic Foils prepared 20 February 00

Foil 50 Parallel Computing Rationale

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Transistors are still getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU

This chip would have little ILP (or parallelism in "innermost loops")

Thus next generation of processor chips more or less have to have multiple CPU's as gain from ILP limited

However getting much more speedup than this requires use of "outer loop" or data parallelism.

This is naturally implemented with threads on chip

The March of Parallelism: Multiple boards --> Multiple chips on a board --> Multiple CPU's on a chip

Implies that "outer loop" Parallel Computing gets more and more important in dominant commodity market

Use of "Outer Loop" parallelism can not (yet) be automated

HTML version of Basic Foils prepared 20 February 00

Foil 51 Importance of Memory Structure in High Performance Architectures

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!

Harder for sequential than parallel computers

Key limit is that memory gets slower as it gets larger and one tries to keep information as near to CPU as possible (in necessarily small size storage)

This Data locality is unifying concept of caches (sequential) and parallel computer multiple memory accesses

Problem seen in extreme case for Superconducting CPU's which can be 100X current CPU's but seem to need to use conventional memory

HTML version of Basic Foils prepared 20 February 00

Foil 52 Processor-DRAM Growing Performance Gap (latency)

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

From Jim Demmel -- remaining from David Culler

This implies need for complex memory systems to hide memory latency

�Proc

60%/yr.

DRAM

7%/yr.

100

1000

1980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-Memory

Performance Gap: (grows 50% / year)

Time

"Moore's Law"

Performance

HTML version of Basic Foils prepared 20 February 00

Foil 53 Parallel Computer Memory Structure

From Introduction to Computational Science CPS615 Computational Science Class -- Spring Semester 2000. *

Full HTML Index

For both parallel and sequential computers, cost is accessing remote memories with some form of "communication"

Data locality addresses in both cases

Differences are quantitative size of effect and what is done by user and what automatically

L3 Cache

Main

Memory

Board Level Interconnection Networks

....

System Level Interconnection Network

L3 Cache

Main

Memory

L3 Cache

Main

Memory

Slow

Very Slow

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Mar 16 2000