Full HTML for Basic Master Set of Foils for 1997 Session of CPS615

Full HTML for

Basic foilset Master Set of Foils for 1997 Session of CPS615

Given by Geoffrey C. Fox at Basic Simulation Track for Computational Science CPS615 on Fall Semester 97. Foils prepared 25 August 1997
Outside Index Summary of Material

This introduces the course which covers the essential programming and algorithmic skills needed for large scale (serious) computing (aka simulation)

We start with an overview of course itself and describe how it compares to other computational science offerings

We point out new modules on web based computing

In Introductory material, we describe some large simulations of interest (the so called Grand Challenges)

An Overview of Technology trends driving the computer industry (in web and simulation areas!)

Overview of computational science education program here and elsewhere

Parallel Computing in Society and why it works in computers!

Table of Contents for full HTML of Master Set of Foils for 1997 Session of CPS615

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Base Course in Simulation Track of Computational Science CPS615 Fall Semester 97
2

Abstract of CPS615 Introductory Lecture
3

What is Computational Science ?
4

Internetics is Another Way of Thinking
5

Basic CPS615 Contact Points
6

Course Organization
7

Material Covered in this Course
8

Structure of CPS615 - II
9

Motivating Applications
10

Effect of Feature Size on Performance
11

Ames Summer 97 Workshop on Device Technology -- Moore's Law - I
12

Ames Summer 97 Workshop on Device Technology -- Moore's Law - II
13

Ames Summer 97 Workshop on Device Technology -- Alternate Technologies I
14

Ames Summer 97 Workshop on Device Technology -- Alternate Technologies II
15

Ames Summer 97 Workshop on Device Technology -- New Logic Concepts
16

Ames Summer 97 Workshop on Device Technology -- RSFQ
17

Ames Summer 97 Workshop on Device Technology -- QCA -I
18

Ames Summer 97 Workshop on Device Technology -- QCA -II
19

Some Comments on Simulation and HPCC
20

Parallel Computing Rationale
21

Sequential Memory Structure
22

Parallel Computer Memory Structure
23

Back from the Analogy to Parallel Computers!
24

Introduction to Computer Architectures CPS615 Fall Semester 97
25

Abstract of CPS615 Architecture Discussion
26

What is a Pipeline -- Cafeteria Analogy?
27

Example of MIPS R4000 Floating Point
28

MIPS R4000 Floating Point Stages
29

Cache Coherent or Not?
30

Choices in Cache Coherence

Outside Index Summary of Material

HTML version of Basic Foils prepared 25 August 1997

Foil 1 Base Course in Simulation Track of Computational Science CPS615 Fall Semester 97

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

http://www.npac.syr.edu/users/gcf/cps615intro97

Geoffrey Fox

Syracuse University NPAC

111 College Place Syracuse NY 13244 4100

3154432163

HTML version of Basic Foils prepared 25 August 1997

Foil 2 Abstract of CPS615 Introductory Lecture

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

This introduces the course which covers the essential programming and algorithmic skills needed for large scale (serious) computing (aka simulation)

We start with an overview of course itself and describe how it compares to other computational science offerings

We point out new modules on web based computing

In Introductory material, we describe some large simulations of interest (the so called Grand Challenges)

An Overview of Technology trends driving the computer industry (in web and simulation areas!)

Overview of computational science education program here and elsewhere

Parallel Computing in Society and why it works in computers!

HTML version of Basic Foils prepared 25 August 1997

Foil 3 What is Computational Science ?

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Practical use of leading edge computer science technologies to address "real" applications

CPS615/713: Simulation Track

CPS606/616/640/714: Information Track

Large Scale Parallel Computing Low latency Closely Coupled Components

World Wide Distributed Computing Loosely Coupled Components

Parallel Algorithms Fortran90

HPF MPI Interconnection Networks

Transactions

Security

Compression

PERL JavaScript Multimedia

Wide Area Networks

Java

VRML Collaboration Integration (middleware) CORBA Databases

HTML version of Basic Foils prepared 25 August 1997

Foil 4 Internetics is Another Way of Thinking

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

These ideas come from Xiaoming Li, Peking University

CPS615

HTML version of Basic Foils prepared 25 August 1997

Foil 5 Basic CPS615 Contact Points

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST

Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST

NPAC administrative support: Nora Downey-Easter -- nora@npac.syr.edu, 3154431722 and room 3-206 CST

The above can be reached at cps615ad@npac.syr.edu

Students will be on alias cps615@npac.syr.edu

Homepage is: http://www.npac.syr.edu/projects/cps615fall97

There is a slightly out of date paper "Basic Issues and Current Status of Parallel Computing" by Fox (SCCS736 on NPAC technical reports)

HTML version of Basic Foils prepared 25 August 1997

Foil 6 Course Organization

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Graded on the basis of approximately 7 Homework sets which will be due Thursday of the week following day (Tuesday or Thursday given out)

There will be two projects -- one will start after message passing (MPI) discussed, the other about 60% through the course

Total grade is 50% homework, 50% the sum of two projects

All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web

HTML version of Basic Foils prepared 25 August 1997

Foil 7 Material Covered in this Course

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Status of High Performance Computing and Computation HPCC nationally

Including web linked simulation examples

What is Computational Science Nationally (and at Syracuse)

Technology driving forces (until around 2010)

Moore's law and exponentially increasing transistors

Elementary discussion of Parallel Computing in Society and why it must obviously work in simulation!

Sequential and Parallel Computer Architectures

Comparison of Architecture of World Wide Web and Parallel Systems

HTML version of Basic Foils prepared 25 August 1997

Foil 8 Structure of CPS615 - II

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Simple base Example -- Laplace's Equation

Illustrate parallel computing
Illustrate use of Web

Then we discuss software and algorithms (mini-applications) intermixed

Programming Models -- SPMD and Message Passing (MPI), Software Integration middleware (Java, WebFlow), Data Parallel (HPF, Fortran90)

Visualization -- Java Applets

Other tools: Collaboration , Performance Measurement

HTML version of Basic Foils prepared 25 August 1997

Foil 9 Motivating Applications

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Large Scale Simulations in Engineering

Model airflow around an aircraft
Study environmental issues -- flow of contaminants
Forecast weather
Oil Industry: Reservoir Simulation and analysis of Seismic data

Large Scale Academic Simulations (Physics, Chemistry, Biology)

Study of Evolution of Universe
Study of fundamental particles: Quarks and Gluons
Study of protein folding
Study of catalysts

"Other Critical Real World Applications"

Run optimization and classification algorithms in datamining of Enterprise Information Systems
Model Financial Instruments

HTML version of Basic Foils prepared 25 August 1997

Foil 10 Effect of Feature Size on Performance

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

f is the size of basic lines on chip and is now 0.25nm and will decrease inexorably to 100 nm in 2005-2010 time period

1992 Digital alpha chip was f= 750 nm. feature size and 1.7 x 106 transistors
1995 Sun UltraSparc was f= 500 nm. and 5.2 X 106 transistors

Roughly number of Transistors is proportional to (1/f)2

Speed of chip is proportional to (1/f)

This is no longer true as "wires" get so thin that propagation speed decreases and so clock speed levels off around 1 gigaherz

Figure of Merit is (1/f)3

Die size is also increasing like (1/f) and this enhances effect!

HTML version of Basic Foils prepared 25 August 1997

Foil 11 Ames Summer 97 Workshop on Device Technology -- Moore's Law - I

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

These foils come from summary by David Bailey

First of all, there seems to be general agreement that Moore's Law will not stop anytime soon.

The current state of the art in the semiconductor industry is 250 nm, as recently announced by Intel among others. Researchers are confident that current approaches can be used for at least another three generations with their current approach.

In the years ahead we may even see some manufacturers skip a generation, proceeding directly to significantly smaller feature sizes. This means that the 100 nm technology wall will be reached earlier than previously anticipated.

HTML version of Basic Foils prepared 25 August 1997

Foil 12 Ames Summer 97 Workshop on Device Technology -- Moore's Law - II

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Below about 100 nm feature sizes, progress will be more difficult, for various well known reasons, among them the lack of any substance that can be used as a lens for photolithography at such wavelengths.

Nonetheless, there are a number of "tricks" in the works, including the usage of diffraction gratings, parallel exposures, and others.

Groups working on these "ultimate" silicon techniques include Frances Houle's group at IBM Almaden and Jeff Baker's group at UC Berkeley.

Also helping will be some improvements such as better materials and increased wafer sizes. The consensus is that these techniques will be good for another two generations, to about 50 nm. Thus Moore's Law should continue until about 2010 or 2012.

HTML version of Basic Foils prepared 25 August 1997

Foil 13 Ames Summer 97 Workshop on Device Technology -- Alternate Technologies I

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Below about 50 nm feature sizes, it appears that a completely new device fabrication technology is needed.

Progress in electron beams and X-rays, which were the leading candidates of a few years ago, has stalled.

No one yet has any good idea how to scale e-beam lithography to economical production, and X-rays seem to destroy everything in their path, including the masks and devices. If something does eventually materialize along these lines, it won't be easy.

The researchers I spoke with nonetheless agree that future devices will be electronic for the foreseeable future.

As Stan Williams of HP points out, electrons are the ideal basis for device technology because they have basically zero size and mass, can travel at large fractions of the speed of light and interact strongly with matter.

HTML version of Basic Foils prepared 25 August 1997

Foil 14 Ames Summer 97 Workshop on Device Technology -- Alternate Technologies II

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

In contrast, DNA computing and the like are still curiosities with no clear path to practical high-speed computing.

One exciting development, which has emerged just in the past year or two, is nanotubes, i.e. cylindrical buckyballs.

It was recently discovered that a nanotube can be "tuned" from insulator to semiconductor to conductor just by changing the pitch of the helical structure of carbon atoms.

Kinks introduced in a tube can be used to form a conductor-semiconductor junction.

Molecular modeling is being used to further explore some of the possibilities.

HTML version of Basic Foils prepared 25 August 1997

Foil 15 Ames Summer 97 Workshop on Device Technology -- New Logic Concepts

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

More than one person with whom Bailey talked emphasized that future research should not focus exclusively on fabrication technology.

An equally important question is what is worth fabricating at these small feature sizes. This is because conventional transistor-based logic appears increasingly troublesome below about 50 nm.

Thus a new paradigm in logic design is as much needed as a new paradigm in fabrication technology.

HTML version of Basic Foils prepared 25 August 1997

Foil 16 Ames Summer 97 Workshop on Device Technology -- RSFQ

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Several at the meeting were aware of Likharev's rapid single flux quantum (RSFQ) superconducting logic. They agree that it is a very interesting development, and further that it is fairly certain to work well.

However, they feel that the requirement for liquid helium temperatures is a severe impediment to widespread utilization, and without commodity production it is unlikely to be economic.

See later for details and other difficulties!

According to Stan Williams, recently Likarev came to Hewlett Packard.

He offered HP all of his patents, only asking that they develop the technology.

But HP management still declined. Williams quipped that if some company decided to produce RSFQ technology, somehow assembling the $100 million or so required in start-up capital, the day after they produce their first devices it would pronounced a great success, but on the second day the company would fold financially.

HTML version of Basic Foils prepared 25 August 1997

Foil 17 Ames Summer 97 Workshop on Device Technology -- QCA -I

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

One interesting possibility for a future logic system is "quantum dot cellular automata", abbreviated QCA.

Wolfgang Porod of the University of Notre Dame gave a talk on these devices at Ames workshop. His group has proposed configurations of cells, each of which is a collection of five electron sites arranged in an X, with two free electrons within the cell.

According to their studies, configurations of these cells can propagate signals, invert signals, form majority vote logic, and thus can be formed to produce practical devices.

The devices are semi-classical, semi-quantum, in that interactions within a cell can involve quantum tunneling, but interactions between cells involve only classical electromagnetic effects.

HTML version of Basic Foils prepared 25 August 1997

Foil 18 Ames Summer 97 Workshop on Device Technology -- QCA -II

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Like RSFQ, initial demonstration devices of QCA logic will need to be super-cooled, assuming fabrication feature sizes in the tens of nanometers.

But unlike RSFQ (from what we know of RSFQ), the smaller the fabrication dimension, the higher the tolerable temperature, according to thermodynamic analyses.

Indeed, if one could eventually fabricate QCA devices in the realm of 2-5 nanometers, then they could operate at room temperature!

HTML version of Basic Foils prepared 25 August 1997

Foil 19 Some Comments on Simulation and HPCC

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

HPCC is a maturing field with many organizations installing large scale systems

These include NSF (academic computations), DoE (Dept of Energy) and DoD (Defense)

There are new applications with new algorithmic challenges

Our work on Black Holes has novel adaptive meshes
On earthquake simulation, new "fast multipole" approaches to a problem not tackled this way before
On financial modeling, new Monte Carlo methods for complex options

However these are not "qualitatively" new concepts

Software ideas are "sound" but note simulation is only a few percent of total computer market

Use where possible software from other sources (such as web)which can spend more money on each feature!

HTML version of Basic Foils prepared 25 August 1997

Foil 20 Parallel Computing Rationale

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU

Essentially impossible to increase clock speed and so must exploit increasing transistor density in figure of merit (1/f)2-4

Already we build chips with some factor of ten more transistors than this and this is used for "automatic" instruction level parallelism.

This corresponds to parallelism in "innermost loops"

However getting much more speedup than this requires use of "outer loop" or data parallelism.

Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!

Harder for sequential than parallel computers
Data locality is unifying concept!

HTML version of Basic Foils prepared 25 August 1997

Foil 21 Sequential Memory Structure

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Data locality implies CPU finds information it needs in cache which stores most recently accessed information

This means one reuses a given memory reference in many nearby computations e.g.

A1 = B*C

A2 = B*D + B*B

.... Reuses B

L3 Cache

Main

Memory

Disk

Increasing Memory

Capacity Decreasing

Memory Speed (factor of 100 difference between processor

and main memory

speed)

HTML version of Basic Foils prepared 25 August 1997

Foil 22 Parallel Computer Memory Structure

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

For both parallel and sequential computers, cost is accessing remote memories with some form of "communication"

Data locality addresses in both cases

Differences are quantitative size of effect and what is done by user and what automatically

HTML version of Basic Foils prepared 25 August 1997

Foil 23 Back from the Analogy to Parallel Computers!

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Now we go through a set of semi-realistic examples of parallel computing and show they use various forms of data parallelism

Seismic Wave Simulation: Regular mesh of grid points

Cosmology (Universe) Simulation: Irregular collection of stars or galaxies

Computer Chess: "Data" is now parts of a game tree with major complication of pruning algorithms

"Deep Blue" uses this global parallelism combined with optimal locally parallel evaluation of "score" of a position using special hardware

Defending the Nation: Tracking multiple missiles achieves parallelism from set of missiles

Finite Element (NASTRAN) structures problem: Irregular collection of nodal (grid) points clustered near a crack

HTML version of Basic Foils prepared 25 August 1997

Foil 24 Introduction to Computer Architectures CPS615 Fall Semester 97

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

http://www.npac.syr.edu/users/gcf/cps615arch97

Geoffrey Fox

Syracuse University NPAC

111 College Place Syracuse NY 13244 4100

3154432163

HTML version of Basic Foils prepared 25 August 1997

Foil 25 Abstract of CPS615 Architecture Discussion

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

We describe the simple nature of parallel machines like the nCUBE2

Technologies include conventional, superconducting and Quantum systems but the former dominates

Sequential architectures are dominated by memory (cache) issues

Vector, SIMD MIMD are classic architectures

There is a growing interest in distributed metacomputers

Special purpose machines are typically not so successful

The details of networks is not as important as it used to be!

HTML version of Basic Foils prepared 25 August 1997

Foil 26 What is a Pipeline -- Cafeteria Analogy?

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Familiar from such everyday activities as getting food in cafeteria where one processes one person per "clock cycle" where

clock cycle here is maximum time anybody takes at a single "stage" where stage is here component of meal (salad, entr�e etc.)

Note any one person takes about 5 clock cycles in this pipeline but the pipeline processes one person per clock cycle

Pipeline has problem if there is a "stall" -- here that one of the people wants an entr�e which needs to be fetched from kitchen. This delays everybody!

In computer case, stall is caused typically by data not being ready for a particular instruction

HTML version of Basic Foils prepared 25 August 1997

Foil 27 Example of MIPS R4000 Floating Point

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Taken from David Patterson CS252 Berkeley Fall 1996

3 Functional Units FP Adder, FP Multiplier, FP Divider

8 Kinds of Stages in FP Units

Stage Functional Unit Description

A FP Adder Mantissa ADD stage

D FP Divider Divide Pipeline stage

E FP Multiplier Exception Test stage

M FP Multiplier First stage of multiplier

N FP Multiplier Second stage of multiplier

R FP Adder Rounding stage

S FP Adder Operand Shift stage

U Unpack Floating point numbers

HTML version of Basic Foils prepared 25 August 1997

Foil 28 MIPS R4000 Floating Point Stages

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Several different pipelines with different lengths!

Add,Subtract- 4 clocks:U S+A A+R R+S

Multiply - 8 clocks: U E+M M M M N+A R

Divide- 38 clocks: U A R D(28) D+A D+R D+R D+A D+R A R

Square Root- 110 clocks: U E (A+R)(108) A R

Negate- 2 clocks: U S

Absolute Value- 2 clocks: U S

Floating Point Compare- 3 clocks:U A R

HTML version of Basic Foils prepared 25 August 1997

Foil 29 Cache Coherent or Not?

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Suppose two processors cache the same variable stored in memory of one of the processors

One must ensure cache coherence so that when one cache value changes, all do!

Interconnection Network

....

Interconnection Network

Cached Value of same shared variable

HTML version of Basic Foils prepared 25 August 1997

Foil 30 Choices in Cache Coherence

From Master Set of Foils for 1997 Session of CPS615 Basic Simulation Track for Computational Science CPS615 -- Fall Semester 97. *

Full HTML Index

Machines like the SGI Origin 2000 have a distributed shared memory with a so called directory implementation (pioneered in DASH project at Stanford) of cache coherence

Machines like SGI Cray T3E are distributed memory but do have fast get and put so as to be able to access single variables stored on remote memories

This gives performance advantage of shared memory without the programming advantage but complex hardware of Origin 2000

Origin 2000 approach does not scale as well as Cray T3E and large Origin 2000 systems must use message passing to link 128 node coherent memory subsystems

Cray T3E offers a uniform (but not as good for small number of nodes) interface

Message passing / distributed memory is natural web model

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Sep 22 1997