Full HTML for Scripted Computational Science Issues for GEM

Full HTML for

Scripted foilset Computational Science Issues for GEM

Given by Geoffrey C. Fox at GEM WorkShop Santa Fe on 24-25 October 97. Foils prepared 27 October 1997
Outside Index Summary of Material

We briefly discuss relevant features of the national scene

We state the computational problem for GEM and analyze parallel computing issues for key steps

We map into a particle dynamics analogy using Green's function formalism and consider O(N2) cut-off O(N) and fast multipole methods

We go in detail through the fast multipole method

We expect object oriented approaches to be very important booth at language level and in using technologies like CORBA to integrate world-wide collaboration, simulations and experiments.

Table of Contents for full HTML of Computational Science Issues for GEM

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Computational Science and the GEM Project GEM Workshop Santa Fe October 24-25 1997
2

Abstract of GEM Workshop Presentation on Computational Science
3

Some International HPCC Activities
4

Some Relevant Lessons from HPCC
5

New Initiatives of Current HPCC
6

Possible Special Features of Earthquake Simulation
7

Basic Computational Structure - I
8

Basic Computational Structure - II
9

Analysis of Computational Structure
10

First two Solutions of O(N2) Computational Complexity
11

Second two Solutions of O(N2) Computational Complexity
12

Basic Idea of Fast Multipole Algorithm
13

Intermediate results of a computation of 322 million particles on ASCI Red
14

Intermediate results of a computation of 9.7 million particles on PC Cluster loki
15

Some Performance Results of Interest from Salmon and Warren
16

Hierarchical Breakup of 2D Space
17

Simple Illustration of Tree Data Structure
18

Tree Structure for 10,000 bodies centrally clustered in a disk
19

Generation of Tree for a small number of particles
20

3 Approximations to Force on a Particle in Fast Multipole Approach
21

Parallelism in O(N2) N Body Approach I
22

Parallelism in O(N2) N Body Approach II
23

Parallelism in Cut Off Force Approach
24

Problems in Cut off Force Parallelism
25

Example of Graphics Rendering
26

Generation of Keys in Salmon Warren Method
27

Generation of 3D Key for Salmon Warren
28

Parallelism in Salmon Warren Approach
29

Two Space Filling Curves
30

Morton Curve split up into 8 processors represented by different gray levels
31

Space Filling Curve chopped up into equal length parts
32

Parallel Algorithm in Fast Multipole I
33

Locally essential Data for Processor in Bottom Left Hand Corner of Processor Array
34

Parallel Algorithm in Fast Multipole II
35

Scaling Ideas in GEM
36

Lessons from Other Fields
37

General Web based Middle Tier Server Architecture
38

High Functionality Software Layer
39

Diagram of HPCORBA Architecture

Outside Index Summary of Material

HTML version of Scripted Foils prepared 27 October 1997

Foil 1 Computational Science and the GEM Project GEM Workshop Santa Fe October 24-25 1997

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Geoffrey Fox

Syracuse University

NPAC

111 College Place Syracuse NY 13244 4100

3154432163

HTML version of Scripted Foils prepared 27 October 1997

Foil 2 Abstract of GEM Workshop Presentation on Computational Science

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

We briefly discuss relevant features of the national scene

We state the computational problem for GEM and analyze parallel computing issues for key steps

We map into a particle dynamics analogy using Green's function formalism and consider O(N2) cut-off O(N) and fast multipole methods

We go in detail through the fast multipole method

We expect object oriented approaches to be very important booth at language level and in using technologies like CORBA to integrate world-wide collaboration, simulations and experiments.

HTML version of Scripted Foils prepared 27 October 1997

Foil 3 Some International HPCC Activities

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

There are national HPCC programs in:

USA
China
Europe
Japan

The USA activities include

NCSA and NPACI NSF Centers
DoE ASCI Program
DoD Modernization Program involving 4 major centers and 11 application areas (CTA's)
- Dayton (Wright Patterson Airforce Base), Aberdeen(ARL), CEWES (Vicksburg), NAVO (Stennis space Center in Mississippi)

HTML version of Scripted Foils prepared 27 October 1997

Foil 4 Some Relevant Lessons from HPCC

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Software and Algorithms will be more important to GEM than hardware

Hardware is driven by "commodity technology" e.g. use of chips from PC and workstation market

Detailed hardware architecture irrelevant if one uses software standards such as:
- Fortran, C++, Java Programming Languages
- MPI for Message passing in parallel codes
- Web and CORBA for distributed information and objects
If you need to a broad class of approaches to earthquake prediction, one cannot beat commercial efforts
- e.g. there isn't a useful GEM special purpose computer
You can spend more money and get more powerful machines

HTML version of Scripted Foils prepared 27 October 1997

Foil 5 New Initiatives of Current HPCC

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Some of current new developments focus on

"Data Intensive" applications which merge data processing (from instruments) and simulation
Distributed Computing or Metacomputing
- Geographically distributed computers
Collaboration
- Linking Computers and People together
Multidisciplinary Problems
- Link several Codes and Disciplines together to solve a large problem with several components

HTML version of Scripted Foils prepared 27 October 1997

Foil 6 Possible Special Features of Earthquake Simulation

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Link of geographically distributed observations and computation

Seismic sensors, SAR etc.

Special Purpose Computers for O(N2) particle dynamics that is used in some approaches to some parts of problem

Less legacy code than other application areas

So can adopt more easily modern technology such as Java(Web) and CORBA

HTML version of Scripted Foils prepared 27 October 1997

Foil 7 Basic Computational Structure - I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Green's Functions Relate Stress and Deformation to Integrals over slip deficit f(x,t) = slip(x,t)-V*t

After discretization, Green's function integrals become sums with

deformationi and stressi = Sj Green(i,j,t) fj(t)

with indices i and j running over fault segments

The friction law then determines if there is an "earthquake" which is a correlated change in the slip deficit

HTML version of Scripted Foils prepared 27 October 1997

Foil 8 Basic Computational Structure - II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

The Green's Function is in first approximation independent of time and time only enters through time dependence of slip deficit

We are evolving a set of N differential equations -- think of each fault segment "i"as a particle -and can do this either as either

deterministic "particle" dynamics
Monte Carlo

These are two standard choices in "ordinary" particle dynamics

After each time/Monte Carlo step one needs to see if a slip will occur using friction law

HTML version of Scripted Foils prepared 27 October 1997

Foil 9 Analysis of Computational Structure

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

So basic computation is "long range" and O(N2) for N fault segments

We can compare with other classic long range force problems such as particle dynamics (chemistry, astrophysics) and vortex method in CFD

This problem is linear in independent variables fi(t)
The position of fault segments is irregular but roughly fixed
Force law is that of a "double-couple" (equal and opposite dipoles)

But we have the same essential computational feature that O(N2) terms will lead to high computational load which needs to be addressed somehow.

HTML version of Scripted Foils prepared 27 October 1997

Foil 10 First two Solutions of O(N2) Computational Complexity

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

1) Ignore it and implement straightforwardly on a conventional computer

2) Note that one can build special purpose parallel machines as need less communication bandwidth (and just one dimensional pipeline) and much less memory per CPU power than for general purpose (parallel) computer

parallel algorithm well known and incredibly efficient
This is approach of GRAPE (Gravity Pipeline) machine which could be about a factor of 100 more cost effective for astrophysics on large problems

HTML version of Scripted Foils prepared 27 October 1997

Foil 11 Second two Solutions of O(N2) Computational Complexity

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

3) Cutoff long range forces (common strategy in chemistry where Forces fall off faster (E.g. 1/r5) than in gravity)

This leads to a classic nearest neighbor O(N) algorithm with irregular geometry causes minor implementation issues

Use the new "fast multipole" algorithms which have been very successful in other related areas

Astrophysics
Computational Electromagnetic
Vortex approach to CFD

Instead of zeroing out contributions of far away particles, these methods do an ingenious multipole expansion of the contributions of far away regions

HTML version of Scripted Foils prepared 27 October 1997

Foil 12 Basic Idea of Fast Multipole Algorithm

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

These ideas came from many people: Appel, Barnes-Hut, Greengard and Rokhlin

Salmon (Caltech) and Warren(Los Alamos) implemented very well for astrophysics with upto 108 particles

Instead of summing over the M stars in a far away cluster, "just" replace by the contribution of the centroid

Fast Multipole "just" applies this idea recursively with a sum over multipoles extending simple centroid formula.

HTML version of Scripted Foils prepared 27 October 1997

Foil 13 Intermediate results of a computation of 322 million particles on ASCI Red

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 14 Intermediate results of a computation of 9.7 million particles on PC Cluster loki

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 15 Some Performance Results of Interest from Salmon and Warren

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

16 Pentium Pro's in a cluster (cost $30K today) reach about 1 gigaflop on a tree code with 107 particles

6800 Pentium Pro's (ASCI Sandia machine) with around 300 million particles

600 gigaflops with simple O(N2) code
400 gigaflops with tree code

But tree code is 105 times as efficient as O(N2) code on this problem

i.e if Execution Time(na�ve code) is c . N2
then Time (tree code) is 3000 c N
and tree code is more efficient than O(N2) code for more than 3000 particles

HTML version of Scripted Foils prepared 27 October 1997

Foil 16 Hierarchical Breakup of 2D Space

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Real astrophysics is 3D but illustrate for 2D case

Chop space recursively into square cells so that at most one (or some specified number) of particles is in each "leaf" cell

HTML version of Scripted Foils prepared 27 October 1997

Foil 17 Simple Illustration of Tree Data Structure

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

We show this recursive break-up in a simple case

Note refinement depends on number of particle density

HTML version of Scripted Foils prepared 27 October 1997

Foil 18 Tree Structure for 10,000 bodies centrally clustered in a disk

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 19 Generation of Tree for a small number of particles

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 20 3 Approximations to Force on a Particle in Fast Multipole Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

1)Simple Direct Sum over all particles in region

2)Just Single Centroid Contribution

3)Open up region into 8 cells and use centroid of each cell

HTML version of Scripted Foils prepared 27 October 1997

Foil 21 Parallelism in O(N2) N Body Approach I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

O(N2) Algorithms are extremely efficient on parallel machines as

Load balancing guaranteed with equal numbers of particles in each node
Communication overhead goes like 1/(grain size n = number of particles in node) . Typical time to communicate word/ typical time to do floating computation) ~ 0.25(from algorithm details) . 10( = tcomm/tfloat)/n
So very efficient along as around 50 particles or more per processor

HTML version of Scripted Foils prepared 27 October 1997

Foil 22 Parallelism in O(N2) N Body Approach II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

This type of performance should be compared with classic grid problems in d dimensions where overhead proportional to edge over area and goes like 1/n1/d

So that grain size in 3D local problems is "cube" of that needed in O(N2) problem

Of course reason that communication overhead is so small is that computation is exceptionally large!

Every silver lining has a cloud

Note can "understand" 1/n dependence as write formula as Sj Si Force on i = function(i and j) --

important that outer sum is j not i!

Each j is communicated to each node and is then used to increment force on n/2 (Newton's law of action and reaction) particles in given node

HTML version of Scripted Foils prepared 27 October 1997

Foil 23 Parallelism in Cut Off Force Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

This is classic "nearest neighbor" problem where one uses "domain decomposition and communicates particles around edge of domain of each processor.

Computation a n
Communication a n(1-1/d) in d dimensions

HTML version of Scripted Foils prepared 27 October 1997

Foil 24 Problems in Cut off Force Parallelism

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Earthquakes cause load imbalance as "action" concentrated in processors where quake occurs

Can be addressed at cost of increasing communication by using "block-cyclic" decomposition so each processor gets part of action

Problem particularly bad for Monte Carlo as like classic "clustering algorithms" needed near critical points to get good performance and these have poor parallel performance

Long range versions of Monte Carlo will perform satisfactorily as can parallelize over long range force part of computation

HTML version of Scripted Foils prepared 27 October 1997

Foil 25 Example of Graphics Rendering

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Here we show a 16 by 16 array of pixels with either CYCLIC or 8 by 8 two dimensional BLOCK,BLOCK

HTML version of Scripted Foils prepared 27 October 1997

Foil 26 Generation of Keys in Salmon Warren Method

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

In 2(3) dimensions add 2(3) binary bits at each step down tree

HTML version of Scripted Foils prepared 27 October 1997

Foil 27 Generation of 3D Key for Salmon Warren

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Each time one goes down one level of tree, add X Y Z bit

64 bit key supports 21 levels

HTML version of Scripted Foils prepared 27 October 1997

Foil 28 Parallelism in Salmon Warren Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

First you must decompose cells among processors which is nontrivial as irregular dynamic (maybe not in GEM) tree like structure

Originally used orthogonal recursive bisection chopping space succesively in 2 in different dimensions

However better to run "a space filling curve" through the cells and divide curve into chunks of equal work (which NOT equal numbers of particles as more computation in dense areas of particles)

Note fast multipole essentially adds in nearby particles using "direct computation approach"

Simplest is to sort keys and divide up this sorted list (Morton curve)

HTML version of Scripted Foils prepared 27 October 1997

Foil 29 Two Space Filling Curves

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 30 Morton Curve split up into 8 processors represented by different gray levels

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 31 Space Filling Curve chopped up into equal length parts

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 32 Parallel Algorithm in Fast Multipole I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Parallelism is clear -- update particles in parallel but what about communication?

It requires a thoughtful algorithm but can efficiently fetch information needed.

Fetch all the information needed by ANY particle in a given processor as multipole approach implies that particles tend to need same long range components if nearby
call this "locally essential" data
cf. O(N) multipole-multipole approach

Effectively top of tree replicated in all processors and tey just differ near bottom which is expanded in a way that depends on where processor is

HTML version of Scripted Foils prepared 27 October 1997

Foil 33 Locally essential Data for Processor in Bottom Left Hand Corner of Processor Array

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 34 Parallel Algorithm in Fast Multipole II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Having fetched all needed data, can update all particles in processors with high efficiency seen in O(N2) problem.

Namely communicated data is re-used many times as needed by many particles in processor
In practice have some 10,000 or more particles per processor

In astrophysics, also need to construct tree and rebalance load at essentially each time step

not needed in GEM

There is a straightforward parallel tree building algorithm

These methods outperform O(N2) when more than a few thousand particles (unless very inhomogeneous)

HTML version of Scripted Foils prepared 27 October 1997

Foil 35 Scaling Ideas in GEM

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

Note that tree codes are a simple example of general need in GEM to have technologies that bridge different physical scales

This is well known for multigrid approach to elliptic partial differential equation solvers

Information travels in O(1 or log K) steps and not K steps for a grid of size K

HTML version of Scripted Foils prepared 27 October 1997

Foil 36 Lessons from Other Fields

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

If we look at

Computational Fluid Dynamics

Global Climate or Weather Simulations

Flow in Porous Media

Study of Phase Transitions

We see as in GEM, uncertainties in some of critical basic physics but these fields have made clear progress even with a mix of phenomenology (turbulence models) and fundamental equations (e.g. Navier Stokes equations)

HTML version of Scripted Foils prepared 27 October 1997

Foil 37 General Web based Middle Tier Server Architecture

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

We have a set of Services hosted by Web Servers which form the middleware and accessed by clients

Groups of clients (electronic societies) are linked by Java server based collaboration systems such as TANGO or Habanero

Access

Resources

Store

Multimedia Information

Collaboration Server

File Systems

and/or Database

Object Broker

Database

Simulation

e.g. NEOS

Netsolve

Computer

Person2

Shared

WhiteBoard

Shared Client Appl

Person1

General User

HTML version of Scripted Foils prepared 27 October 1997

Foil 38 High Functionality Software Layer

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

This is "middleware" which is implemented in simplest form as a network of Java Servers

Web based distributed or Metacomputing based on linking mobile Web modules
Collaboration as in Habanero or Tango
Support services including databases, object brokers, instruments, MPP's etc.

Access

Resources

Store

Multimedia Information

Collaboration Server

File Systems

and/or Database

Object Broker

Database

Simulation (Network-enabled

servers such as NEOS, Netsolve)

Sequential

or Parallel

Computer

HTML version of Scripted Foils prepared 27 October 1997

Foil 39 Diagram of HPCORBA Architecture

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *

Full HTML Index

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Oct 27 1997