Full HTML for

Scripted foilset Computational Science Issues for GEM

Given by Geoffrey C. Fox at GEM WorkShop Santa Fe on 24-25 October 97. Foils prepared 27 October 1997
Outside Index Summary of Material


We briefly discuss relevant features of the national scene
We state the computational problem for GEM and analyze parallel computing issues for key steps
We map into a particle dynamics analogy using Green's function formalism and consider O(N2) cut-off O(N) and fast multipole methods
We go in detail through the fast multipole method
We expect object oriented approaches to be very important booth at language level and in using technologies like CORBA to integrate world-wide collaboration, simulations and experiments.

Table of Contents for full HTML of Computational Science Issues for GEM

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Computational Science and the GEM Project GEM Workshop Santa Fe October 24-25 1997
2 Abstract of GEM Workshop Presentation on Computational Science
3 Some International HPCC Activities
4 Some Relevant Lessons from HPCC
5 New Initiatives of Current HPCC
6 Possible Special Features of Earthquake Simulation
7 Basic Computational Structure - I
8 Basic Computational Structure - II
9 Analysis of Computational Structure
10 First two Solutions of O(N2) Computational Complexity
11 Second two Solutions of O(N2) Computational Complexity
12 Basic Idea of Fast Multipole Algorithm
13 Intermediate results of a computation of 322 million particles on ASCI Red
14 Intermediate results of a computation of 9.7 million particles on PC Cluster loki
15 Some Performance Results of Interest from Salmon and Warren
16 Hierarchical Breakup of 2D Space
17 Simple Illustration of Tree Data Structure
18 Tree Structure for 10,000 bodies centrally clustered in a disk
19 Generation of Tree for a small number of particles
20 3 Approximations to Force on a Particle in Fast Multipole Approach
21 Parallelism in O(N2) N Body Approach I
22 Parallelism in O(N2) N Body Approach II
23 Parallelism in Cut Off Force Approach
24 Problems in Cut off Force Parallelism
25 Example of Graphics Rendering
26 Generation of Keys in Salmon Warren Method
27 Generation of 3D Key for Salmon Warren
28 Parallelism in Salmon Warren Approach
29 Two Space Filling Curves
30 Morton Curve split up into 8 processors represented by different gray levels
31 Space Filling Curve chopped up into equal length parts
32 Parallel Algorithm in Fast Multipole I
33 Locally essential Data for Processor in Bottom Left Hand Corner of Processor Array
34 Parallel Algorithm in Fast Multipole II
35 Scaling Ideas in GEM
36 Lessons from Other Fields
37 General Web based Middle Tier Server Architecture
38 High Functionality Software Layer
39 Diagram of HPCORBA Architecture

Outside Index Summary of Material



HTML version of Scripted Foils prepared 27 October 1997

Foil 1 Computational Science and the GEM Project GEM Workshop Santa Fe October 24-25 1997

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Geoffrey Fox
Syracuse University
NPAC
111 College Place Syracuse NY 13244 4100
3154432163

HTML version of Scripted Foils prepared 27 October 1997

Foil 2 Abstract of GEM Workshop Presentation on Computational Science

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
We briefly discuss relevant features of the national scene
We state the computational problem for GEM and analyze parallel computing issues for key steps
We map into a particle dynamics analogy using Green's function formalism and consider O(N2) cut-off O(N) and fast multipole methods
We go in detail through the fast multipole method
We expect object oriented approaches to be very important booth at language level and in using technologies like CORBA to integrate world-wide collaboration, simulations and experiments.

HTML version of Scripted Foils prepared 27 October 1997

Foil 3 Some International HPCC Activities

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
There are national HPCC programs in:
  • USA
  • China
  • Europe
  • Japan
The USA activities include
  • NCSA and NPACI NSF Centers
  • DoE ASCI Program
  • DoD Modernization Program involving 4 major centers and 11 application areas (CTA's)
    • Dayton (Wright Patterson Airforce Base), Aberdeen(ARL), CEWES (Vicksburg), NAVO (Stennis space Center in Mississippi)

HTML version of Scripted Foils prepared 27 October 1997

Foil 4 Some Relevant Lessons from HPCC

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Software and Algorithms will be more important to GEM than hardware
Hardware is driven by "commodity technology" e.g. use of chips from PC and workstation market
  • Detailed hardware architecture irrelevant if one uses software standards such as:
    • Fortran, C++, Java Programming Languages
    • MPI for Message passing in parallel codes
    • Web and CORBA for distributed information and objects
  • If you need to a broad class of approaches to earthquake prediction, one cannot beat commercial efforts
    • e.g. there isn't a useful GEM special purpose computer
  • You can spend more money and get more powerful machines

HTML version of Scripted Foils prepared 27 October 1997

Foil 5 New Initiatives of Current HPCC

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Some of current new developments focus on
  • "Data Intensive" applications which merge data processing (from instruments) and simulation
  • Distributed Computing or Metacomputing
    • Geographically distributed computers
  • Collaboration
    • Linking Computers and People together
  • Multidisciplinary Problems
    • Link several Codes and Disciplines together to solve a large problem with several components

HTML version of Scripted Foils prepared 27 October 1997

Foil 6 Possible Special Features of Earthquake Simulation

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Link of geographically distributed observations and computation
  • Seismic sensors, SAR etc.
Special Purpose Computers for O(N2) particle dynamics that is used in some approaches to some parts of problem
Less legacy code than other application areas
  • So can adopt more easily modern technology such as Java(Web) and CORBA

HTML version of Scripted Foils prepared 27 October 1997

Foil 7 Basic Computational Structure - I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Green's Functions Relate Stress and Deformation to Integrals over slip deficit f(x,t) = slip(x,t)-V*t
After discretization, Green's function integrals become sums with
  • deformationi and stressi = Sj Green(i,j,t) fj(t)
with indices i and j running over fault segments
The friction law then determines if there is an "earthquake" which is a correlated change in the slip deficit

HTML version of Scripted Foils prepared 27 October 1997

Foil 8 Basic Computational Structure - II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
The Green's Function is in first approximation independent of time and time only enters through time dependence of slip deficit
We are evolving a set of N differential equations -- think of each fault segment "i"as a particle -and can do this either as either
  • deterministic "particle" dynamics
  • Monte Carlo
These are two standard choices in "ordinary" particle dynamics
After each time/Monte Carlo step one needs to see if a slip will occur using friction law

HTML version of Scripted Foils prepared 27 October 1997

Foil 9 Analysis of Computational Structure

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
So basic computation is "long range" and O(N2) for N fault segments
We can compare with other classic long range force problems such as particle dynamics (chemistry, astrophysics) and vortex method in CFD
  • This problem is linear in independent variables fi(t)
  • The position of fault segments is irregular but roughly fixed
  • Force law is that of a "double-couple" (equal and opposite dipoles)
But we have the same essential computational feature that O(N2) terms will lead to high computational load which needs to be addressed somehow.

HTML version of Scripted Foils prepared 27 October 1997

Foil 10 First two Solutions of O(N2) Computational Complexity

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
1) Ignore it and implement straightforwardly on a conventional computer
2) Note that one can build special purpose parallel machines as need less communication bandwidth (and just one dimensional pipeline) and much less memory per CPU power than for general purpose (parallel) computer
  • parallel algorithm well known and incredibly efficient
  • This is approach of GRAPE (Gravity Pipeline) machine which could be about a factor of 100 more cost effective for astrophysics on large problems

HTML version of Scripted Foils prepared 27 October 1997

Foil 11 Second two Solutions of O(N2) Computational Complexity

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
3) Cutoff long range forces (common strategy in chemistry where Forces fall off faster (E.g. 1/r5) than in gravity)
  • This leads to a classic nearest neighbor O(N) algorithm with irregular geometry causes minor implementation issues
Use the new "fast multipole" algorithms which have been very successful in other related areas
  • Astrophysics
  • Computational Electromagnetic
  • Vortex approach to CFD
Instead of zeroing out contributions of far away particles, these methods do an ingenious multipole expansion of the contributions of far away regions

HTML version of Scripted Foils prepared 27 October 1997

Foil 12 Basic Idea of Fast Multipole Algorithm

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
These ideas came from many people: Appel, Barnes-Hut, Greengard and Rokhlin
Salmon (Caltech) and Warren(Los Alamos) implemented very well for astrophysics with upto 108 particles
Instead of summing over the M stars in a far away cluster, "just" replace by the contribution of the centroid
Fast Multipole "just" applies this idea recursively with a sum over multipoles extending simple centroid formula.

HTML version of Scripted Foils prepared 27 October 1997

Foil 13 Intermediate results of a computation of 322 million particles on ASCI Red

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 14 Intermediate results of a computation of 9.7 million particles on PC Cluster loki

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 15 Some Performance Results of Interest from Salmon and Warren

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
16 Pentium Pro's in a cluster (cost $30K today) reach about 1 gigaflop on a tree code with 107 particles
6800 Pentium Pro's (ASCI Sandia machine) with around 300 million particles
  • 600 gigaflops with simple O(N2) code
  • 400 gigaflops with tree code
But tree code is 105 times as efficient as O(N2) code on this problem
  • i.e if Execution Time(naïve code) is c . N2
  • then Time (tree code) is 3000 c N
  • and tree code is more efficient than O(N2) code for more than 3000 particles

HTML version of Scripted Foils prepared 27 October 1997

Foil 16 Hierarchical Breakup of 2D Space

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Real astrophysics is 3D but illustrate for 2D case
Chop space recursively into square cells so that at most one (or some specified number) of particles is in each "leaf" cell

HTML version of Scripted Foils prepared 27 October 1997

Foil 17 Simple Illustration of Tree Data Structure

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
We show this recursive break-up in a simple case
Note refinement depends on number of particle density

HTML version of Scripted Foils prepared 27 October 1997

Foil 18 Tree Structure for 10,000 bodies centrally clustered in a disk

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 19 Generation of Tree for a small number of particles

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 20 3 Approximations to Force on a Particle in Fast Multipole Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
1)Simple Direct Sum over all particles in region
2)Just Single Centroid Contribution
3)Open up region into 8 cells and use centroid of each cell

HTML version of Scripted Foils prepared 27 October 1997

Foil 21 Parallelism in O(N2) N Body Approach I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
O(N2) Algorithms are extremely efficient on parallel machines as
  • Load balancing guaranteed with equal numbers of particles in each node
  • Communication overhead goes like 1/(grain size n = number of particles in node) . Typical time to communicate word/ typical time to do floating computation) ~ 0.25(from algorithm details) . 10( = tcomm/tfloat)/n
  • So very efficient along as around 50 particles or more per processor

HTML version of Scripted Foils prepared 27 October 1997

Foil 22 Parallelism in O(N2) N Body Approach II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
This type of performance should be compared with classic grid problems in d dimensions where overhead proportional to edge over area and goes like 1/n1/d
So that grain size in 3D local problems is "cube" of that needed in O(N2) problem
Of course reason that communication overhead is so small is that computation is exceptionally large!
  • Every silver lining has a cloud
Note can "understand" 1/n dependence as write formula as Sj Si Force on i = function(i and j) --
important that outer sum is j not i!
Each j is communicated to each node and is then used to increment force on n/2 (Newton's law of action and reaction) particles in given node

HTML version of Scripted Foils prepared 27 October 1997

Foil 23 Parallelism in Cut Off Force Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
This is classic "nearest neighbor" problem where one uses "domain decomposition and communicates particles around edge of domain of each processor.
  • Computation a n
  • Communication a n(1-1/d) in d dimensions

HTML version of Scripted Foils prepared 27 October 1997

Foil 24 Problems in Cut off Force Parallelism

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Earthquakes cause load imbalance as "action" concentrated in processors where quake occurs
Can be addressed at cost of increasing communication by using "block-cyclic" decomposition so each processor gets part of action
Problem particularly bad for Monte Carlo as like classic "clustering algorithms" needed near critical points to get good performance and these have poor parallel performance
Long range versions of Monte Carlo will perform satisfactorily as can parallelize over long range force part of computation

HTML version of Scripted Foils prepared 27 October 1997

Foil 25 Example of Graphics Rendering

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Here we show a 16 by 16 array of pixels with either CYCLIC or 8 by 8 two dimensional BLOCK,BLOCK

HTML version of Scripted Foils prepared 27 October 1997

Foil 26 Generation of Keys in Salmon Warren Method

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
In 2(3) dimensions add 2(3) binary bits at each step down tree

HTML version of Scripted Foils prepared 27 October 1997

Foil 27 Generation of 3D Key for Salmon Warren

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Each time one goes down one level of tree, add X Y Z bit
64 bit key supports 21 levels

HTML version of Scripted Foils prepared 27 October 1997

Foil 28 Parallelism in Salmon Warren Approach

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
First you must decompose cells among processors which is nontrivial as irregular dynamic (maybe not in GEM) tree like structure
Originally used orthogonal recursive bisection chopping space succesively in 2 in different dimensions
However better to run "a space filling curve" through the cells and divide curve into chunks of equal work (which NOT equal numbers of particles as more computation in dense areas of particles)
  • Note fast multipole essentially adds in nearby particles using "direct computation approach"
Simplest is to sort keys and divide up this sorted list (Morton curve)

HTML version of Scripted Foils prepared 27 October 1997

Foil 29 Two Space Filling Curves

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 30 Morton Curve split up into 8 processors represented by different gray levels

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 31 Space Filling Curve chopped up into equal length parts

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 32 Parallel Algorithm in Fast Multipole I

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Parallelism is clear -- update particles in parallel but what about communication?
It requires a thoughtful algorithm but can efficiently fetch information needed.
  • Fetch all the information needed by ANY particle in a given processor as multipole approach implies that particles tend to need same long range components if nearby
  • call this "locally essential" data
  • cf. O(N) multipole-multipole approach
Effectively top of tree replicated in all processors and tey just differ near bottom which is expanded in a way that depends on where processor is

HTML version of Scripted Foils prepared 27 October 1997

Foil 33 Locally essential Data for Processor in Bottom Left Hand Corner of Processor Array

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

HTML version of Scripted Foils prepared 27 October 1997

Foil 34 Parallel Algorithm in Fast Multipole II

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Having fetched all needed data, can update all particles in processors with high efficiency seen in O(N2) problem.
  • Namely communicated data is re-used many times as needed by many particles in processor
  • In practice have some 10,000 or more particles per processor
In astrophysics, also need to construct tree and rebalance load at essentially each time step
  • not needed in GEM
There is a straightforward parallel tree building algorithm
These methods outperform O(N2) when more than a few thousand particles (unless very inhomogeneous)

HTML version of Scripted Foils prepared 27 October 1997

Foil 35 Scaling Ideas in GEM

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
Note that tree codes are a simple example of general need in GEM to have technologies that bridge different physical scales
  • This is well known for multigrid approach to elliptic partial differential equation solvers
Information travels in O(1 or log K) steps and not K steps for a grid of size K

HTML version of Scripted Foils prepared 27 October 1997

Foil 36 Lessons from Other Fields

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
If we look at
Computational Fluid Dynamics
Global Climate or Weather Simulations
Flow in Porous Media
Study of Phase Transitions
We see as in GEM, uncertainties in some of critical basic physics but these fields have made clear progress even with a mix of phenomenology (turbulence models) and fundamental equations (e.g. Navier Stokes equations)

HTML version of Scripted Foils prepared 27 October 1997

Foil 37 General Web based Middle Tier Server Architecture

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
We have a set of Services hosted by Web Servers which form the middleware and accessed by clients
Groups of clients (electronic societies) are linked by Java server based collaboration systems such as TANGO or Habanero
Access
Resources
Store
Multimedia Information
Collaboration Server
File Systems
and/or Database
Object Broker
Database
Simulation
e.g. NEOS
Netsolve
Computer
Person2
Shared
WhiteBoard
Shared Client Appl
Person1
General User

HTML version of Scripted Foils prepared 27 October 1997

Foil 38 High Functionality Software Layer

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index
This is "middleware" which is implemented in simplest form as a network of Java Servers
  • Web based distributed or Metacomputing based on linking mobile Web modules
  • Collaboration as in Habanero or Tango
  • Support services including databases, object brokers, instruments, MPP's etc.
Access
Resources
Store
Multimedia Information
Collaboration Server
File Systems
and/or Database
Object Broker
Database
Simulation (Network-enabled
servers such as NEOS, Netsolve)
Sequential
or Parallel
Computer

HTML version of Scripted Foils prepared 27 October 1997

Foil 39 Diagram of HPCORBA Architecture

From Master Set of Foils for GEM Computational Science Presentation GEM WorkShop Santa Fe -- 24-25 October 97. *
Full HTML Index

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Mon Oct 27 1997