Full HTML for

Scripted foilset CPS615-End of N-Body Discussion and Beginning of Numerical Integration

Given by Geoffrey C. Fox at Delivered Lectures of CPS615 Basic Simulation Track for Computational Science on 15 October 96. Foils prepared 12 November 1996
Outside Index Summary of Material


This finishes the last part of N body and ODE discussion fociussing on pipeline data parallel algorithm
Note several foils were changed after presentation and so discussion is a little disconnected from foils at times
We start Numerical Integration with a basic discussion of Newton-Cotes formulae (including Trapezoidal and Simpson's rule)
We illustrate them pictorially

Table of Contents for full HTML of CPS615-End of N-Body Discussion and Beginning of Numerical Integration

Denote Foils where Image Critical
Denote Foils where HTML is sufficient
Denote Foils where Image is not available
Indicates Available audio which is lightpurpleed out if missing
1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

2 Abstract of Oct 15 1996 CPS615 Lecture
3 Simple Data Parallel Version of N Body Force Computation -- Grav -- I
4 The Grav Function in Data Parallel Algorithm - II
5 Some Inefficiencies of the N2 Algorithm - I
6 Some Inefficiencies of the N2 Algorithm - II
7 Better Pipeline Algorithm for Computation of Accelerations,
taking 1/2 the iterations of force computation - I

8 Pipeline Algorithm in detail
9 Basic pipeline operation
10 Examples of Pipeline Algorithm
11 Pipeline Algorithm Grav -- Part I
12 Pipeline Algorithm for Grav -- Part II
13 Grav Pipeline Algorithm, concluded
14 Parallel Implementation - I
15 Parallel Execution Time -I
16 Parallel Execution Time -II
17 N-body Problem is a one dimensional Algorithm
18 1:Introduction to Numerical Integration
19 2:NI.1: Newton-Cotes Formulae
20 3:Linear Equations for Coefficients
21 4:Examples of Newton Cotes Rules I
22 5:Examples of Newton Cotes Rules II
23 6:Formulae for Examples: Trapezoidal and Simpson's Rules
24 7:Summary of Newton Cotes Rules
25 8:Errors in Newton Cotes Formulae
26 9:Use of High Order Newton Cotes

Outside Index Summary of Material



HTML version of Scripted Foils prepared 12 November 1996

Foil 1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of October 15 - 1996

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Scripted Foils prepared 12 November 1996

Foil 2 Abstract of Oct 15 1996 CPS615 Lecture

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index
This finishes the last part of N body and ODE discussion fociussing on pipeline data parallel algorithm
Note several foils were changed after presentation and so discussion is a little disconnected from foils at times
We start Numerical Integration with a basic discussion of Newton-Cotes formulae (including Trapezoidal and Simpson's rule)
We illustrate them pictorially

HTML version of Scripted Foils prepared 12 November 1996

Foil 3 Simple Data Parallel Version of N Body Force Computation -- Grav -- I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 118
function Grav(X,M)
C accepts positions of particles X and masses of particles M
C returns accelerations in Grav
C Uses completely parallel calculation, ignoring anti-symmetry of force
  • integer, parameter :: N =number of particles
  • real, parameter :: G = gravitational constant
  • real array (1:3, 1:N) :: X,M,Grav
! calculates acceleration on body i due to body j in entries ( :,i,j )
  • real array (1:3, 1:N,1:N) :: A, Xi, Xj, Ms, D, R
  • logical, array (1:3, 1:N,1:N) ::diag
  • integer i,j,k
! diag is true for diagonal of N by N slices
  • forall ( k=1:3, i=1:N, j=1:N ) diag = (i.eq.j)

HTML version of Scripted Foils prepared 12 November 1996

Foil 4 The Grav Function in Data Parallel Algorithm - II

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 110.8
! set up arrays of particles Xi and particles Xj
  • Xi = spread (X, dim =3, ncopies = N )
  • Xj = spread (X, dim = 2, ncopies = N )
  • Ms = spread (M, dim=2, ncopies = N )
! displacements and Euclidean distance
  • D = Xj - Xi
  • R = sqrt (spread ( sum ( D*D, dim=1), dim =1, ncopies=3)))
! calculate accelerations for all pairs except on main diagonal
  • where (diag)
    • A=0.0
  • elsewhere
    • A=G*Ms*D / R**3
  • end where
  • Grav = sum (A, dim=3)
end function Grav

HTML version of Scripted Foils prepared 12 November 1996

Foil 5 Some Inefficiencies of the N2 Algorithm - I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 54.7
Symmetry of force on particles: Fij = -Fji (Newton's Law of Action and Reaction!)
  • only half need to be computed so should use triangular arrays
  • i.e. just do loops with sum over particles i, sum over particles j <= i and then calculate algebraic form of Fij and then accumulate
  • Force on i increments by Fij
  • Force on j decrements by Fij
There is a Load balancing problem with triangular arrays
Assuming for example that processors assigned with block distribution in column direction.
  • To calculate the force between 2 particles
  • will take N/Nproc iterations in the
  • longest running processor
  • i.e. you don't get factor of two back!

HTML version of Scripted Foils prepared 12 November 1996

Foil 6 Some Inefficiencies of the N2 Algorithm - II

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 115.2
Also, all particle information is sent to all processors, taking O(N2) space whereas natural algorithms use O(N) space and this is how special purpose machines like GRAPE get their cost effectiveness
  • If N is one million as for globular cluster problem, there is a big difference between DRAM cost of 106 and 1012 units of memory
Space is further wasted as everything is spread to 3 dimensional arrays even when arrays like mass are naturally one dimensional!

HTML version of Scripted Foils prepared 12 November 1996

Foil 7 Better Pipeline Algorithm for Computation of Accelerations,
taking 1/2 the iterations of force computation - I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 115.2
Pair together the data for every particle Xi with the data for every particle Xj by iterating over a pipeline (circulating) array.

HTML version of Scripted Foils prepared 12 November 1996

Foil 8 Pipeline Algorithm in detail

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 198.7
Case when i compared to i is not needed as particles dont interact with themselves
At step k, interact particle i with particle j= 1 + mod((i+k-1),N)
Accumulate force on i due to j in fixed Ai
Accumulate negative of this as force on j due to i in circulating Acj
At the end of the algorithm, add Ai and Aci
In parallel version, Note that Ai will be calculated in "home processor for particle i but Aci will travel around the machine being accumulated in processor holding particle j
Thus this violates the owner computes rule and so this parallel algorithm must be implemented by hand -- the compiler will not find it automatically

HTML version of Scripted Foils prepared 12 November 1996

Foil 9 Basic pipeline operation

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 187.2
First step is to Circulate(shift) one position and calculate acelerations Fij and Fji in all index positions
Shifting pipeline (N-1) times gives correct algorithm but does not save "Newton's factor of two".
Just need (N-1)/2 steps when N is odd and N/2 steps when N is even which saves factor of two.

HTML version of Scripted Foils prepared 12 November 1996

Foil 10 Examples of Pipeline Algorithm

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 116.6

HTML version of Scripted Foils prepared 12 November 1996

Foil 11 Pipeline Algorithm Grav -- Part I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 84.9
function Grav(X,M)
C accepts positions of particles X and masses of particles M
C returns accelerations in Grav
  • integer, parameter :: N =number of particles
  • real, parameter :: G = gravitational constant
  • real, dimension (1:3, 1:N) :: X,M,Grav
  • real, dimension (1:3,1:N) :: A, Xc, Mc, Ac, D, R
  • integer k
! A is fixed accelerations - X and M are used for fixed positions and masses
  • A=0.0
  • Ac=0.0 ! circulating accelerations
  • Xc=X ! circulating positions
  • M=G*M ! precalculate mass * gravitational constant
  • Mc=M ! circulating masses

HTML version of Scripted Foils prepared 12 November 1996

Foil 12 Pipeline Algorithm for Grav -- Part II

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 38.8
!Shift Circulating Arrays Xc Mc Ac to the right
  • Xc = cshift (Xc, dim=2, shift = -1)
  • Mc = cshift (Mc, dim=2, shift = -1)
  • Ac = cshift (Ac, dim=2, shift = -1)
! calculate R to be distance over 3-D cordinates
  • D= Xc-X
  • R = sqrt (spread (sum (D*D, dim=1), dim=1, ncopies=3)))
  • D = D/R**3
  • A = A + Mc*D ! fixed acceleration
  • Ac = Ac - M*D ! circulating acceleration
  • end do

HTML version of Scripted Foils prepared 12 November 1996

Foil 13 Grav Pipeline Algorithm, concluded

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 123.8
if (( N mod 2) = 0 ) then ! final one way acceleration if N even
  • Xc = cshift (Xc, dim=2, shift = -1)
  • Mc = cshift (Mc, dim=2, shift = -1)
  • Ac = cshift (Ac, dim=2, shift = -1)
  • D = Xc-X
  • R = sqrt (spread (sum (D*D, dim=1), dim=1, ncopies=3)))
  • D = D/R**3
  • A = A + Mc*D
end if
! combine accelerations for final result - circulating particle in i'th
! position corresponds to fixed particle (i-(N-1)/2)
Grav = A + cshift (Ac, dim=2, shift = (N-1)/2)
end function Grav

HTML version of Scripted Foils prepared 12 November 1996

Foil 14 Parallel Implementation - I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 279.3
Distribute arrays in naive block fashion - if Nproc is the number of processors, each processor has N/Nproc particles.

HTML version of Scripted Foils prepared 12 November 1996

Foil 15 Parallel Execution Time -I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 79.2
Consider time for Runge Kutta invocation of function Grav
Shifting particles communicates one set of particle information - all processors communicate at the same time giving estimate:
9 * tcomm (factor should be 7 as need only 1 not 3 masses as we used in simple implementation earlier)
  • we are ignoring latency which in practice means best implementation transfers several (not N-1 as in naive data parallel algorithm) particles at a time
Floating point calculations: roughly 3(x,y,z) of -, *, sum, sqrt, exp, /, *, +, *, + which can be summarized as estimate: > 30 tfloat
Each communicated particle is interacted with the N/Nproc particles in the local partition of that processor and each step has one shift giving a total time for (N-1)/2 steps in Grav:

HTML version of Scripted Foils prepared 12 November 1996

Foil 16 Parallel Execution Time -II

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 138.2
Then the total time for the Runge-Kutta solver is:
Giving O(N2/Nproc) running time in the number of particles.
Note that parallel overhead or communication time/computation time is proportional to 1/n where n = N/Nproc is grain size.
Note this algorithm has an overhead characteristic of a one dimensional problem for
  • overhead goes like (1/n)1/d in d dimensions -- which is edge over area -- giving 1/n for d=1

HTML version of Scripted Foils prepared 12 November 1996

Foil 17 N-body Problem is a one dimensional Algorithm

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 313.9
Also we used a one-dimension parallel decomposition
Note these one dimensional characteristics are independent of dimension of space that particles move in.
Normally d is thought of as dimension of physical space in which problem posed.
But the geometrical interpretation of d is only valid where interaction between particles is itself geometrical i.e. short range
Rather N body algorithm has no geometric structure and you see d=1 characteristic of algorithm
See Chapter 3 of Parallel Computing Works for a longer discussion of this from a "Complex Systems" point of view
Note the simple N-body problem has some interesting features
  • Very low communication/compute ratios
  • Best parallel methods violate owner computes rule -- also true in recent molecular dynamics problems with shorter range force
  • Unexpected dimension

HTML version of Scripted Foils prepared 12 November 1996

Foil 18 1:Introduction to Numerical Integration

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 267.8
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 19 2:NI.1: Newton-Cotes Formulae

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 240.4
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 20 3:Linear Equations for Coefficients

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 218.8
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 21 4:Examples of Newton Cotes Rules I

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 123.8
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 22 5:Examples of Newton Cotes Rules II

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 224.6
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 23 6:Formulae for Examples: Trapezoidal and Simpson's Rules

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 90.7
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 24 7:Summary of Newton Cotes Rules

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 252
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 25 8:Errors in Newton Cotes Formulae

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 339.8
See Original Foil

HTML version of Scripted Foils prepared 12 November 1996

Foil 26 9:Use of High Order Newton Cotes

From CPS615-End of N-Body Discussion and Beginning of Numerical Integration Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 15 October 96. *
Full HTML Index Secs 279.3
See Original Foil

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Fri Aug 15 1997