function Grav(X,M)
|
C accepts positions of particles X and masses of particles M
|
C returns accelerations in Grav
|
C Uses completely parallel calculation, ignoring anti-symmetry of force
-
integer, parameter :: N =number of particles
-
real, parameter :: G = gravitational constant
-
real array (1:3, 1:N) :: X,M,Grav
|
! calculates acceleration on body i due to body j in entries ( :,i,j )
-
real array (1:3, 1:N,1:N) :: A, Xi, Xj, Ms, D, R
-
logical, array (1:3, 1:N,1:N) ::diag
-
integer i,j,k
|
! diag is true for diagonal of N by N slices
-
forall ( k=1:3, i=1:N, j=1:N ) diag = (i.eq.j)
|
! set up arrays of particles Xi and particles Xj
-
Xi = spread (X, dim =3, ncopies = N )
-
Xj = spread (X, dim = 2, ncopies = N )
-
Ms = spread (M, dim=2, ncopies = N )
|
! displacements and Euclidean distance
-
D = Xj - Xi
-
R = sqrt (spread ( sum ( D*D, dim=1), dim =1, ncopies=3)))
|
! calculate accelerations for all pairs except on main diagonal
-
where (diag)
-
elsewhere
-
end where
-
Grav = sum (A, dim=3)
|
end function Grav
|
Case when i compared to i is not needed as particles dont interact with themselves
|
At step k, interact particle i with particle j= 1 + mod((i+k-1),N)
|
Accumulate force on i due to j in fixed Ai
|
Accumulate negative of this as force on j due to i in circulating Acj
|
At the end of the algorithm, add Ai and Aci
|
In parallel version, Note that Ai will be calculated in "home processor for particle i but Aci will travel around the machine being accumulated in processor holding particle j
|
Thus this violates the owner computes rule and so this parallel algorithm must be implemented by hand -- the compiler will not find it automatically
|
function Grav(X,M)
|
C accepts positions of particles X and masses of particles M
|
C returns accelerations in Grav
-
integer, parameter :: N =number of particles
-
real, parameter :: G = gravitational constant
-
real, dimension (1:3, 1:N) :: X,M,Grav
-
real, dimension (1:3,1:N) :: A, Xc, Mc, Ac, D, R
-
integer k
|
! A is fixed accelerations - X and M are used for fixed positions and masses
-
A=0.0
-
Ac=0.0 ! circulating accelerations
-
Xc=X ! circulating positions
-
M=G*M ! precalculate mass * gravitational constant
-
Mc=M ! circulating masses
|
if (( N mod 2) = 0 ) then ! final one way acceleration if N even
-
Xc = cshift (Xc, dim=2, shift = -1)
-
Mc = cshift (Mc, dim=2, shift = -1)
-
Ac = cshift (Ac, dim=2, shift = -1)
-
D = Xc-X
-
R = sqrt (spread (sum (D*D, dim=1), dim=1, ncopies=3)))
-
D = D/R**3
-
A = A + Mc*D
|
end if
|
! combine accelerations for final result - circulating particle in i'th
|
! position corresponds to fixed particle (i-(N-1)/2)
|
Grav = A + cshift (Ac, dim=2, shift = (N-1)/2)
|
end function Grav
|
Consider time for Runge Kutta invocation of function Grav
|
Shifting particles communicates one set of particle information - all processors communicate at the same time giving estimate:
|
9 * tcomm (factor should be 7 as need only 1 not 3 masses as we used in simple implementation earlier)
-
we are ignoring latency which in practice means best implementation transfers several (not N-1 as in naive data parallel algorithm) particles at a time
|
Floating point calculations: roughly 3(x,y,z) of -, *, sum, sqrt, exp, /, *, +, *, + which can be summarized as estimate: > 30 tfloat
|
Each communicated particle is interacted with the N/Nproc particles in the local partition of that processor and each step has one shift giving a total time for (N-1)/2 steps in Grav:
|
Also we used a one-dimension parallel decomposition
|
Note these one dimensional characteristics are independent of dimension of space that particles move in.
|
Normally d is thought of as dimension of physical space in which problem posed.
|
But the geometrical interpretation of d is only valid where interaction between particles is itself geometrical i.e. short range
|
Rather N body algorithm has no geometric structure and you see d=1 characteristic of algorithm
|
See Chapter 3 of Parallel Computing Works for a longer discussion of this from a "Complex Systems" point of view
|
Note the simple N-body problem has some interesting features
-
Very low communication/compute ratios
-
Best parallel methods violate owner computes rule -- also true in recent molecular dynamics problems with shorter range force
-
Unexpected dimension
|