Given by Geoffrey C. Fox at Computational Science CPS615 on Spring 2000 Semester. Foils prepared February 25 2000
Outside Index
Summary of Material
This uses the simple O(N2) Particle Dynamics Problem as a motivator to discuss solution of ordinary differential equations |
We discuss Euler, Runge Kutta and predictor corrector methods |
Various Message parallel O(N2) algorithms are described with performance comments |
There is a related data parallel module sharing the same initial foils and another module with a discussion of advanced methods of this type (tree algorithms for Green's function solutions) for Earthquake science |
Outside Index Summary of Material
Spring Semester 2000 |
Geoffrey Fox |
Northeast Parallel Architectures Center |
Syracuse University |
111 College Place |
Syracuse NY |
gcf@npac.syr.edu |
gcf@cs.fsu.edu |
This uses the simple O(N2) Particle Dynamics Problem as a motivator to discuss solution of ordinary differential equations |
We discuss Euler, Runge Kutta and predictor corrector methods |
Various Message parallel O(N2) algorithms are described with performance comments |
There is a related data parallel module sharing the same initial foils and another module with a discussion of advanced methods of this type (tree algorithms for Green's function solutions) for Earthquake science |
Whenever f satisfies certain smoothness conditions, there is always a sufficiently small step size h such that the difference between the real function value Yi+1 at ti+1 and the approximation Xi+1 is less than some required error magnitude ?. [e.g. Burden and Faires] |
Euler's method: very quick as one computation of the derivative function f at each step. |
Other methods require more computation per step size h but can produce the specified error ? with fewer steps as can use a larger value of h. Thus Euler's method is not best. |
Mathematical (Numerical) formulations of simulations fall into a few key classes which have their own distinctive algorithmic and parallelism issues |
Most common formalism is that of a field theory where quantities of interest are represented by densities defined over a 1,2,3 or 4 dimensional space.
|
Another set of models of physical systems represent them as coupled set of discrete entities evolving over time
|
Within coupled discrete system class, one has two important approaches
|
Particles are sets of entities -- sometimes fixed (atoms in a crystal) or sometimes moving (galaxies in a universe) |
They are characterized by force Fij on particle i due to particle j |
Forces are characterized by their range r: Fij(xi,xj) is zero if distance |xi-xj| greater than r |
Examples:
|
Laws of Motion are typically ordinary differential equations
|
If the range r is small (as in a crystal), the one gets numerical formulations and parallel computing considerations similar to those in Laplace example with local communication
|
If r is infinite ( no cut-off for force) as in gravitational problem, one finds rather different issues which we will discuss in this module |
There are several "non-particle" problems discussed later that reduce to long range force problem characterized by every entity interacting with every other entity
|
An electrical or electronic network has the same structure as a particle problem where "particles" are components (transistor, resistance, inductance etc.) and "force" between components i and j is nonzero if and only if i and j are linked in the circuit
|
If one is simulating something like a chip, then time stepped approach is very wasteful as 99.99% of the components are doing nothing (i.e. remain in same state) at any given time step!
|
Discrete Event Simulations are clearly preferable on sequential machines but parallel algorithms are hard due to need for dynamic load balancing (events are dynamic and not uniform throughout system) and synchronization (which events can be executed in parallel?) |
There are several important approaches to DES of which best known is Time Warp method originally proposed by David Jefferson -- here one optimistically executes events in parallel and rolls back to an earlier state if this is found to be inconsistent |
Conservative methods (only execute those events you are certain cannot be impacted by earlier events) have little paralellism
|
DES do not exhibit the classic loosely synchronous compute-communicate structure as there is no uniform global time
|
Suppose we try to execute in parallel events E1 and E2 at times t1 and t2 with t1 > t2. |
We show the timelines of several(4) objects in the system and our two events E1 and E2 |
If E1 generates no interfering events or one E*12 at a time greater than t2 then our parallel execution of E2 is consistent |
However if E1 generates E12 before t2 then execution of E2 has to be rolled back and E12 should be executed first |
Objects |
in System |
Time |
Especially in cases where the "force" is linear in the ?i(t) , it is convenient to think of force being specified by a matrix M whose elements mij are nonzero if and only if the force between i and j is nonzero. A typical force law is: Fi = ? mij ?i(t) |
In Laplace example, the matrix M is sparse ( most elements are zero) and this is a specially common case where one can and needs to develop efficient algorithms |
We will return to the matrix formulation in the case of partial differential solvers |
Another way of looking at these problems is as graphs G where the nodes of the graphs are labeled by the particles i, and one has edges linking i to j if and only if the force Fij is non zero |
In these languages, long range force problems correspond to dense matrix M (all elements nonzero) and fully connected graphs G |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
In ordinary differential equations, one focuses on different ways of represent d?i(t)/dt in some difference form such as Euler's form d?i(t)/dt = (?i(t + ?t) - ?i(t)) / ?t |
Or more accurate Runge-Kutta and other formulae |
The same time discretization methods are applicable to (time derivatives in) partial differential equations involving time and are used to approximate ??i(x,t) / ?t |
Let X(ti) be numerical solution and Y(ti) the exact solution. We have Taylor Expansion |
And by assumption X(ti) = Y(ti) and in Euler's method we use exact derivative at ti. Thus And so X(ti+h) = Y(ti+h) + O(h2) and so local error is O(h2) |
Accumulating this error over 1/h steps gives a global error of order h |
Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is
|
The shared memory approach is effective for a modest number of processors in both algorithms.
|
Message Parallel approach gives you very efficient algorithms in both O(N2) and O(NlogN)
|
The CSEP Book http://csep.hpcc.nectec.or.th/CSEP/ has lots of useful material. Consider simple example from there: |
With the trivial and uninteresting (as step size too large) choice h=0.25, we get |
And so |
Euler's method involves a simple linear extrapolation to proceed from point to point with the ODE providing the slope |
Grid ti = t0 + h*i and X0 = Initial Value |
Xi+1 = Xi + h * f(ti,Xi) |
In practice not used as not accurate enough |
To extrapolate from ti to ti+1 one should obviously best use (if just a straight line) the average slope (of tangent) and NOT the value at start |
This immediately gives an error smaller by factor h but isn't trivial because you don't seem to know f(ti+0.5*h,X(ti+0.5*h)) |
In the Runge Kutta methods, one uses intermediate values to calculate such midpoint derivatives |
Key idea is that use an approximation for X(ti+0.5*h) as this is an argument of f which is multiplied by h. Thus error is (at least) one factor of h better than approximation |
So if one wishes just to do one factor of h better than Euler, one can use Euler to estimate value of X(ti+0.5*h) |
Midpoint Method is ?1 = f(ti,X(ti)) ?2 = f(ti+0.5*h,X(ti+0.5*h)) Xi+1 = Xi + h * ?2 |
ti |
ti+1 |
Xi = Yi |
Exact Yi+1 |
?2 Approximate Tangent at Midpoint |
Approximate Tangent at Midpoint through ti |
?1 Tangent at ti |
Midpoint Approximation Xi+1 |
Global Error is O(h2)) Second Order Method |
We discussed in case of Jacobi example, that in many problems there is an elegant formula fcomm = constant . tcomm/(n1/d tfloat) |
d is system information dimension which is equal to geometric dimension in problems like Jacobi Iteration where communication is a surface and calculation a volume effect
|