Given by Geoffrey C. Fox at CPS615 Spring Semester 00 on March 00. Foils prepared March 15 00
Outside Index
Summary of Material
This Introduces the three fundamental types of PDE's -- Elliptic, Parabolic and Hyperbolic and studies the numerical solution of Elliptic Equations |
The sparse matrix formulation is used and iterative approaches -- Jacobi, Gauss Seidel and SOR are defined |
These are motivated by analogies between equilibrium of diffusive equations and elliptic systems |
Parallel Computing is Discussed for Gauss Seidel |
Eigenvalue analysis is used to discuss convergence of methods |
We discuss Multigrid methods at a simple level |
Outside Index Summary of Material
Spring Semester 2000 |
Geoffrey Fox |
Northeast Parallel Architectures Center |
Syracuse University |
111 College Place |
Syracuse NY |
gcf@npac.syr.edu |
gcf@cs.fsu.edu |
This Introduces the three fundamental types of PDE's -- Elliptic, Parabolic and Hyperbolic and studies the numerical solution of Elliptic Equations |
The sparse matrix formulation is used and iterative approaches -- Jacobi, Gauss Seidel and SOR are defined |
These are motivated by analogies between equilibrium of diffusive equations and elliptic systems |
Parallel Computing is Discussed for Gauss Seidel |
Eigenvalue analysis is used to discuss convergence of methods |
We discuss Multigrid methods at a simple level |
Cauchy Boundary Conditions are ? and ?? / ?n given on a curve C where n is normal direction to curve. This is appropriate for hyperbolic equations |
C |
Region determined by given boundary conditions |
? and ?? / ?n cannot be given on a closed curve as then solutions say from gotten from integrating from XZ1Y up and XZ2Y down will clash |
Dirichlet Boundary Conditions: ? given on closed curve C |
Neumann Boundary Conditions: ?? / ?n given on closed curve C |
General Equation is of form: |
a ? 2? /? x2 + b ? 2? /? x? y + c ? 2? /? y2 = f(x,y,t) |
b2 > 4ac Hyperbolic Equations |
b2 = 4ac Parabolic Equations |
b2 < 4ac Elliptic Equations |
Boundary Equation Type | Curve | Conditions | Example |
Hyperbolic | Open | Cauchy | Wave Equation |
Parabolic | Open | Dirichlet | Diffusion Equation |
or Neumann |
Elliptic | Closed | Dirichlet | Poisson Equation |
or Neumann |
There are many iterative methods which can be applied to solve any matrix equation but are particularly effective in sparse matrices as they directly exploit "zero structure" |
Here we look at three simplest or stationary methods - so called because iteration equation is the same at each iteration |
The Jacobi method is based on solving for every variable locally with respect to the other variables; one iteration of the method corresponds to solving for every variable once. The resulting method is easy to understand and implement, but convergence is slow. |
The Gauss-Seidel method is like the Jacobi method, except that it uses updated values as soon as they are available. In general, it will converge faster than the Jacobi method, though still relatively slowly. |
Successive Overrelaxation (SOR) can be derived from the Gauss-Seidel method by introducing an extrapolation parameter ?. For the optimal choice of ?, SOR converges faster than Gauss-Seidel by an order of magnitude. |
Generic Iteration Strategy is to split matrix
|
Ax = b implies Mx = Nx + b |
So write Mxk = Nx(k-1) + b (*) |
All iteration methods have this form for different choices of M, N for given A and b |
We must ensure that Iteration (*) converges and we naturally prefer that it converges fast! |
One can change A and b by preconditioning |
M1-1 A (M2-1 M2)x = M1-1 b |
is same equation as before for any choice of matrices M1 and M2 |
All these choices are designed to accelerate convergence of iterative methods |
Anew = M1-1 A M2-1 |
xnew = M2 x |
bnew = M1-1 b |
We choose M1 and M2 to make our standard methods perform better |
There are specialized preconditioning ideas and perhaps better general approaches such as multigrid and Incomplete LU (ILU) decomposition |
Anew xnew = bnew has same form as above and we can apply any of the methods that we used on A x = b |
For Jacobi the iteration matrix G = M-1N takes the form D-1(A-D) where D is diagonal part of A |
One can calculate analytically the eigenvalues in this case and find that if one has N gridpoints, largest eigenvalue of G is |
Thus if one requires an error of 10-6, then the number of iterations is given by a dreadfully large number K where |
Slowest Convergence is for a discrepancy (residual) that varies slowly with grid point. |
For this smooth discrepancy to be "noticed" by iterative method, the points along way away need to affect each other. This will take N steps to happen as information moves slowly (one grid point per iteration) in Jacobi |
Then after impact, one needs to actually correct value |
Thus convergence takes N2 iterations |
On the other hand, if residual varies rapidly with position, that will be corrected quickly as only needs a few iterations for information to be communicated |
Thus rapidly varying residual corresponds to lowest eigenvalue |
Jacobi and Gauss Seidel give a formula for xk in terms of x(k-1) call this xbasick |
Overrelaxation forms xSORk = ? xbasick + (1- ?)xbasic(k-1) |
Typically only 0 < ? < 2 is sensible and ? < 1 is relaxation 1 < ? < 2 is over relaxation |
It is "over" because if ? > 1, you go "further" in direction of new estimate than calculated |
In "relaxation", you conservatively, average prediction with old value |
Unfortunately, the best value of ? cannot be calculated except in simple cases and so SOR is not so reliable for general problems |
You can show that for Jacobi, best choice is ? = 1 i.e. that relaxation strategy is not very useful |
For Gauss-Seidel, then in simple one dimensional case with N grid points, one can show that there is an optimal choice ? = 2 ( 1 - ? / N) i.e. almost equals 2 |
Then the number of iterations needed by optimal SOR to get a given error can be shown to be proportional to N not N2 as in Jacobi or Gauss Seidel. So Ratio of Iterations is: |
Jacobi 2 N2/ ?2 |
Gauss-Seidel N2/ ?2 |
SOR on Gauss-Seidel N/ ? |
The remaining foils discuss some details that are not essential |
Details on convergence of Gauss Seidel |
Detailed derivation of Convergence of Jacobi in one dimension |
Discussion of SOR with computation of optimal ? for Jacobi case |
Motivation of Iterative Methods with an artificial time whose discretization gives index k for iteration |
We remarked that key problem with iterative methods is that got detail (short wavelength) correct but that convergence was controlled by coarse (long wavelength) structure |
Then in simple methods one needs of order N2 iterations to get good results |
Ironically one goes to large N to get detail as if all you wanted was coarse structure, a smaller mesh would be fine |
Basic idea in multigrid is key in many areas of science
|
We get coarse structure from small N and fine detail from large N
|
Material taken from http://www.mgnet.org/mgnet/www/mgnet/www/mgnet/www/mgnet/www/ Tutorial by Ulrich Ruede |
Take Laplace's equation in the Unit Square with initial guess as ? = 0 and boundary conditions that are zero except on one side |
For N=31 by 31 Grid it takes around 1000 (N2) iterations to get a reasonable answer |
Boundary Conditions Exact Solution |
1 Iteration |
10 Iterations |
100 Iterations |
1000 Iterations |
Suppose we have a finest level M(1) with N by N points (in 2D) |
Then the k'th coarsest approximation M(k) to this has N/2k by N/2k points |
One way to think about Multigrid is that M(k+1) can form a preconditioner to M(k) so that one can replace natural matrix A(k) by A-1(k+1)A(k)
|
Basic issue is that A(k) and A(k+1) are of different size so we need prolongation and restriction to map solutions at level k and k+1
|
Relax |
Interpolate |
Restrict |
Relax |
Relax |
Relax |
Relax |
In picture, relax is application of standard iteration scheme to "solve" short wavelength solution at a given level i.e. use Jacobi, Gauss-Seidel, Conjugate Gradient |
Interpolation is taking a solution at a coarser grid and interpolating to find a solution at half the grid size |
Restriction is taking solution at given grid and averaging to find solution at coarser grid |
Interpolate |
Restrict |
if level = coarsest then
|
else
|
endif |
endprocedure |
Alevel uexact = f |
uexact = u + v |
Alevel v = r = f - Alevel u |
The parameter mc determines the exact strategy for how one iterates through different meshes |
One ends up with a full cycle as shown |
Full Multigrid Cycle |
Coarsest Grid |
Finest Grid |
Finest |
Coarsest |
Must Exploit special structure of matrix |
Gauss Seidel for "general matrices" cannot be parallelized |
k-1 |
k-1 |
k |
k |
This is best approach to parallelizing Gauss Seidel for the natural ordering along rows -- first in x, the y (and then z in 3D) |
4 Processors with cyclic row decomposition
|
Consider points on lines parallel to diagonal
|
All points in a given phase can be updated in parallel as one only needs updated (iteration k) points from previous phase -- need iteration (k-1) from next phase |
Load Imbalance but at worst processors differ by 1 in load and effect goes away for large systems
|
3 |
5 |
Suppose conventional |
Gauss Seidel ordering |
of update -- |
along rows |
starting at |
bottom left |
All points needed by general point P |
at level k are below dotted line |
parallel to diagonal |
and through P |
Iteration Number k or k-1 |
Phase Number runs 1 to 31 |
31 |
The Pipeline method has high communication costs (as will in fact use "block cyclic" to preserve locality) and is complex to implement well |
Thus instead we note that we can get new versions of Gauss Seidel by reordering update order -- this could (in principle) make for a better or worse method (or more likely we won't know if better or worse!) |
There is a natural reordering which is typically as good if not better for which parallelism is "trivial" |
This ONLY works for nearest neighbor stencil but there are versions of red black for similar stencils |
K = (G2/2) + 1 |
where grid is G by G |
First 4 and Middle 4 labelling of Gauss Seidel Update Order shown |
In red-black, you color every alternate point red and every other point black |
This gives a checkerboard pattern shown on previous foil |
Now label update so first update all red points, then update all black points |
Updating red points to iterate k only requires black points at level k-1 |
Updating black points to level k requires NO black points but just read points at level k |
So can divide parallel update into two phases
|
1 |
K |
2 |
K+1 |
3 |
K+2 |
4 |
K+3 |
Do normal block decomposition |
Parallel Phase I: Update all Red Points
|
Parallel Phase II: Update all Black Points
|
This has similar efficiency analysis to Jacobi except a little more sensitive to latency
|
In electrical power system and similar simulations, one gets irregular sparse matrices and no way to get such a clean parallel algorithm
|
Fortunately very irregular problems like power systems tend not to be huge.
|
In PDE case, one can show for some elliptic problems that red black can be better than original ordering |
For some hyperbolic equations used in computational fluid dynamics, this is not so clear and SOR methods are used |
If one has more complex differencing schemes e.g. fourth order differencing, then red-black does not work but a scheme with more colors (and more update phases) can be devised
|