16 by 16 Wavefront Parallel Gauss Seidel
This is best approach to parallelizing Gauss Seidel for the natural ordering along rows -- first in x, the y (and then z in 3D)
4 Processors with cyclic row decomposition
- Processor 1 rows 1 5 9 13
- Processor 4 rows 4 8 12 16 etc.
Consider points on lines parallel to diagonal
- 31 such lines
- Execute each phase consecutively starting at bottom left
All points in a given phase can be updated in parallel as one only needs updated (iteration k) points from previous phase -- need iteration (k-1) from next phase
Load Imbalance but at worst processors differ by 1 in load and effect goes away for large systems
- This example has speed up of 3.4 on 4 processors
- 8 phases 1 unit, 8 phases 2 units, 8 phases 3 units, 7 phases 4 units of work