1 |
Consider N 3NAS grid decomposed onto P processors
|
2 |
We must start each ADI solve at the beginning i,j,k=1 or the end i,j,k= NNAS
|
3 |
For each sweep (where elements not all stored in same processor) start half the solves at the beginning and half at the end
-
in each case work towards the middle
|
4 |
After at most NNAS/2 steps, we will have "made it" to the middle on the initial solve and all processors will be active.
|
5 |
This algorithm is sensitive to latency as in racing to get middle fast(est), we have only performed part of one solve in each processor.
-
We communicate only one (5 component) element
|
6 |
Maximizing message size by partial solution of several solves before communication:
-
Decreases communication cost as fewer messages
-
But worsens load balance as delays start of processors near middle
|