1 |
We need a criterion for deciding where to compute Fi,j so that for EACH j block, there is an equal of computation in each processor
|
2 |
If the criterion is calculate in Home of i if i<j (natural sequential choice), then this is NOT load balanced for a one dimensional block decomposition as amount of work decreases as i increases and one gets to later numbered processors
|
3 |
However one can use a cyclic or scattered decomposition (and "interaction criterion" i<j ) as then each processor has particles distributed throughout array
-
This is typical of load balancing triangular matrix algorithms such as Gaussian elimination
|
4 |
Note that this "best" algorithm halves computation and "doubles" communication as one is transferring twice as much information (7 not 4 items for each j)
|