Similarily we will not dimension PHI(4,4) but rather PHI(6,6) to hold communicated values |
Now we preload PHI(1, . ), PHI ( . , 1), PHI(6, . ), PHI( . , 6) by communication and then basic "parallel" program update is identical to sequential case and one loops over x(I), y(J) with I and J in range 2 to 5 |
As "load imbalance" - not all processors have the same number of internal points, some differences are seen for "edge" processors with only 9 or 12 internal points |