The compiler decomposes the grid array onto processors by equal pieces |
For each interior point, the algorithm must access the immediate neighbors according to a 5 point stencil, which may sometimes involve communication with other processors: |
Then the algorithm can be implemented by code which first communicates the internal boundaries and then loops over the interior points on each processor. |