This type of topology will almost always require the use of
the CYCLIC(N) directive in the data distribution. Because
of the symmetry inherent in this type of application and the
fact that some compilers do not compile or perform well when
using the (CYCLIC(N),CYCLIC(M)) distribution (a
cyclic distribution over both dimensions), the use of the
(*,CYCLIC(N)) technique is recommended. Further, the size
of column blocks N to be used should chosen such that the edge over
area ratio x is in the range . The size of
the blocks in this case is such that load balancing in nearly perfect
but the edge over area ratio is not unnecessarily large.
The experiment bares this out, as the best distribution
(*,CYCLIC(20)) has an edge over area ratio of 0.12. It is
important to note that depending on the number of processors used,
this ratio may correspond to a (*,BLOCK) distribution. If
a (*,BLOCK) distribution yields a larger ratio than this to
start with, the problem size is probably not large enough to justify
that many processors in the first place.