BLOCK is good for local (e.g. nearest-neighbor) communication
-
Look for subscripts like a(i+1,j-1)
-
Look for intrinsics like CSHIFT
-
Warning: BLOCK depends on array size
|
CYCLIC has non-obvious locality
-
Look for subscripts like x(i+jmp), where jmp is a multiple of the number of processors
-
For example, jmp may be a power of 2 on a hypercube machine
-
CYCLIC always balances the memory load
|
CYCLIC(K) has some of the advantage of each
|
Strides are expensive on any distribution
-
But some special cases are worth recognizing
|
Broadcasts are equally expensive on any distribution
|
Communication between different distributions is very expensive
|