The compiler must recognize the presence of collective communication
patterns in the computations in order to generate the appropriate
communication calls. Specifically, this involves a number of tests
on the relationship among subscripts of various arrays in a statement.
These tests should also include information about array alignments and
distributions. We use pattern matching techniques similar to those
proposed by Marina Chen [37] and also used by Gupta [41].
However, we also need to perform transformation of array indices in order
to account for different alignments and distributions.
Further, we extend the above tests to include unstructured communication.
We also note that we do not expand scalars because scalars are replicated
accross processors.
We consider the following statement to illustrate the steps involved
in communication detection.
forall(i1=l1:u1:s1, i2= ..., ...)
LHS(,
,...,
) = RHS1(
,
,...,
) + ...
where and
,
,
are functions of
index variables or indirection arrays.
The steps involved in determining a communication pattern are summarized
in Algorithm 1.
The algorithm uses two different tables. Table detects
structured communication primitives based on the relationship
between the
and
array subscript reference pattern
for BLOCK distribution.
Table
detects
unstructured communication primitives.
Tables
and
use variables
to represent
: compile time constant,
: scalar,
: invertible function,
: an indirection array.
The algorithm first attempts to detect structured communication
if the arrays are aligned to the same template. For each array on the ,
the following processing is performed. For each subscript of the array,
it is coupled with the corresponding subscript on the
array such that
both subscripts are aligned with the same dimension of the template.
For each such pair, the algorithm attempts to find a structured communication
pattern in that dimension according to Table
.
If a structured communication pattern is found then the subscript on
the
is is tagged, indicating the appropriate communication
primitive.
If any distributed dimension of an array on the is left
untagged then the array is marked with one of
the unstructured communication primitives
(the third column of Table
) depending on
the reference pattern given in the second column of Table
.
The algorithm tags the array as postcomp_write
or scatter according to the reference pattern
given in Table
if one or more of the distributed
dimension's subscript is in non-canonical form , or is vector-valued or is
unknown. Note that any pattern that can not be classified according
to Tables
or
, is marked as
unknown (such as when a subscript involves more than one forall index, e.g
) so that scatter and gather can be used to parallelize
any forall statement.
Finally, the algorithm marks the distributed RHS array with the concatenation primitive if the LHS array is replicated.