Alignment of data arrays to templates is specified by the ALIGN directives. In this section, we describe how the ALIGN directive is processed.
Alignment determines which portions of two or more arrays will be in the same processor for a particular data partitioning. Clearly, if arrays involved in the same computation are aligned so that after distribution their respective sections lie on the same processors, then the number of non-local accesses would be reduced.
Alignment is a relation that specifies a one-to-one correspondence
between elements of a pair of array objects.
The template is defined by a DECOMPOSITION directive with its shape and rank given. Let be an
-dimensional array
and
be an
-dimensional template.
The general form of an alignment directive is:
!F90D$ ALIGN A([*], ... ,
[*]) WITH TEMPL(
[*], ... ,
[*]).
The exhibited elements of are aligned to those of
.
The template is eventually distributed on a set of processors.
The compiler guarantees that the array elements aligned to
the same element of the template will be mapped to the same processor.
The alignment function is required to be a linear function
or
.
The parameters
,
, and
correspond to the three components of the alignment function: axis, stride, and offset.
Misalignment in the axis or stride components causes
irregular communication, and misalignment in the offset
component causes nearest-neighbor communication [46].
Algorithm 1 gives the steps in the algorithm used by Fortran 90D/HPF to process align directives. Algorithm 1 takes Fortran 90D/HPF syntax tree with arbitrary alignment functions, transforms them to perfect alignment functions. That is to transform array indices from the array index domain to template index domain. The following example illustrates the steps and all the transformations performed to transform by Algorithm 1.
Consider the Fortran 90D/HPF code fragment shown in Figure .
There are three arrays ODD(N/2), EVEN(N/2) and NUM(N).
Elements of the array ODD are aligned with odd elements of TEMPL.
Similarly, elements of the array EVEN are aligned
with the even elements of TEMPL.
NUM is aligned identically with TEMPL which is called perfect alignment.
Hence, ODD and EVEN are aligned
with odd and even indices of NUM respectively, because they are aligned to the
same template.
Step 1. Extend aligned arrays to match template size.
Note that it is required that the array size is equal to or smaller
than the template size in the distributed dimension(s).
If an array size is smaller than the template size in the distributed
dimension, the compiler extends
the array size to match the template size.
For example,
ODD and EVEN arrays are extended to size to match
the template TEMPL's size, which is
.
This is a limitation of our compiler.
Step 2. Apply alignment functions to the aligned arrays.
In this step, all indices of each occurrence of an array, all the statements
in the input program are transformed into the template index domain
using the alignment function (I).
Arrays ODD, EVEN and NUM are associated with the
,
,
functions respectively.
Figure
illustrates this transformation on the array ODD.
For example, the first forall assignment statement in Figure
:
NUM(I)=ODD((I+1)/2)
is transformed into
NUM(I)=ODD(2*((I+1)/2)-1) (1)
by applying function (identical function) and
to lhs and rhs respectively.
Step 3. Transform into canonical form.
In this step, the compiler simplifies all functions applied in step 3
by performing
symbolic manipulation and partial evaluation of constants.
For example, the statement (1) becomes:
NUM(I)=ODD(I).
The above simplification of indices helps the compiler to choose efficient
collective communication routines.
Our communication detection algorithm [50][37] is based on
symbolically comparing the lhs and rhs reference patterns
and determining if the pattern is associated with one of the predefined
collective communication routines.
In the above statement the compiler compares the lhs
and rhs indices and determines that no communication is required
because both the array reference patterns are given by and aligned
to the same template. However, if the rhs was ODD(I+2),
the compiler recognizes the operation as a shift communication.
Step 4. Compute .
For each array, we compute the inverse alignment function
corresponding to each
.
is stored in a Distributed Array Descriptor (DAD) [51].
This function is needed when any computation needs to be performed using the original index of an array.
For example,
the last statement in Figure
calls the intrinsic function MAXLOC to find the location of the maximum
element in the array ODD. This function must be evaluated using the original array indices.
The inverse function for array ODD is
.
MAXLOC returns the location of maximum value in the original array index domain
by applying the
function.
Figure shows the compiler generated Fortran 77+MP code
for the Fortran 90D/HPF code given in Figure
.
We emphasize that the transformation shown in Figure
from the array index domain to the template index domain
has two advantages.
1-) This allows the compiler to easily detect regular collective communication patterns among arrays aligned to the same template.
2-) The compiler keeps data distribution functions only for the template and not for all the arrays aligned to the template.