The following examples illustrate how the data distribution function can be used for various constructs.
For these examples, the array has the following alignment.
C$ DECOMPOSITION TEMPL(N,M)
C$ ALIGN A(I,J) WITH TEMPL(I,J)
C$ DISTRIBUTE TEMPL(CYCLIC,BLOCK)
and TEMPL is distributed on a two-dimensional PxQ processor grid.
Example 1 (Masking) Consider the statement:
A(5,8)=99.0
The owner processor of the array element executes
the statement. Since the compiler generates SPMD style code, it masks the rest of the processors:
if( 5 mod P .eq. my_id(1) .and. 8*Q/M .eq. my_id(2))
A(5/P, 8-my_id(2)*M/Q) = 99.0
Where my_id(1) and my_id(2) describes the processor's position in the two dimensional logical grid. In this case, the compiler uses the global to processor and global to local functions for cyclic and block distributions. The processors are masked according to the coordinate id numbers since the logical processors are arranged in a grid topology.
Example 2 (Grouping) Consider the statement:
A(:,8)=99.0
Only, the group of processors owning the column of array A need to
execute this statement. The rest of the grid must be masked.
do i=my_id(1),N,P
if(8*Q/M .eq. my_id(2)) A(i/P, 8-my_id(2)*M/Q) = 99.0
end do
Note that the iterations (indexed by above) are distributed cyclicly
following the owner computes rule.
Example 3 (Forall) Consider the statement:
forall(i=1:N,j=1:M) A(i,j)=j
In the above computations all elements of each column of array A are assigned the corresponding column number (in the global index domain).
do i=my_id(1),N,P
do j=1,M/Q
A(i/P,j)=j+my_id(2)*M/P
end do
end do
The compiler
distributes the iterations and
in cyclic and block fashion
respectively since array A
is distributed in that fashion. Iteration index
is localized.
The compiler transforms
back to a global index using local to global
index conversion in the rhs expression.
Example 4 (Broadcast) Consider the statement:
x=A(5,8)
where is a scalar variable (scalars are replicated on all processors).
The above
statement causes a broadcast communication. The source processor
of the broadcast is found using a global-to- processor function similar to
that in Example 1.
Example 5 (Gather) Consider the statement:
B=A(U,V)
where and
are one-dimensional replicated arrays.
is a two-dimensional array and is distributed in the same way as is array
.
This vector-valued assignment causes an unstructured communication (also
called gather[28] in this case). The owner processors
of array
may need some values of array
, depending on the contents of arrays
and
at run-time. The compiler makes each owner processor of array
calculate which processor has the non-local part of array
using global to processor function. The compiler also generates code that computes the local index the array
using the global to local index conversion function for each source processor.
After making each processor calculate the local list and the processor list,
the compiler generates a statement to the call gather collective communication.
Example 6 (Scatter) Consider the statement:
A(U,V)=B
The above statement causes scatter communications.
Again the compiler generates code such that each
owner processor of the array uses data distribution functions to find
the destination of the local array
.