Next: Grid Mapping Functions Up: Data Distribution (Stage Previous: Distribution functions

Usage of the data distribution functions

The following examples illustrate how the data distribution function can be used for various constructs. For these examples, the array has the following alignment.


         C$ DECOMPOSITION TEMPL(N,M) 
         C$ ALIGN A(I,J) WITH TEMPL(I,J)  
         C$ DISTRIBUTE TEMPL(CYCLIC,BLOCK)

and TEMPL is distributed on a two-dimensional PxQ processor grid.

Example 1 (Masking) Consider the statement:


         A(5,8)=99.0
The owner processor of the array element executes the statement. Since the compiler generates SPMD style code, it masks the rest of the processors:


         if( 5 mod P .eq. my_id(1) .and. 8*Q/M .eq. my_id(2)) 
              A(5/P, 8-my_id(2)*M/Q) = 99.0

Where my_id(1) and my_id(2) describes the processor's position in the two dimensional logical grid. In this case, the compiler uses the global to processor and global to local functions for cyclic and block distributions. The processors are masked according to the coordinate id numbers since the logical processors are arranged in a grid topology.

Example 2 (Grouping) Consider the statement:


         A(:,8)=99.0

Only, the group of processors owning the column of array A need to execute this statement. The rest of the grid must be masked.


         do i=my_id(1),N,P
            if(8*Q/M .eq. my_id(2))  A(i/P, 8-my_id(2)*M/Q) = 99.0
         end do

Note that the iterations (indexed by above) are distributed cyclicly following the owner computes rule.

Example 3 (Forall) Consider the statement:


         forall(i=1:N,j=1:M)  A(i,j)=j

In the above computations all elements of each column of array A are assigned the corresponding column number (in the global index domain).


         do i=my_id(1),N,P
         do j=1,M/Q
           A(i/P,j)=j+my_id(2)*M/P
         end do
         end do

The compiler distributes the iterations and in cyclic and block fashion respectively since array A is distributed in that fashion. Iteration index is localized. The compiler transforms back to a global index using local to global index conversion in the rhs expression.

Example 4 (Broadcast) Consider the statement:

 
         x=A(5,8)
where is a scalar variable (scalars are replicated on all processors). The above statement causes a broadcast communication. The source processor of the broadcast is found using a global-to- processor function similar to that in Example 1.

Example 5 (Gather) Consider the statement:


         B=A(U,V)

where and are one-dimensional replicated arrays. is a two-dimensional array and is distributed in the same way as is array . This vector-valued assignment causes an unstructured communication (also called gather[28] in this case). The owner processors of array may need some values of array , depending on the contents of arrays and at run-time. The compiler makes each owner processor of array calculate which processor has the non-local part of array using global to processor function. The compiler also generates code that computes the local index the array using the global to local index conversion function for each source processor. After making each processor calculate the local list and the processor list, the compiler generates a statement to the call gather collective communication.

Example 6 (Scatter) Consider the statement:

 
         A(U,V)=B

The above statement causes scatter communications. Again the compiler generates code such that each owner processor of the array uses data distribution functions to find the destination of the local array .



Next: Grid Mapping Functions Up: Data Distribution (Stage Previous: Distribution functions


zbozkus@
Thu Jul 6 21:09:19 EDT 1995