All the examples discussed below have the following mapping directives.
CHPF$ PROCESSORS(P,Q)
CHPF$ DISTRIBUTE TEMPL(BLOCK,BLOCK)
CHPF$ ALIGN A(I,J) WITH TEMPL(I,J)
CHPF$ ALIGN B(I,J) WITH TEMPL(I,J)
Example 1 (transfer) Consider the statement:
FORALL(I=1:N) A(I,8)=B(I,3)
The first subscript of is marked as no_communication
because
and
are aligned in the first dimension
and have identical indices.
The second dimension is marked as transfer.
1. call set_BOUND(lb,ub,st,1,N,1)
2. call set_DAD(B_DAD,.....)
3. call transfer(B, B_DAD, TMP,src=global_to_proc(8),
dest=global_to_proc(3))
4. DO I=lb,ub,st
5. A(I,global_to_local(8)) = TMP(I)
6. END DO
In the above code, the set_BOUND primitive (line 1) computes the local bounds for computation assignment based on the iteration distribution (Section ). In line 2, the primitive set_DAD is used to fill the Distributed Array Descriptor (DAD) associated with array
so that it can be passed to the transfer communication primitive at run-time. The DAD has sufficient information for the communication primitives to compute all the necessary information including local bounds, distributions, global shape etc. Note that transfer performs one-to-one send-receive communication based on the logical grid. In this example, one column of grid processors communicate with another column of the grid processors as shown in Figure
(a).
Example 2 (multicast) Consider the statement:
FORALL(I=1:N,J=1:M) A(I,J)=B(I,3)
The second subscript of marked as multicast
and the first as no_communication.
1. call set_BOUND(lb,ub,st,1,N,1)
2. call set_BOUND(lb1,ub1,st1,1,M,1)
3. call set_DAD(B_DAD,.....)
4. call multicast(B, B_DAD, TMP,
& source_proc=global_to_proc(3), dim=2)
5. DO I=lb,ub,st
6. DO J=lb1,ub1,st1
7. A(I,J) = TMP(I)
8. END DO
Line 4 shows a broadcast along dimension 2 of the logical processor grid by the processors owning elements where
(Figure
(b).)
Example 3 (multicast_shift) Consider the statement:
FORALL(I=1:N,J=1:M) A(I,J)=B(3,J+s)
The first subscript of array is marked as multicast and
the second subscript is marked as temporary_shift.
The above communication can be implemented as two separate communication steps: multicast along the first dimension of logical grid and
temporary_shift along the second dimension of the logical grid.
Alternatively, the two communication patterns can be composed together
to obtain a better communication primitive such as
the multicast_shift primitive.
call set_BOUND(lb,ub,st,1,N,1) ! compute local lb, ub, and st
call set_BOUND(lb1,ub1,st1,1,M,1) ! compute local lb, ub, and st
multicast_shift(B, B_DAD,TMP, source=global_to_proc(3),
& shift=s, multicast_dim=1, shift_dim=2)
DO I=lb,ub,st
DO J=lb1,ub1,st1
A(I,J)=TMP(J)
END DO
END DO
Combining two primitives eliminates the need for creating temporary storage and eliminates some of intra-processor copying, message-packing, and unpacking.