In order to perform a collective communication on array elements, the communication primitive needs the following information
There are two ways of determining the above information. 1) Use a preprocessing loop to compute the above values or, 2) Based on the type of communication, the above information may be implicitly available, and therefore, not require preprocessing. We classify our communication primitives into unstructured and structured communication.
Our structured communication primitives are based on a logical grid configuration of the processors. Hence, they use grid-based communications such as shift along dimensions, broadcast along dimensions etc. The following summarizes some of the structured communication primitives .
The other structured communications in data parallel languages are tree-based communications to perform reduction operations on the specified dimensions of arrays. For example, in Fortran 90D/HPF, the reduction operations on arrays are included as intrinsic functions which can be efficiently hand-coded and supplied as a part of the run-time library for the compiler. Therefore, tree-based communication primitives patterns are not considered in this chapter.
The other advantages of these types of communication primitives are
that they can be combined to form composite communication patterns
for better performance.
(This will be elaborated on in section .) Further,
some structured communication calls can be eliminated using appropriate
alignment directives.
Example 1 (Alignment) Consider the following statement:
!F90D$ ALIGN A,B with T
A(1:N-1,1:N-1) = B(2:N,2:N)
The above code results in an overlap_shiftof array B in two dimensions.
However, note that this shift communication may be avoided by the aligning
arrays and
as shown below.
!F90D$ ALIGN A(I,J) with T(I,J)
!F90D$ ALIGN B(I,J) with T(I-1,J-1)
We have implemented two sets of unstructured communication primitives: One, to support cases where the communicating processors can determine the send and receive lists based only on local information, and hence, only require preprocessing that involves local computations [21], and the other, where to determine the send and receive lists, preprocessing itself requires communication among the processors [27]. The primitives are as follows.