Array restriction

Library functions operating on distributed arrays often specify certain alignment relations between their array arguments. Two arrays are aligned if they have the same distribution group and the same ranges ^4.5. The Adlib member dotProduct, for example, takes two distributed array arguments. These arguments must be aligned.

Occasionally it happens that two arrays we want to pass as arguments to a library function are essentially aligned, but one is replicated over a particular process dimension and the other isn't. It may be intuitively obvious that all the data needed by the function is in the right place, but still we cannot call the function--the ranges may match, but the replicated array has a larger distribution group. By the definition given above the arrays are not identically aligned.

One possibility is to relax the definition of argument alignment to take account of this situation. Anyone writing a library callable from HPJava is free to take this path--they can write their functions to accept arrays with some weaker alignment constraint. But experience suggests that the simple definition of alignment given above is easy to understand, and the specification and implementation of functions will be simpler if thay are based on this definition.

A minor extension to the HPJava language takes care of this situation. The restriction operation introduced for groups in the previous section can also be applied to an array. It returns a new array object--akin to an array section--which has the same ranges as the parent array, but has its group restricted by the specified location. Applied to a replicated array, it returns an array object referencing only the copies of the elements held in the restricted group.

Figure 4.6 is a generalization of the matrix multiplication program in Figure 3.13 to the case where the arrays are suitably distributed over a 3-dimensional process grid. Note that array c is replicated over the process dimension of z, a is replicated over the dimension of y, and b is replicated over the dimension of x. The sequential inner loop of Figure 3.13 is replaced by a call to dotProduct which directly forms the inner product of two sections with distributed range z.

**Figure:** A maximally parallel matrix multiplication program.
$\begin{figure} \small\begin{verbatim}Procs3 p = new Procs3(P, P, P) ; on(p) ... ...roduct(a [[i, :]] / j, b [[:, j]] / i) ; }\end{verbatim}\normalsize\end{figure}$

If we didn't know about array restriction we would probably try to write the loop body as

This is the first example we have given of a call to a collective library function inside the parallel overall construct. The library, Adlib, supports this kind of ``nested parallelism'' provided certain precautions are observed. These will be explained in section 6.