Parallel arrays and collective communication

Next: ``Array syntax'' in Java. Up: Data parallellism in Java Previous: Data parallellism in Java

Parallel arrays and collective communication

At the run-time level, a class library implementation of the HPF model is likely to include

Classes to describe process arrays and distributed data arrays.
Classes or functions to simplify access to locally held elements of a distributed array (including parallel iteration).
Functions for collective communication through operations on distributed arrays: regular ``copying'' operations including shifts and transposes, arithmetic reduction operations, irregular gather/scatter operations, and so on.

Our first experiments with a Java binding only touch the surface of the full HPF semantics, but they provide some hints about a general framework. The interface given here borrows from the C++ class library, Adlib, developed by one of us [6].

A distributed array is parametrized by a member of the Array class. In C++ Array would naturally be a template for a container class. In Java, generic container classes are problematic. Without the template mechanism, the obvious options are that a container holds items of type Object, the base class for all non-primitive types, or that a separate container class is provided for each allowed type of element. The first option doesn't allow for array elements of primitive type, and prevents compile-time type-checking (reminiscent of using void* in C). The second approach presumably involves restricting elements to the finite set of primitive types (int, float, ...). For now we have side-stepped the issue by leaving the data elements out of the Array class. Array defines the shape and distribution of an array, but space for elements is allocated in a separately declared vector of the appropriate type.

The constructor for an Array defines its shape and distribution format. This is expressed through two auxilliary classes: the Procs and Range classes. The Procs class corresponds directly to the HPF processor arrangement. It maps the set of physical processes on which the program is executing to a multi-dimensional grid. A Range describes a single dimension of an HPF array. It embodies an array extent (the size of the array in the dimension concerned), and a mapping of the subscript range to a dimension of a Procs grid.

In our pilot implementation any parallel Java application is written as a class extending the library class Node. The Node class maintains some global information and provides various collective operations on arrays as member functions. The code for the ``main program'' goes in the run member of the application class.

A simplified version of the code for the ``Life'' demo is given in figure 6.

Figure 6: Simplified code of the Life demo program.

The object p represents a 2 by 2 process grid. The Procs constructor takes the current Node object as an argument, from which it obtains information on the available physical processes. In this simplified example we assume that the program executes on exactly four processors.

The objects x and y represent index ranges of size N distributed over the first and second dimensions of the grid p. The default distribution format is blockwise. Cyclic distribution format can also be specified by using a range object of class CRange, which is derived from Range (the pilot implementation does not provide any more general distribution or alignment options).

The object r represents the shape and distribution of a two dimensional array. This ``template'' is be shared by several distributed arrays--it does not contain a data vector. The data vectors that hold the local segments of arrays are created separately using the inquiry function seg, which returns the number of locally held elements. In the example the elements of the main data array are held in w. The extra arrays cn_, cp_, ..., cnn, ... will be used to hold arrays of neighbour values.

The ``forall loop'' initializing w should be read as something like

  forall(i in range x, j in range y)
    w(i, j) = fun(i, j)

where fun is some function of the global indices defining the initial state of the Life board. The members forall, test, next update internal state of r so that r.sub() returns the local subscript for the current iteration, and r.idx(0) and r.idx(1) return the global index values for the current iteration. We are using r as an iterator class.

The main loop uses cyclic shift operations to obtain neighbours, communicating data where necessary. The shift operation is a member of the Node class. It should be overloaded to accept data vectors of any primitive type--here the array elements are bytes.

Finally w is updated in terms of its neighbours.

Note some characteristic features of the data-parallel style of programming:

The distribution format of the arrays can be changed just by altering a few parameters at the start of the program--the main program is insensitive to these details
low level message-passing is abstracted into high-level collective operations on distributed array structures.

It may be unclear that this framework has the same power as HPF. If the distributed array model is extended to the full HPF model, as it can be, and the shift operation is augmented by some more powerful collective operations including gather and scatter operations parametrized by subscript arrays, we claim that it does. The proof lies in the observation that this is, as advertised, a simplified model of the kind of run-time library that various HPF translators target. This is not to say that programming in this style is always as straightforward as the example given here, or as comprehensible as the corresponding HPF program (if it was, there would be no need for HPF).

Next: ``Array syntax'' in Java. Up: Data parallellism in Java Previous: Data parallellism in Java

Geoffrey Fox, Northeast Parallel Architectures Center at Syracuse University, gcf@npac.syr.edu