At the run-time level, a class library implementation of the HPF model is likely to include
A distributed array is parametrized by a member of the Array class. In C++ Array would naturally be a template for a container class. In Java, generic container classes are problematic. Without the template mechanism, the obvious options are that a container holds items of type Object, the base class for all non-primitive types, or that a separate container class is provided for each allowed type of element. The first option doesn't allow for array elements of primitive type, and prevents compile-time type-checking (reminiscent of using void* in C). The second approach presumably involves restricting elements to the finite set of primitive types (int, float, ...). For now we have side-stepped the issue by leaving the data elements out of the Array class. Array defines the shape and distribution of an array, but space for elements is allocated in a separately declared vector of the appropriate type.
The constructor for an Array defines its shape and distribution format. This is expressed through two auxilliary classes: the Procs and Range classes. The Procs class corresponds directly to the HPF processor arrangement. It maps the set of physical processes on which the program is executing to a multi-dimensional grid. A Range describes a single dimension of an HPF array. It embodies an array extent (the size of the array in the dimension concerned), and a mapping of the subscript range to a dimension of a Procs grid.
In our pilot implementation any parallel Java application is written as a class extending the library class Node. The Node class maintains some global information and provides various collective operations on arrays as member functions. The code for the ``main program'' goes in the run member of the application class.
A simplified version of the code for the ``Life'' demo is given in figure 6.
Figure 6: Simplified code of the Life demo program.
The object p represents a 2 by 2 process grid. The Procs constructor takes the current Node object as an argument, from which it obtains information on the available physical processes. In this simplified example we assume that the program executes on exactly four processors.
The objects x and y represent index ranges of size N distributed over the first and second dimensions of the grid p. The default distribution format is blockwise. Cyclic distribution format can also be specified by using a range object of class CRange, which is derived from Range (the pilot implementation does not provide any more general distribution or alignment options).
The object r represents the shape and distribution of a two dimensional array. This ``template'' is be shared by several distributed arrays--it does not contain a data vector. The data vectors that hold the local segments of arrays are created separately using the inquiry function seg, which returns the number of locally held elements. In the example the elements of the main data array are held in w. The extra arrays cn_, cp_, ..., cnn, ... will be used to hold arrays of neighbour values.
The ``forall loop'' initializing w should be read as something like
forall(i in range x, j in range y) w(i, j) = fun(i, j)
where fun is some function of the global indices defining the initial state of the Life board. The members forall, test, next update internal state of r so that r.sub() returns the local subscript for the current iteration, and r.idx(0) and r.idx(1) return the global index values for the current iteration. We are using r as an iterator class.
The main loop uses cyclic shift operations to obtain neighbours, communicating data where necessary. The shift operation is a member of the Node class. It should be overloaded to accept data vectors of any primitive type--here the array elements are bytes.
Finally w is updated in terms of its neighbours.
Note some characteristic features of the data-parallel style of programming: