Java already provides parallelism through threads. But that model of parallelism can only be easily exploited on shared memory computers. HPJava is targetted at distributed memory parallel computers (most likely, networks of PCs and workstations).
HPJava extends Java with class libraries and some additional syntax for dealing with distributed arrays. Some or all of the dimensions of a these arrays can be declared as distributed ranges. A distributed range defines a range of integer subscripts, and specifies how they are mapped into a process grid dimension. It is represented by an object of base class Range. Process grids--equivalent to processor arrangements in HPF--are described by suitable classes. A base class Group describes a general group of processes and has subclasses Procs1, Procs2, ..., representing one-dimensional process grids, two-dimensional process grids, and so on. The inquiry function dim returns an object describing a particular dimension of a grid. In the example
Procs2 p = new Procs2(3, 2) ; Range x = new BlockRange(100, p.dim(0)) ; Range y = new BlockRange(200, p.dim(1)) ; float [[,]] a = new float [[x, y]] on p ;
a is created as a 100 200 array, block-distributed over the 6 processes in p. The Range subclass BlockRange describes a simple block-distributed range of subscripts--analogous to BLOCK distribution format in HPF. The arguments of the BlockRange constructor are the extent of the range and an object defining the process grid dimension over which the range is distributed.
In HPJava the type-signatures and constructors of distributed arrays use double brackets to distinguish them from ordinary Java arrays. Selected dimensions of a distributed array may have a collapsed (sequential) ranges rather than a distributed ranges: the corresponding slots in the type signature of the array should include a * symbol. In general the constructor of the distributed array is followed by an on clause, specifying the process group over which the array is distributed. (If this is omitted the group defaults to the APG, see below.) Distributed ranges of the array must be distributed over distinct dimensions of this group.
A standard library, Adlib, provides functions for manipulating distributed arrays, including functions closely analogous to the array transformational intrinsic functions of Fortran 90. For example:
float [[,]] b = new float [[x, y]] on p ; Adlib.shift(b, a, -1, 0, CYCL) ; float g = Adlib.sum(b) ;
The shift operation with shift-mode CYCL executes a cyclic shift on the data in its second argument, copying the result to its first argument. The sum operation simply adds all elements of its argument array. In general these functions imply inter-processor communication.
Often in SPMD programming it is necessary to restrict execution of a block of code to processors in a particular group p. Our language provides a short way of writing this construct
on(p) { ... }
The language incorporates a formal idea of an active process group (APG). At any point of execution some group is singled out as the APG. An on(p) construct specifically changes its value to p. On exit from the construct, the APG is restored to its value on entry.
Subscripting operations on distributed arrays are subject to some restrictions that ensure data accesses are local. An array access such as
a [17, 23] = 13 ;
is forbidden because typical processes do not hold the specified element. The idea of a location is introduced. A location can be viewed as an abstract element, or ``slot'', of a distributed range. The syntax x[n] stands for location n in range x. In simple array subscripting operations, distributed dimensions of arrays can only be subscripted using locations (not integer subscripts). These must be locations in the appropriate range of the array. Moreover, locations appearing in simple subscripting operations must be named locations, and named locations can only be scoped by at and overall constructs.
The at construct is analogous to on, except that its body is executed only on processes that hold the specified location. The array access above can be safely written as:
at(i = x [17]) at(j = y [23]) a [i, j] = 13 ;
Any location is mapped to a particular slice of a process grid. The body of the at construct only executes on processes that hold the location specified in its header.
The last distributed control construct in the language is called overall. It implements a distributed parallel loop, and is parametrized by a range. Like at, the header of this construct scopes a named location. In this case the location can be regarded as a parallel loop index.
float [[,]] a = new float [[x, y]], b = new float [[x, y]] ; overall(i = x) overall(j = y) a [i, j] = 2 * b [i, j] ;
The body of an overall construct executes, conceptually in parallel, for every location in the range of its index. An individual ``iteration'' executes on just those processors holding the location associated with the iteration. Because of the rules about use of subscripts, the body of an overall can usually only combine elements of arrays that have some simple alignment relation relative to one another. The idx member of Range can be used in parallel updates to yield expressions that depend on global index values.
Other important features of the language include Fortran-90-style regular array sections (section construction operations look similar to simple subscripting operations, but are distinguished by use of double brackets), an associated idea of subranges, and subgroups, which can be used to represent the restricted APG inside at and overall constructs.
The language extensions are most directly targetted at data parallelism. But an HPJava program is implicitly an SPMD Java program, and task parallelism is available by default. A structured way to write a task parallel program is to write an overall construct parametrized by a process dimension (which is a particular kind of range). The body of the loop executes once in each process. The body can execute one or more ``tasks'' of arbitrary complexity. Task parallel programming with distributed arrays can be facilitated by extending the standard library with one-sided communication operations to access remote patches of the arrays, and we are investigating integration of software from the PNNL Global Array Toolset [8] in this connection.