next up previous
Next: Discussion. Up: HPJava: Data Parallel Extensions Previous: Locations and the at

Distributed loops.

The last and most important distributed control construct in the language is called over. It implements a distributed parallel loop. The argument of over is a member of the special class Index. This class is a subclass of Location, so it is syntactically correct to use an index as an array subscript (the effect of such subscripting is only well-defined inside an over construct parametrised by the index in question). Here is an example of a pair of nested over loops:

  float [[#,#]] a = new float [[x, y]],
                b = new float [[x, y]] ;
  Index i, j ;
  over(i = x | :)
    over(j = y | :)
      a [i, j] = 2 * b [i, j] ;

The body of an over construct executes, conceptually in parallel, for every location in the range of its index (or some subrange if a non-trivial triplet is specified). An individual ``iteration'' executes on just those processors holding the location associated with the iteration. Because of the rules about where an individual iteration iterates, the body of an over can usually only combine elements of arrays that have some simple alignment relation relative to one another. The idx member of Range can be used in parallel updates to yield expressions that depend on global index values.

Figure gif gives a parallel implementation of Choleski decomposition in the extended language. The first dimension of a is sequential (``collapsed'' in HPF parlance). The second dimension is distributed (cyclically, to improve load-balancing). This a column-oriented decomposition. The example involves one new operation from the standard library: the function remap copies the elements of one distributed array or section to another of the same shape. The two arrays can have any, unrelated decompositions. In the current example remap is used to implement a broadcast. Because b has no range distributed over p, it implicitly has replicated mapping; remap accordingly copies identical values to all processors. This example also illustrates construction of Fortran-90-like sections of distributed arrays (using double brackets and triplet subscripts) and use of non-trivial triplets in the over construct.

 
Figure:  Choleski decomposition.


next up previous
Next: Discussion. Up: HPJava: Data Parallel Extensions Previous: Locations and the at

Theresa Canzian
Mon Jul 27 22:18:44 EDT 1998