next up previous contents
Next: class Index Up: NPAC PCRC Runtime Kernel Previous: The ad++ interface

 

Distributed loops

In this chapter we discuss various ways to use the Adlib run-time technology in translation of distributed loops--loops whose ranges is partitioned across the active process group. Typically such loops are used to access and modify the data in distributed arrays. For definiteness, we work in the context of the ad++ interface. The techniques can be adapted to other interfaces to the kernel library.

In ad++, the general overall construct is a distributed, parallel loop. It is parametrized by an Index object which maintains local loop state. If x is a range the overall construct has the syntax.

  Index i(x) ;
  OVERALL(i) {
    ...
  } ALLOVER(i) ;

If x has extent N, this construct can be compared to the sequential loop

  int i ;
  for(i = 0 ; i < N ; i++) {
    ...
  }

The difference is that in the overall construct the N instances of the body of the loop will be partitioned across the set of active processes, following the mapping of x.

The Index class is a subclass of Location. Within an overall construct parametrized by an Index i, the Subcript component of i is set to the local subscript for the current iteration. So i can be used as an array subscript, as in

  Array1<float> c(x) ;

  Index i(x) ;
  OVERALL(i) {
    c(i) = ...
  } ALLOVER(i) ;

The general overall construct has an effect on the active process group as described in section 2.7. If a construct parametrized by i appears in the context of an active process group p, the body of the construct executes in the context of an active process group p / i (recall that Index is a subclass of Location which is in turn is a subclass of Coord, so this expression is well-formed.) The parent range of i must be distributed over a dimension of p.

Combining these features, we can give a more complete example

  Array2<float> a(x, y) ;
  Array1<float> b(y) ;
  ...
  Index i(x), j(y) ;
  OVERALL(i) {
    OVERALL(j) {
      a(i, j) = 2 * b(j) + x.idx(i) ;
    } ALLOVER(j) ;
  } ALLOVER(i) ;

To each element of a, this assigns an expression computed from the aligned value of b and the global subscript of x (obtained through x.idx(i). All data accesses through legal subscripting operations are local. If a non-local array element was required, it would take a specific call to a member of the communication library to access it.

The remainder of this chapter discusses several schemes for translating the distributed loop. The first scheme uses the Adlib Index class directly. The ``translation'' is the trivial one, using only the C macro preprocessor to replace the OVERALL and ALLOVER ``keywords''. The second scheme uses another auxilliary (iterator) class from the library--LocBlocksIndex. The translation is still relatively straightforward and has the advantage of being independent of the level of the parametric range. The LocBlocksIndex mechanism is quite efficient, and is used extensively in the implementation of the Adlib communication library. Finally we describe a scheme which works directly in terms of the members of Range class, without introducing any auxilliary iterator class.




next up previous contents
Next: class Index Up: NPAC PCRC Runtime Kernel Previous: The ad++ interface

Guansong Zhang
Fri Oct 9 12:29:23 EDT 1998