Next: Ghost Regions Up: More on Mapping Arrays Previous: More on Mapping Arrays Contents

Other Distribution Formats

HPJava follows HPF in allowing several different distribution formats for the dimensions of its distributed arrays. The new formats are provided without further extension to the syntax of the language. Instead the Range class hierarchy is extended. The full hierarchy is shown in Figure 3.1.

**Figure 3.1:** The `Range` hierachy of HPJava.
$\begin{figure} \centerline {\psfig{figure=ranges.eps,width=2in}}\end{figure}$

The BlockRange subclass is very familiar by now. The Dimension class is also familiar, although previously it wasn't presented as a range class. Later examples will demonstrate how it can be convenient to use process dimension objects as array ranges. CyclicRange and BlockCyclicRange are directly analogous with the cyclic and block-cyclic distribution formats available in HPF.

Cyclic distributions are sometimes favoured because they can lead to better load balancing than the simple block distributions introduced so far. Some algorithms (for example dense matrix algorithms) don't have the kind of locality that favours block distribution for stencil updates, but they do involve phases where parallel computations are limited to subsections of the whole array. In block distributions these sections may map to only a fraction of the available processes, leaving the remaining processes idle. Here is a contrived example


  Procs2 p = new Procs2(2, 3) ;
  on(p) {
    Range x = new BlockRange(N, p.dim(0)) ;
    Range y = new BlockRange(N, p.dim(1)) ;

    float [[,]] a = new float [[x, y]] ;

    overall(i = x for 0 : N / 2 - 1)
      overall(j = y for 0 : N / 2 - 1)
        a [i, j] = complicatedFunction(i`, j`) ;
  }

As shown in Figure 3.2, this leads to a very poor distribution of workload. The process with coordinates

does nearly all the work. The process at

has a few elements to work on, and all the other processes are idle.

**Figure 3.2:** Work distribution for an example with block distribution.
$\begin{figure} \centerline {\psfig{figure=blockBalance.eps,width=4in}}\end{figure}$

In cyclic distribution format, the index space is mapped to the process dimension in wraparound fashion. If we change our example to


  Procs2 p = new Procs2(2, 3) ;
  on(p) {
    Range x = new CyclicRange(N, p.dim(0)) ;
    Range y = new CyclicRange(N, p.dim(1)) ;

    float [[,]] a = new float [[x, y]] ;

    overall(i = x for 0 : N / 2 - 1)
      overall(j = y for 0 : N / 2 - 1)
        a [i, j] = complicatedFunction(i`, j`) ;
  }

Figure 3.3 shows that the imbalance is much less extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using overall loops and collective array operations) it is often possible to change the distribution format of arrays dramatically but leave much of the code that processes them invariant. HPJava does not guarantee this property in the same way that HPF does, and fully optimized HPJava programs are unlikely to be so easily redistributed. But it is still a useful feature.

**Figure 3.3:** Work distribution for an example with cyclic distribution.
$\begin{figure} \centerline {\psfig{figure=cyclicBalance.eps,width=4in}}\end{figure}$

As a more graphic example of how cyclic distribution can improve load balancing, consider the Mandelbrot set code of Figure 3.4. Points away from the set are generally eliminated in a few iterations, whereas those close to the set or inside it take much more computation--points are assumed to be inside the set, and the corresponding element of the array is set to 1, when the number of iterations reaches CUTOFF. In the case N = 64 (with CUTOFF = 100), the block decomposition of the set is shown in Figure 3.5. The middle column of processors will do most of the work, because they hold most of the set. Changing the class of the ranges to CyclicRange gives the much more even distribution shown in Figure 3.6.

**Figure 3.4:** Mandelbrot set computation.
$\begin{figure} \small\begin{verbatim}Procs2 p = new Procs2(2, 3) ; on(p) { ... ...newi ; }}Adlib.printArray(set) ; }\end{verbatim}\normalsize\end{figure}$

**Figure 3.5:** Blockwise decomposition of the Mandelbrot set (black region).
$\begin{figure} \centerline {\psfig{figure=blockMandelbrot.eps,width=3in}}\end{figure}$

**Figure:** Cyclic decomposition of the Mandelbrot set.
$\begin{figure} \centerline {\psfig{figure=cyclicMandelbrot.eps,width=3in}}\end{figure}$

Block cyclic distribution format is a generalization of cyclic distribution that is used in some libraries for parallel linear algebra ^3.1. It will not be discussed further here.

The ExtBlockRange subclass represents block-distributed ranges extended with ghost regions.

Next: Ghost Regions Up: More on Mapping Arrays Previous: More on Mapping Arrays Contents

Bryan Carpenter
2000-05-19