HPJava follows HPF in allowing several different distribution formats for the dimensions of its distributed arrays. The new formats are provided without further extension to the syntax of the language. Instead the Range class hierarchy is extended. The full hierarchy is shown in Figure 3.1.
The BlockRange subclass is very familiar by now. The Dimension class is also familiar, although previously it wasn't presented as a range class. Later examples will demonstrate how it can be convenient to use process dimension objects as array ranges. CyclicRange and BlockCyclicRange are directly analogous with the cyclic and block-cyclic distribution formats available in HPF.
Cyclic distributions are sometimes favoured because they can lead to better load balancing than the simple block distributions introduced so far. Some algorithms (for example dense matrix algorithms) don't have the kind of locality that favours block distribution for stencil updates, but they do involve phases where parallel computations are limited to subsections of the whole array. In block distributions these sections may map to only a fraction of the available processes, leaving the remaining processes idle. Here is a contrived example
Procs2 p = new Procs2(2, 3) ; on(p) { Range x = new BlockRange(N, p.dim(0)) ; Range y = new BlockRange(N, p.dim(1)) ; float [[,]] a = new float [[x, y]] ; overall(i = x for 0 : N / 2 - 1) overall(j = y for 0 : N / 2 - 1) a [i, j] = complicatedFunction(i`, j`) ; }As shown in Figure 3.2, this leads to a very poor distribution of workload. The process with coordinates
In cyclic distribution format, the index space is mapped to the process dimension in wraparound fashion. If we change our example to
Procs2 p = new Procs2(2, 3) ; on(p) { Range x = new CyclicRange(N, p.dim(0)) ; Range y = new CyclicRange(N, p.dim(1)) ; float [[,]] a = new float [[x, y]] ; overall(i = x for 0 : N / 2 - 1) overall(j = y for 0 : N / 2 - 1) a [i, j] = complicatedFunction(i`, j`) ; }Figure 3.3 shows that the imbalance is much less extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using overall loops and collective array operations) it is often possible to change the distribution format of arrays dramatically but leave much of the code that processes them invariant. HPJava does not guarantee this property in the same way that HPF does, and fully optimized HPJava programs are unlikely to be so easily redistributed. But it is still a useful feature.
As a more graphic example of how cyclic distribution can improve load balancing, consider the Mandelbrot set code of Figure 3.4. Points away from the set are generally eliminated in a few iterations, whereas those close to the set or inside it take much more computation--points are assumed to be inside the set, and the corresponding element of the array is set to 1, when the number of iterations reaches CUTOFF. In the case N = 64 (with CUTOFF = 100), the block decomposition of the set is shown in Figure 3.5. The middle column of processors will do most of the work, because they hold most of the set. Changing the class of the ranges to CyclicRange gives the much more even distribution shown in Figure 3.6.
Block cyclic distribution format is a generalization of cyclic distribution that is used in some libraries for parallel linear algebra 3.1. It will not be discussed further here.
The ExtBlockRange subclass represents block-distributed ranges extended with ghost regions.