[Make this a subsection of the previous section.]
Occasionally an effective optimization to the translation scheme of the last section is to use the subkernel instead of the kernel for outer loops. If the parametric range is a small subrange of its parent template, covering only a small part of its kernel, using the subkernel may avoid the overhead of inspecting many empty blocks. See figures 3.1 and 3.2.
Figure: Possible definition of the subkernel
for the range of figure 3.1.
Figure: Possible definition of the subkernel
for the range of figure 3.2.
For a parametric range of level greater than zero, the translation summary is given in figure 7.14. The kernel index j is now parametrized by x.subker() rather than x.ker().
In practise this optimization is more important in the global block enumerations of section 7.6 than in distributed loops. As well as providing a useful optimization to this enumeration, the subkernel imposes an ordering on block enumeration which is important in certain communication operations (specifically, remap). If the alignment stride of the parent range is negative, the result for subker().str() is also negative.
Figure 7.14: Summary of subkernel-based translation
scheme for overall construct with level greater than 0.