Another optimization Fortran 90D/HPF performs involves how expressions are evaluated. For example, consider the following program fragment:
REAL A(N), B(N), C(N), D(N), E(N)
!HPF$ distribute E(BLOCK)
!HPF$ distribute (CYCLIC) :: A, B, C, D
E = (A + B) * (C + D)
In a compiler that blindly applies the owner computes rule,
,
,
and
will be redistributed into temporaries
that match the distribution of
, then the computation will be performed
locally. This may cause four different communications with four different
temporaries.
A more intelligent approach might perform the sum of and
locally
into a cyclically distributed temporary, then
perform the sum of
and
locally into another cyclically distributed
temporary. Then multiply those two temporaries locally into a cyclically
distributed a temporary, finally, redistribute the result into
.
This approach may cause one communication with three temporaries (shown in Figure
(b) ).
To apply the above optimization, the compiler has to evaluate
the expression according the partial order induced by the expression tree
(shown Figure (a) ). However, Li and Chen [40] show
that the problem of determining an optimal static alignment between
the dimension of distinct arrays arrays is NP-complete.
Chatterjee et al. [62] and Bouchitte
et al. [63] propose some heuristic algorithms to evaluate
the expression tree with minimal cost.