Recent parallel computers such as Intel Paragon and Thinking Machine CM-5
have vector units for each processor. However, most vector units does not
perform well if the loop has a branch instruction in the loop.
Such loops may not be vectorized.
A mask may cause the generated loop to be un-vectorizable.
Fortran 90D/HPF compiler tries to insert the mask with only depended indices
not all indices.
For example, Gaussian Elimination code has a forall:
forall (i = 1:N, j = k:NN, indx(i) .EQ. -1)
& a(i,j) = a(i,j) - fac(i)*row(j)
Here, mask does not depend on the j index, it only depends on i, so we transform as follows:
do i=..
if(index(i).eq.-1) then
do j=..
a(i,j)=....
enddo
endif
enddo
The inner loop becomes a vectorizable loop.