The compiler try to move up some communication routines
by analyzing definition-use chains [56] as much as possible .
This may lead to moving of the scheduling code out of
one or more nested loops which may significantly reduce the amount of overhead.
The code movement may also vectorize some of
communication. For example, the following loop can not be written as an array
construct or a statement because the loop contains the
user defined function FOO.
DO i=1, N-3
B(i) = A(i+3) + FOO(C(:,i))
ENDDO
The compiler may communicate the array element inside the loop. However, if it applies the optimization, the code becomes:
tmp(1:N-3) = A(4:N)
DO i=1, N-3
B(i) = tmp(i) + FOO(C(:,i))
ENDDO
The communication from to
is taken outside of loop.
And it is vectorized.