One of the important considerations for message passing on
distributed memory machines is the setup time required for sending a
message. Typically, this cost is equivalent to the sending cost of
hundreds of bytes.
Vectorization combines messages for the same source and destination
into a single message to reduce this overhead [61][17]
Since in Fortran 90D/HPF we are only parallelizing array
assignments and forall loops, there is no data dependency between
different loop iterations. Thus, all the required communication can be
performed before or after the execution of a loop on each of the processors
involved as shown in Figure .