The communication library routines try to aggregate messages [61][17][37]
(corresponding to several array sections) into a single larger message,
possibly at the expense of extra copying into a continues buffer.
A communication routine first calculates the largest possible array section
from this processor to the rest. These may indicate several continuous
block of data. Then, it tries to sort the continuous data by destination.
Then it aggregates non-continuous array sections (messages) into a
continuous message buffers.
Messages with an identical destination processor can then be collected into
a single communication operation as shown at Figure .
The gain from message aggregation is similar to vectorization in
that multiple communication operations can be eliminated at the cost of
increasing the message length.