There is tradeoff between parallelism and communication
|
Programmer defines the data mapping and compiler uses this to assign processing
|
Underlying assumptions are that:
|
An operation on two or more data object is likely to be carried out much faster if they all reside in the same processor,
|
And that it may be possible to carry out many such operations concurrently if they can be performed on different processors
|
This is embodied in "owner computes" rule -- namely that in for instance
-
A(i,j)= .....
-
One brings everything on right hand side to process "owning" A(i,j) and performs computation in this processor
|
Owner computes algorithm is usually good and often best
|