Use owner-computes rule to partition computation
-
See implementation of DISTRIBUTE
|
Semantics of the construct allow parallelism
-
All rows of dependence diagram can execute in parallel
|
Dependence analysis can further limit synchronization
-
If RHS and LHS do not access the same element, no synchronization is needed
|
Data need only be moved at synchronization points
-
Before each RHS and before each LHS
-
These may be combined with previous/following statement
|