1 |
Use owner-computes rule to partition computation
See implementation of DISTRIBUTE
2 |
Semantics of the construct allow parallelism
All rows of dependence diagram can execute in parallel
3 |
Dependence analysis can further limit synchronization
If RHS and LHS do not access the same element, no synchronization is needed
4 |
Data need only be moved at synchronization points
Before each RHS and before each LHS
These may be combined with previous/following statement