1 |
Use owner-computes rule to partition computation
-
See implementation of DISTRIBUTE
|
2 |
Semantics of the construct allow parallelism
-
All rows of dependence diagram can execute in parallel
|
3 |
Dependence analysis can further limit synchronization
-
If RHS and LHS do not access the same element, no synchronization is needed
|
4 |
Data need only be moved at synchronization points
-
Before each RHS and before each LHS
-
These may be combined with previous/following statement
|