1 |
Minimize forking and synchronization overhead
-
One parallel region at highest possible level
-
Mark outermost possible loop for work sharing
|
2 |
Keep each processor working on the same data
-
Consistent schedule for DO loops
-
Trust underlying system not to migrate threads for no reason
|
3 |
Lay out data to be contiguous
-
Column-major ordering in Fortran
-
Therefore, make dimension of outermost work-shared loop the column
|