1 |
Allocate enough memory on each processor for its section of each distributed array
-
Distributed memory: 1 malloc per processor or shrink bounds
-
Shared memory: 1 shared area, with usage divided
|
2 |
Adjust indexing
-
Distributed memory: translate global indices ¤ local numbering
-
Shared memory: permute elements, keep each processor's together
|
3 |
Adjust loops (including implicit loops)
-
Pick a reference in the loop (e.g. A(J+1))
-
Each processor executes iterations so that reference is local (e.g. lb-1:ub-1)
|
4 |
Handle nonlocal data
-
Distributed memory: allocate buffer space, SEND/RECV needed data
-
Shared memory: access data directly, or allocate & make local copies
|