Implementation of DISTRIBUTE (2)
Adjust loops (including implicit loops)
- Pick a reference in the loop (e.g. A(J+1))
- Each processor executes iterations so that reference is local (e.g. lb-1:ub-1)
Handle nonlocal data
- Distributed memory: allocate buffer space, SEND/RECV needed data
- Shared memory: access data directly, or allocate & make local copies