Implementation of DISTRIBUTE
Allocate enough memory on each processor for its section of each distributed array
- Distributed memory: 1 malloc per processor
- Shared memory: 1 shared area, divide usage
Adjust indexing
- Distributed memory: translate global indices Û local numbering
- Shared memory: permute elements, keep each processor's together