Distributed Memory Machines -- Notes
Conceptually, the nCUBE CM-5 Paragon SP-2 Beowulf PC cluster are quite similar.
- Bandwidth and latency of interconnects different
- The network topology is a two-dimensional torus for Paragon, fat tree for CM-5, hypercube for nCUBE and Switch for SP-2
To program these machines:
- Divide the problem to minimize number of messages while retaining parallelism
- Convert all references to global structures into references to local pieces (explicit messages convert distant to local variables)
- Optimization: Pack messages together to reduce fixed overhead (almost always needed)
- Optimization: Carefully schedule messages (usually done by library)