An important and interesting aspect of parallel computer systems is how the individual processors communicate with one another. This is particularly important for distributed memory systems but it is also important for shared memory systems since the connection to the shared memory can be implemented by several different communication schemes. We next discuss briefly a number of the more common schemes.
A distributed-memory machine consists of a set of processors linked by interconnection networks. Each processor has its own memory that is directly accessible only by this processor. Data exchange and global operations among processors are accomplished through message passing. Each processor is connected to a fixed number of processors in some regular geometry (Figure 4.3 such as ring, 2-D mesh (e.g., Intel Touchstone Delta and Intel Paragon), fat tree (e.g., TMC CM-5), and hypercube (e.g., Intel iPSC/860 and nCUBE 2).
Writing an efficient program on distributed-memory machines is more difficult than programming on sequential machines or shared-memory machines. Some issues that arise in programming distributed-memory machines that must be carefully addressed are:
Load balancing and reduction of communication cost are always two important factors for achieving good performance on distributed-memory machines. They are discussed in the implementation section.
In Appendix , we describe important features of three
distributed-memory architectures that were used in this thesis. We will
focus on the architecture of the Connection
Machine CM-5 in detail and also briefly on the Intel and the IBM SP-1.