Naturally parallel implementations work ``best'' if the machine architecture is ``similar'' to that of the problem. This is summarized in Table 5 where to be precise, success requires that the machine architecture ``contains'' (is a superset of) the problem architecture. Thus, both SIMD and MIMD machines express synchronous problems, but SIMD machines are typically unsuitable for loosely synchronous problems.
Table 5: What is the ``Correct'' Machine Architecture for each Problem
Class?
Software systems need to be designed so that they can express problems well, and be targeted to relevant machines. Software should not be designed for a particular machine model---it expresses problem and not machine characteristics.
Table 6: Candidate Software Paradigms for Each Problem
Architectures
We have described those issues at length in [Fox:90p;91g;94a], and here we will just present a simple table (Table 6) mapping the five problem architectures into possible software environments. This is presented in a different fashion for HPF and HPC++ in Figure 9 and Table 7, which also points out the distinct runtime support needed for each problem class. One always has a tradeoff between performance and flexibility. Systems listed under ``asynchronous'' in Table 6 can typically also be used for synchronous and loosely synchronous problems. As shown in Figure 10, the ``asynchronous'' software used on loosely synchronous problems will probably provide greater flexibility, but lower performance than software systems explicitly designed for this problem class.
Table 7: Imprecise Mapping of Problem Classes into Runtime and
Language Terms
Figure 9: General Applicability of HPF, HPF+, HPC++ Classified by
Problem Architecture and type of Runtime Support needed
Figure 10: Mapping of Asynchronous, Loosely Synchronous, and Synchronous
Levels or Components of Machine, Software and Problem. Each is
pictured hierarchically with the asynchronous level at the top and
synchronous components at lowest level. Any one of the components may
be absent.
Loosely synchronous problems are in some sense the hardest as they have difficult irregularities which must be expressed with high efficiency by the underlying compiler and runtime systems. We, and others, have discussed this at length, both in general [Choudhary:92d;92e], [Fox:90p], [Goil:94a;95a],
, and in case of High Performance Fortran [Bogucz:94a], [Chapman:94b], [Cheng:94e], [Choudhary:92g;94c], [Fox:94g], [Hawick:95a;95c], [HPF:94a], [HPFapp:95a], [Joubert:95a], [Muller:95a], [Robinson:95a], [Sturler:95a].
Note that Figure 9 refers to ``HPF+''---this is some extension, called officially HPF2 (and later 3 perhaps) of HPF [HPF:93a], [HPFF:95a] to fill gaps in the original language. The current HPF1 handles most synchronous and embarrassingly applications, but requires extension to handle the adaptive irregular data structures typical of loosely synchronous problems.
We now quantify these remarks with three case studies, which will link the material of Sections 2,3, and 4.