Naturally parallel implementations work ``best'' if the machine architecture is ``similar'' to that of the problem. This is summarized in Table 5 where to be precise, success requires that the machine architecture ``contains'' (is a superset of) the problem architecture. Thus, both SIMD and MIMD machines express synchronous problems, but SIMD machines are typically unsuitable for loosely synchronous problems.
Software systems need to be designed so that they can express problems well, and be targeted to relevant machines. Software should not be designed for a particular machine model---it expresses problem and not machine characteristics.
We have described those issues at length in [Fox:90p;91g;94a], and
here we present
only a simple table (Table 6) mapping the five problem architectures into
possible software environments. This is presented in a different
fashion for HPF and HPC++ in Figure and Table 7, which
also points out the distinct runtime support needed for each problem
class. One always has a tradeoff between performance and flexibility.
Systems listed under ``asynchronous'' in Table 6 can typically also be
used for synchronous and loosely synchronous problems. As shown in
Figure
, the ``asynchronous'' software used on loosely
synchronous problems will probably provide greater flexibility, but
lower performance than software systems explicitly designed for this
problem class.
Figure: General Applicability of HPF, HPF+, HPC++ Classified by
Problem Architecture and type of Runtime Support needed
Figure: Mapping of Asynchronous, Loosely Synchronous, and Synchronous
Levels or Components of Machine, Software and Problem. Each is
pictured hierarchically with the asynchronous level at the top and
synchronous components at lowest level. Any one of the components may
be absent.
Loosely synchronous problems are in some sense the hardest as they have difficult irregularities which must be expressed with high efficiency by the underlying compiler and runtime systems. We, and others, have discussed this at length, both in general [Choudhary:92d;92e], [Fox:90p], [Goil:94a;95a],
, and in case of High Performance Fortran [Bogucz:94a], [Chapman:94b], [Cheng:94e], [Choudhary:92g;94c], [Fox:94g], [Hawick:95a;95c], [HPF:94a], [HPFapp:95a], [Joubert:95a], [Muller:95a], [Robinson:95a], [Sturler:95a].
Note that Figure refers to ``HPF+''---this is some
extension, called officially HPF2 (and later 3 perhaps) of HPF
[HPF:93a], [HPFF:95a] to fill gaps in the original language. The
current HPF1 handles most synchronous and embarrassingly applications,
but requires extension to handle the adaptive irregular data structures
typical of loosely synchronous problems.
We now quantify these remarks with three case studies, which will link
the material of Sections and
.