So far we have presented techniques used in our compiler that map data onto logical processors. In this section we describe the mapping of logical processors onto physical processors.
There are several advantages of decoupling logical processors from physical system configurations. These advantages include locality, portability and grouping.
Locality: Multiple accesses to consecutive memory locations is
called spatial locality. Spatial locality is very important for
Distributed Memory Machines. Arrays representing
spatial locations are distributed across the parallel computer.
For instance, it makes sense to have data distributed in such a way
that processors that need to communicate frequently are neighbors in
the hardware topology. It has been shown that this is extremely important in
the common regular problems in scientific applications such as
relaxation [49]. Our template is a d-dimensional mesh. If this template
is BLOCK distributed on a d-dimension grid of processors,
the neighboring array elements (spatial locality) will be in the neighboring processors.
The grid topology is a very good topology for spatial locality.
Fortran 90D/HPF makes the logical processor topology grid according to
the number of dimensions of the DECOMPOSITION as shown in Figure .
Portability: The physical topology of a hardware system may be a grid, a tree, a hypercube or some other layout. The mapping for the best (possible) grid topology changes from one physical topology to another. To enhance portability, we separate the physical and logical topologies. Therefore, porting the compiler from one hardware platform to another involves changing the functions that map the logical grid topology to the target hardware.
Grouping: Operations on a subset of dimensions in arrays are very common in scientific programming, e.g., row and column operations on matrices. Fortran 90D/HPF provides intrinsic functions such as SPREAD, SUM, MAXVAL and CSHIFT that let a user specify operations along different dimensions by specifying the DIM dimension parameter. These dimensional operations conceptually group elements in the same dimension. The dimensional array operations result in ``dimensional array communications''. We have designed a set of collective communication routines that operate along one or more dimensions (groups of processors) of the grid. For example, we have developed spread (broadcast along dimension), shift along dimensions and concatenate communications. these primitives are discussed in Chapter 5.
The performance of the resulting code may be adversely affected if the logical grid to physical system mapping is not efficient. Therefore, one of the goals of these mapping functions is to map nearby processors in the logical grid to physically close processors in the machine architecture.
Definition 2: A logical processor grid consists of d dimensions,
(), where
,
is the size of the
dimension.
A processor grid mapping function,
,
maps a processor index in the d-dimensional space,
where
(i.e.,
is the index of the logical processor in the
dimension), and
p is the physical processor number, (
).
The inverse mapping function
transform the processor number p back into logical grid number.
For example, the grid mapping function and
for hypercube using Gray Code can be found in [49] and the grid mapping onto a fat tree can be
found in [52].
Figure gives some of the grid mapping functions implemented in
the Fortran 90D/HPF compiler.
The first routine, gridinit, takes the dimensionality of the grid,
,
and the number of physical processors in each dimension as an array,
and
performs the necessary initializations in order to use the other two grid mapping functions
and
.
The routine gridcoord implements the function
to generate the physical processor number corresponding to the logical processor grid specified in the parameter array ``coord(*)''.
Similarly,
the routine gridproc implements the function
.
Its input parameter ``proc'' specifies the physical processor id
and its output is the corresponding index in the logical grid which
is stored in the array ``coord(*)''.
The details of these functions can be found in[49].
The goal of these functions is to enhance portability. The compiler generates all the communication calls based on the logical coordinates of the processors. The communication routines in turn use the above functions to compute the physical processor ids of involved processors. Another important point to note is that by using the logical grid at the compiler level, masking and grouping are performed using logical grid coordinates.