Next: Communication Model Up: Distribution Model Previous: Usage of the

Grid Mapping Functions (Stage 3)

So far we have presented techniques used in our compiler that map data onto logical processors. In this section we describe the mapping of logical processors onto physical processors.

There are several advantages of decoupling logical processors from physical system configurations. These advantages include locality, portability and grouping.

Locality: Multiple accesses to consecutive memory locations is called spatial locality. Spatial locality is very important for Distributed Memory Machines. Arrays representing spatial locations are distributed across the parallel computer. For instance, it makes sense to have data distributed in such a way that processors that need to communicate frequently are neighbors in the hardware topology. It has been shown that this is extremely important in the common regular problems in scientific applications such as relaxation [49]. Our template is a d-dimensional mesh. If this template is BLOCK distributed on a d-dimension grid of processors, the neighboring array elements (spatial locality) will be in the neighboring processors. The grid topology is a very good topology for spatial locality. Fortran 90D/HPF makes the logical processor topology grid according to the number of dimensions of the DECOMPOSITION as shown in Figure .

Portability: The physical topology of a hardware system may be a grid, a tree, a hypercube or some other layout. The mapping for the best (possible) grid topology changes from one physical topology to another. To enhance portability, we separate the physical and logical topologies. Therefore, porting the compiler from one hardware platform to another involves changing the functions that map the logical grid topology to the target hardware.

Grouping: Operations on a subset of dimensions in arrays are very common in scientific programming, e.g., row and column operations on matrices. Fortran 90D/HPF provides intrinsic functions such as SPREAD, SUM, MAXVAL and CSHIFT that let a user specify operations along different dimensions by specifying the DIM dimension parameter. These dimensional operations conceptually group elements in the same dimension. The dimensional array operations result in ``dimensional array communications''. We have designed a set of collective communication routines that operate along one or more dimensions (groups of processors) of the grid. For example, we have developed spread (broadcast along dimension), shift along dimensions and concatenate communications. these primitives are discussed in Chapter 5.

The performance of the resulting code may be adversely affected if the logical grid to physical system mapping is not efficient. Therefore, one of the goals of these mapping functions is to map nearby processors in the logical grid to physically close processors in the machine architecture.

Definition 2: A logical processor grid consists of d dimensions, (), where , is the size of the dimension. A processor grid mapping function, , maps a processor index in the d-dimensional space, where (i.e., is the index of the logical processor in the dimension), and p is the physical processor number, (). The inverse mapping function transform the processor number p back into logical grid number.

For example, the grid mapping function and for hypercube using Gray Code can be found in [49] and the grid mapping onto a fat tree can be found in [52].

Figure gives some of the grid mapping functions implemented in the Fortran 90D/HPF compiler. The first routine, gridinit, takes the dimensionality of the grid, , and the number of physical processors in each dimension as an array, and performs the necessary initializations in order to use the other two grid mapping functions and . The routine gridcoord implements the function to generate the physical processor number corresponding to the logical processor grid specified in the parameter array ``coord(*)''. Similarly, the routine gridproc implements the function . Its input parameter ``proc'' specifies the physical processor id and its output is the corresponding index in the logical grid which is stored in the array ``coord(*)''. The details of these functions can be found in[49].

The goal of these functions is to enhance portability. The compiler generates all the communication calls based on the logical coordinates of the processors. The communication routines in turn use the above functions to compute the physical processor ids of involved processors. Another important point to note is that by using the logical grid at the compiler level, masking and grouping are performed using logical grid coordinates.



Next: Communication Model Up: Distribution Model Previous: Usage of the


zbozkus@
Thu Jul 6 21:09:19 EDT 1995