From haupt@nova.npac.syr.edu Mon May 2 22:24:54 1994 Date: Mon, 2 May 94 21:57:30 EDT From: Tomasz Haupt To: paulc@nova.npac.syr.edu Cc: haupt@nova.npac.syr.edu, gcf@nova.npac.syr.edu Subject: HPF data mapping Paul, Probably this is too long. Take what you fill will best suite parper you are writting. Tom ------------------------------------------------------------------------- Data Mapping HPF data alignment and distribution directives allow the programmer to advise the compiler how to assign data object (typically array elements) to processors' memories. The model is that there is a two-level mapping of data objects to memory regions, referred to as "abstract processors": arrays are first aligned relative to one another, and then this group of arrays is distributed onto a user defined, rectilinear arrangement of abstract processors. The final mapping, abstract to physical processors is not specified by HPF and it is language-processor dependent. The alignment itself is logically accomplished in two steps. First, the index space spanned by an array that serves as an align target defines a natural template of the array. Then, an alignee is associated with this template. In addition, HPF allows users to declare a template explicitly; this is particular convenient when aligning arrays of different size and/or different shape. It is the template (either a natural or explicit one) that is distributed onto abstract processors. This means, that all arrays' elements aligned with an element of the template are mapped to the same processor. This way locality of data is forced. Arrays and other data object that are not explicitly distributed using the compiler directives are mapped according to an implementation dependent default distribution. One possible choice of the default distribution is replication: each processor is given its own copy of the data. The data mapping can be declared using declarative directives: PROCESSORS, ALIGN, DISTRIBUTE, and, optionally, TEMPLATE. In addition, arrays may be remapped during the runtime. To this end, array must be declared using DYNAMIC directive, and the actual remapping is triggered by executable directives REALIGN and REDISTRIBUTE. It is important to notice that the template is not a fist-class Fortran 90 object, in the sense that it cannot be passed to a subprogram as an argument. As a consequence, a distributed array passed to a subprogram is aligned either to the natural template of the actual argument or it is aligned to the user defined template. In both cases it may lead to a runtime, implicit remapping of the array. To allow more efficient implementations, in particular when the mapping of the actual argument is known at the compile time, HPF provides a directive INHERIT that specifies that a dummy argument should be aligned to a copy of the template of the corresponding actual argument in the same way the actual argument is aligned. In addition, user may use a special syntax of the ALIGN and DISTRIBUTE directives (with stars preceding the align and/or distribute attributes) that serve as assertion rather than declaration of the mapping of the dummy argument. In HPF, arrays may be aligned one with another in many ways. The repertoire includes shifts, strides, or any other linear combination of a subscript (i.e., n*i + m), transposition of indices, and collapse or replication of array's dimensions. Skewed or irregular alignments are, however, not allowed. The template may be distributed in BLOCK, CYCLIC, BLOCK(n), and CYCLIC(n) fashion. In addition, any dimension of the template may be collapsed or replicated onto a processor grid (note, that it does not change the relative alignment of the arrays!). The BLOCK distribution specifies that the template should be distributed across set of abstract processors by slicing it uniformly into blocks of contiguous elements. The BLOCK(n) distribution specifies that groups of exactly n elements should be mapped to successive abstract processors, and there must be at least (array size)/n abstract processors if the directive is to be satisfied. The CYCLIC(n) distribution specifies that successive array elements' blocks of size n are to be dealt out to successive abstract processors in round-robin fashion. Finally, CYCLIC distribution is equivalent to the CYCLIC(1) distribution.