The Fortran 90D/HPF compiler relies on a powerful run-time support system. The run-time support system consists of functions which can be called from the node programs of a distributed memory machine. Runtime system efficiently support the address translations and data movements that occur when mapping a shared address space program onto a multiple processor architecture.
Intrinsic functions support many of the basic data parallel operations
in Fortran 90.
The intrinsics not only provide a concise means of expressing
operations on arrays,
but also identify parallel computation patterns that may be difficult to detect automatically. Fortran 90 provides intrinsic functions for operations
such as shift, reduction, transpose, and matrix multiplication.
The intrinsic functions that may induce communication can be
divided into five categories as shown in Table .
The first category requires data to be transferred using the low overhead structured shift communications operations. The second category of intrinsic functions require computations based on local data followed by use of a reduction tree on the processors involved in the execution of the intrinsic function. The third category uses multiple broadcast trees to spread data. The fourth category is implemented using unstructured communication patterns. The fifth category is implemented using existing research on parallel matrix algorithms [49]. Some of the intrinsic functions can be further optimized for the underlying hardware architecture. Fortran 90D/HPF compiler includes more than 500 parallel run-time support routines. Fortran 90D/HPF run-time is written with Fortran 77 language. If the run-time is implemented with C language, the number of run-time routines may reduce drastically. The run-time implementation details can be found in [51].
Arrays may be redistributed across subroutine boundaries. A dummy argument which is distributed differently than its actual argument in the calling routine is automatically redistributed upon entry to the subroutine, and is automatically redistributed back to its original distribution at subroutine exit. These operations are performed by the redistribution primitives which transform from block to cyclic or vice versa.
When a distributed array is passed as an argument to some of the run-time support primitives, it is also necessary to provide information such as its size, its distribution among the nodes of the distributed memory machine, and other information. All this information is stored into a structure which is called the distributed array descriptor (DAD) [51]. DADs pass compile-time information to the run-time system and information between run-time primitives. The run-time primitives query alignment and distribution information from a DAD and act upon that information.
The basic layer of a run-time system should be a portable message passing system like Express [57], PVM [58], MPI [59] or PARMACS [60]. Only this approach guarantees the portability of the HPF compiler accross many different platforms. PARMACS is based on the host-node style programming which is not god for Fortran 90D/HPF compilation. PVM are avaliable for nearly all machines, but functinality and efficency is rather low. MPI support many different communication modes, especially blocking and non-blocking communication.
Our run-time library uses the Express parallel programming environment [57] as a message passing communication primitives. The Express parallel programming environment [57] guarantees a level of portability on various platforms including, Intel iPSC/860, nCUBE/2, networks of workstations etc. We choose the Express because it was the avaliable one at the time of Fortran 90D/HPF implementation.
In summary, parallel intrinsic functions, communication routines, dynamic data redistribution primitives and other primitives and routines are part of the run-time support system.