Low Level HPF Compiler Benchmarks ------------------------------------------------------------------------- Overview --------- The benchmark suite comprises several simple, synthetic applications which test several aspects of HPF compilation. The current version of the suite addresses the basic features of HPF, and it is designed to measure performance of early implementations of the compiler. They concentrate on testing parallel implementation of explicitly parallel statements, i.e., array assignments, FORALL statements, INDEPENDENT DO loops, and intrinsic functions with different mapping directives. In addition, the low level compiler benchmarks address problem of passing distributed arrays as arguments to subprograms. The language features not included in the HPF subset are not addressed in this release of the suite. The next releases will contain more kernels that will address all features of HPF, and also they will be sensitive to advanced compiler transformations. The codes included in this suite are either adopted from existing benchmark suites, NAS suite \cite{NAS}, Livermore Loops \cite{Liv}, and the Purdue Set \cite{Rice}, or are developed at Syracuse University. FORALL statement - kernel FL ---------------------------------------- FORALL statement provides a convenient syntax for simultaneous assignments to large groups of array elements. Such assignments lie at the heart of the data parallel computations that HPF is designed to express. The idea behind introducing FORALL in HPF is to generalize Fortran 90 array assignments to make expressing parallelism easier. Kernel FL provides several examples of FORALL statements that are difficult or inconvenient to write using Fortran 90 syntax. Explicit template - kernel TL ---------------------------------------- Parallel implementation of the array assignments, including FORALL statements, is a central issue for an early HPF compiler. Given a data distribution, the compiler distributes computation over available processors. An efficient compiler achieves an optimal load balance with minimum interprocessor communication. Sometimes, the programmers may help the compiler to minimize interprocessor communication by suitable data mapping, in particular by defining a relative alignment of different data objects. This may be achieved by aligning the data objects with an explicitly declared template. Kernel TL provides an example of this kind. Communication detection in array assignments - kernels AA, SH, ST, and IR ------------------------------------------------------------------------- Once the data and iteration space is distributed, the next step that strongly influences efficiency of the resulting codes is communication detection and code generation to execute data movement. In general, the off-processor data elements must be gathered before execution of an array assignment, and the results are to be scattered to destination processors after the assignment is completed. In other words, some of the array assignments may require a preprocessing phase to determine which off-processor data elements are needed and execute the gather operation. Similarly, they may require postprocessing (scatter). Many different techniques may be used to optimize these operations. To achieve high efficiency, it may be very important that the compiler is able to recognize structured communication patterns, like shift, multicast, etc. Kernels AA, SH, and ST introduce different structured communication patterns, and kernel IR is an example of an array assignment that requires unstructured communication (because of indirections). Non-elemental intrinsic functions - kernel RD ----------------------------------------------- Fortran 90 intrinsics and HPF functions offer yet another way to express parallelism. Kernel RD tests implementation of several reduction functions. Passing distributed arrays as subprogram arguments - kernels AS, IT, IM The last group of kernels, demonstrate passing distributed arrays as subprogram arguments. They represents three typical cases: - a known mapping of the actual argument is to be preserved by the dummy argument (AS). - mapping of the dummy argument is to be inherited from the actual argument, thus no remapping is necessary. The mapping is known at compile time (IT). - mapping of the dummy argument is to be identical to that of the actual argument, but the mapping is not known at compile time (IM). Summary ----------------------------- The synthetic compiler benchmark suite described here is an addition to the PARKBENCH benchmark kernels and applications. It is not meant as a tool to evaluate the overall performance of the compiler generated codes. It has been introduced as an aid for compiler developers and implementators to address some selected aspect of the HPF compilation process. In the current version, the suite does not comprise a comprehensive sample of HPF codes. Actually, it addresses only the HPF subset. Hopefully, this way, we will contribute to the establishment of a systematic compiler benchmarking methodology. We intend to continue our effort to develop a complete, fully representative HPF benchmark suite.