SERC proposal # GR/J89507 ------------------------- Efficient translation of High Performance Fortran ------------------------------------------------- for distributed memory architectures ------------------------------------ We understand that the above SERC proposal, which requested funding for 4 MY (i.e. 2 people for 2 years), has been awarded 2 MY funding, as the requirement for 2 researchers was not adequately justified in the case for support. We would like to argue for this award to be increased to 3 MY. Our justifications are as follows: -- Currently one of the researchers named in the proposal, Dr. D.B.Carpenter, is funded for 1 year (starting Jan 1 1994) on a JISC grant to develop the run-time communcations library associated with the HPF tranlsation system that is the basis of this SERC project. We therefore propose that he continues on this grant until the end of 1994, when we would like to transfer him to the SERC project. We propose to employ the other named researcher, Dr. J.H. Merlin, on the SERC project from 1 April this year. In the first few months he will concentrate on completing the development of the HPF-to-Fortran 90 translator, and thus work in parallel with Dr. Carpenter. We expect to have an initial prototype Subset HPF tranlsation system completed by the end of July 1994. For the remainder of 1994 Dr. Carpenter will concentrate on bringing this prototype to a standard where it can be released to the research community. This complete system (translator + run-time communications library) will form the basis for exploring and implementing the HPF optimisations that are the main subject of this SERC project. From the start of 1995 we would like to employ both Dr. Merlin and Dr. Carpenter on the SERC project. They have compelemtary areas of expertise: Dr. Carpenter in terms of implementing run-time communications libraries, Dr. Merlin in terms of compiler construction, code analysis and optimisation which is performed within a compiler, and HPF semantics and implementation. The listed objectives of the SERC project divide naturally into these two realms of expertise, and we had planned them to be divided beteen these two researchers as follows: 1. To design and develop a `layered' run-time library for implementing communications in HPF as efficiently as the available static information allows. (Dr. Carpenter) 2. To develop methods to determine the optimal placement of computations in HPF. (Dr. Merlin) 3. To develop methods to determine the optimal organisation of communications in HPF. (Drs. Merlin and Carpenter). 4. To implement the results of this research in a Subset HPF compiler. (Translator -- Dr. Merlin; run-time library -- Dr. Carpenter) Outline workplan ---------------- In outline, our workplan for these items is as follows: April 94 - end 94 (Dr. Merlin only) ----------------------------------- April 94 - July 94: complete prototype translator Aug 94 - Dec 94: investigating computation placement -- report Dec 94. (Particularly in the context of the important FORALL statement, alignment of temporaries is an important area of research that has to be done before the optimised implementations can be performed in year 2). 1995 (Drs. Carpenter and Merlin) -------------------------------- There are two threads 1) Optimising transformations, specific to the HPF model. (egs, overlap areas?) 2) Optimisating calls to the run-time (communication) library. 1 is largely ``platform-independent''. 2 will incorporate platform dependencies as described below. [Words about thread 1] Words about thread 2 -------------------- The prototype implementation will involve calls to a high-level ``communication'' library---in fact a library directly supporting the HPF model of distributed data. Incorporation of so much functionality in the run-time library is not conducive to highly efficient execution. On the other hand, the complexity of the operations involved is such that ``hardwiring'' these run-time functions into the basic translation scheme is likely to lead to an unmanageably complex compiler---consider also that the detailed implementation of these functions will differ from platform to platform. As an example, the top-level run-time library will incorporate operations to perform general remapping (in the HPF sense) of distributed arrays. In common cases this remapping might be expressible as a simple data shift in a processor dimension, and such an operation may be especially efficiently implemented on some platform. The top-layer run-time library for that platform may check for this special case then dispatch a lower level shift operation. But in many cases the compiler could have checked statically for this case, and invoked the shift operation directly, had it been aware of the platform-specific details. Similar observations apply at successive layers of the run-time system, in principle down to the lowest layers of message-passing. We propose to investigate a scheme where layered run-time libraries are described systematically in some ``database'', together with conditions (predicates on the values of the actual parameters of a call) under which simpler code, directly invoking lower layers of the library, may be ``in-lined''. By reading this data-base the translator can generate efficient in-line code without a priori knowledge of the platform. Workplan, year 2, ``thread 2'' ------------------------------ 1) Tidy up prototype run-time libraries, with a view to exploiting ``in-lining''. A more abstract design, with less direct concern for petty optimisation, is likely to be favoured. 2 MMs. 2) Data-base design. What kind of predicates on actual parameters can be effectively exploited (by reference to the output of 1)? Representation of these conditions, and the selected code. 4 MMs. 3) Modify prototype translator to exploit this data-base, unpeeling the layers of the run-time operations to generate efficient low-level code. 3MMs. 4) Synthesis of these optimisations with those investigated in thread 1. 3MMs.