SERC proposal # GR/J89507
		-------------------------
	Efficient translation of High Performance Fortran
	-------------------------------------------------
		for distributed memory architectures
		------------------------------------

We understand that the above SERC proposal, which requested funding for
4 MY (i.e. 2 people for 2 years), has been awarded 2 MY funding, as
the requirement for 2 researchers was not adequately justified in the
case for support.

We would like to argue for this award to be increased to 3 MY.
Our justifications are as follows:

-- Currently one of the researchers named in the proposal,
Dr. D.B.Carpenter, is funded for 1 year (starting Jan 1 1994)
on a JISC grant to develop the run-time communcations library
associated with the HPF tranlsation system that is the basis
of this SERC project.  We therefore propose that he continues 
on this grant until the end of 1994, when we would like to transfer 
him to the SERC project.

We propose to employ the other named researcher, Dr. J.H. Merlin,
on the SERC project from 1 April this year.  In the first few months
he will concentrate on completing the development of the 
HPF-to-Fortran 90 translator, and thus work in parallel with 
Dr. Carpenter.  We expect to have an initial prototype Subset HPF tranlsation
system completed by the end of July 1994.  For the remainder of
1994 Dr. Carpenter will concentrate on bringing this prototype to a 
standard where it can be released to the research community.

This complete system (translator + run-time communications library)
will form the basis for exploring and implementing the HPF optimisations
that are the main subject of this SERC project.  

From the start of
1995 we would like to employ both Dr. Merlin and Dr. Carpenter on the
SERC project.
They have compelemtary areas of expertise: Dr. Carpenter in terms
of implementing run-time communications libraries, Dr. Merlin 
in terms of compiler construction, code analysis and optimisation
which is performed within a compiler, and HPF semantics and implementation.

The listed objectives of the SERC project divide naturally into 
these two realms of expertise, and we had planned them to be divided beteen
these two researchers as follows:

1. To design and develop a `layered' run-time library for 
implementing communications in HPF as efficiently as the available
static information allows.  (Dr. Carpenter)

2. To develop methods to determine the optimal placement of
computations in HPF.  (Dr. Merlin)

3. To develop methods to determine the optimal organisation
of communications in HPF.  (Drs. Merlin and Carpenter).

4. To implement the results of this research in a Subset HPF 
compiler.  (Translator -- Dr. Merlin; run-time library -- Dr. Carpenter)

Outline workplan
----------------
In outline, our workplan for these items is as follows:

April 94 - end 94 (Dr. Merlin only)
-----------------------------------
April 94 - July 94:  complete prototype translator

Aug 94 - Dec 94: investigating computation placement -- report Dec 94.

(Particularly in the context of the important FORALL statement, 
alignment of temporaries is an important area of research that has to 
be done before the optimised implementations can be performed in year 2).

1995 (Drs. Carpenter and Merlin)
--------------------------------

There are two threads

1) Optimising transformations, specific to the HPF model. (egs,
   overlap areas?)

2) Optimisating calls to the run-time (communication) library.

1 is largely ``platform-independent''.  2 will incorporate platform
dependencies as described below.

[Words about thread 1]

Words about thread 2
--------------------

The prototype implementation will involve calls to a high-level
``communication'' library---in fact a library directly supporting the
HPF model of distributed data.

Incorporation of so much functionality in the run-time library is not
conducive to highly efficient execution.  On the other hand, the
complexity of the operations involved is such that ``hardwiring'' these
run-time functions into the basic translation scheme is likely to lead
to an unmanageably complex compiler---consider also that the detailed
implementation of these functions will differ from platform to
platform.

As an example, the top-level run-time library will incorporate
operations to perform general remapping (in the HPF sense) of
distributed arrays.  In common cases this remapping might be
expressible as a simple data shift in a processor dimension, and such
an operation may be especially efficiently implemented on some
platform.  The top-layer run-time library for that platform may check
for this special case then dispatch a lower level shift operation.  But
in many cases the compiler could have checked statically for this case,
and invoked the shift operation directly, had it been aware of the
platform-specific details.  Similar observations apply at successive
layers of the run-time system, in principle down to the lowest layers
of message-passing.

We propose to investigate a scheme where layered run-time libraries are
described systematically in some ``database'', together with conditions
(predicates on the values of the actual parameters of a call) under
which simpler code, directly invoking lower layers of the library, may
be ``in-lined''.  By reading this data-base the translator can generate
efficient in-line code without a priori knowledge of the platform.

Workplan, year 2, ``thread 2''
------------------------------

1) Tidy up prototype run-time libraries, with a view to exploiting
``in-lining''.  A more abstract design, with less direct concern for
petty optimisation, is likely to be favoured.  2 MMs.

2) Data-base design.  What kind of predicates on actual parameters can
be effectively exploited (by reference to the output of 1)?
Representation of these conditions, and the selected code.  4 MMs.

3) Modify prototype translator to exploit this data-base, unpeeling the
layers of the run-time operations to generate efficient low-level
code.  3MMs.

4) Synthesis of these optimisations with those investigated in thread
1.  3MMs.