To understand the Fortran 90 code it is informative to take a look at the Fortran 77 code.
The structure of the Fortran 77 code is as follows :
As can be seen from the program structure the major computational bottleneck is caused by the routines GMIDDLE and INTERPOLATE, as these routines are called nt * (nx - 2) times. Parallelizing these routines yields a major performance gain.
The GMIDDLE routine in Fortran 77 basically consists of three nested do loops. Fortunately the loops do not have any data dependences between successive iterations. In the Fortran 90 code , we have left the outermost loop as in the original Fortran 77 code The inner two loops have been eliminated by using Forall statements, thus exploiting parallelism.
To see the Fortran 90 version of GMIDDLE click here.
The INTERPOLATE routine provides a great challenge for parallelization. Closely related to the routine is the array "mask". The structure of the array is as follows :
As shown in the above diagram the array "mask" is non-zero only for the outer boundary.
This of great significance in trying to understand the operation of the routine INTERPOLATE. The routine computes only for those values of k and l for which mask(k,l).ne.0. Thus, the loops found in the Fortran 77 version of the routines are from -(nd+1) to (nd+1), actual computation is being done only at the boundary points.
The different values of mask are used to indicate the regions where different techniques of interpolation have to be used. Thus, for example, the technique used for all the points where mask(k,l)=11 is the same and it differs from the technique used for the points having a different value of mask.
In the Fortran 90 version of the code, this routine has been completely restructured. Firstly, we do not use the mask array at all. In stead, we have found the values of k and l for which the "mask" is 11, 12 etc. Using these values, we then perform interpolation only for those values of k and l for which "mask" was non-zero in the Fortran 77 version. Further, we have grouped all the operations for the same mask value together. Now we are in a position to use FORALL sentences in these groups to exploit parallelism.
For example, we know that
mask(k,l)=11 for k=(nd+1),j=-(nd+1)/2 to (nd+1)/2
and k=-(nd+1),j=-(nd+1)/2 to (nd+1)/2
Using this information all the operations performed for mask(k,l)=11 in the Fortran 77 version are now explicitly programmed for the values of k and l given above.
Thus, if the Fortran 77 version had the following operation:
The corresponding Fortran 90 version would look as follows:
To see the Fortran 90 version of INTERPOLATE click here.
We have also optimized out the routine COPY. This routine just copies the values from the array gn to the array go. The routine has been replaced by the single statement go=gn.
All the other routines have been left as in the original Fortran 77 code. This has been done despite there is some parallelization possible, because it has been contemplated that the overhead would exceed the performance gain achievable.
To view the entire Fortran 90 code click here.