Distributed memory computers are characterized by their scalable architectures. These distributed-memory systems are expandable and can achieve a proportional performance increase without changing the basic architecture. In order to take full advantage of scalable hardware, the software must also be scalable to exploit the increased computing capability. This section presents benchmark results to illustrate that Fortran 90D/HPF generates scalable codes to exploit the scalable distributed memory machine.
All of these benchmarks were run on a 15-processor Intel Paragon. The processors run at 50 MHz, and each node has 32 MBytes of physical memory. The programs were compiled using the Fortran 90D/HPF compiler with all optimization turned on, including the i860 vectorizer which exploits single node parallelism.
The shallow water (shallow) benchmark is a small program (300 lines)
abstracting a 2-dimensional flow system.
The data is distributed in block fashion in one dimension, (*, block).
The generated code consists of many computations
of order with communication mostly consisting of overlap-shifts
of order N. Figure
shows the performance of shallow.
The super-linear speed-up on the large data set dramatically exhibits
the ability of the Fortran 90D/HPF compiler to make large
problems more tractable simply through efficient use of the larger available
core memory on a multi-processor system.
The partial differential equation benchmark (pde1) is
a small program (360 lines) from the Genesis Parallel Benchmark Suite
that implements a 3D Poisson Solver using red-black relaxation through
five iterations. Figure shows the performance of pde1.
The data is distributed block fashion in one dimension, (*,*,block).
Good scalability is exhibited. The communication mostly consists of
overlap-shifts due to the stencil computations of pde1.
The hydflo benchmark is a small hydrodynamics program (2000 lines).
Figure shows the performance of hydflo.
The data is distributed block fashion in one dimension, (*,*,block).
Good scalability is exhibited. The communication mostly consists of
copy-section and collective-communication.
As shown by the data, benchmark programs written in Fortran 90D/HPF can achieve reasonable efficiency given a problem of reasonable size. The figures show reasonably good scalability when increasing numbers of processors are used.