Using the compiler directive
!HPF$ DISTRIBUTE augmented (CYCLIC,*)I got the following timing:
merlin4:~/CPS615.dir/hw4> source SETENV merlin4:~/CPS615.dir/hw4> pghpf -O gauss.hpf -o gauss ** gauss === End of Compilation 1 === ** gauss_elim === End of Compilation 2 === 1501-510 Compilation successful for file pghpfE8bcVFXAAB.f. Linking: merlin4:~/CPS615.dir/hw4> gauss -pghpf -host merlin1,merlin2,merlin3,merlin4 -np 4 -stat cpus cpu real user sys ratio node 0* 133.19 17.02 13.34 23% merlin4 1 125.57 17.79 13.81 25% merlin1 2 121.51 17.53 13.55 26% merlin2 3 121.51 17.72 14.34 26% merlin3 min 121.51 17.02 13.34 avg 125.45 17.52 13.76 max 133.19 17.79 14.34 total 133.19 70.06 55.04 0.94x merlin4:~/CPS615.dir/hw4>
We see that, although the same amount of computation is taking place for both BLOCK and CYCLIC distributions (approximately the same user and sys times), the real time is less for the CYCLIC distribution. This is expected, as the Gaussian elimination algorithm is better adapted to the CYCLIC distribution (there is less communications). However, because NPAC's SP2 is a multiple user machine, we cannot say for sure how much this speed up is attributable to the change in distribution.