We design and implement algorithms on existing hardware; however, for industrial applications such as power systems network analysis, it is equally important to predict algorithm performance for future architectures. Performance predictions for future architectures will help determine whether or not it will be cost-effective to port critical software to parallel architectures now or to simply wait and get speedup in the future from faster single processor computers.
This analysis is a good case in point --- performance for the parallel block-diagonal-bordered sparse solvers developed here is rather good on the Thinking Machine CM-5 for moderate number of processors (2--32). For Choleski solver applications, the parallel block-diagonal-bordered Gauss-Seidel algorithm yields good speedups and offers substantial algorithmic speedup when compared with parallel block-diagonal-bordered direct solvers. However, in this section we show that the superb computation-to-communication ratio available on the CM-5 using low-latency active messages will probably not be equaled in future architectures where processor performance increases significantly. Performance of our parallel Gauss-Seidel algorithm is latency dependent, due to the large number of small messages. Meanwhile, performance of our parallel direct algorithm is bandwidth dependent, due to the limited number of moderate size messages.
We show in this chapter that while the bandwidth-dependent parallel sparse block-diagonal-bordered direct solvers may port to future architectures with equal or better performance, the latency-dependent parallel sparse block-diagonal-bordered Gauss-Seidel solvers may not. While future architectures will have greater bandwidth than the Thinking Machines CM-5, they will not have a comparable reduction in communications latency. Any algorithmic performance gains possible with the parallel Gauss-Seidel algorithm would not be realized on future architectures that do not have the computation-to-communication ratio available on the CM-5.
We open this chapter by discussing future computing architectures and
the requirements of the power utility industry in
section . We introduce overhead-based
performance estimates, in section
, that we
developed to predict algorithm performance on future high-performance
computing architectures. We apply these estimation techniques to both
sparse parallel block-diagonal-bordered direct and iterative solvers
developed in this research in sections
and
. Due to the poor performance of the
parallel iterative solver on future SPP architectures, we include
comments on improving the latency performance of SPP communications in
section
, and in section
, we
reiterate the significant conclusions for porting our parallel linear
solvers to future SPP architectures.