We design and implement algorithms on existing hardware; however, for industrial applications such as power systems network analysis, it is equally important to predict algorithm performance for future architectures. Performance predictions for future architectures will help determine whether or not it will be cost-effective to port critical software to parallel architectures now or to simply wait and get speedup in the future from faster single processor computers.
This analysis is a good case in point --- performance for the parallel block-diagonal-bordered sparse solvers developed here is rather good on the Thinking Machine CM-5 for moderate number of processors (2--32). For Choleski solver applications, the parallel block-diagonal-bordered Gauss-Seidel algorithm yields good speedups and offers substantial algorithmic speedup when compared with parallel block-diagonal-bordered direct solvers. However, the superb computation-to-communication ratio available on the CM-5 using low-latency active messages will probably not be equaled in future architectures where processor performance increases significantly.
While the bandwidth-dependent parallel sparse block-diagonal-bordered direct solvers may port to future architectures with equal or better performance, the latency-dependent parallel sparse block-diagonal-bordered Gauss-Seidel solvers may not. While future architectures will have greater bandwidth than the Thinking Machines CM-5, they will not have a comparable reduction in communications latency. Any algorithmic performance gains possible with the parallel Gauss-Seidel algorithm would not be realized on future architectures that do not have the computation-to-communication ratio available on the CM-5.