next up previous
Next: Empirical Results for Up: Empirical Results for Previous: Algorithm Performance for

Conclusions

We have extensively analyzed the performance of parallel linear solvers for power systems applications on the Thinking Machines CM-5. We have shown that the node-tearing-based partitioning algorithm can yield matrices in block-diagonal-bordered form with balanced workloads; and we have shown that the performance of our parallel block-diagonal-bordered sparse direct linear solvers can yield good speedups for LU factorization. Power system matrices are so sparse that we were able to show that relative speedups for parallel Choleski factorization and complex-variate LU factorization can differ by factors of four for an eight-fold increase in the number of calculations. Matrix sparsity has an even greater effect on the triangular solution steps as it does on the factorization. Communications overhead when reducing or substituting in the last diagonal block is so great that there is no available speedup, so the performance of these algorithms becomes limited by Amdahl's law for both the Thinking Machines CM-5 architecture and the IBM SPPs.



David P. Koester
Sun Oct 22 17:27:14 EDT 1995