next up previous
Next: Iterative Methods Up: Conclusions Previous: Conclusions

Direct Methods

We have developed parallel block-diagonal-bordered sparse direct linear solver algorithms that have been optimized for the special irregular sparse matrices originating in the electrical power systems community. Available parallelism in the block-diagonal-bordered matrix structure has shown promise for simplified implementation and also provides a simple decomposition of the problem into clearly identifiable sub-problems. Parallel block-diagonal-bordered direct linear solvers require a three step preprocessing phase and the ordered matrix is reusable for many sparse linear solutions. The matrix is ordered into block-diagonal-bordered form, pseudo-factored to identify the location of all fillin and to obtain operations counts in the mutually independent diagonal blocks and corresponding portions of the borders, and load-balanced to distribute workload uniformly throughout all processors.

We developed an implementation that offered speedups on 32 processors of nearly ten for double precision LU factorization and even greater speedups for complex variate LU factorization. Speedups for parallel block-diagonal-bordered Choleski factorization were less than for LU factorization, and there are formidable problems implementing forward reduction due to last diagonal block data distributions. These parallel block-diagonal-bordered direct solvers address the most difficult power systems applications to implement on a multi-processor --- solutions to linear equations corresponding to only power system networks. Load-flow has the smallest matrices and the fewest calculations due to symmetry and lack of requirements for pivoting to ensure numerical stability. LU factorization of network equations for decoupled solutions of differential-algebraic equations has additional calculations, but often is solved without numerical pivoting. These parallel direct algorithms are very sensitive to communications overhead, and the capabilities of the particular parallel architecture. We have been able to quantify the effects of granularity on implementation performance and we have shown that by simply increasing the granularity by a factor of eight, parallel speedup of the algorithm improves significantly.



David P. Koester
Sun Oct 22 17:27:14 EDT 1995