In general, solving load-flow matrices from real-power systems has proven to be a very difficult challenge for parallel sparse Choleski solvers. As we have illustrated, the computational complexity of the actual data matrices is sufficiently low, even for matrix factorization, that the speedup performance of the parallel block-diagonal-bordered Choleski solver is limited by computational starvation and in some instances, load imbalance.
In this paper we present research into parallel block-diagonal-bordered sparse Choleski factorization algorithms developed with special considerations to irregular sparse matrices originating in the electrical power systems community. Available parallelism in the block-diagonal-bordered matrix structure offers promise for simplified implementation and also offers a simple decomposition of the problem into clearly identifiable subproblems. Parallel block-diagonal-bordered Choleski solvers require a three step preprocessing phase that is reusable for static matrices. The matrix is ordered into block-diagonal-bordered form, pseudo-factored to identify the location of all fillin and obtain operations counts in the mutually independent diagonal blocks and corresponding portions of the borders, and the load-balanced to uniformly distribute operations (when possible).
We developed an implementation that offered efficiencies of 60% for Choleski factorization with four processors, although the implementation was not efficient beyond four processors with any of the power system load-flow matrices examined. Further examinations into techniques to better solve the last diagonal block could significant improve performance. While a dense Choleski solver was used in this implementation, due to the available parallelism in block-diagonal-bordered form matrices, any technique can be used to solve this sub-matrix, including iterative techniques.
The parallel block-diagonal-bordered Choleski algorithm, presented in this paper, addresses one of the most difficult power systems applications to implement on a multi-processor. Load-flow has the smallest matrices and the fewest calculations due to symmetry and lack of requirements for pivoting to ensure numerical stability. In the near future, we will investigate applying block-diagonal-bordered LU factorization to transient stability analysis simulations. These matrices have substantially more available parallelism due to the increased number of the diagonal blocks corresponding to the addition of the generator equations to the block-diagonal-bordered network equations.