Block-diagonal-bordered sparse matrix algorithms require modifications to the normal preprocessing phase described in numerous papers on parallel Choleski factorization [19,20,21,29,55,56,57,58,64]. Each of the numerous papers referenced above use the paradigm to order the sparse matrix and then perform symbolic factorization in order to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. We modify this commonly used sparse matrix preprocessing phase to include an explicit load balancing step coupled to the ordering step so that the workload is uniformly distributed throughout a distributed-memory multi-processor and parallel algorithms make efficient use of the computational resources.
Parallel block-diagonal-bordered sparse direct linear solvers offer the potential for regularity often absent from other parallel sparse solvers [34,35,36,38]. Our research into specialized matrix ordering techniques has shown that it is possible to order actual power system matrices readily into block-diagonal-bordered form, and to load-balance sufficiently effectively that relative speedups greater than ten have been observed in empirical performance measurements for direct solvers on a 32 processor Thinking Machines CM-5.