Block-diagonal-bordered sparse matrix algorithms require modifications to the normal preprocessing phase described in numerous papers on parallel Choleski factorization [14,15,16,20,37,38,39,40,41,45]. Each of the numerous papers referenced above use the paradigm to order the sparse matrix and then perform symbolic factorization in order to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. We modify this commonly used sparse matrix preprocessing phase to include an explicit load balancing step coupled to the ordering step so that the workload is uniformly distributed throughout a distributed-memory multi-processor and parallel algorithms make efficient use of the computational resources.
Parallel block-diagonal-bordered sparse linear solvers offer the potential for regularity often absent from other parallel sparse solvers [23,24,25,27]. Our research into specialized matrix ordering techniques has shown that it is possible to order actual power system matrices readily into block-diagonal-bordered form, and load-balancing is sufficiently effective that relative speedups greater than ten have been observed in empirical performance measurements for 32 processors on a Thinking Machines CM-5 multi-processor.
In addition to the promising speedup encountered for only the parallel direct linear solver, other dimensions exist in electrical power system applications that can be exploited to efficiently make use of multi-processors with greater than 32 processors. We believe that this research also has utility for other irregular sparse matrix applications where the data is hierarchical, very sparse, and irregular. Other sources of hierarchical matrices exist, for example, electrical circuits, that have the potential for larger numbers of equations than power system matrices.