Research is in progress to examine efficient parallel algorithms for the direct solution of sparse systems of equations
where the ordered sparse matrix is in
block-diagonal-bordered form. Direct block-diagonal-bordered sparse
linear solvers exhibit distinct advantages when compared to current
general parallel direct sparse linear solvers. All processors can be
busy all of the time. Task assignments for numerical factorization on
distributed-memory multi-processors depend only on the assignment of
mutually independent diagonal blocks to processors and the processor
assignments of data in the last diagonal block. In addition, data
communications are significantly reduced and those remaining
communications are uniform and structured. Figure 1
illustrates the factorization steps for a block-diagonal-bordered
sparse matrix with four mutually independent sub-matrices. The
mapping of data for the independent sub-matrices to the four
processors is included in this figure. Each step involves highly
parallel operations on data in the mutually independent diagonal
blocks, operations on data in the last block, and operations that
couple the data in the mutually independent blocks to data in the last
block. This step involves data communications for distributed-memory
multi-processor algorithms.
Figure 1: Block Diagonal Bordered Sparse Matrix Factorization for 4 Processors, P1 - P4
When A is a sparse symmetric positive definite linear system, then a
specialized form of LU factorization, Choleski factorization, can be
used to determine L such that . All features of
parallel block-diagonal-bordered sparse linear solvers are applicable
to parallel Choleski factorization, with the only modification to the
algorithm requiring limiting calculations to the lower triangular portion
of the symmetric matrix and calculating only L, instead of both L
and U.
Parallel block-diagonal-bordered sparse linear solvers require modification to the traditional sparse matrix preprocessing phase for parallel Choleski factorization of ordering the matrix to minimize the number of calculations and symbolic factorization to identify the location of all fillin in order to use static data structures [7]. Parallel block-diagonal-bordered sparse linear solvers must include a specialized ordering step coupled to an explicit load balancing step in order to place the original matrix in block-diagonal-bordered form in such a manner as to minimize additional calculations due to fillin during factorization and to distribute the workload uniformly throughout a distributed-memory multi-processor. Our research has shown that the lower right-hand diagonal block in a block-diagonal-bordered ordered power system matrix is not extremely sparse, so it should be solved using dense techniques to take advantage of memory access regularity.