The most significant aspect of parallel sparse Choleski factorization is that the sparsity structure can be exploited to offer more parallelism than is available with dense matrix solvers. Parallelism in dense matrix factorization is achieved by distributing the data in a manner that the calculations in one of the for loops can be performed in parallel. Sparse factorization algorithms have inadequate calculations in any row or column for efficient parallelism; however, sparse matrices offer additional parallelism as a result of the nature of the data and the precedence rules governing the order of calculations. Instead of just parallelizing a single for loop as in parallel dense matrix factorization, entire independent portions of a sparse matrix can be factored in parallel --- especially when the sparse matrix has been ordered into block-diagonal-bordered form. The description of parallelism presented here has some things in common with elimination graphs and super-nodes described in [14]; however, this work adds a two-dimensional flavor to these concepts. Provided that the matrix can be ordered into block-diagonal-bordered form, then the parallel sparse Choleski algorithm can reap additional benefits, such as the elimination of task graphs for distributed-memory multi-processor implementations. Minimizing or eliminating task graphs is significant because the task graph can contain as much information as the representation of the sparse matrix for more conventional parallel sparse Choleski solvers [8].
There are several distinct ways to examine the available parallelism in block-diagonal-bordered form matrices. The first way to consider available parallelism in a block-diagonal-bordered sparse matrix is to consider the graph of the matrix. Figure 6 represents the form of a graph with four mutually independent sub-matrices (subgraphs) interconnected by shared coupling equations. No graph node in a subgraph has an interconnection to another subgraph except through the coupling equations. It should be intuitive that data in columns associated with nodes in subgraphs can be factored independently up to the point where the coupling equations are factored. While this graph offers intuition into the available parallelism in block-diagonal-bordered sparse matrices, it is possible to examine the theoretical mathematics of matrix partitioning to clearly identify available parallelism in this sparse matrix form. By partitioning the block-diagonal-bordered matrix into:
Figure 6: Graph with Four Independent Sub-Matrices
Define a partition of as
where:¯The Shur complement of the partitioned matrices in equation 7 can be calculated by simply performing the matrix multiplication on the
and
are of size
![]()
and
are of size
![]()
and
are of size
![]()
and
are of size
.
By equating blocks in equation 8, we can easily identify how to solve for the partitions:
Before we can proceed and prove the theorem that the
factors of a block-diagonal-bordered (BDB) symmetric sparse matrix are also
in block-diagonal-bordered form, we must define additional matrix
partitions in the desired form and prove a Lemma that the
factors of a block-diagonal (BD) matrix are also in
block-diagonal form. At this point, we must define additional partitions of
that represent the block-diagonal-bordered nature of the
original
matrix:
Lemma --- The factors of a block-diagonal
matrix are also in block-diagonal form
Proof:
Let:
By applying the Shur complement to equation 17, we obtain:
and
If is non-singular and has a numerical
factor, then
and
must exist and be non-zero: thus
This lemma can be applied recursively to a block-diagonal matrix with any number of diagonal blocks to prove that the factorization of a block-diagonal matrix preserves the block structure.
Theorem --- The factors of a
block-diagonal-bordered matrix are also in block-diagonal-bordered
form. To restate this theorem, we must show that
.
Proof:
First the matrix partitions and
have simply been further partitioned to match the
sizes of the diagonal blocks. Meanwhile, the matrix partition
has been left unchanged. In the lemma, we proved that
the factors of
are block-diagonal if
is block-diagonal. Consequently,
.
As a result of this theorem, it is relatively straight forward to identify available parallelism by simply performing the matrix multiplication in a manner similar to the Shur complement. As a result we obtain:
This derivation identifies the parallelism in the Choleski
factorization step of a block-bordered-diagonal sparse matrix. The
parallelism in the forward reduction and backward substitution steps
also benefits from the aforementioned data/processor distribution. By
assigning data in a matrix block and its associated border section to
the same processor, no communications would be required in the forward
reduction phase until the last block of the factored matrix, ,
is updated by the product of a dense vector partition
the sparse matrix
(
). No communications
is required in the backward substitution phase after the values of
are broadcast to all processors holding the matrix blocks
and
(
).
Figure 7 illustrates both the Choleski factorization steps
and the reduction/substitution steps for a block-diagonal-bordered
sparse matrix. In this figure, the strictly lower diagonal portion of
the matrix is , and the strictly upper diagonal portion of
the matrix is
. This figure depicts four diagonal
blocks, and processor assignments (P1, P2, P3, and P4) are listed with
the data block.
Figure 7: Block Bordered Diagonal Form Sparse Matrix Solution Steps