The most significant aspect of parallel sparse LU factorization is that the sparsity structure can be exploited to offer more parallelism than is available with dense matrix solvers. Parallelism in dense matrix factorization is achieved by distributing the data in a manner that the calculations in one of the for loops can be performed in parallel. Sparse factorization algorithms have inadequate calculations in any row or column for efficient parallelism; however, sparse matrices offer additional parallelism as a result of the nature of the data and the precedence rules governing the order of calculations. Instead of just parallelizing a single for loop as in parallel dense matrix factorization, entire independent portions of a sparse matrix can be factored in parallel --- especially when the sparse matrix has been ordered into block-diagonal-bordered form. Provided that a matrix can be ordered into block-diagonal-bordered form, then the parallel sparse LU algorithm can reap additional benefits, such as the elimination of task graphs for distributed-memory multi-processor implementations. Minimizing or eliminating task graphs is significant because the task graph can contain as much information as the representation of the sparse matrix for more conventional parallel sparse LU solvers [13].
There are several distinct ways to examine the available parallelism
in block-diagonal-bordered form matrices. The first way to consider
available parallelism in a block-diagonal-bordered sparse matrix is to
consider the graph of the matrix. Figure 10 represents the
form of a graph with four mutually independent sub-matrices
(subgraphs) interconnected by shared coupling equations. No graph
node in a subgraph has an interconnection to another subgraph except
through the coupling equations. It should be intuitive that data in
columns associated with nodes in subgraphs can be factored
independently up to the point where the coupling equations are
factored. The description of parallelism presented here is closely
related to the concept of elimination graphs and super-nodes described
in [20]. A block-diagonal-bordered form sparse matrix can be
represented by an elimination tree with supernodes at only two levels.
Supernodes form the elimination tree leaves, with another supernode as
the root of the tree. By simply restructuring the graph
presented in figure 10, it is possible to represent the same
concept as a tree. An elimination tree for a block-diagonal-bordered
form matrix with four supernodes as leaves and a single supernode as
the tree's root is presented in figure 11.
Figure 10: Graph with Four Independent Sub-Matrices
Figure 11: Elimination Tree with Four Supernode Leaves
While an elimination graph offers intuition into the available parallelism in block-diagonal-bordered sparse matrices, it is possible to examine the theoretical mathematics of matrix partitioning to clearly identify available parallelism in this sparse matrix form. By partitioning the block-diagonal-bordered matrix into:
Define a partition of as
where:¯The Shur complement of the partitioned matrices in equation 7 can be calculated by simply performing the matrix multiplication on the
,
, and
are of size
![]()
and
are of size
![]()
and
are of size
![]()
,
, and
are of size
.
By equating blocks in equation 8, we can easily identify how to solve for the partitions:
Before we can proceed and prove the theorem that the
factors of a block-diagonal-bordered (BDB) position symmetric sparse matrix are also
in block-diagonal-bordered form, we must define additional matrix
partitions in the desired form and prove a Lemma that the
factors of a block-diagonal (BD) matrix are also in
block-diagonal form. At this point, we must define additional partitions of
that represent the block-diagonal-bordered nature of the
original
matrix:
Lemma --- The factors of a block-diagonal
matrix are also in block-diagonal form
Proof:
Let:
By applying the Shur complement to equation 17, we obtain:
and
If is non-singular and has a numerical
factor, then
and
must exist and be non-zero: thus
This lemma can be applied recursively to a block-diagonal matrix with any number of diagonal blocks to prove that the factorization of a block-diagonal matrix preserves the block structure.
Theorem --- The factors of a
block-diagonal-bordered matrix are also in block-diagonal-bordered
form. To restate this theorem, we must show that
.
Proof:
First the matrix partitions and
have simply been further partitioned to match the
sizes of the diagonal blocks. Meanwhile, the matrix partition
has been left unchanged. In the lemma, we proved that
the factors of
are block-diagonal if
is block-diagonal. Consequently,
.
As a result of this theorem, it is relatively straight forward to identify available parallelism by simply performing the matrix multiplication in a manner similar to the Shur complement. As a result we obtain:
This derivation identifies the parallelism in the LU factorization
step of a block-bordered-diagonal sparse matrix. The parallelism in
the forward reduction and backward substitution steps also benefits
from the aforementioned data/processor distribution. By assigning
data in a matrix block and its associated border section to the same
processor, no communications would be required in the forward
reduction phase until the last block of the factored matrix, , is updated by the product of a dense vector partition
the sparse matrix
(
). No
communications is required in the backward substitution phase after
the values of
are broadcast to all processors holding the
matrix blocks
and
(
).
Figure 12 illustrates both the LU factorization steps and the
reduction/substitution steps for a block-diagonal-bordered sparse
matrix. In this figure, the strictly lower diagonal portion of the
matrix is , and the strictly upper diagonal portion of the
matrix is
. This figure depicts four diagonal blocks, and
processor assignments (P1, P2, P3, and P4) are listed with the data
block. This figure would represent the block-diagonal-bordered form
matrix and data distribution for the data represented in
figures 10 and 11.
Figure 12: Block Bordered Diagonal Form Sparse Matrix Solution Steps