next up previous
Next: The Three-Step Preprocessing Up: Parallel Choleski Factorization of Previous: A Survey of

Available Parallelism in Block-Diagonal-Bordered Form Matrices

The most significant aspect of parallel sparse Choleski factorization is that the sparsity structure can be exploited to offer more parallelism than is available with dense matrix solvers. Parallelism in dense matrix factorization is achieved by distributing the data in a manner that the calculations in one of the for loops can be performed in parallel. Sparse factorization algorithms have inadequate calculations in any row or column for efficient parallelism; however, sparse matrices offer additional parallelism as a result of the nature of the data and the precedence rules governing the order of calculations. Instead of just parallelizing a single for loop as in parallel dense matrix factorization, entire independent portions of a sparse matrix can be factored in parallel --- especially when the sparse matrix has been ordered into block-diagonal-bordered form. The description of parallelism presented here has some things in common with elimination graphs and super-nodes described in [14]; however, this work adds a two-dimensional flavor to these concepts. Provided that the matrix can be ordered into block-diagonal-bordered form, then the parallel sparse Choleski algorithm can reap additional benefits, such as the elimination of task graphs for distributed-memory multi-processor implementations. Minimizing or eliminating task graphs is significant because the task graph can contain as much information as the representation of the sparse matrix for more conventional parallel sparse Choleski solvers [8].

There are several distinct ways to examine the available parallelism in block-diagonal-bordered form matrices. The first way to consider available parallelism in a block-diagonal-bordered sparse matrix is to consider the graph of the matrix. Figure 6 represents the form of a graph with four mutually independent sub-matrices (subgraphs) interconnected by shared coupling equations. No graph node in a subgraph has an interconnection to another subgraph except through the coupling equations. It should be intuitive that data in columns associated with nodes in subgraphs can be factored independently up to the point where the coupling equations are factored. While this graph offers intuition into the available parallelism in block-diagonal-bordered sparse matrices, it is possible to examine the theoretical mathematics of matrix partitioning to clearly identify available parallelism in this sparse matrix form. By partitioning the block-diagonal-bordered matrix into:

and calculating the Shur complement [4], it is possible to identify available parallelism by proving a theorem that states the Choleski factors of a block-diagonal-bordered matrix are also in block-diagonal-bordered form. A supporting lemma stating that the Choleski factors of a block-diagonal matrix are also block-diagonal form is required to complete the proof of the theorem. A similar version of this derivation can be used to identify the parallelism in general LU factorization.

 
Figure 6: Graph with Four Independent Sub-Matrices  

Define a partition of as

 

 where:¯

and are of size

and are of size

and are of size

and are of size .

The Shur complement of the partitioned matrices in equation 7 can be calculated by simply performing the matrix multiplication on the partitions which yields:

 

By equating blocks in equation 8, we can easily identify how to solve for the partitions:

 

Before we can proceed and prove the theorem that the factors of a block-diagonal-bordered (BDB) symmetric sparse matrix are also in block-diagonal-bordered form, we must define additional matrix partitions in the desired form and prove a Lemma that the factors of a block-diagonal (BD) matrix are also in block-diagonal form. At this point, we must define additional partitions of that represent the block-diagonal-bordered nature of the original matrix:

 

 

 

 

 

 

 

Lemma --- The factors of a block-diagonal matrix are also in block-diagonal form

Proof:

Let:

 

By applying the Shur complement to equation 17, we obtain:

and

If is non-singular and has a numerical factor, then and must exist and be non-zero: thus

This lemma can be applied recursively to a block-diagonal matrix with any number of diagonal blocks to prove that the factorization of a block-diagonal matrix preserves the block structure.

Theorem --- The factors of a block-diagonal-bordered matrix are also in block-diagonal-bordered form. To restate this theorem, we must show that .

Proof:

First the matrix partitions and have simply been further partitioned to match the sizes of the diagonal blocks. Meanwhile, the matrix partition has been left unchanged. In the lemma, we proved that the factors of are block-diagonal if is block-diagonal. Consequently, .

As a result of this theorem, it is relatively straight forward to identify available parallelism by simply performing the matrix multiplication in a manner similar to the Shur complement. As a result we obtain:

  1. Diagonal Blocks:
  2. Lower Border:
  3. Last Block:

If the matrix blocks and () are assigned to the same processor, then there are no communications until the last block is factored. At that time, only sums of sparse matrix sparse matrix products are sent to the processors that hold the appropriate data in the last block. This data-assignment to processors is similar to column-oriented sparse Choleski algorithms, although a significant difference exists with block-diagonal-bordered form matrices. Data in block-diagonal-bordered form sparse matrices have a two-dimensional blocked nature that groups calculations and should permit efficient parallel operations.

This derivation identifies the parallelism in the Choleski factorization step of a block-bordered-diagonal sparse matrix. The parallelism in the forward reduction and backward substitution steps also benefits from the aforementioned data/processor distribution. By assigning data in a matrix block and its associated border section to the same processor, no communications would be required in the forward reduction phase until the last block of the factored matrix, , is updated by the product of a dense vector partition the sparse matrix (). No communications is required in the backward substitution phase after the values of are broadcast to all processors holding the matrix blocks and ().

Figure 7 illustrates both the Choleski factorization steps and the reduction/substitution steps for a block-diagonal-bordered sparse matrix. In this figure, the strictly lower diagonal portion of the matrix is , and the strictly upper diagonal portion of the matrix is . This figure depicts four diagonal blocks, and processor assignments (P1, P2, P3, and P4) are listed with the data block.

 
Figure 7: Block Bordered Diagonal Form Sparse Matrix Solution Steps 



next up previous
Next: The Three-Step Preprocessing Up: Parallel Choleski Factorization of Previous: A Survey of



David P. Koester
Sun Oct 22 15:40:25 EDT 1995