next up previous
Next: Available Parallelism in Up: Parallel Block-Diagonal-Bordered Sparse Linear Previous: A Survey of

Available Parallelism

The most significant aspect of parallel sparse LU factorization is that the sparsity structure can be exploited to offer more parallelism than is available with dense matrix solvers. Parallelism in dense matrix factorization is achieved by distributing the data in a manner that the calculations in one of the for loops can be performed in parallel. Sparse factorization algorithms have inadequate calculations in any row or column for efficient parallelism; however, sparse matrices offer additional parallelism as a result of the nature of the data and the precedence rules governing the order of calculations. Instead of just parallelizing a single for loop as in parallel dense matrix factorization, entire independent portions of a sparse matrix can be factored in parallel --- especially when the sparse matrix has been ordered into block-diagonal-bordered form.

Provided that a matrix can be ordered into block-diagonal-bordered form, the parallel sparse LU algorithm can reap additional benefits, such as the elimination of task graphs for distributed-memory multi-processor implementations. Minimizing or eliminating task graphs is significant because the task graph can contain as much information as the representation of the sparse matrix for more conventional parallel sparse LU solvers [18].

The same independence between diagonal blocks in these sparse matrices can also be exploited in our parallel sparse Gauss-Seidel algorithm. While Gauss-Seidel algorithms for dense matrices are inherently sequential, it is possible to identify portions of sparse matrices that do not have mutual data dependencies, so calculations can proceed in parallel on mutually independent matrix partitions while maintaining the strict precedence rules in the Gauss-Seidel technique. All parallelism in the Gauss-Seidel algorithm is derived from within the actual interconnection relationships between elements in the matrix. Furthermore, the extremely sparse last diagonal block also has inherent parallelism that can be identified by using graph multi-coloring techniques.

There are several distinct ways to illustrate available parallelism in block-diagonal-bordered form matrices. Available parallelism in a block-diagonal-bordered sparse matrix can be illustrated by the graph of the matrix. Figure gif represents the form of a graph with four mutually independent sub-matrices (subgraphs) interconnected by shared coupling equations. No graph node in a subgraph has an interconnection to another subgraph except through the coupling equations. It should be intuitive that data in columns associated with nodes in subgraphs can be factored independently up to the point where the coupling equations are factored. The description of parallelism presented here for direct methods is also closely related to the concept of elimination graphs and super-nodes described in [29]. For the parallel Gauss-Seidel, a concept similar to elimination graphs could be utilized to depict the precedence in the calculations.

A block-diagonal-bordered form sparse matrix can be represented by an elimination or general precedence tree with supernodes at only two levels. Supernodes are collections of nodes in the elimination tree that are considered as a single entity, and form the elimination tree leaves, with another supernode as the root of the tree. By simply restructuring the graph presented in figure gif, it is possible to represent the same concept as a tree. An elimination tree for a block-diagonal-bordered form matrix with four supernodes as leaves and a single supernode as the tree's root is presented in figure gif. Each leaf supernode will have an inherent hierarchy of calculations, that can be calculated independently of other leaf supernodes. The root supernode for factorization algorithms has little additional parallelism as a result of independent calculations, and would be represented generally as a vertical chain of nodes. This is due to fillin as a result of factorization. Conversely, the last block for the block-diagonal-bordered parallel Gauss-Seidel method represents only the interconnection structure within the equations that couple the partitions in the block-diagonal portion of the matrix.

 
Figure: Graph with Four Independent Sub-Matrices  

 
Figure: Elimination Tree with Four Supernode Leaves  

For iterative methods, the reduced interconnectivity in the last diagonal block offers more available parallelism, but at the cost of requiring multiple iterations for change in the value of any to propagate throughout any interconnected values. Fillin in a factored matrix increases the amount of calculations, but any interconnection theoretically possible from the matrix graph representation has been considered. Unfortunately, there is little available parallelism inherent in factoring the last diagonal block, so pipelined techniques similar to those used for dense matrices are required for parallel factorization of the last diagonal block.





next up previous
Next: Available Parallelism in Up: Parallel Block-Diagonal-Bordered Sparse Linear Previous: A Survey of



David P. Koester
Sun Oct 22 17:27:14 EDT 1995