next up previous
Next: Node-Tearing-Based Ordering Up: Ordering Sparse Matrices Previous: Minimum Degree Ordering

Recursive Spectral Bisection-Based Ordering

Recursive spectral bisection can be used to produce block-diagonal-bordered matrices for the parallel sparse matrix solvers. The recursive spectral bisection algorithm has been developed to balance computational load in a manner that minimizes interprocessor communications. While this goal is similar to generating block-diagonal-bordered sparse matrices with independent blocks, the ordering technique based on recursive spectral bisection requires two steps to generate matrices in this form. The first step in this ordering technique is to employ a multilevel version of the recursive spectral bisection method for partitioning unstructured graphs [8]. This technique computes the smallest non-trivial eigenvector of the Laplacian matrix associated with a graph to partition a sparse matrix into a predetermined number of subgraphs where the number of edges connecting subgraphs is minimized.

The second step in this ordering technique requires the extraction of the coupling equations to form the block-diagonal-bordered matrix. Those edges that interconnect multiple sub-matrices must be identified and moved to the set of coupling equations. The initial recursive spectral bisection partitioning method yields approximately equal sized sub-blocks, and the algorithm for this second step also considers the requirement to balance the size of the sub-blocks as the coupling equations are removed. Meanwhile, the number of coupling equations must be minimized to ensure good performance of the parallel block-bordered-diagonal sparse matrix solver. A greedy heuristic algorithm has been developed to perform this task while addressing both optimization goals.

The heuristic developed to extract the coupling equations from a graph ordered by recursive spectral bisection is based on repetitively removing all nodes with edges in multiple partitions and placing these nodes in a separate set. To minimize the number of coupling equations, nodes are removed in order of the largest number of inter-partition connections. Moreover, when there is more than one node with the same number of inter-partition connections, these ties are broken by selecting nodes from partitions with the largest number of remaining nodes. This maintains equal numbers of nodes per independent partition, a criteria of the first step in the algorithm.

Examples of recursive spectral bisection-based ordering are presented in figures 4 and 5 respectively for eight and sixteen partitions/processors, again using the BCSPWR09.PSA matrix from the Boeing-Harwell series [4]. Each partition could be assigned to a separate processor, with work proceeding independently until communications is required to send data to the last block before factoring the non sparse last block. Note that mutually independent partitions are equal sized, although the number of variables in each equation can vary. The number of fillin for these examples is 3,386 and 4,809 respectively, with a substantial portion of the fillin occurring in the lower right-hand corner.

 
Figure 4: RSB-Based Ordering --- 8 Processors  

 
Figure 5: RSB-Based Ordering --- 16 Processors  

The general performance of recursive spectral bisection-based ordering is dependent on the number of graph partitions, which is equal to the number of independent partitions. The number of coupling equations and the size of the last block is dependent on the number of processors, which is illustrated in figure 6. As the number of processors increases, so does both the size of the last block and load imbalance. Load imbalance is defined as the ratio of the difference of the maximum and minimum numbers of floating point operations divided by the maximum number of floating point operations per processor and illustrates the percentage of time that the processor with the least amount of work is inactive. Load imbalance for recursive spectral bisection-based ordering as a function of the number of processors is presented in figure 7. Even for only two partitions of the matrix, there is load imbalance. Load imbalance is approximately 20% for two processors and nearly 90% for greater than eight processors. While this ordering technique attempts to maintain equal numbers of elements in the graph partitions, the actual number of floating point operations is dependent on the sub-matrix structure.

 
Figure 6: RSB --- Nodes in Last Block  

 
Figure 7: RSB --- Load Imbalance  



next up previous
Next: Node-Tearing-Based Ordering Up: Ordering Sparse Matrices Previous: Minimum Degree Ordering



David P. Koester
Sun Oct 22 17:45:03 EDT 1995