next up previous
Next: Ordering the Last Up: The Preprocessing Phase Previous: The Preprocessing Phase

Ordering the Matrix into Block-Bordered-Diagonal Form

We require a technique that orders irregular matrices into block-diagonal-bordered form while limiting the number of coupling equations. Minimizing the number of coupling equations minimizes the size of the last diagonal block in a block-diagonal-bordered sparse matrix, and minimizes the amount of broadcast communications required when calculating values of in the last diagonal block. The effects of minimizing the size of the last diagonal block are not all positive. We have found that minimizing the size of the last block can affect potential parallelism if the resulting workload for calculating in the diagonal blocks cannot be distributed uniformly throughout a multi-processor --- in which case there is load imbalance between multi-processors [9]. When determining the optimal ordering for a sparse matrix, the size of the last diagonal block and the subsequent additional communications may be traded for an ordering that yields good load balance in the highly parallel portion of the calculations, especially when using larger numbers of processors.

The method we have chosen to order a sparse matrix into block-diagonal-bordered form is referred to as node-tearing [13], which is a specialized form of diakoptics [5]. We have selected node-tearing nodal analysis because this algorithm determines the natural structure in the matrix while providing the means to minimize the number of coupling equations. With the node-tearing algorithm, we can determine the hierarchical structure in a power system distribution grid solely from the interconnection relationships in the sparse matrices. Tearing here refers to breaking the original problem into smaller sub-problems whose partial solutions can be combined to give the solution of the original problem. Load balancing techniques must be used after the node tearing matrix ordering step to uniformly distribute the processing load onto a multi-processor.

The node-tearing-based ordering algorithm has the ability to adjust the characteristics of the ordering by varying an input parameter. Empirical data is presented later in section 9 for multiple orderings to illustrate the parallel linear solver algorithm performance as a function of input parameters to the node-tearing algorithm.

Load balancing for node-tearing-based ordering can be performed with a simple pigeon-hole type algorithm that uses a metric based on the number of floating point multiply/add operations in a partition, instead of simply using the number of rows per partition. Load balancing examines the number of operations when calculating in the matrix partitions and the number of operations when calculating the sparse matrix vector products in preparation to solve for in the last diagonal block. These metrics do not consider indexing overhead, which can be rather extensive when working with very sparse matrices stored in an implicit form. This algorithm finds an optimal distribution for workload to processors, however, actual disparity in processor workload is dependent on the irregular sparse matrix structure. This algorithm works best when there are minimal disparities in the workloads for independent blocks or when there are significantly more independent blocks than processors. In this instance, the workloads in multiple small blocks can sum to equal the workload in a single block with more computational workload.



next up previous
Next: Ordering the Last Up: The Preprocessing Phase Previous: The Preprocessing Phase



David P. Koester
Sun Oct 22 15:29:26 EDT 1995