We require a technique that orders irregular matrices into
block-diagonal-bordered form while limiting the number of coupling
equations. Minimizing the number of coupling equations minimizes the
size of the last diagonal block in a block-diagonal-bordered sparse
matrix, and minimizes the amount of broadcast communications required
when calculating values of in the last diagonal
block. The effects of minimizing the size of the last diagonal block
are not all positive. We have found that minimizing the size of the
last block can affect potential parallelism if the resulting workload
for calculating
in the diagonal blocks cannot be
distributed uniformly throughout a multi-processor --- in which case
there is load imbalance between multi-processors [9].
When determining the optimal ordering for a sparse matrix, the size of
the last diagonal block and the subsequent additional communications
may be traded for an ordering that yields good load balance in the
highly parallel portion of the calculations, especially when using
larger numbers of processors.
The method we have chosen to order a sparse matrix into block-diagonal-bordered form is referred to as node-tearing [13], which is a specialized form of diakoptics [5]. We have selected node-tearing nodal analysis because this algorithm determines the natural structure in the matrix while providing the means to minimize the number of coupling equations. With the node-tearing algorithm, we can determine the hierarchical structure in a power system distribution grid solely from the interconnection relationships in the sparse matrices. Tearing here refers to breaking the original problem into smaller sub-problems whose partial solutions can be combined to give the solution of the original problem. Load balancing techniques must be used after the node tearing matrix ordering step to uniformly distribute the processing load onto a multi-processor.
The node-tearing-based ordering algorithm has the ability to adjust the characteristics of the ordering by varying an input parameter. Empirical data is presented later in section 9 for multiple orderings to illustrate the parallel linear solver algorithm performance as a function of input parameters to the node-tearing algorithm.
Load balancing for node-tearing-based ordering can be performed with a
simple pigeon-hole type algorithm that uses a metric based on the
number of floating point multiply/add operations in a partition,
instead of simply using the number of rows per partition. Load
balancing examines the number of operations when calculating in the matrix partitions and the number of operations when
calculating the sparse matrix vector products in preparation to solve
for
in the last diagonal block. These metrics do
not consider indexing overhead, which can be rather extensive when
working with very sparse matrices stored in an implicit form. This
algorithm finds an optimal distribution for workload to processors,
however, actual disparity in processor workload is dependent on the
irregular sparse matrix structure. This algorithm works best when
there are minimal disparities in the workloads for independent blocks
or when there are significantly more independent blocks than
processors. In this instance, the workloads in multiple small blocks
can sum to equal the workload in a single block with more
computational workload.