In the previous chapter, we developed the theoretical foundations for
parallel sparse block-diagonal-bordered linear solvers, and we now
will discuss the procedures required to generate the permutation
matrices, , to produce block-diagonal-bordered/multi-colored
sparse matrices so that our parallel algorithms are efficient. To
reach this goal, we must ensure that the parallel algorithms are not
overwhelmed by load-imbalance-overhead and are as efficient as
possible on an architecture. We must reiterate that all parallelism
for our linear solvers is identified from the power systems network
structure during this preprocessing phase. The specifics for the
direct and iterative solvers are discussed separately below.
The preprocessing phase may be incorporated into an optimization framework that is used to produce matrix orderings with optimal overall performance for a particular version of the block-diagonal-bordered sparse matrix factorization algorithm. For this research, the optimization has been performed by hand. We have examined ordered matrices resulting from various values of the input parameter for the node-tearing algorithm, and we have identified the block-diagonal-bordered form sparse matrix with the best empirical run-time performance for each of five sample power system networks. These results are presented in sections 7.1 and 7.2.
This preprocessing phase incurs significantly more overhead than solving a single instance of the sparse matrix; consequently, the use of this technique will be limited to problems that have static matrix structures that can reuse the ordered matrix and load balanced processor assignments multiple times in order to amortize the cost of the preprocessing phase over numerous matrix solutions. The times for solving the linear power systems network matrices in parallel are measured in fractions of a second to seconds; however, the times for the preprocessing stage are at least an order of magnitude greater. The preprocessing phase for direct methods takes longer than the preprocessing phase for iterative methods, due to requirements for multiple minimum degree ordering. The preprocessing phase produces a permutation matrix with which to order the actual matrix, and defines what rows/columns are placed on what processors. The permutation matrix can be calculated with no more information than the graph interconnectivity of the power systems networks, which represent actual power systems generators, power lines, and distribution substations. The infrastructure underlying the power systems network matrices changes infrequently, so the effort to perform a single ordering can be amortized over days or even over weeks.