Critical to the efficient operation of these parallel block-diagonal-bordered direct sparse matrix solvers is the ability to order sparse power systems networks into block-diagonal-bordered form with equal workloads in all processors. In this section, we illustrate that it is possible to order power systems networks to the desired form, and later we present empirical data that show the load-balancing capabilities of the preprocessing phase.
To demonstrate the performance of the graph partitioning algorithm, we present pseudo-images that show the locations of the non-zero values in the sparse matrices, both the original non-zero values and those that would become non-zero due to fillin during factorization. In the following pseudo-images, original non-zero values are represented as black pixels and fillin values are represented by a lighter grey color. A bounding box has been placed around the sparse matrix. These pseudo-images clearly show the block-diagonal-bordered form of the power systems network matrices after the preprocessing phase.
We examine the performance of our parallel block-diagonal-bordered LU and Choleski solvers with five separate power systems network matrices:
Our parallel block-diagonal-bordered direct algorithms require that the power systems network matrix be ordered into block-diagonal-bordered form in a manner that yields a minimum of floating point operations and that has uniformly distributed workloads at all processors. A single specified input parameter, the maximum partition size, defines the shape of the matrix after ordering by the node-tearing algorithm, which in turn directly impacts the size of the borders and the last diagonal block, the number of floating point operations, and the efficacy of load-balancing. In order to illustrate the ability of the node-tearing-based ordering algorithm, we present a detailed analysis of graph partitioning for the BCSPWR09 power systems network in figure 7.1, with sample ordered matrices for maximum diagonal block sizes of 16, 32, 64, and 96 nodes. For the larger values of maximum partition size, the application of minimum degree ordering within a partition is evident in these figures. The upper left-hand corner of a diagonal block has fewer values than the lower right-hand corner.
Detailed statistics for the matrix partitionings are presented in
table (appendix
) for the four
example orderings of the BCSPWR09 matrix. This table includes the
number of fillin, and the number of rows/columns in the borders and
last diagonal block of the ordered matrix.
Table
shows that the ordering with maximum
partition size of 32 has the least fillin, the fewest total
operations, and the largest percentage of operations in the mutually
independent matrix partitions. Empirical data collected when
benchmarking the parallel software implementation on the CM-5 show
that this partitioning has the best parallel direct linear solver
performance for this power systems network.
Figure 7.1: BCSPWR09 --- Block-Diagonal-Bordered Form --- Load Balanced for 8 Processors
The pseudo-images in figure 7.1 illustrate that the size of the borders and last diagonal block can be manipulated by varying the the maximum partition size. The number of rows/columns in the borders and last diagonal block of these ordered matrices vary from 277 to 131 for maximum partition size of 16 and 96 respectively. Each of these four figures has been load-balanced for eight processors, and the pseudo-images in figure 7.1 include additional markings to illustrate how this matrix would be distributed to the eight processors --- P1 through P8. The metric for load-balancing is the number of operations and not the number of columns or rows assigned to a processor. The load balancing step is simply another permutation of the matrix that keeps rows/columns within partitions together in the same order. As the matrix is load-balanced for various numbers of processors, there is no change in the number of fillin nor in the total number of operations.
Figure 7.2 has families of curves that illustrate the relationship between maximum partition size and size of the borders and last diagonal block when partitioning each of the five power systems networks used in this analysis. The partitioning results for the BCSPWR09 network are very similar to the data for the Niagara Mohawk operations data, NiMo-OPS. These matrices are similar in size and have similar numbers of edges per node. Meanwhile, larger matrices have significantly greater numbers of rows in the border and last diagonal block. Also, these larger matrices have significantly greater variations between the number of rows in the last diagonal block. This empirical evidence suggests that there are fundamental differences between operational analysis networks and larger planning networks. Additional evidence of these differences is discussed below, both as we present orderings for these matrices and as we discuss the performance of the parallel direct linear solvers.
Figure 7.2: Last Diagonal Block Size after Partitioning
Note, that in figure 7.2, the maximum size of the diagonal blocks is inversely related to the size of the last diagonal block. This is intuitive, because as diagonal matrix blocks are permitted to grow larger, multiple smaller blocks can be incorporated into a single block. Not only will the two blocks be consolidated into the single block, but in addition, any elements in the coupling equations that are unique to those network partitions would also be moved into the larger block. Another interesting point with the relationship between maximum size of the diagonal block and the size of the last block, is that the percentage of non-zeros and fillin in the last diagonal block increases significantly as the size of the last block decreases. The empirical performance data for the parallel solvers show that the best parallel performance is closely correlated with minimum numbers of operations.
In tables through
(appendix
), we present summary statistics for the
remaining power systems networks used in this analysis. In each
table, the maximum partition size that yielded the best parallel
performance has been identified.
In figure 7.3, we provide an accompanying visual
reference to the partitioning performance data presented in
tables through
.
For each power systems network, we present a representation of the
matrix after partitioning and load-balancing for 8 processors.
Partitioned graphs presented here have maximum partition size values
that yielded the best empirical parallel block-diagonal-bordered
direct linear solver performance. The pseudo-images of the
block-diagonal-bordered matrices are highlighted to illustrate the
manner in which each matrix would be distributed to eight processors
--- P1 through P8.
Figure 7.3: Block-Diagonal-Bordered Form Matrices --- Load Balanced for 8 Processors
We want to reiterate that the block-diagonal-bordered matrix for the BCSPWR09 network has many similarities with the NiMo-OPS network. Also, the EPRI6K matrix has noticeable similarities with the NiMo-PLANS matrix. The BCSPWR09 and NiMo-OPS matrices are operational networks that are homogeneous and have very similar voltage distributions throughout. Meanwhile, the EPRI6K and NiMo-PLANS matrices are from planning applications, and one subsection of these networks includes some lower voltage electrical distribution lines. This matrix has enhanced detail in the local area, with less detail in areas distant from the power utility's own network. This causes additional rows/columns in the borders and the last diagonal blocks, but our parallel block-diagonal-bordered direct solvers appear to have little difficulty with efficiently solving these matrices. The small, highly connected graph section can be seen at the lower right-hand corner of the EPRI6K and NiMo-PLANS matrices in figure 7.3.