Critical to the efficient operation of this parallel block-diagonal-bordered direct sparse matrix solvers is the ability to order the sparse power systems networks into block-diagonal-bordered form with equal workload in all processors. In this section, we show the results of ordering power systems networks to the desired block-diagonal-bordered form, and later we present empirical data that illustrate the load-balancing capabilities of this algorithm. We demonstrate the performance of the graph partitioning algorithm with pseudo-images that show the location of the non-zero values in the sparse matrices with black pixels. A bounding box has been placed around the sparse matrices. The pseudo-images clearly show both the block-diagonal-bordered form of the power systems network matrices after ordering with diakoptic techniques, and the additional multi-colored ordering of the last diagonal block.
Performance of our parallel block-diagonal-bordered LU and Choleski solvers will be examined with five separate power systems network matrices:
Our parallel block-diagonal-bordered Gauss-Seidel algorithm requires that the power systems network matrix be ordered into block-diagonal-bordered form in a manner that yields uniformly distributed workloads at all processors. The algorithm also requires that the last block be multi-colored in order to have parallelism in this portion of the algorithm. A single specified input parameter, the maximum partition size, defines the shape of the matrix after ordering by the node-tearing algorithm. Examples of applying the node-tearing algorithm to the BCSPWR09 matrix are presented in figure 7.11, with separate pseudo-images for maximum diagonal block sizes of 32, 96, and 160 nodes. These three sparse matrices in figure 7.11 have been load balanced for 32 processors, for sixteen processors, and for eight processors respectively.
Figure 7.11: BCSPWR09 --- Block-Diagonal-Bordered Form for Parallel Gauss-Seidel
The pseudo-image for maximum block size of 160 nodes in figure 7.11 includes additional markings to illustrate how the blocks and border of this matrix would be distributed to eight processors --- P1 through P8. The metric for load-balancing is the number of operations and not the number of columns assigned to a processor. Due to the amount of clutter within the pseudo-images, no processor assignments are listed for the pseudo-images partitioned with a maximum of 32 and 96 nodes.
These pseudo-images represent the actual matrix orderings that yielded the best performance for the respective number of processors. We have found in this work, that there are two significant areas that must be considered for optimal parallel Gauss-Seidel performance for a particular number of processors:
The pseudo-images in figure 7.12 provide an
accompanying visual reference to the partitioning performance data
presented in tables
through
. We present a figure for each
power systems network that illustrates the ordering after
partitioning, multi-coloring, and load-balancing for eight processors
--- P1 through P8. Partitioned graphs presented here have values of
the maximum partition size that yielded the best empirical parallel
block-diagonal-bordered Gauss-Seidel performance.
Figure 7.12: Block-Diagonal-Bordered Form for Parallel Gauss-Seidel --- Load Balanced for 8 Processors
It is important to note that the block-diagonal-bordered matrices for the BCSPWR09 network has many similarities with the NiMo-OPS network. Also the EPRI6K matrix and the NiMo-PLANS matrix have many notable similarities. The BCSPWR09 and NiMo-OPS matrices are for operations networks that are homogeneous and have very similar voltage distributions throughout. Meanwhile, the EPRI6K and NiMo-PLANS matrices are from planning applications, and one subsection of these networks includes some lower voltage distribution lines.
Planning networks have enhanced detail in the area that represents the local power systems grid, with less detail in areas distant from the power utility's own network. Depending on the size of this highly connected portion of the graph and the maximum partition size, the size and interconnectivity of the last diagonal block can be affected. As a result, the last diagonal block for planning matrices may require more than four colors when this portion of the matrix is ordered. Algorithmic efficiency is severely hampered when solving the last diagonal block with eight or more colors on 32 processors, as will be shown below. The highly interconnected graph section can be seen at the upper left-hand matrix corner and in the last diagonal block after coloring for the EPRI6K and NiMo-PLANS power systems network matrices in figure 7.12.
When using additional processors, ordering with lesser values of maximum nodes per partition give better performance because they trade better load balance for additional rows/columns in the last diagonal block. The dense partition in the upper left-hand corners of these sparse matrices are torn into smaller partitions with the coupling equations being directed to the last diagonal block.