next up previous
Next: Parallel Sparse Gauss-Seidel Up: Empirical Results for Previous: Empirical Results for

Ordering Power Systems Network Matrices into Block-Diagonal-Bordered Form

Critical to the efficient operation of this parallel block-diagonal-bordered direct sparse matrix solvers is the ability to order the sparse power systems networks into block-diagonal-bordered form with equal workload in all processors. In this section, we show the results of ordering power systems networks to the desired block-diagonal-bordered form, and later we present empirical data that illustrate the load-balancing capabilities of this algorithm. We demonstrate the performance of the graph partitioning algorithm with pseudo-images that show the location of the non-zero values in the sparse matrices with black pixels. A bounding box has been placed around the sparse matrices. The pseudo-images clearly show both the block-diagonal-bordered form of the power systems network matrices after ordering with diakoptic techniques, and the additional multi-colored ordering of the last diagonal block.

Performance of our parallel block-diagonal-bordered LU and Choleski solvers will be examined with five separate power systems network matrices:

Pseudo-images of the matrices without ordering have been presented in chapter gif to provide a baseline that illustrates the irregular sparse nature of these power systems network matrices. In order to illustrate the utility of the node-tearing ordering algorithm described in appendix gif and the graph multi-coloring algorithm described in appendix gif, we present a detailed analysis of graph partitioning for the parallel block-diagonal-bordered sparse Gauss-Seidel algorithm using the BCSPWR09 power systems network.

Our parallel block-diagonal-bordered Gauss-Seidel algorithm requires that the power systems network matrix be ordered into block-diagonal-bordered form in a manner that yields uniformly distributed workloads at all processors. The algorithm also requires that the last block be multi-colored in order to have parallelism in this portion of the algorithm. A single specified input parameter, the maximum partition size, defines the shape of the matrix after ordering by the node-tearing algorithm. Examples of applying the node-tearing algorithm to the BCSPWR09 matrix are presented in figure 7.11, with separate pseudo-images for maximum diagonal block sizes of 32, 96, and 160 nodes. These three sparse matrices in figure 7.11 have been load balanced for 32 processors, for sixteen processors, and for eight processors respectively.

 
Figure 7.11: BCSPWR09 --- Block-Diagonal-Bordered Form for Parallel Gauss-Seidel  

The pseudo-image for maximum block size of 160 nodes in figure 7.11 includes additional markings to illustrate how the blocks and border of this matrix would be distributed to eight processors --- P1 through P8. The metric for load-balancing is the number of operations and not the number of columns assigned to a processor. Due to the amount of clutter within the pseudo-images, no processor assignments are listed for the pseudo-images partitioned with a maximum of 32 and 96 nodes.

These pseudo-images represent the actual matrix orderings that yielded the best performance for the respective number of processors. We have found in this work, that there are two significant areas that must be considered for optimal parallel Gauss-Seidel performance for a particular number of processors:

The pseudo-images in figure 7.11 illustrate that the size of the borders and last diagonal block can be manipulated by varying the value of the maximum partition size. The number of rows/columns in the borders and last diagonal block of these ordered matrices vary from 190 to 117 for maximum partition size of 32 and 160 respectively. Statistics are presented in table gif (appendix gif) for six example orderings of the BCSPWR09 matrix. The data presented in this table is for maximum partition sizes that have varied from 16 to 160. For each ordering, only three separate colors were required for the last diagonal block, and as many as 96.8% of floating point operations are in the diagonal blocks and border. There is a detailed listing of the number of floating point operations in this table for each ordering of the BCSPWR09 power systems network.

The pseudo-images in figure 7.12 provide an accompanying visual reference to the partitioning performance data presented in tables gif through gif. We present a figure for each power systems network that illustrates the ordering after partitioning, multi-coloring, and load-balancing for eight processors --- P1 through P8. Partitioned graphs presented here have values of the maximum partition size that yielded the best empirical parallel block-diagonal-bordered Gauss-Seidel performance.

 
Figure 7.12: Block-Diagonal-Bordered Form for Parallel Gauss-Seidel --- Load Balanced for 8 Processors 

It is important to note that the block-diagonal-bordered matrices for the BCSPWR09 network has many similarities with the NiMo-OPS network. Also the EPRI6K matrix and the NiMo-PLANS matrix have many notable similarities. The BCSPWR09 and NiMo-OPS matrices are for operations networks that are homogeneous and have very similar voltage distributions throughout. Meanwhile, the EPRI6K and NiMo-PLANS matrices are from planning applications, and one subsection of these networks includes some lower voltage distribution lines.

Planning networks have enhanced detail in the area that represents the local power systems grid, with less detail in areas distant from the power utility's own network. Depending on the size of this highly connected portion of the graph and the maximum partition size, the size and interconnectivity of the last diagonal block can be affected. As a result, the last diagonal block for planning matrices may require more than four colors when this portion of the matrix is ordered. Algorithmic efficiency is severely hampered when solving the last diagonal block with eight or more colors on 32 processors, as will be shown below. The highly interconnected graph section can be seen at the upper left-hand matrix corner and in the last diagonal block after coloring for the EPRI6K and NiMo-PLANS power systems network matrices in figure 7.12.

When using additional processors, ordering with lesser values of maximum nodes per partition give better performance because they trade better load balance for additional rows/columns in the last diagonal block. The dense partition in the upper left-hand corners of these sparse matrices are torn into smaller partitions with the coupling equations being directed to the last diagonal block.



next up previous
Next: Parallel Sparse Gauss-Seidel Up: Empirical Results for Previous: Empirical Results for



David P. Koester
Sun Oct 22 17:27:14 EDT 1995