We have developed an efficient parallel Gauss-Seidel algorithm for irregular, sparse matrices from electrical power systems applications. Even though Gauss-Seidel algorithms for dense matrices are inherently sequential, it is possible to identify sparse matrix partitions without data dependencies so calculations can proceed in parallel while maintaining the strict precedence rules in the Gauss-Seidel technique. All data parallelism in our Gauss-Seidel algorithm is derived from within the actual interconnection relationships between elements in the matrix. We employed two distinct ordering techniques in a preprocessing phase to identify the available parallelism within the matrix structure:
Power system distribution networks are generally hierarchical with limited numbers of high-voltage lines transmitting electricity to connected local networks that eventually distribute power to customers. In order to ensure reliability, highly interconnected local networks are fed electricity from multiple high-voltage sources. Electrical power grids have graph representations which in turn can be expressed as matrices --- electrical buses are graph nodes and matrix diagonal elements, while electrical transmission lines are graph edges which can be represented as non-zero off-diagonal matrix elements.
We show that it is possible to identify the hierarchical structure within a power system matrix using only the knowledge of the interconnection pattern by tearing the matrix into partitions and coupling equations that yield a block-diagonal-bordered matrix. Node-tearing-based partitioning identifies the basic network structure that provides parallelism for the majority of calculations within a Gauss-Seidel iteration. Meanwhile, without additional ordering, the last diagonal block would be purely sequential, limiting the potential speedup of the algorithm in accordance with Amdahl's law. The last diagonal block represents the interconnection structure within the equations that couple the partitions found in the previous step. Graph multi-coloring has been used to order this matrix partition and subsequently identify those rows that can be solved in parallel.
We implemented explicit load balancing as part of each of the aforementioned ordering steps to maximize efficiency as the parallel algorithm is applied to real power system load-flow matrices. An attempt was made to place equal amounts of processing in each partition, and in each matrix color. The metric employed when load-balancing the partitions is the number of floating point multiply/add operations, not simply the number of rows per partition. Empirical performance data collected on the parallel Gauss-Seidel algorithm illustrate the ability to balance the workload for as many as 32 processors.
We implemented the parallel Gauss-Seidel algorithm on the Thinking Machines CM-5 distributed memory multi-processor using the Connection Machine active message layer (CMAML). Using this communications paradigm, significant improvements in the performance of the algorithm were observed compared to more traditional communications paradigms that use the standard blocking send and receive functions in conjunction with packing data into communications buffers. To significantly reduce communications overhead and attempt to hide communications behind calculations, we implemented each portion of the algorithm using CMAML remote procedure calls. The communications paradigm we use throughout this algorithm is to send a double precision data value to the destination processor as soon as the value is calculated. The use of active messages greatly simplified the development and implementation of this parallel sparse Gauss-Seidel algorithm.
Parallel implementations of Gauss-Seidel have have generally been developed for regular problems such as the solution of Laplace's equations by finite differences [3,4], where red-black coloring schemes are used to provide independence in the calculations and some parallelism. This scheme has been extended to multi-coloring for additional parallelism in more complicated regular problems [4], however, we are interested in the solution of irregular linear systems. There has been some research into applying parallel Gauss-Seidel to circuit simulation problems [12], although this work showed poor parallel speedup potential in a theoretical study. Reference [12] also extended traditional Gauss-Seidel and Gauss-Jacobi methods to waveform relaxation methods that trade overhead and convergence rate for parallelism. A theoretical discussion of parallel Gauss-Seidel methods for power system load-flow problems on an alternating sequential/parallel (ASP) multi-processor is presented in [15]. Other research with the parallel Gauss-Seidel methods for power systems applications is presented in [7], although our research differs substantially from that work. The research we present here utilizes a different matrix ordering paradigm, a different load balancing paradigm, and a different parallel implementation paradigm than that presented in [7]. Our work utilizes diakoptic-based matrix partitioning techniques developed initially for a parallel block-diagonal-bordered direct sparse linear solver [9,10]. In reference [9] we examined load balancing issues associated with partitioning power systems matrices for parallel Choleski factorization.
The paper is organized as follows. In section 2, we introduce the electrical power system applications that are the basis for this work. In section 3, we briefly review the Gauss-Seidel iterative method, then present a theoretical derivation of the available parallelism with Gauss-Seidel for a block-diagonal-bordered form sparse matrix. Paramount to exploiting the advantages of this parallel linear solver is the preprocessing phase that orders the irregular sparse power system matrices and performs load-balancing. We discuss the overall preprocessing phase in section 5, and describe node-tearing-based ordering and graph multi-coloring-based ordering in sections 6 and 7 respectively. We describe our parallel Gauss-Seidel algorithm in section 8, and include a discussion of the hierarchical data structures to store the sparse matrices. Analysis of the performance of these ordering techniques for actual power system load flow matrices from the Boeing-Harwell series and for a matrix distributed with the Electrical Power Research Institute (EPRI) ETMSP software are presented in section 9. Examinations of the convergence of the algorithm are presented along with parallel algorithm performance. We state our conclusions in section 10.