Even though Gauss-Seidel algorithms for dense matrices are inherently sequential, it is possible to identify sparse matrix partitions without data dependencies so calculations can proceed in parallel while maintaining the strict precedence rules in the Gauss-Seidel technique [35,36]. All data parallelism in our Gauss-Seidel algorithm is derived from within the actual interconnection relationships between elements in the matrix. We employed two distinct ordering techniques in a preprocessing phase to identify the available parallelism within the matrix structure:
Node-tearing-based partitioning identifies the basic network structure that provides parallelism for the majority of calculations within a Gauss-Seidel iteration. Meanwhile, without additional ordering, the last diagonal block would be purely sequential, limiting the potential speedup of the algorithm in accordance with Amdahl's law. The last diagonal block represents the interconnection structure within the equations that couple the partitions found in the block-diagonal-bordered matrix. Graph multi-coloring has been used to order this matrix partition and subsequently identify those rows that can be solved in parallel.
We implemented explicit load balancing as part of each of the aforementioned ordering steps to maximize efficiency as the parallel Gauss-Seidel algorithm is applied to real power system load-flow matrices. An attempt was made to place equal amounts of processing in each partition, and in each matrix color. The metric employed when load-balancing the partitions is the number of floating point multiply/add operations, not simply the number of rows per partition. Load-balancing for the parallel Gauss-Seidel algorithm is sufficiently effective that relative speedups greater than 20 have been observed in empirical performance measurements for iterative solvers on a 32 processor Thinking Machines CM-5.