While Gauss-Seidel algorithms for dense matrices are inherently
sequential, it is possible to identify portions of sparse matrices
that do not have mutual data dependencies, so calculations can proceed
in parallel on mutually independent matrix partitions while
maintaining the strict precedence rules in the Gauss-Seidel technique.
All parallelism in the Gauss-Seidel algorithm is derived from within
the actual interconnection relationships between elements in the
matrix. Ordering sparse matrices into block-diagonal-bordered form
can offer substantial opportunity for parallelism, because the values
of in entire sparse matrix partitions can be
calculated in parallel without requiring communications. Because the
sparse matrix is a single system of equations, all equations (with
off-diagonal variables) are dependent. Dependencies within the linear
system requires data movement from mutually independent partitions to
those equations that couple the linear system. After we develop the
Gauss-Seidel algorithm for a block-diagonal-bordered matrix, the
optimum data/processor assignments for an efficient parallel
implementation are straightforward.
While much of the parallelism in this algorithm comes from the
block-diagonal-bordered ordering of the sparse matrix, further
ordering of the last diagonal block is required to provide parallelism
in what would otherwise be a purely sequential portion of the
algorithm. The last diagonal block represents the interconnection
structure within the equations that couple the partitions
in the block-diagonal portion of the matrix. These equations are
rather sparse, often with substantially fewer off-diagonal matrix
elements (graph edges) than diagonal matrix elements (graph nodes).
Consequently, it is rather simple to color the graph representing this
portion of the matrix. Separate graph colors represent rows where
can be calculated in parallel, because within a
color, no two nodes have any adjacent edges. For the parallel
Gauss-Seidel algorithm, a synchronization barrier is required between
colors to ensure that all new
values are distributed
to the processors so that the strict precedence relation in the
calculations are maintained.