All parallelism in the Gauss-Seidel algorithm is derived from within
the actual interconnection relationships between elements in the
matrix. Ordering sparse matrices into block-diagonal-bordered form
can offer substantial opportunity for parallelism, because the values
of in entire sparse matrix partitions can be
calculated in parallel without requiring communications. Because the
sparse matrix is a single system of equations, all equations sharing
off-diagonal variables are dependent. Dependencies within the linear
system requires data movement from mutually independent partitions to
those equations that couple the linear system. After we derive the
formulation of the Gauss-Seidel algorithm for a
block-diagonal-bordered matrix, the optimum data/processor assignments
for an efficient parallel implementation are straightforward.
While much of the parallelism in this algorithm is made clearly
visible as a result of the block-diagonal-bordered ordering of the
sparse matrix, further ordering of the last diagonal block is required
to provide parallelism in what would otherwise be a purely sequential
portion of the algorithm. The last diagonal block represents the
interconnection structure within the equations that couple the
partitions in the block-diagonal portion of the matrix. These
equations are rather sparse, often with substantially fewer
off-diagonal matrix elements (graph edges) than diagonal matrix
elements (graph nodes). Consequently, it is rather simple to color
the graph representing this portion of the matrix. Separate graph
colors represent rows where can be calculated in
parallel, because within a color, no two nodes have any adjacent
edges, and there is no precedence when performing the calculations. For
the parallel Gauss-Seidel algorithm, a synchronization barrier is
required between colors to ensure that all new
values are distributed to the processors so that the strict precedence
relation in the calculations are maintained.