We have implemented a parallel version of a block-diagonal-bordered sparse Gauss-Seidel algorithm in the C programming language for the Thinking Machines CM-5 multi-computer using the Connection Machine active message layer (CMAML) remote procedure call as the basis for interprocessor communications [14]. Significant improvements in the performance of the algorithm were observed for active messages, when compared to more traditional communications paradigms that use the standard blocking CMMD_send and CMMD_receive functions in conjunction with packing data into communications buffers. A significant portion of the communications require each processor to send short data buffers to every other processor, imposing significant communications overhead due to latency. To significantly reduce communications overhead and attempt to hide communications behind calculations, we implemented each portion of the algorithm using CMAML remote procedure calls ( CMAML_rpc). The communications paradigm we use throughout this algorithm is to send a double precision data value to the destination processor as soon as the value is calculated. Communications in the algorithm occur at distinct time phases, making polling for the active message handler function efficient. An active message on the CM-5 has a four word payload, which is more than adequate to send a double precision floating point value and an integer position indicator. The use of active messages greatly simplified the development and implementation of this parallel sparse Gauss-Seidel algorithm, because there was no requirement to maintain and pack communications buffers.
This implementation uses implicit data structures based on vectors of
C programming language structures to store and retrieve data
efficiently within the sparse matrix. These data structures provide
good cache coherence, because non-zero data values and column location
indicators are stored in adjacent physical memory locations. The data
structure is composed of six separate parts that implicitly store the
block-diagonal-bordered sparse matrix and the last block.
Figure 6 graphically illustrates the relationships
within the data structure. As illustrated in the figure, the
block-diagonal structure, the border-row structure, and the
last-block-diagonal structure contain pointers to the sparse row
vectors. The second values in the two diagonal pointers are the
values of , while the second value in the border-row structure
is the destination processor for the
vector
vector
product
from this border row used in calculating values in the last diagonal
block.
Our parallel Gauss-Seidel algorithm has the following distinct sections where blocks are defined in section 4:
Figure 7: Parallel Gauss-Seidel