next up previous
Next: Empirical Results Up: A Parallel Gauss-Seidel Algorithm Previous: Graph Coloring

Parallel Gauss-Seidel Implementation

We have implemented a parallel version of a block-diagonal-bordered sparse Gauss-Seidel algorithm in the C programming language for the Thinking Machines CM-5 multi-computer using the Connection Machine active message layer (CMAML) remote procedure call as the basis for interprocessor communications [14]. Significant improvements in the performance of the algorithm were observed for active messages, when compared to more traditional communications paradigms that use the standard blocking CMMD_send and CMMD_receive functions in conjunction with packing data into communications buffers. A significant portion of the communications require each processor to send short data buffers to every other processor, imposing significant communications overhead due to latency. To significantly reduce communications overhead and attempt to hide communications behind calculations, we implemented each portion of the algorithm using CMAML remote procedure calls ( CMAML_rpc). The communications paradigm we use throughout this algorithm is to send a double precision data value to the destination processor as soon as the value is calculated. Communications in the algorithm occur at distinct time phases, making polling for the active message handler function efficient. An active message on the CM-5 has a four word payload, which is more than adequate to send a double precision floating point value and an integer position indicator. The use of active messages greatly simplified the development and implementation of this parallel sparse Gauss-Seidel algorithm, because there was no requirement to maintain and pack communications buffers.

This implementation uses implicit data structures based on vectors of C programming language structures to store and retrieve data efficiently within the sparse matrix. These data structures provide good cache coherence, because non-zero data values and column location indicators are stored in adjacent physical memory locations. The data structure is composed of six separate parts that implicitly store the block-diagonal-bordered sparse matrix and the last block. Figure 6 graphically illustrates the relationships within the data structure. As illustrated in the figure, the block-diagonal structure, the border-row structure, and the last-block-diagonal structure contain pointers to the sparse row vectors. The second values in the two diagonal pointers are the values of , while the second value in the border-row structure is the destination processor for the vector vector product from this border row used in calculating values in the last diagonal block.

 
Figure 6: The Data Structure 

Our parallel Gauss-Seidel algorithm has the following distinct sections where blocks are defined in section 4:

  1. solve for in the diagonal blocks
  2. calculate by forming the matrix vector products in parallel
  3. solve for in the last diagonal block
A pseudo-code representation of the parallel Gauss-Seidel solver is presented in figure 7. A version of the software is available that runs on a single processor on the CM-5 to provide empirical speed-up data to quantify multi-processor performance. This sequential software includes the capability to gather convergence-rate data. The parallel implementation has been developed as an instrumented proof-of-concept to examine the efficiency of each section of the code described above. The host processor is used to gather and tabulate statistics on the multi-processor calculations. Statistics are gathered at synchronization points, so there is no impact on total empirical measures of performance. Empirical performance data is presented in the next section for varied numbers of processors solving real power systems sparse load-flow matrices.

 
Figure 7: Parallel Gauss-Seidel 



next up previous
Next: Empirical Results Up: A Parallel Gauss-Seidel Algorithm Previous: Graph Coloring



David P. Koester
Sun Oct 22 15:29:26 EDT 1995