next up previous
Next: The Hierarchical Data Up: Parallel Sparse Matrix Previous: Forward Reduction and

Parallel Sparse Iterative Solver Implementations

We have implemented a parallel version of a block-diagonal-bordered sparse Gauss-Seidel algorithm in the C programming language for the Thinking Machines CM-5 multi-computer using using explicit message passing and a host-node paradigm. In order to have an implementation with extremely low-latency communications, we utilized the Connection Machine active message layer (CMAML) remote procedure call features [53,59]. We also implemented versions of the algorithms using non-blocking, buffered communications. A significant portion of the communications require each processor to send short data buffers to every other processor, imposing significant communications overhead due to latency. Substantial improvements in the performance of the algorithm were observed for low-latency active messages, when compared to more traditional communications paradigms that use non-blocking communications functions in conjunction with packing data into communications buffers. Throughout this discussion of parallel iterative sparse solvers, the active message communications paradigm is the means with which we implemented low-latency communications on the Thinking Machines CM-5.

The low-latency communications paradigm we use throughout this algorithm is to send a double precision or complex data value to the destination processor as soon as the value is calculated and the value is sent only to those processors that need the value. Communications in the algorithm occur at distinct time phases, making polling for the active message handler function efficient. An active message on the CM-5 has a four word payload, which is more than adequate to send a double precision floating point value and an integer position indicator or a similar complex value and integer position indicator. The use of active messages greatly simplified the development and implementation of this parallel sparse Gauss-Seidel algorithm, because there was no requirement to maintain and pack communications buffers.

A version of the software is available that runs on a single processor on the CM-5 to provide empirical speed-up data to quantify multi-processor performance. Empirical performance data has been gathered for a range of numbers of processors and real power systems sparse network matrices. Results based on empirical data collected in benchmarking trials are presented in section 7.2. This block-diagonal-bordered sparse Gauss-Seidel method has the following distinct segments which were derived in chapter gif:

  1. calculate in the diagonal blocks and upper border, and
    respectively, where (),
  2. update the values of using values in the lower border ---
    ---
    the actual implementations use values of ,
  3. calculate using the values of and the last diagonal block .




next up previous
Next: The Hierarchical Data Up: Parallel Sparse Matrix Previous: Forward Reduction and



David P. Koester
Sun Oct 22 17:27:14 EDT 1995