next up previous
Next: The Hierarchical Data Up: Parallel Choleski Factorization of Previous: The Node-tearing Implementation

Sparse Matrix Solver Implementations

Implementations of a block-diagonal-bordered sparse Choleski solver have been developed primarily in the C programming language for the Thinking Machines CM-5 multi-computer using message passing and a host-node paradigm. Portions of our implementation use existing FORTRAN routines to factor and solve the last diagonal block of the matrix. A version of the software is available that runs on a single processor on the CM-5 to provide empirical speed-up data to quantify multi-processor performance. Empirical performance data has been gathered for a range of numbers of processors and real power systems sparse load-flow matrices. This empirical data is presented in the next section. Our block-diagonal-bordered sparse Choleski solver can be viewed as having the following distinct sections where blocks are defined in section 4:

  1. Choleski factorization
  2. forward reduction
  3. backward substitution

The parallel implementation presented in this section has been developed as an instrumented proof-of-concept to examine the efficiency of each section of the code described above. The host processor is used to gather and tabulate statistics on the multi-processor calculations. Statistics are gathered in a manner that do not impact the total empirical measures of performance for factorization, forward reduction, or backward substitution. The last diagonal block can be factored in various manners, using either parallel dense or parallel sparse codes. Because the last block is generally relatively dense, this parallel implementation uses a parallel dense pipelined blocked kij-saxpy-based Choleski factorization algorithm. This algorithm uses both LAPACK and generalized BLAS routines for this portion of the algorithm [6]. When this algorithm is run on a single processor, a dense Choleski factorization algorithm from the LAPACK library and non-blocked forward reduction and backward substitution are utilized.





next up previous
Next: The Hierarchical Data Up: Parallel Choleski Factorization of Previous: The Node-tearing Implementation



David P. Koester
Sun Oct 22 15:40:25 EDT 1995