next up previous
Next: The Hierarchical Data Up: Parallel LU Factorization of Previous: Forward Reduction/Backward Substitution

Sparse Matrix Solver Implementations

Implementations of a block-diagonal-bordered sparse matrix solver have been developed in the C programming language for both sequential computers and for the Thinking Machines CM-5 multi-computer using message passing and a host-node paradigm. Performance data has been gathered for each of the two software implementations and performance comparisons are presented in the next section. Both the sequential and parallel block-diagonal-bordered sparse matrix solvers use implicit hierarchical data structures based on vectors of C programming language structures. These data structures are optimal for either sequential or parallel implementations and provide good cache coherence, because non-zero data values and row and column location indicators are stored in adjacent physical memory locations. The parallel implementation presented in this section has been developed as an instrumented proof-of-concept to examine the efficiency of the data structures and the efficiency of the basic message passing when partial sums are sent to the last block to update values. The last block in this initial parallel implementation has been factored sequentially in the host processor. In order to minimize communications times in the second step of factorization when partial sums of updates from the borders are sent to the host processor, vectors of all non-zero partial sums from a processor are constructed. For this implementation, there has been no attempt at parallel reduction of the partial sums of updates in the borders.

Section 5.2 and section 5.3 includes pseudo-code outlines of the sequential and parallel algorithms respectively. Appendix B contains more detailed versions of these algorithms that include all for and for each loops.





David P. Koester
Sun Oct 22 16:27:33 EDT 1995