next up previous
Next: Empirical Results Up: Sparse Matrix Solver Previous: The Hierarchical Data

The Parallel Algorithm

Implementation of the parallel block-diagonal-bordered sparse matrix solver has been developed primarily in the C programming language for the Thinking Machines CM-5 multi-processor using a host-node paradigm with message passing. A pseudo-code representation of the Choleski factorization algorithm is presented in figure 11. The same implicit hierarchical data structure used in the parallel implementation can be readily used in a sequential implementation, although the sparse matrix data is no longer physically allocated to the distributed memory located with the various processors. This block-diagonal-bordered Choleski factorization implementation solves the last block using a dense blocked kij-saxpy Choleski algorithm [28]. Efforts have been relatively successful to minimize the size of the last diagonal block; consequently, this block is relatively dense --- often 20% to 35% dense. The reduced indexing overhead and regularity in data access justify the use of parallel dense Choleski algorithms for this relatively small partition of the overall matrix.

By distributing the factorization of the last block to all active node processors, partial sums of updates for values in the borders in the second phase of the factorization step must also be distributed to all processors. Only those nonzero rows in the last block are examined when calculating the required partial sum values, which significantly limits the amounts of calculations in this phase. In order to minimize communications times in this factorization step, care is taken to minimize the length of the interprocessor communications messages; and separate vectors containing only information for a particular processor are sent to each machine. Because of the sparsity of the rows in the border, there has been no attempt at parallel reduction of the partial sums of updates from the borders.

 
Figure 11: Parallel Block-diagonal-Bordered Sparse Choleski Factorization 

The remaining steps in the parallel algorithm are forward reduction and backward substitution. The parallel version of these algorithms take advantage of the fact that calculations can be performed in one of two distinct orders that preserve the precedence relation in the calculations. The combination of these techniques is utilized to minimize communications times when solving for the last diagonal block. Pseudo-code versions of the algorithms for parallel forward reduction and parallel backward substitution are presented in figures 12 and 13 respectively.

 
Figure 12: Parallel Sparse Forward Reduction 

 
Figure 13: Parallel Sparse Backward Substitution 



next up previous
Next: Empirical Results Up: Sparse Matrix Solver Previous: The Hierarchical Data



David P. Koester
Sun Oct 22 15:40:25 EDT 1995