next up previous
Next: Sparse Matrix Solver Up: LU Factorization of Previous: Numerical Factorization

Forward Reduction/Backward Substitution

Block-diagonal-bordered matrices have the potential for efficient reduction and substitution steps because of limited, regular communications, as seen in figure 1(b). The solution of values in the y vector corresponding to the independent diagonal blocks can be solved in a highly parallel manner. For the forward reduction step, the only communications required are to send the partial sums of products from rows in the borders to those processors that hold the data for the last block and these partial sums can be reduced in parallel in the same manner as the reduction of partial sums in the factorization step, depicted in figure 11. In essence, the partial sums of products of , such that are used to modify the value of . After all updates have been made using the values of in the independent blocks, the forward reduction of the last block can proceed using the best available techniques for the nature of this sub-matrix.

For the backward substitution step that calculates the values in the vector x, the values must be broadcast to all processors. If the data in the last block is distributed to multiple processors, each value must be broadcast to all other processors immediately after they are calculated. In this instance, special techniques that support lightweight communications processes, such as active messages [22] on the Thinking Machines CM-5 can be used to minimize communications latency. Regardless of the communications capabilities of the distributed-memory multi-processor, as soon as values in the x vector from the last block have been broadcast to all processors, calculations to update the y vector can be made with values in the borders because of the sub-matrix precedence relationship in backward substitution. After all values in the x vector from the last block have been solved, the remaining calculations to solve for values of x in the independent blocks can be performed in parallel without requiring additional interprocessor communications, because all data for rows are allocated to a processor and because blocks are independent with no requirement to send x vector values to other processors.

There are additional algorithm implementation features that can improve the efficiency of forward reduction and backward substitution. By including information in the data structures on the row and column locations of a value in the implicit storage of the matrix, indexing overhead can be minimized by permitting modifications in both the forward reduction and backward substitution steps that do not require the calculation of the row or column locations of a value.



next up previous
Next: Sparse Matrix Solver Up: LU Factorization of Previous: Numerical Factorization



David P. Koester
Sun Oct 22 16:27:33 EDT 1995