next up previous
Next: Crout's LU Factorization Up: Background Previous: LU Factorization

Precedence in LU Factorization

Parallel implementations of direct linear solvers must cope with the precedence relationships that exist in the order of calculations. Before calculations can be completed on a row or column, non-zero values from previous rows and columns must be available and all calculations in those rows and columns must be completed. Precedence relationships in the calculations cause frequent synchronization in parallel algorithms, which reduce the granularity in the amount of available calculations making it more difficult to uniformly distribute processing load.

Varying the order of the for loops in a dense LU factorization causes different algorithms: some LU factorization algorithms offer more available parallelism than others. One class of LU factorization algorithms is referred to as fan-in algorithms, because data from previous rows or columns are sent inward to the column and row being modified. This algorithm type is illustrated in figure 2, in addition to the fan-out factorization technique. For fan-out factorization, data from one row or column can be used in the modification step for all subsequent rows and columns. However, a row or column is not completely factored until all previous rows and columns have been modified. In order to develop an efficient parallel block-diagonal-bordered LU factorization algorithm, a hybrid combination of both fan-in and fan-out precedence relations have been included in the same algorithm.

 
Figure 2: Fan-in and Fan-out LU Factorization Algorithms for Dense Matrices  

For dense matrix fan-in algorithms, only a single column or row can be modified at any time; however, multiple rows or columns can be modified concurrently in sparse matrices due to the fact that multiple rows and columns may be independent of other rows and columns. For a fan-in algorithm, the block-diagonal-bordered sparse matrix form presented in figure 1 clearly illustrates column and row independence for the independent diagonal blocks. There simply is no data in previous rows/columns to be included in the calculations. This phenomenon will be described in greater detail in section 3.

The goals of this research are to develop highly efficient, scalable parallel sparse LU factorization algorithms. To accomplish this goal, we propose redefining the matrix ordering phase to more efficiently exploit mutually independent calculations while significantly reducing concerns with precedence throughout as many of the calculations in a parallel implementation as possible. In this work, we plan to minimize the effects of precedence in block-diagonal-bordered sparse matrix algorithms by localizing calculations in independent sub-matrices that are spread throughout a multi-processor.



next up previous
Next: Crout's LU Factorization Up: Background Previous: LU Factorization



David P. Koester
Sun Oct 22 16:27:33 EDT 1995