In this paper, we describe both sequential and parallel variants of the LU factorization algorithm used to solve the system of linear equations Ax=b, where A is a sparse matrix in block-diagonal-bordered form. We describe methods to transform a general sparse matrix to block-diagonal-bordered form and we also describe an implementation of the parallel block-diagonal-bordered sparse matrix solver on the Thinking Machines CM-5 using explicit message passing. In section 2 of this paper, we describe in detail the mathematics of sparse LU factorization including the precedence relationships in the algorithms, the two step process presented in recent literature to preprocess a sparse matrix before performing sparse Choleski factorization, and a version of Crout's LU factorization algorithm for sparse matrix factorization. This section includes a general discussion of forward reduction and backward substitution, and also includes a survey of the sparse matrix literature in order to place this work in context with other research.
In section 3, we propose a new three-step preprocessing phase to replace the present two-step preprocessing phase. We describe the specialized ordering, pseudo factorization, and load balancing steps required for efficient parallel algorithm implementations on distributed-memory multi-processors. This preprocessing step is the key to the efficient use of block-diagonal-bordered sparse matrix algorithms for highly irregular sparse matrices. In section 4, we describe the options available for the LU factorization of block-diagonal-bordered matrices that are dependent on the precedence relationships inherent in the algorithm. In section 5, we describe both a sequential block-diagonal-bordered sparse matrix algorithm and a parallel version implemented on the CM-5. In section 6, we present performance results when testing the software implementations on sample matrices. These data include measured speedup and efficiency as a function of the order of the matrix for both the LU factorization step and the forward reduction/backward substitution steps. Lastly, in section 7, we present our conclusions and briefly describe future research.