next up previous
Next: A New Three-step Up: Background Previous: Forward Reduction/Backward Substitution

A Survey of the Literature

Significant research effort has been expended to examine parallel matrix solvers --- for both dense and sparse matrices. Numerous papers have documented research on parallel dense matrix solvers [4,29,30], and these articles illustrate that good efficiency is possible when solving dense matrices on multi-processor computers. The calculation time complexity of dense matrix LU factorization is , and there are sufficient, regular calculations for good parallel algorithm performance. Some implementations are better than others [29,30], nevertheless, performance is deterministic for:

Direct sparse matrix solvers, on the other hand, have computational complexity significantly less than , and some papers refer to the complexity of particular applications ranging from to [18]. With significantly less calculations than dense direct solvers, and lacking uniform, organized communications patterns, direct parallel sparse matrix solvers often require detailed knowledge of the application to permit efficient implementations. The greater complexity and irregularity of sparse matrix computations make this field an area with active research as parallel sparse matrix solvers are developed and optimized for particular applications.

The basics for general sequential direct sparse matrix solvers, including numerical stability and ordering techniques to limit fillin are presented in [5]. There are many papers that examine sparse matrix algorithms for multi-processors and vector computers, however, there are limited numbers of papers describing research into parallel non-symmetric direct sparse matrix solvers. A concurrent non-symmetric sparse solver was developed as part of a petrochemical processing simulation [21]. Frontal or multi-frontal techniques offer some promise for algorithms designed for vector and vector computer-based multi-processors [4]. General sparse matrix solvers store data in compressed implicit data structures. In order to use vector processors in an effective manner, the hardware must have indirect addressing capabilities or the data must be processed with vector scatter and gather operations.

Meanwhile, the bulk of recent research into parallel direct sparse matrix techniques has centered around symmetric positive definite matrices, and implementations of Choleski factorization. A significant number of papers concerning parallel Choleski factorization for symmetric positive definite matrices have been published recently [7,8,9,12]. These papers have thoroughly examined many aspects of the parallel direct sparse matrix solver implementations, symbolic factorization, and appropriate data structures. Techniques to improve interprocessor communications using block partitioning methods have been examined in [24,25,26,27]. Techniques for sparse Choleski factorization have even been developed for single-instruction-multiple-data (SIMD) computers like the Thinking Machines CM-1 and the MasPar MPP [15]. This discussion is by no means an exhaustive literature survey, although it does represent a significant portion of the direct sparse matrix research performed for vector and multi-processor computers.

References [7,8,9,12,24,25,26,27] have kept with a general four step paradigm for parallel sparse Choleski factorization:

In this paper, we break from this four step process and introduce a process that is applicable to highly irregular sparse matrices. We propose a new three-step preprocessing phase to replace the two-step preprocessing phase of ordering and symbolic factorization. While the original development of our method was directed toward applications from the electrical power systems community, it has wider applicability to many irregular sparse matrix applications. Because of the irregular nature of the sparse matrices, explicit load balancing of the workload among the processors is required.

The direct matrix technique presented in this paper may even compete with other parallel sparse matrix solvers for regular problems, because task assignments for numerical factorization of a block-diagonal-bordered matrix depend only on the assignment of independent blocks to processors and the processor assignments of data in the last diagonal block. Techniques based on block-diagonal-bordered sparse matrices have absolutely no task organization graph, because the entire problem of task assignments are performed in an initial stage that performs load-balancing, and all data is available on the processor for calculations until the solution of the last diagonal block in the matrix. Communications are also more uniform and structured.



next up previous
Next: A New Three-step Up: Background Previous: Forward Reduction/Backward Substitution



David P. Koester
Sun Oct 22 16:27:33 EDT 1995