next up previous
Next: Iterative Methods Up: Direct Methods Previous: Ordering Sparse Matrices

A Survey of the Literature for Parallel Direct Methods

Significant research effort has been expended to examine parallel direct methods --- for both dense and sparse matrices. Numerous papers have documented research on parallel dense matrix solvers [11,60,61], and these articles illustrate that good efficiency is possible when solving dense matrices on multi-processor computers. The calculation time complexity of dense matrix LU factorization is , and there are sufficient, regular calculations for good parallel algorithm performance. Some implementations are better than others [60,61], nevertheless, performance is deterministic for:

Direct sparse matrix solvers, on the other hand, have computational complexity significantly less than , and actual power system sparse matrices used in this work have orders of complexity less than . These orders of complexity are consistent with matrices from circuit analysis applications that have complexities ranging from to [48]. With significantly less calculations than dense direct solvers, and lacking uniform, organized communications patterns, direct parallel sparse matrix solvers often require detailed knowledge of the application to permit efficient implementations.

The bulk of recent research into parallel direct sparse matrix techniques has centered around symmetric positive definite matrices, and implementations of Choleski factorization. A significant number of papers concerning parallel Choleski factorization for symmetric positive definite matrices have been published recently [19,20,21,29]. These papers have thoroughly examined many aspects of the parallel direct sparse matrix solver implementations, symbolic factorization, and appropriate data structures. Techniques to improve interprocessor communications using block partitioning methods have been examined in [46,56,57,58].

Some of the most celebrated recent work has revived research into parallel sparse multifrontal Choleski techniques [25,33]. Multifrontal techniques identify parallelism within the matrix structure in a manner similar to references [19,20,21,29], but then create multiple small, dense matrices from independent rows/columns of data, and update each frontal matrix with dense techniques. Parallel sparse multifrontal algorithms have shown scalable performance for very-large, extremely regular sparse structural matrices. There has been some work on solving less-regular problems. Research has recently been published in [47] that describes load balancing techniques to support the work in [46]. Also, research has been ongoing to examine techniques that can efficiently factor irregular matrices using multifrontal techniques [8,9,10].

Techniques for sparse Choleski factorization have even been developed for single-instruction-multiple-data (SIMD) computers like the Thinking Machines CM-1 and the MasPar MPP [44]. These techniques rely on regularity in the data to avoid processor load-imbalance.

Developing efficient parallel sparse matrix factorization algorithms requires more than just implementing parallel versions of sparse direct algorithms. All parallelism is identified in the structure of the sparse matrix, so before parallel factorization of the matrix can proceed, preprocessing of the matrix must occur. References [19,20,21,29,56,57,58] have utilized a general two step preprocessing paradigm for parallel sparse Choleski factorization:

  1. order the matrix to minimize fillin,
  2. symbolically factor the matrix to identify fillin and to identify static data structures.
In this paper, we break from this two step preprocessing paradigm and introduce a new three-step preprocessing phase that includes:
  1. order the matrix,
  2. pseudo-factor the matrix,
  3. explicit load balance the matrix.
Pseudo-factorization is similar to the symbolic factorization step, although we require that the number of calculations in matrix partitions be calculated so that we can perform explicit load-balancing on the majority of the sparse matrix. Our three-step preprocessing phase is described in chapter gif.

This discussion is by no means an exhaustive literature survey, although it does represent a significant portion of the general direct sparse matrix research performed for vector and multi-processor computers.



next up previous
Next: Iterative Methods Up: Direct Methods Previous: Ordering Sparse Matrices



David P. Koester
Sun Oct 22 17:27:14 EDT 1995