This thesis presents research into parallel linear solvers for
block-diagonal-bordered sparse matrices. The block-diagonal-bordered
form identifies parallelism that can be exploited for both
direct and iterative linear solvers. Direct methods obtain the
exact solution for in a finite number of
operations, whereas iterative methods calculate sequences of
approximations that may or may not converge to the solution. In order
to compare performance for parallel sparse direct and iterative linear
solvers for power systems network applications, we have developed
efficient parallel block-diagonal-bordered sparse direct
methods based on LU factorization and Choleski factorization
algorithms, and we have developed an efficient parallel
block-diagonal-bordered sparse iterative method based on the
Gauss-Seidel method. We are examining parallel sparse linear solvers
for embedded power systems applications, so the direct solvers we
implement also require parallel forward reduction and backward
substitution algorithms.
Solving sparse linear systems practically dominates scientific computing, but the performance of direct sparse matrix solvers has tended to trail behind its dense matrix counterparts [29]. Parallel sparse matrix solver performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. This additional parallelism is often described by elimination trees, graphs that illustrate the dependencies in the calculations [19,20,21,29,55,56,57,58,64]. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or rows during each pipelined communication cycle. The limited success with efficient sparse matrix solvers is not unexpected, because general sparse linear solvers require more complicated data structures and algorithms that must contend with irregular memory reference patterns. The irregular nature of many real-world sparse matrices has aggravated the task of implementing sparse matrix solvers on vector or parallel architectures: efficient algorithms for these classes of machines require regularity in available data vector lengths and in interprocessor communications patterns [11,25,47].
We have focused on developing parallel linear solvers optimized for sparse matrices from the power systems community --- in particular, we have examined linear solvers for matrices resulting from power distribution system networks. These matrices are some of the most sparse matrices encountered in real-life applications, while also being irregular. Recently, references [25,33] have reported scalable Choleski solvers with extremely good performance for large numbers of processors, but they are for matrices that have more rows/columns, that have more nonzero elements per row/column, and that are more regular than power systems matrices. When empirical performance of sparse linear solvers is examined using real, irregular sparse matrices, available parallelism in the sparse matrix or load-imbalance overhead can be as much the reason for poor parallel efficiency as the parallel algorithm or implementation [37,47].