NPAC Technical Report SCCS-679

Parallel Direct Methods for Block-Diagonal-Bordered Sparse Matrices

David Koester, Sanjay Ranka, Geoffrey Fox

Submitted December 17 1994

Abstract

This paper presents research into parallel direct methods for block-diagonal-bordered sparse matrices --- LU factorization and Choleski factorization algorithms developed with special consideration for irregular sparse matrices from the electrical power systems community. Direct block-diagonal-bordered sparse linear solvers exhibit distinct advantages when compared to general direct parallel sparse algorithms for irregular matrices. Task assignments for numerical factorization on distributed-memory multi-processors depend only on the assignment of data to blocks, and data communications are significantly reduced with uniform and structured communications. Factorization algorithms for block-diagonal-bordered form matrices require a specialized ordering step coupled to an explicit load balancing step in order to generate this matrix form and to uniformly distribute the computational workload for an irregular matrix throughout a distributed-memory multi-processor. This ordering relates to more general sparse direct solver algorithms that use elimination trees --- however, this algorithm develops an elimination tree with only two-levels of supernodes. Matrix orderings are performed using a diakoptic technique based on node-tearing-nodal analysis, with load balancing to optimize performance when factoring the diagonal blocks and borders, the lower layer of supernodes in the elimination tree. Empirical performance measurements for real power system networks are presented for implementations of a parallel block-diagonal-bordered LU algorithm and a similar Choleski algorithm run on a distributed memory Thinking Machines CM-5 multi-processor. The algorithms presented here requires active message remote procedure calls in order to minimize communications overhead and obtain good relative speedup. The paradigm used with active messages greatly simplified the implementation of these sparse matrix algorithms.