next up previous
Next: Power System Applications Up: Parallel Choleski Factorization of Previous: Parallel Choleski Factorization of

Introduction

Solving sparse linear systems practically dominates scientific computing, but the performance of direct sparse matrix solvers have tended to trail behind their dense matrix counterparts [14]. Parallel sparse matrix solver performance generally is less than similar dense matrix solvers even though there is more inherent parallelism in sparse matrix algorithms than dense matrix algorithms. Parallel sparse linear solvers can simultaneously factor entire groups of mutually independent contiguous blocks of columns or rows without communications; meanwhile, dense linear solvers can only update blocks of contiguous columns or rows each pipelined communication cycle. The limited success with efficient sparse matrix solvers is not surprising, because general sparse linear solvers require more complicated data structures and algorithms that must contend with irregular memory reference patterns. The irregular nature of these problems has aggravated the task of implementing scalable sparse matrix solvers on vector or parallel architectures: efficient scalable algorithms for these classes of machines require regularity in available data vector lengths and in interprocessor communications patterns [3]. Nevertheless, when scalability of sparse linear solvers is examined using real irregular sparse matrices, the available parallelism in the sparse matrix can be as much the reason for poor parallel efficiency as the parallel algorithm or implementation [15].

In this paper we examine the applicability of parallel direct block-diagonal-bordered sparse solvers for real power system load-flow applications that require the solution of symmetric positive definite sparse matrices. Parallel block-diagonal-bordered sparse linear solvers offer the potential for regularity often absent from other parallel sparse solvers. Load flow analysis entails the solution of non-linear systems of simultaneous equations, which are performed by repeatedly solving sparse linear equations. For power system load-flow applications, however, the limited size of the matrices and load imbalance due to limited parallelism in the matrix structure significantly limits the number of processors that can be used efficiently for a single parallel Choleski solver. This fact will be evident in the empirical data collected on the CM-5. Our research into specialized matrix ordering techniques has shown that it is possible to order actual power system matrices readily into block-diagonal-bordered form, but load imbalance becomes excessive beyond four processors, limiting potential parallelism for a single parallel Choleski solver within an application. Nevertheless, other dimensions exist in electrical power system applications that can be exploited to efficiently make use of large-scale multi-processors. We believe that this research also has utility for other irregular sparse matrix applications where the data is hierarchical. Other sources of hierarchical matrices exist, for example, electrical circuits, that have the potential for larger numbers of equations than power system matrices.

In this paper, we examine the performance of a block-diagonal-bordered Choleski solver to be incorporated within electrical power system applications. Because we are considering software to be embedded within a more extensive application, we examine efficient parallel forward reduction and backward substitution algorithms in addition to parallel Choleski factorization algorithms. Due to the reduced amount of calculations in the triangular solution phases of solving a system of symmetric positive definite equations, these algorithms are often ignored when parallel Choleski algorithms are presented in the literature. We not only include a discussion of these algorithms, we also include an analysis of load balancing as a function of either solution phase: sparse block-diagonal-bordered Choleski factorization and forward reduction/backward substitution. Interprocessor communications costs would be too high to redistribute the data from an optimal load balance data/processor assignment for parallel Choleski factorization to an optimal load balance data/processor assignment for parallel forward reduction and backward substitution, so performance is examined for both factorization and triangular solutions with each load balance data/processor assignment.

Block-diagonal-bordered sparse matrix algorithms require modifications to the normal preprocessing phase described in numerous papers on parallel Choleski factorization [9,10,11,14,22,23,24,25,26,30]. Each of the numerous papers referenced above use the paradigm to order the sparse matrix and then perform symbolic factorization in order to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. We modify this commonly used sparse matrix preprocessing phase to include an explicit load balancing step coupled to the ordering step so that the workload is uniformly distributed throughout a distributed-memory multi-processor and parallel algorithms make efficient use of the computational resources.

This paper is organized as follows. In section 2, we introduce the electrical power system applications that are the basis for this work. In section 3, we briefly review Choleski factorization and forward reduction/backward substitution and present a review of the literature concerning general parallel Choleski factorization algorithms. This is followed by a theoretical derivation of the available parallelism in both the Choleski factorization and forward reduction/backward substitution phases when solving block-diagonal-bordered form sparse matrices. Paramount to exploiting the advantages of this parallel linear solver is ordering the irregular sparse power system matrices into this form in a manner that balances the workload among multi-processors. In section 5, we describe the three-step preprocessing phase used to generate matrix ordering for block-diagonal-bordered matrices with uniformly distributed processing load. In this section, we present pseudo-factorization and we review minimum degree ordering and pigeon-hole load balancing algorithms. We present the node-tearing algorithm developed to order matrices into block-diagonal-bordered form in section 6. In section 7, we describe our block-diagonal-bordered sparse Choleski algorithm that has been implemented on the CM-5. Analysis of the performance of these ordering techniques for actual power system load flow matrices from the Boeing-Harwell series are presented in section 8. Lastly, we present our conclusions concerning block-diagonal-bordered Choleski solvers for electrical power system applications in section 9.



next up previous
Next: Power System Applications Up: Parallel Choleski Factorization of Previous: Parallel Choleski Factorization of



David P. Koester
Sun Oct 22 15:40:25 EDT 1995