next up previous
Next: Organization of this Up: Introduction Previous: Low-Latency Communications

Embedded Software Applications

Our research has examined the performance of block-diagonal-bordered direct solvers, with implementations for both Choleski and LU factorization, to be incorporated within electrical power system applications. Because we are considering software to be embedded within a more extensive application, we examine efficient parallel forward reduction and backward substitution algorithms in addition to parallel factorization algorithms. Due to the reduced amount of calculations in the triangular solution phases of solving a system of factored linear equations, these algorithms are often ignored when parallel Choleski or LU factorization algorithms are presented in the literature. However, we will show that for large power systems network matrices, there is less than an order of magnitude difference in parallel factorization and parallel triangular solution empirical run times.

In our research, we have found that the development of parallel factorization algorithms must consider forward reduction and backward substitution, because the choice of the order of calculations in factorization can greatly influence the performance of the parallel triangular solutions. In order to ensure cache hits, data structures are dependent on the order of calculations, and data structures affect the amount of communications in parallel forward reduction and backward substitution. We have found that the results of additional communications overhead can eliminate any potential speedup for parallel forward reduction with column oriented data storage. This communications overhead cannot be eliminated for Choleski factorization, where either forward reduction or backward substitution must be performed with an implicit transpose of the factored matrix [29,40]. Fortunately, the LU factorization algorithm can be implemented in a manner to eliminate this communications overhead problem.

One goal of our research has been to compare performance for parallel block-diagonal-bordered direct and iterative solvers to determine which algorithm could provide the best performance when embedded within a parallel power systems application. Direct methods obtain the exact solution for in a finite number of operations, whereas iterative methods calculate sequences of approximations that may or may not converge to the solution. Nevertheless, if the parallel sparse Gauss-Seidel method does converge and does so rapidly for an application, the iterative technique can provide algorithmic speedup when compared to parallel direct sparse methods. For sequential dense Gauss-Seidel and LU factorization, both a Gauss-Seidel iteration and forward-reduction and backward substitution have the same computational complexity, ; however, a single iteration of the sparse Gauss-Seidel algorithm has less computations than the combination of sparse forward reduction and backward substitution. Both algorithms perform a sparse matrix vector product; however, the a Gauss-Seidel iteration has only the original non-zeros in the sparse matrix, while the forward reduction and backward substitution for a direct method also includes fillin --- or values that become non-zero during the process of factorization. The computational complexity for dense factorization, , is even greater than for forward reduction and backward substitution, adding computational costs to the parallel sparse direct methods.

To illustrate the potential algorithmic speedup available for the parallel iterative solver, we provide parametric comparisons of the parallel sparse Gauss-Seidel algorithm and the parallel sparse direct methods using empirical data collected on the Thinking Machines CM-5. Comparisons of algorithm performance can be made as a function of the number of iterations that can be performed in the time to perform the factorization and any number of forward reductions and backward substitutions. Performance improvements can only be assured for the solution of diagonally dominate or positive definite matrices, where convergence with the Gauss-Seidel method is ensured [23].



next up previous
Next: Organization of this Up: Introduction Previous: Low-Latency Communications



David P. Koester
Sun Oct 22 17:27:14 EDT 1995