next up previous
Next: Conclusions Up: Comparing Parallel Direct Previous: Parallel Choleski versus

Parallel LU Solvers versus Parallel Gauss-Seidel

LU factorization has none of the limitations on matrix characteristics that define Choleski factorization. As a result, there is no guarantee that the iterative solution will converge. Nevertheless, we extend the performance comparison of parallel direct solvers with parallel iterative Gauss-Seidel solvers from the previous section. Again we compare direct and iterative solver performance by determining the number of iterations for the parallel Gauss-Seidel given a number of (re)uses. Families of curves plotting the number of iterations versus the number of dishonest (re)uses for a double precision parallel block-diagonal-bordered LU-based solver are presented in figure 7.24 for one through ten reuses and one through 32 processors for the BCSPWR09 and BCSPWR10 power systems networks. Likewise, we present similar families of curves in figure 7.25 for parallel complex variate LU-based and Gauss-Seidel solvers. The shape of the curves show that the largest number of iterations possible for a constant time solution occur for a single use of the factored matrix. As the factorization is (re)used, the cost to factor the matrix is amortized over the additional (re)uses. For large numbers of factorization (re)uses, the curve becomes asymptotic to

 where:¯

is the time for a single (parallel) LU triangular solution.

is the time for a single (parallel) Gauss-Seidel iteration.

 
Figure 7.24: Gauss-Seidel Iterations as a Function of Dishonest Reuses of the Double Precision Matrix  

 
Figure 7.25: Gauss-Seidel Iterations as a Function of Dishonest Reuses of the Complex Matrix  

Graphs in figures 7.24 and 7.25 illustrate that for one use of the factored matrix, the number of Gauss-Seidel iterations per dishonest reuse of the factored matrix are more for LU factorization than for parallel Choleski solvers. We have shown in section 7.1 that the time to factor a matrix into LU versus is greater because there are twice as many calculations in the LU factorization than Choleski factorization. However, the time to perform the triangular solutions are less for parallel LU solvers than parallel Choleski solvers --- Choleski solvers must perform one of the triangular solutions for the last diagonal block with less than optimal communications. As a result, while the number of iterations for a single reuse of the double precision solver may be greater, the number of available iterations for multiple dishonest reuses actually decreases when compared to the Choleski/Gauss-Seidel comparison curves. The effects of the higher cost in the triangular solution phase can be clearly seen when comparing graphs in figures 7.23 and 7.24. For ten reuses with 32 processors, there would be time for 45 iterations per Choleski solution for the BCSPWR10 data set, meanwhile, there would only be time for 25 parallel Gauss-Seidel iterations per parallel double precision LU solution.

Parallel performance of the complex LU solver increases significantly when compared to the parallel Choleski and substantially when compared to the double precision LU solver. Meanwhile, there is only a small improvement in performance for the added calculations in parallel complex variate Gauss-Seidel versus the double precision version of this solver. As a result, parallel complex Gauss-Seidel would offer less potential for improved performance than for parallel double precision Gauss-Seidel.



next up previous
Next: Conclusions Up: Comparing Parallel Direct Previous: Parallel Choleski versus



David P. Koester
Sun Oct 22 17:27:14 EDT 1995