LU factorization has none of the limitations on matrix characteristics that define Choleski factorization. As a result, there is no guarantee that the iterative solution will converge. Nevertheless, we extend the performance comparison of parallel direct solvers with parallel iterative Gauss-Seidel solvers from the previous section. Again we compare direct and iterative solver performance by determining the number of iterations for the parallel Gauss-Seidel given a number of (re)uses. Families of curves plotting the number of iterations versus the number of dishonest (re)uses for a double precision parallel block-diagonal-bordered LU-based solver are presented in figure 7.24 for one through ten reuses and one through 32 processors for the BCSPWR09 and BCSPWR10 power systems networks. Likewise, we present similar families of curves in figure 7.25 for parallel complex variate LU-based and Gauss-Seidel solvers. The shape of the curves show that the largest number of iterations possible for a constant time solution occur for a single use of the factored matrix. As the factorization is (re)used, the cost to factor the matrix is amortized over the additional (re)uses. For large numbers of factorization (re)uses, the curve becomes asymptotic to
where:¯
is the time for a single (parallel) LU triangular solution.
is the time for a single (parallel) Gauss-Seidel iteration.
Figure 7.24: Gauss-Seidel Iterations as a Function of Dishonest Reuses of the Double Precision Matrix
Figure 7.25: Gauss-Seidel Iterations as a Function of Dishonest Reuses of the Complex Matrix
Graphs in figures 7.24 and 7.25 illustrate that for
one use of the factored matrix, the number of Gauss-Seidel iterations
per dishonest reuse of the factored matrix are more for LU
factorization than for parallel Choleski solvers. We have shown in
section 7.1 that the time to factor a matrix into
LU versus is greater because there are
twice as many calculations in the LU factorization than Choleski
factorization. However, the time to perform the triangular solutions
are less for parallel LU solvers than parallel Choleski solvers ---
Choleski solvers must perform one of the triangular solutions for the
last diagonal block with less than optimal communications. As a
result, while the number of iterations for a single reuse of the
double precision solver may be greater, the number of available
iterations for multiple dishonest reuses actually decreases when
compared to the Choleski/Gauss-Seidel comparison curves. The effects
of the higher cost in the triangular solution phase can be clearly
seen when comparing graphs in figures 7.23 and 7.24.
For ten reuses with 32 processors, there would be time for 45
iterations per Choleski solution for the BCSPWR10 data set, meanwhile,
there would only be time for 25 parallel Gauss-Seidel iterations per
parallel double precision LU solution.
Parallel performance of the complex LU solver increases significantly when compared to the parallel Choleski and substantially when compared to the double precision LU solver. Meanwhile, there is only a small improvement in performance for the added calculations in parallel complex variate Gauss-Seidel versus the double precision version of this solver. As a result, parallel complex Gauss-Seidel would offer less potential for improved performance than for parallel double precision Gauss-Seidel.