For the three solver implementations, there are increasing amounts of floating point calculations in double precision Choleski factorization, double precision LU factorization, and complex LU factorization, with a relative workload of approximately 6:2:1. While there are differing amounts of calculations in these algorithms, there are equal amounts of communications. We present timing comparisons for the three solver implementations in figures 43 through 47 for the five sample power system networks. These graphs each have six curves --- three each for factorization and for the triangular solution. These graphs illustrate just how sensitive parallel sparse direct solvers for power systems networks are to the relative amount of communications overhead. This sensitivity is not totally unexpected, given the extremely sparse nature of power systems matrices.
Figure 43: Parallel Choleski and LU Timing Comparisons --- BCSPWR09 --- Maximum Nodes per Partition = 32
Figure 44: Parallel Choleski and LU Timing Comparisons --- BCSPWR10 --- Maximum Nodes per Partition = 32
Figure 45: Parallel Choleski and LU Timing Comparisons --- EPRI6K --- Maximum Nodes per Partition = 16
Figure 46: Parallel Choleski and LU Timing Comparisons --- NiMo-OPS --- Maximum Nodes per Partition = 32
Figure 47: Parallel Choleski and LU Timing Comparisons --- NiMo-PLANS --- Maximum Nodes per Partition = 32
These graphs show the time in milliseconds that it will take to factor and calculate a triangular solution for these matrices. These graphs also show the relative amount of floating point operations between implementations for a single processor. That ratio of time to factor the matrix decreases as additional processors are used when solving the sparse matrix. With constant amounts of communications, communications overhead has proportionally less of an effect when there are more calculations, and it is easier to hide communications behind calculations when there are more floating point operations to perform.
These graphs also illustrate some important information concerning
parallel triangular solutions for Choleski factorization. First,
while there is only half the calculations in the factorization step,
there is no reduction in the number of calculations in the triangular
solution step. To calculate the triangular solution, every non-zero
coefficient in L is used once during forward reduction and
every non-zero coefficient in must also be used once during
backward substitution. While it is possible to avoid explicitly
performing the matrix transpose, one of the triangular solutions will require
additional overhead because of the data must be explicitly or
implicitly column oriented. This solution phase will require
additional communications,
as compared to
. The number of non-zeros is greater than the number of
row/columns in the matrix. For a single processor, there is the same
amount of work when solving the factored equations for double
precision LU and Choleski. However, the effect of the additional
communications overhead has a noticeable effect on the slope of the
curve representing the triangular solution for Choleski solvers. This
phenomenon can be seen for all five power systems networks.