next up previous contents
Next: Examining Speedup Up: Empirical Results --- Previous: Selecting Partitioned Matrices

Comparing Timing Performance for Direct Solver Implementations

For the three solver implementations, there are increasing amounts of floating point calculations in double precision Choleski factorization, double precision LU factorization, and complex LU factorization, with a relative workload of approximately 6:2:1. While there are differing amounts of calculations in these algorithms, there are equal amounts of communications. We present timing comparisons for the three solver implementations in figures 43 through 47 for the five sample power system networks. These graphs each have six curves --- three each for factorization and for the triangular solution. These graphs illustrate just how sensitive parallel sparse direct solvers for power systems networks are to the relative amount of communications overhead. This sensitivity is not totally unexpected, given the extremely sparse nature of power systems matrices.

 
Figure 43: Parallel Choleski and LU Timing Comparisons --- BCSPWR09 --- Maximum Nodes per Partition = 32  

 
Figure 44: Parallel Choleski and LU Timing Comparisons --- BCSPWR10 --- Maximum Nodes per Partition = 32  

 
Figure 45: Parallel Choleski and LU Timing Comparisons --- EPRI6K --- Maximum Nodes per Partition = 16  

 
Figure 46: Parallel Choleski and LU Timing Comparisons --- NiMo-OPS --- Maximum Nodes per Partition = 32  

 
Figure 47: Parallel Choleski and LU Timing Comparisons --- NiMo-PLANS --- Maximum Nodes per Partition = 32  

These graphs show the time in milliseconds that it will take to factor and calculate a triangular solution for these matrices. These graphs also show the relative amount of floating point operations between implementations for a single processor. That ratio of time to factor the matrix decreases as additional processors are used when solving the sparse matrix. With constant amounts of communications, communications overhead has proportionally less of an effect when there are more calculations, and it is easier to hide communications behind calculations when there are more floating point operations to perform.

These graphs also illustrate some important information concerning parallel triangular solutions for Choleski factorization. First, while there is only half the calculations in the factorization step, there is no reduction in the number of calculations in the triangular solution step. To calculate the triangular solution, every non-zero coefficient in L is used once during forward reduction and every non-zero coefficient in must also be used once during backward substitution. While it is possible to avoid explicitly performing the matrix transpose, one of the triangular solutions will require additional overhead because of the data must be explicitly or implicitly column oriented. This solution phase will require additional communications, as compared to . The number of non-zeros is greater than the number of row/columns in the matrix. For a single processor, there is the same amount of work when solving the factored equations for double precision LU and Choleski. However, the effect of the additional communications overhead has a noticeable effect on the slope of the curve representing the triangular solution for Choleski solvers. This phenomenon can be seen for all five power systems networks.



next up previous contents
Next: Examining Speedup Up: Empirical Results --- Previous: Selecting Partitioned Matrices



David P. Koester
Sun Oct 22 15:31:10 EDT 1995