next up previous
Next: Performance Analysis Up: A Parallel Gauss-Seidel Algorithm Previous: Parallel Gauss-Seidel Implementation

Empirical Results

Overall performance of our parallel Gauss-Seidel linear solver is dependent on both the performance of the matrix ordering in the preprocessing phase and the performance of the parallel Gauss-Seidel implementation. Because these two components of the parallel Gauss-Seidel implementation are inextricably related, the best way to assess the potential of this technique is to measure the speedup performance using real power system load-flow matrices. We first present speedup and efficiency data for three separate power systems matrices:

Matrices BCSPWR09 and BCSPWR10 are from the Boeing Harwell series and represent electrical power system networks from the Western and Eastern US respectively. The EPRI-6K matrix is distributed with the Extended Transient-Midterm Stability Program (ETMSP) from EPRI. These matrices were preprocessed using a sequential program that ordered the matrix, load balanced each ordering step, and subsequently produced the implicit data structures required for the parallel block-diagonal-bordered Gauss-Seidel linear solver. Due to the static nature of the power system grid, such an ordering would be reused over many hours of calculations in real electrical power utility operations load-flow applications.

Matrix preprocessing was performed for multiple values of , the input value to the node-tearing algorithm. Empirical performance data was collected for each of the aforementioned power systems matrices using 1 through 32 processors on the Thinking Machines CM-5 at the Northeast Parallel Architectures Center at Syracuse University. The NPAC CM-5 is configured with all 32 nodes in a single partition, so user software was required to define the number of processors used to actually solve a linear system. Empirical data collected on the parallel Gauss-Seidel algorithm will be presented in two ways. We first present speedup and efficiency data from the three power systems matrices. Relative speedup and efficiency are presented using the times required to perform four iterations and a single convergence check. Next, we provide a detailed performance analysis using actual run times for the individual subsections of the parallel Gauss-Seidel linear solver. This detailed performance analysis illustrates the efficacy of the load balancing step in the preprocessing phase, and illustrates other performance bottlenecks.

Definition --- Relative Speedup Given a single problem with a sequential algorithm running on one processor and a concurrent algorithm running on p independent processors, relative speedup is defined as

where is the time to run the sequential algorithm as a single process and is the time to run the concurrent algorithm on p processors.

Definition --- Relative Efficiency Relative efficiency is defined as

where is relative speedup and p is the number of processors.





next up previous
Next: Performance Analysis Up: A Parallel Gauss-Seidel Algorithm Previous: Parallel Gauss-Seidel Implementation



David P. Koester
Sun Oct 22 15:29:26 EDT 1995