Overall performance of our parallel Gauss-Seidel linear solver is dependent on both the performance of the matrix ordering in the preprocessing phase and the performance of the parallel Gauss-Seidel implementation. Because these two components of the parallel Gauss-Seidel implementation are inextricably related, the best way to assess the potential of this technique is to measure the speedup performance using real power system load-flow matrices. We first present speedup and efficiency data for three separate power systems matrices:
Matrix preprocessing was performed for multiple values of ,
the input value to the node-tearing algorithm. Empirical performance
data was collected for each of the aforementioned power systems
matrices using 1 through 32 processors on the Thinking Machines CM-5
at the Northeast Parallel Architectures Center at Syracuse University.
The NPAC CM-5 is configured with all 32 nodes in a single partition,
so user software was required to define the number of processors used
to actually solve a linear system. Empirical data collected on the
parallel Gauss-Seidel algorithm will be presented in two ways. We
first present speedup and efficiency data from the three power systems
matrices. Relative speedup and efficiency are presented using the
times required to perform four iterations and a single convergence
check. Next, we provide a detailed performance analysis using actual
run times for the individual subsections of the parallel Gauss-Seidel
linear solver. This detailed performance analysis illustrates the
efficacy of the load balancing step in the preprocessing phase, and
illustrates other performance bottlenecks.
Definition --- Relative Speedup Given a single problem with a sequential algorithm running on one processor and a concurrent algorithm running on p independent processors, relative speedup is defined as
where is the time to run the sequential algorithm as a
single process and
is the time to run the concurrent algorithm
on p processors.
Definition --- Relative Efficiency Relative efficiency is defined as
where is relative speedup and p is the number of processors.