next up previous
Next: Differential-Algebraic Equation Solvers Up: Power Systems Transient Stability Previous: Power System Transient

Techniques to Speedup Transient Stability Simulations

At present, techniques available to the computational scientist to either improve the performance of an application or to make the implementation of a grand computing challenge problem even feasible fall into two categories:

  1. faster hardware
  2. more efficient algorithms
While emphasis is often primarily directed to the application of more computing power, the development of more efficient underlying algorithms can also be an effective means to reduce the wall-clock time for a problem as computationally intensive as transient stability simulations. Meanwhile, when attempting to enhance the performance of an application, a strong synergism exists between faster hardware and algorithms. In particular, the use of supercomputer hardware requires research into algorithms that make efficient use of vector processing or parallel processing architectures, especially when significant portions of applications are not embarassingly parallel.

Supercomputer technologies based on scalar or vector processing architectures are rapidly approaching the physical limitations of circuit size and logic switching speeds; consequently, concurrent or parallel processing has become the focus for addressing computational grand challenges [39]. Without a doubt, concurrent processing architectures will be the basis for the forthcoming teraflop ( trillion floating-point operations per second) supercomputers. Detailed, accurate, faster-than-real-time transient stability analysis for state or regional electrical power governing authorities will require these impressive computing capabilities to be effectively harnessed using efficient algorithms.

Basic definitions of speedup for sequential and concurrent algorithms are required in order that preliminary estimates of algorithm performance can be developed without extensive modeling of algorithms, computer architectures, and particular power systems.

Definition 1 (Speedup) Given a single problem with two sequential algorithms that exhibit execution times of and with , speedup is defined as

This simple, intuitive definition will be expanded in order to compare the performance of sequential and concurrent algorithms.

Definition 2 (Relative Speedup) Given a single problem with a sequential algorithm running on one processor and a concurrent algorithm running on p independent processors, relative speedup is defined as

where is the time to run the sequential algorithm as a single process and is the time to run the concurrent algorithm on p processors.

Definition 3 (Fair Speedup) Given a single problem with a sequential algorithm and concurrent algorithm running on p independent processors, fair speedup is defined as

where is the time to run the most efficient sequential algorithm as a single process and is the time to run the concurrent algorithm on p processors.

Definition 4 (Amdahl's Law) Given , the time to solve a problem on a single processor, then can be parameterized in by

where is the inherently sequential fraction of computations. The aforementioned estimate of can be used when estimating the relative speedup [37] by

Amdahl's Law can be used to estimate the maximum potential relative speedup by taking the inverse of the sequential portion of the parallel problem. According to Amdahl's law, a task with 1% sequential operations could obtain no more than a speedup of 100, regardless of the number of processors applied to the problem.

Definition 5 (Overhead-Based Estimates of Speedup) Amdahl's Law gives one preliminary estimate of the potential speedup in a concurrent algorithm, however, for some concurrent algorithms, overhead associated with the concurrent algorithm appears more critical than the inherent percentage of sequential operations in a concurrent algorithm. In these instances, the time for a parallel algorithm can be defined as

Consequently, an estimate of fair speedup can be obtained by

In this formula is the total amount of overhead from numerous sources [16] , that include:

  1. nonoptimal algorithm or algorithmic overhead --- additional calculations in the concurrent algorithm that are not present in a sequential algorithm
  2. load balancing --- speedup is limited by the processing time of the slowest node
  3. software overhead --- additional calculations that must be replicated at each processor, such as additional index calculations
  4. communications overhead --- idle time for processors as they wait for interprocessor communications that has not been overlapped with calculations
It is not always easy to predict the amount of overhead in a parallel algorithm without extensive, detailed simulation. This measure of concurrent algorithm performance, along with Amdahl's law, provide preliminary estimates of potential performance improvement when comparing algorithms with differing communications properties or when comparing substantially different concurrent and sequential algorithms.

The two techniques available to the computational scientist to address this problem can thus be restated as:

  1. execute as many instructions as possible concurrently
  2. reduce the number of instructions to solve the problem
Both of these techniques have been addressed in previous publications on research into transient stability simulations, nevertheless, significant areas of application dependent research still exist. Numerous papers exist that address the use of vector or parallel processing architectures for the transient stability problem [2,8,10,12,13,15,25,33,34,35,40,41,44,47]. Many papers in this field addressed theoretical applications of parallelism within the transient stability problem but offered no implementations [2,8,13,15,33,34,35]. Many recent papers address specific numerical analysis techniques and only a few papers have benchmark information from actual parallel processing hardware, and that hardware is often presently outdated [10,25,47]. Frequently those authors who have developed parallel processing implementations have relied on optimizing or automatic parallelizing compilers that were used with existing sequential transient stability software [10,25,47].

Meanwhile, algorithmic speedup research has been frequently reported in the IEEE Transaction on Power Systems, with only a single recent paper addressing the combination of algorithmic speedup in conjunction with parallelism [47]. Some parallel processing papers have addressed the synergism of algorithms with the computer architecture, however, the focus has been generally limited to techniques that permit parallelism by modifying the precedence when solving the DAEs at the normally sequential time-steps [2,10,33,34,35]. This parallel time-domain technique [2] technique has been widely used because it may have more calculations that can be performed in parallel than would a concurrent version of a sequential transient stability simulation and this parallel time-domain technique has permitted implementations that avoid addressing parallel implementations of such algorithms as the direct solution of linear systems of equations [10]. Nevertheless, the parallel time-domain technique requires significant additional calculations to be performed to enhance parallelism.

The example in [2], that illustrates the parallel time-domain technique, requires nearly 35% more calculations than solving the DAEs for sequential time-steps. These additional calculations are a result of reordering the lower bi-diagonal matrix that represents the application of the trapezoidal rule on the entire set of time-steps. Reordering the matrix modifies the precedence between calculations and causes fillin that is not present in the purely sequential algorithm. Such additional calculations contribute to algorithmic overhead --- or additional calculations that a parallel algorithm must perform that a sequential program does not perform. Any overhead specific to a concurrent implementation of an algorithm decreases the potential efficiency of that algorithm. For example, when 33% more calculations are required in a parallel algorithm, according to the overhead based estimate of speedup, the maximum speedup will be less than , where P is the number of processors. In spite of having to perform significant additional calculations, techniques like this are one method to utilize more parallel processors at a time. Techniques that require additional calculations are an inefficient use of resources if other techniques exist that can be exploited for parallelism without the requirement for additional calculations.

Additionally, the parallel time-domain technique is inherently limited to using a low order numerical integration technique, the trapezoidal rule, that may require shorter time steps and consequently more calculations due to accuracy limitations [20]. Because the power system transient stability problem primarily involves the solution of DAEs, additional discussion on DAE solvers will be included in the next section.

One of the interesting areas for algorithmic speedup in transient stability simulations is the partitioning of the electrical network under test into study and external areas that permit larger integration step sizes in external areas. Variable time-steps can be used because the generator equations are better behaved the further the electromagnetic distance from the initial fault. Such techniques are reported in [6,28,30,47], with a discussion of a parallel implementation on an Alliant FX/8 shared-memory multiprocessor being reported in [47]. That article concludes that an algorithmic speedup of approximately five is possible by simply limiting this source of unnecessary calculations. Significant improvements in both reducing the number of computer instructions and executing as many instructions in parallel on distributed-memory multiprocessors for the transient stability problem will require research into many areas not previously addressed in the state-of-the-art parallel power system transient stability analysis research described in the literature.



next up previous
Next: Differential-Algebraic Equation Solvers Up: Power Systems Transient Stability Previous: Power System Transient



David P. Koester
Sun Oct 22 16:35:54 EDT 1995