next up previous
Next: Conclusions Up: Power Systems Transient Stability Previous: Relevant Transient Stability

Estimates of Potential Speedup

The numerical analysis techniques proposed in this paper have the potential to be the basis for power system transient stability simulations that exhibit substantial speedup when compared to both present sequential programs and those parallel implementations developed for distributed-memory multiprocessors. In order to estimate potential speedup for power system transient stability simulations, estimates of the percentage of sequential calculations and concurrent overhead will be used as input respectively to Amdahl's law and overhead based estimates of speedup. The definitions for these estimators of speedup are in definition four and five.

The source of most sequential operations in the parallel algorithm will be a result of precedence in the direct solution of linear equations when solving the DAEs. If the matrix is reordered into block-bordered diagonal form, there will be significant reduction in the precedence in operations and a significant increase in the amount of available parallelism in the algorithm. We estimate that it should be possible to limit the amount of solely sequential operations to less than 5%, so, according to Amdahl's Law a maximum relative speedup of 20 could be possible for a concurrent transient stability algorithm. This estimate does not include speedup due to better transient stability algorithms. Even if the performance of dynamic partitioning algorithms are less than the speedup of five encountered in [47], there is the possibility of substantial speedup when combining better algorithms and concurrent processing. This is a very coarse estimate of potential performance that only considers sequential portions of the algorithm.

Fair speedup is a often a more realistic estimator of potential performance improvement than relative speedup because fair speedup accounts for more factors that are involved with the development of parallel algorithms. Thus, estimates of potential speedup based on examining overhead in the algorithm can produce a much more realistic estimate of potential performance than does Amdahl's law. The research opportunities discussed in this paper could experience any of the four types of overhead listed in Definition 5. Often research into parallel algorithms is an attempt to examine the trade-offs between various sources of overhead, in order to minimize the aggregate overhead. Examples of overhead in the proposed research areas are presented below.

The parallel time-domain method [2] is an example of algorithmic overhead, or additional calculations that are performed in a parallel implementation that are not required in a sequential implementation. These additional calculations can significantly reduce potential speedup. The methods to solve the power system transient stability DAEs proposed in this paper require no additional calculations for the parallel versus sequential algorithms. This removes a significant source of potential overhead that erodes potential performance. Meanwhile, the proposal to examine dynamic partitioning techniques does introduce a source of algorithmic overhead. In a real-time parallel implementation, it would be required to examine load-balancing in real-time to minimize load balance overhead. Such calculations are not required in sequential algorithms. Meanwhile, the power systems under study are generally quite constant, so some work on load balancing for pre-chosen fault locations can be performed in advance to minimize the impact on real-time calculations.

The block-bordered diagonal form of the sparse matrix can simplify the distribution of data to processors in such a manner as to minimize communications overhead for either direct or iterative techniques to solve the sparse linear systems, even if partial pivoting is required in the direct solutions or in the preconditioning step for iterative solutions. There should be limited interprocessor communications required in the factorization of the block-bordered matrix until the narrow borders are factored. Likewise, some communications would be required for the forward reduction and backward substitution phases for the variables in the borders, however, there should be no communications required for these operations in other portions of block-bordered diagonal matrices. Consequently, the amount of interprocessor communications should be minimal, while required communications should be performed in a regular manner to minimize bottlenecks and should be overlapped with calculations when ever possible. Active Messages is presently available for parallel software developers to overlap communications with calculations using Split-C [45].

The remaining form of overhead, parallel software overhead, should be minimal as long as the granularity of calculations per processor is reasonably large. Consequently, the total amount of overhead in a concurrent transient stability simulation should be minimal. While there is no way to estimate the amount of overhead exactly without detailed simulations of the various algorithms or by actually implementing the algorithms, preliminary estimates of speedup can be developed using parametric values of overhead. We believe that aggregate overhead should be less than 20% for well-constructed concurrent algorithms. This will yield potential speedups of , or greater than 26 for 32 processors. [47] reported a speedup of five for a transient stability dynamic partitioning algorithm on a shared-memory multiprocessor. Assuming that there will be 20% greater overhead for a distributed-memory multiprocessor implementation of the dynamic partitioning algorithm, it may be possible to obtain total speedups of as much as 100 for 32 processors when combining algorithmic and concurrent speedup. For larger multicomputers with as many as 1024 processors, it may be possible to get nearly thousand-fold speedup for power system simulations as long as the system modeled is sufficiently large. In general, only state or regional controlling authorities would have sufficiently large power system networks to efficiently use such large processors. If significant numbers of processors are utilized for each independent block in the block-bordered diagonal matrix, communications overhead could increase sharply and affect these estimates.

In concluding this discussion of potential speedup for advanced implementations of transient stability simulations, there are areas where speedup may be obtained by simply utilizing the advanced features of distributed-memory multiprocessors. Most notably, a sequential power system transient stability simulation is so large that only small portions of the data can be stored in cache or other very fast memory at any time. Partitioning the problem into smaller pieces on multiple processors many permit significantly more, and possibly all, data to be available in cache or fast memory. Due to the size of the data structures in a sequential implementation, there will be many cache misses, which slow processing as data in the cache is swapped out for new data that is required in the present stage of processing. By placing the data onto multiple processors, cache misses can be minimized, offering significant unseen benefits to speedup the calculations.



next up previous
Next: Conclusions Up: Power Systems Transient Stability Previous: Relevant Transient Stability



David P. Koester
Sun Oct 22 16:35:54 EDT 1995