next up previous
Next: Performance Predictions for Up: Algorithm Performance on Previous: Parallel Computing for

Overhead-Based Performance Estimates

 

For many concurrent algorithms, overhead associated with the concurrent algorithm appears more critical than the inherent percentage of sequential operations in the algorithm. In these situations, parallel overhead provides a better preliminary estimate of the potential speedup in a concurrent algorithm than Amdahl's Law. In these instances, the parallel execution time for an algorithm can be defined as

 

 where:¯

is the parallel execution time.

is the sequential execution time.

p is the number of processors --- .

is the total parallel overhead.

For predicting parallel algorithm performance with future architectures, we assume that the algorithm will be applied to a matrix that can be partitioned into block-diagonal-bordered form with no load-imbalance, and we assume that the same algorithm is being implemented on both architectures. As a result, we can assume that the only component of interest in , the total parallel overhead, is , the communications overhead, and we can rewrite equation gif as

 

This can be further rewritten as

 

by substituting relative speedup, for . Relative speedup is defined in equation gif.

Communications overhead, , is a measure of the additional workload incurred in a parallel algorithm as a result of interprocessor communications [17,25,28], and is dependent on the computation-to-communications ratio, not just the amount of communications

 

or

 

where is the metric describing the computational capability of a single processor [17], is the metric describing the communications characteristics, and the constant is an algorithm dependent coefficient of proportionality. The quantity is often referred to as the computation-to-communications ratio and is related to the granularity of the parallel algorithm. For traditional buffered interprocessor communications, is a linear combination of latency, bandwidth, and message size

 

is the communications latency or startup time, is the bandwidth measured in bytes per second, and is the number of words or four byte units of data. For active messages, is the product of the message latency and number of messages, ,

 

We can determine , the algorithm dependent constant of proportionality by combining formulas gif and gif to yield

 

We can calculate estimates of speedup for a new parallel architecture without detailed simulation of an algorithm by using available empirical timing data to calculate the algorithm dependent constant of proportionality, and combine it with the number of processors and parallel architecture characteristics. Let be the communication overhead for the new architecture, then

Terms can be reordered to yield

 

The value of can be used to calculate , the speedup for the new architecture, using

In the next section, we present an analysis that predicts performance for future architectures using these formulas.



next up previous
Next: Performance Predictions for Up: Algorithm Performance on Previous: Parallel Computing for



David P. Koester
Sun Oct 22 17:27:14 EDT 1995