For many concurrent algorithms, overhead associated with the concurrent algorithm appears more critical than the inherent percentage of sequential operations in the algorithm. In these situations, parallel overhead provides a better preliminary estimate of the potential speedup in a concurrent algorithm than Amdahl's Law. In these instances, the parallel execution time for an algorithm can be defined as
where:¯For predicting parallel algorithm performance with future architectures, we assume that the algorithm will be applied to a matrix that can be partitioned into block-diagonal-bordered form with no load-imbalance, and we assume that the same algorithm is being implemented on both architectures. As a result, we can assume that the only component of interest in
is the parallel execution time.
is the sequential execution time.
p is the number of processors ---
.
is the total parallel overhead.
This can be further rewritten as
by substituting relative speedup, for
. Relative speedup is defined in equation
.
Communications overhead, , is a measure of the additional
workload incurred in a parallel algorithm as a result of
interprocessor communications [17,25,28], and is
dependent on the computation-to-communications ratio, not just the
amount of communications
or
where is the metric describing the computational capability
of a single processor [17],
is the metric describing
the communications characteristics, and the constant
is an
algorithm dependent coefficient of proportionality. The quantity
is often referred to as the
computation-to-communications ratio and is related to the granularity
of the parallel algorithm. For traditional buffered interprocessor
communications,
is a linear combination of latency,
bandwidth, and message size
is the communications latency or startup time,
is the bandwidth measured in bytes per second, and
is the number of words or four byte units of data. For
active messages,
is the product of the message latency and number of messages,
,
We can determine , the algorithm dependent constant of
proportionality by combining formulas
and
to yield
We can calculate estimates of speedup for a new parallel architecture
without detailed simulation of an algorithm by using available
empirical timing data to calculate the algorithm dependent constant of
proportionality, and combine it with the number of processors and
parallel architecture characteristics. Let be the
communication overhead for the new architecture, then
Terms can be reordered to yield
The value of can be used to calculate
, the speedup for the new architecture, using
In the next section, we present an analysis that predicts performance for future architectures using these formulas.