- Speedup ---
:
- Given a single problem with two algorithms
that exhibit execution times of
and
with
, speedup for algorithm
is defined as

This simple, intuitive definition will be expanded in order to
compare the performance of sequential and parallel algorithms [51].
- Parallel Execution Time ---
:
- The time to run a parallel
algorithm on p processors [17,25].
- Sequential Execution Time ---
:
- The time to run a
sequential algorithm as a single process
[17,25].
- Relative Speedup ---
:
- Given a single problem with a
sequential algorithm running on one processor and a concurrent
algorithm running on p independent processors, relative speedup is
defined as

- Relative Efficiency ---
:
- Relative efficiency is defined
as

Efficiency can be viewed as the speedup-per-node [17].
- Amdahl's Law:
- A rule formalized by Gene Amdahl,
which defines limits on parallel program speedup as a function of the
sequential portion of the overall calculations. Given
, the
time to solve a problem on a single processor, then
can be
parameterized in
by

where
is the inherently sequential fraction of computations
[17,25,28,51]. The aforementioned estimate of
can be used when estimating the speedup
[51] by

Amdahl's Law can be used to estimate the maximum potential relative
speedup by taking the inverse of the sequential portion of the
parallel problem. According to Amdahl's law, a task with 10%
sequential operations could obtain no more than a speedup of 10,
regardless of the number of processors applied to the problem.
- Total Parallel Overhead ---
:
- The sum of all overhead
incurred in the parallel calculations by all processors.
includes overhead due to communications costs, load-imbalance, costs
for additional software replicated on each processor, and non-optimal
algorithms for the parallel processor [17,25].
is defined as

When
is calculated using empirical data, it can be either a
non-negative or negative quantity. Negative overhead signifies that
speedup for the problem has not been bounded by p, and thus
superlinear speedup has occurred. Superlinear speedup can result from
cache effects as a problem is divided onto multiple processors, the
amount of memory required on each processors is reduced, generally
proportionally to the number of processors. The entire problem may
not fit in fast cache for one or several processors, but the problem
may fit into fast cache memory as the number of processors increases.
Consequently, doubling the number of processors may more than double
the measured speedup.
- Communications Overhead ---
:
- A measure of the
additional workload incurred in a parallel algorithm as a result of
interprocessor communications [17,25,28].
is
dependent on the ratio of communications to calculations, not just the
amount of communications

where
is a metric describing the computational capability
of a single processor [17], and
represents the
communications characteristics. The quantity
is often referred to as the
computation-to-communications ratio. For traditional buffered
interprocessor communications,
is a linear combination of
latency, bandwidth, and message size

is the communications latency or startup time,
is the bandwidth measured in bytes per second, and
is the number of words or four byte units of data.
- Load-Imbalance Overhead ---
:
- Parallel speedup is
limited by the time of the slowest processor in the calculations.
is the sum of idle time for processors waiting for the slowest
processor [17,28] .
- Parallel Software Overhead ---
:
- Parallel algorithms
may require additional calculations that must be replicated at each
processor, such as additional index calculations --- the overhead to
startup loops must be replicated in spite of the fact that the loops
may have fewer applications, due to work being distributed amongst
multiple processors [17].
- Parallel Algorithmic Overhead ---
:
- Efficient parallel
algorithms may require additional calculations that are not present in a sequential algorithm.
denotes the additional work performed in the parallel algorithm not
required in the sequential algorithm [17].
- Overhead-Based Performance Estimate ---
:
- Amdahl's Law gives one preliminary estimate of the
potential speedup in a concurrent algorithm; however, for some
concurrent algorithms, overhead associated with the concurrent
algorithm appears more critical than the inherent percentage of
sequential operations in a concurrent algorithm. In these instances,
the time for a parallel algorithm can be defined as

This can be rewritten as

and

which yields an estimate for
,

This measure of parallel algorithm performance, along with Amdahl's
law, can be used to generate estimates of potential performance for
algorithms on other existing or even future parallel architectures
[17].
- Computation-to-Communications Ratio:
- The
computation-to-communications ratio denotes the relative number of
calculations on a processor compared to the amount of communications.
This quantity is related to granularity, and also is related to
the inverse of
, communications overhead
[17,28].
- Granularity:
- The amount of operations performed by a process
between interprocessor communications events. A fine-grain
process performs only a few operations between requisite
communications, while a coarse-grain process performs many
operations between interprocessor communications [17,28].