next up previous
Next: Node-Tearing Nodal Analysis Up: Nomenclature Previous: Parallel Processor Architectures

Parallel Computing Analysis

Speedup --- :
Given a single problem with two algorithms that exhibit execution times of and with , speedup for algorithm is defined as

This simple, intuitive definition will be expanded in order to compare the performance of sequential and parallel algorithms [51].

Parallel Execution Time --- :
The time to run a parallel algorithm on p processors [17,25].

Sequential Execution Time --- :
The time to run a sequential algorithm as a single process [17,25].

Relative Speedup --- :
Given a single problem with a sequential algorithm running on one processor and a concurrent algorithm running on p independent processors, relative speedup is defined as

 

Relative Efficiency --- :
Relative efficiency is defined as

Efficiency can be viewed as the speedup-per-node [17].

Amdahl's Law:
A rule formalized by Gene Amdahl, which defines limits on parallel program speedup as a function of the sequential portion of the overall calculations. Given , the time to solve a problem on a single processor, then can be parameterized in by

where is the inherently sequential fraction of computations [17,25,28,51]. The aforementioned estimate of can be used when estimating the speedup [51] by

Amdahl's Law can be used to estimate the maximum potential relative speedup by taking the inverse of the sequential portion of the parallel problem. According to Amdahl's law, a task with 10% sequential operations could obtain no more than a speedup of 10, regardless of the number of processors applied to the problem.

Total Parallel Overhead --- :
The sum of all overhead incurred in the parallel calculations by all processors. includes overhead due to communications costs, load-imbalance, costs for additional software replicated on each processor, and non-optimal algorithms for the parallel processor [17,25]. is defined as

When is calculated using empirical data, it can be either a non-negative or negative quantity. Negative overhead signifies that speedup for the problem has not been bounded by p, and thus superlinear speedup has occurred. Superlinear speedup can result from cache effects as a problem is divided onto multiple processors, the amount of memory required on each processors is reduced, generally proportionally to the number of processors. The entire problem may not fit in fast cache for one or several processors, but the problem may fit into fast cache memory as the number of processors increases. Consequently, doubling the number of processors may more than double the measured speedup.

Communications Overhead --- :
A measure of the additional workload incurred in a parallel algorithm as a result of interprocessor communications [17,25,28]. is dependent on the ratio of communications to calculations, not just the amount of communications

where is a metric describing the computational capability of a single processor [17], and represents the communications characteristics. The quantity is often referred to as the computation-to-communications ratio. For traditional buffered interprocessor communications, is a linear combination of latency, bandwidth, and message size

is the communications latency or startup time, is the bandwidth measured in bytes per second, and is the number of words or four byte units of data.

Load-Imbalance Overhead --- :
Parallel speedup is limited by the time of the slowest processor in the calculations. is the sum of idle time for processors waiting for the slowest processor [17,28] .

Parallel Software Overhead --- :
Parallel algorithms may require additional calculations that must be replicated at each processor, such as additional index calculations --- the overhead to startup loops must be replicated in spite of the fact that the loops may have fewer applications, due to work being distributed amongst multiple processors [17].

Parallel Algorithmic Overhead --- :
Efficient parallel algorithms may require additional calculations that are not present in a sequential algorithm. denotes the additional work performed in the parallel algorithm not required in the sequential algorithm [17].

Overhead-Based Performance Estimate --- :
Amdahl's Law gives one preliminary estimate of the potential speedup in a concurrent algorithm; however, for some concurrent algorithms, overhead associated with the concurrent algorithm appears more critical than the inherent percentage of sequential operations in a concurrent algorithm. In these instances, the time for a parallel algorithm can be defined as

This can be rewritten as

and

which yields an estimate for ,

This measure of parallel algorithm performance, along with Amdahl's law, can be used to generate estimates of potential performance for algorithms on other existing or even future parallel architectures [17].

Computation-to-Communications Ratio:
The computation-to-communications ratio denotes the relative number of calculations on a processor compared to the amount of communications. This quantity is related to granularity, and also is related to the inverse of , communications overhead [17,28].

Granularity:
The amount of operations performed by a process between interprocessor communications events. A fine-grain process performs only a few operations between requisite communications, while a coarse-grain process performs many operations between interprocessor communications [17,28].



next up previous
Next: Node-Tearing Nodal Analysis Up: Nomenclature Previous: Parallel Processor Architectures



David P. Koester
Sun Oct 22 17:27:14 EDT 1995