Next: Structure of the Up: Introduction Previous: Distributed Memory System

Parallelism and Performance Issues

Conventional algorithms that have been optimized for a single processor computer, or a vector-pipelined computer, are often taken for granted since they are readily available as subroutine packages. However, these algorithms may perform poorly in the parallel environment of a particular parallel computer. To achieve superior performance, new algorithms must be developed that conform better to the parallel architecture.

To illustrate parallelism and load balancing, we consider the problem of adding two n-vectors and . The additions

are all independent and can be done in parallel. Thus, it has perfect mathematical parallelism. On the other hand, it may not have perfect parallelism on a parallel computer because it may not have perfect load balancing. By load balancing we mean the assignment of tasks to the processors of the system so as to keep each processor doing useful work as much as possible. For example, there are processors and in (). Then, the processors can work in perfect parallelism on 96 additions but only two processors will be busy during the remaining 2 additions. Thus, there is not perfect load balancing to match the perfect mathematical parallelism.

In general, load balancing may be done either statically or dynamically. In static load balancing, tasks (and, perhaps, data for distributed memory system) are assigned to processors at the beginning of a computation. In dynamic load balancing, tasks (and data) are assigned to processors as the computation proceeds. A useful concept for dynamic load balancing is that of a pool of tasks, from which a processor obtains its next task when is is ready to do so.

Related to load balancing is the idea of granularity. Large-scale granularity means large tasks that can be performed independently in parallel. Small-scale granularity means small tasks that can be performed in parallel.

In general, in a distributed memory system, exchange of data between processors will be necessary at various times during an overall computation, and to the extent that processors are not doing useful computation during the communication, this constitutes an overhead.

Synchronization is necessary when certain parts of a computation must be completed before the overall computation can proceed. There are two aspects of synchronization that contribute to overhead. The first is the time to do the synchronization; usually this requires that all processors perform certain checks. The second aspect is that some, or even almost all, processors may become idle, waiting for clearance to proceed with the computation.

The degree to which an algorithm can exploit a multiprocessor is often measured by either speed-up or efficiency. Ideally, we could solve a problem times as fast on processors as on a single processor. This ideal is rarely achieved; what is achieved is called the speed-up defined by:

The efficiency can be defined as

Since , we have and an efficiency of corresponds to a perfect speedup of .

The speed-up defined in () is a measure of how a given algorithm compares with itself on and processors. However, the parallel algorithm may not be the best algorithm on a single processors. Hence, a better measure of what is gained by parallel computation is given by the alternative definition

Both of the measurements and are useful and the context of discussion will determine which is to be used.



Next: Structure of the Up: Introduction Previous: Distributed Memory System


xshen@
Sat Dec 3 17:51:03 EST 1994