next up previous
Next: LU Factorization of Up: A New Three-step Previous: Pseudo Factorization

Load Balancing

The load balancing step of the preprocessing phase can be performed with a simple pigeon-hole type algorithm that uses one of several metrics based on the numbers of calculations determined in the pseudo factorization step. There are three distinct steps in the proposed block-diagonal-bordered matrix solver:

Load balancing may emphasize uniformly distributing the processing workload in any one of the steps, or it may attempt to optimize the distribution of processing workload for a combination of the three phases. A preliminary investigation into this area has emphasized uniformly distributing the workload to separate processors based on the number of calculations when factoring both the independent blocks and calculating the updates of the last block from data in the borders [14]. The second factorization step, updating the last block using data in the borders, requires that partial sums be accumulated from multiple processors and sent to the processor that holds the data for an element in the last block. However, we are assuming that communications can be made regular for all processors --- a research goal --- so the independent nature of calculations in the independent blocks will permit a processor to start the second phase as soon as that processor has completed factoring the independent blocks. No processor synchronization is required between these steps and it is assumed that communications will occur independent of the calculations. Consequently, the sum total of all calculations in the independent blocks can be used for load balancing.

A pigeon-hole algorithm is a simple greedy assignment algorithm that distributes objective function values to multiple pigeon-holes in a manner that minimizes the disparity between the sums of objective function values in each pigeon-hole. This is performed in a three-step process. First the objective function values for each of the independent blocks are placed into descending order. Second, the greatest values are distributed to a pigeon-hole for each processor, where is the number of processors in a distributed-memory multi-computer. Then the remaining objective function values are selected in descending order and placed in the pigeon-hole with the least aggregate workload. This algorithm is straightforward and minimizes the disparity in aggregate workloads between processors. This algorithm finds an optimal distribution for workload to processors, however, actual disparity in processor workload is dependent on the irregular sparse matrix structure. This algorithm works best when there are minimal disparities in the workloads for independent blocks or when there are many more independent blocks than processors. In this instance, the workloads in multiple small blocks can sum to equal the workload in a single larger block.

The pseudo factorization step incurs significantly more computational cost than symbolic factorization in previous sparse matrix solvers. Additionally, the ordering phase is more costly than minimum degree ordering used extensively in irregular sparse matrix problems, and load balancing is often not explicitly considered. Consequently, block-diagonal-bordered sparse matrix solvers have significantly more overhead in the preprocessing phase, and consequently, the use of this technique will be limited to problems that have static matrix structures that can reuse the ordered matrix and load balanced processor assignments multiple times in order to amortize the cost of the preprocessing phase over numerous matrix factorizations.



next up previous
Next: LU Factorization of Up: A New Three-step Previous: Pseudo Factorization



David P. Koester
Sun Oct 22 16:27:33 EDT 1995