The load balancing step of the preprocessing phase can be performed
with the same simple pigeon-hole type algorithm described above for
direct methods in section . The
load-balancing algorithm uses a metric based on the numbers of
floating point multiply/add operations in a partition, not simply the
number of rows per partition. The independent nature of calculations
in the diagonal blocks and the border permit a processor to start
updating the last diagonal block as soon as that processor has
completed calculating
in the diagonal blocks. The
parallel calculations in the last diagonal block are load balanced
separately, by dividing the rows within colors evenly among processors
to minimize communications overhead. These metrics do not consider
indexing overhead, which can be rather extensive when working with
very sparse matrices stored in an implicit form.
We have found that load-balancing for the parallel block-diagonal-bordered Gauss-Seidel algorithm has been easier to perform than load-balancing for direct methods, because workload per partition has less variability for iterative methods than direct methods. Workloads within a partition for direct methods have a higher computational complexity than for iterative methods, so for the same partitioned matrix, there may be significantly different distributions of data to processors for the two methods.