next up previous
Next: Load Balancing Up: The Preprocessing Phase Previous: Ordering

Pseudo-Factorization

The metric for performing load balancing or for comparing the performance of ordering techniques must be based on the actual workload required by the processors in a distributed-memory multi-computer. Consequently, more information is required than just the locations of fillin as in previous work that used symbolic factorization to identify fillin for static data structures [21,29,56]. The number of floating point operations may not be proportional to the number of equations assigned to a processor because the number of calculations in an independent subgraph is a function of the number and location of non-zero matrix elements in that block --- not the number of equations in a block. For dense matrices, the computational complexity of factorization is . Meanwhile, the computational complexity for factoring entire sparse power systems matrices used in later parallel algorithm performance studies is less than , but greater than . Determining the actual workload requires a detailed simulation of all processing for the factorization phases, which we refer to as pseudo-factorization.

Pseudo-factorization is merely a replication of the numerical factorization process without actually performing the calculations. Numbers of calculations to factor the independent data blocks and numbers of calculations to update the last block using data from the borders are tallied.

There is no way to avoid the computational expense of this preprocessing step, because the computational workload in factorization is not correlated with the number of equations in an independent block. Efficient parallel sparse matrix solvers require that any disparities in processor workloads be minimized in order to minimize load imbalance overhead, and consequently, to maximize processor utilization.



David P. Koester
Sun Oct 22 17:27:14 EDT 1995