The metric for performing load balancing or for comparing the
performance of ordering techniques must be based on the actual
workload required by the processors in a distributed-memory
multi-computer. Consequently, more information is required than just
the locations of fillin as in previous work that used symbolic
factorization to identify fillin for static data structures
[21,29,56]. The number of floating point
operations may not be proportional to the number of equations assigned
to a processor because the number of calculations in an independent
subgraph is a function of the number and location of non-zero matrix
elements in that block --- not the number of equations in a block.
For dense matrices, the computational complexity of factorization is
. Meanwhile, the computational complexity for factoring
entire sparse power systems matrices used in later parallel algorithm
performance studies is less than
, but greater than
. Determining the actual workload requires a detailed
simulation of all processing for the factorization phases, which we
refer to as pseudo-factorization.
Pseudo-factorization is merely a replication of the numerical factorization process without actually performing the calculations. Numbers of calculations to factor the independent data blocks and numbers of calculations to update the last block using data from the borders are tallied.
There is no way to avoid the computational expense of this preprocessing step, because the computational workload in factorization is not correlated with the number of equations in an independent block. Efficient parallel sparse matrix solvers require that any disparities in processor workloads be minimized in order to minimize load imbalance overhead, and consequently, to maximize processor utilization.