We have extensively examined ordering techniques based on node-tearing and recursive spectral bisection. The implementations of these ordering techniques include simulated factorization of independent sub-matrices to examine load-imbalance. Node-tearing-based techniques produce numerous diagonal blocks, and the workload assigned to separate processors is determined in an explicit load balancing step. Recursive-spectral bisection attempts to balance the workload by assigning equal numbers of network nodes or matrix rows/columns per processor. However, work in sparse LU factorization is dependent on the number of non-zero and fillin values in the partition rather than the number of nodes.
Node-tearing-based ordering can vary the maximum size of the diagonal
blocks by a user-selectable parameter. Varying the size of the
diagonal blocks affects load imbalance within the diagonal blocks and
borders, while also affecting the size of the last block. It is
desirable to minimize the size of the last block, because that portion
of the matrix is not sparse and traditional dense factorization
techniques are used to minimize indexing overhead. The computational
complexity of dense solvers is , so even a small reduction
in the number of coupling equations can have a significant impact on
parallel solver performance.