Next: Ordering
Up: Parallel Direct Methods for
Previous: Available Parallelism in
For parallel sparse block-diagonal-bordered
matrix algorithms to be efficient when factoring irregular sparse
matrices, the following three step preprocessing phase must be
performed:
- order the matrix into block-diagonal-bordered form while minimizing the number of calculations,
- pseudo-factorization to identify both fillin and the number of calculations for all diagonal blocks and corresponding portions of the borders, and
- load balance to uniformly distribute the calculations among processors.
The first step determines the block-diagonal-bordered form and the
ordering of nodes within diagonal blocks to minimize calculations; the
second step determines the locations of fillin values for static data
structures and also determines the number of calculations in
independent blocks for the load balancing step; and the third step
determines a mapping of data to processors for efficient
implementation of the algorithm for the user specified data. These
three steps may be incorporated into an optimization framework that
uses the three-step preprocessing phase to produce matrix orderings
with optimal overall performance for a particular version of the
block-diagonal-bordered sparse matrix factorization algorithm. For
this paper, the optimization was performed by hand --- various values
of input parameters for the node-tearing routine were examined and the
block-diagonal-bordered form sparse matrix with the best load balance
and least numbers of operations were chosen for collecting performance
benchmarks.
The metric for load balancing or the metric for an optimization
routine to determine the most efficient overall ordering technique
must be based on the actual workload required by the individual
processors. This number may differ substantially from the number of
equations assigned to processors because the number of calculations in
an independent sub-block is a function of the number and location of
non-zero matrix elements in that block --- not the number of equations
in a block. For dense matrices, the computational complexity of
factorization is
, and the computational complexity for
factoring entire sparse matrices used in later parallel algorithm
performance studies varies is substantially less than
.
Determining the actual workload requires a detailed simulation of all
processing for the factorization and triangular solution phases, which
we refer to as pseudo-factorization.
Next: Ordering
Up: Parallel Direct Methods for
Previous: Available Parallelism in
David P. Koester
Sun Oct 22 15:31:10 EDT 1995