This block-diagonal-bordered sparse LU solver uses implicit hierarchical data structures based on vectors of C programming language structures to efficiently store and retrieve data for a block-diagonal-bordered sparse matrix. These data structures provide good cache hit access, because non-zero data values and row and column location indicators are stored in adjacent physical memory locations. For this work, we are assuming that there is no requirement for pivoting; consequently, we can use static data structures and the locations of all fillin are determined before memory is allocated for the data structures. We use explicit pointers to subsequent data locations in order to reduce indexing overhead. Row location indicators are explicitly stored as are pointers to subsequent values in columns that are required when updating values in the matrix. The use of additional memory in the data structures is traded for reduced indexing overhead. Modern distributed memory multi-processors are available with substantial amounts of random access memory at each node, and power systems network matrices are actually small relative to other matrices found in the academic and industrial communities. Consequently, this research examines data structures that are designed to optimize processing speed at the cost of increased memory usage when compared to other compressed storage formats. We compare the memory requirements for these data structures to the memory requirements for the more conventional compressed data structures below.
The hierarchical data structure is composed of eight separate parts
that implicitly store a block-diagonal-bordered sparse matrix. The
hierarchical nature of these structures store only non-zero values,
especially in the borders where entire rows may be zero. Eight
separate C language structures are employed to store the data in a
manner that can efficiently be accessed with minimal indexing
overhead. Static vectors of each structure type are used to store the
block-diagonal-bordered sparse matrix. Figure
graphically illustrates the hierarchical nature of the data structure,
where the distinct C structure elements presented in that figure are:
Figure: The Hierarchical Data Structure for Parallel LU Factorization
At the top of the hierarchical data structure is the information on the storage locations of independent diagonal blocks, and both the lower and upper borders. The next layer in the data structure hierarchy has the matrix diagonal and the identifiers of non-zero border rows and columns. Data values on the original matrix diagonal are stored in the diagonal portion of the data structure; however, most of the remaining information stored along with each diagonal element are pointers so that data in related columns or rows can be accessed rapidly.
Data in the strictly lower triangular portion of the matrix is stored
as sparse row vectors; likewise, data in the strictly upper triangular
portion of the matrix is stored as sparse column vectors. This data
storage scheme minimizes the effort to find non-zero ---
pairs used to modify
by consecutively storing
values in lower triangular rows and upper triangular columns.
However, Crout and Doolittle-based LU factorization algorithms require
access to the next non-zero value in the same column or row for
lower/upper triangular matrices, so pointers are used to permit direct
access to those values without requiring searches for the data as is
required in other compressed storage formats. This data structure
provides the benefits of a doubly linked data structure in order to
minimize indexing overhead. The value corresponding to any diagonal
element has pointers to the first non-zero element in the lower
triangular row and upper triangular column, and to the first non-zero
elements in the lower and upper border. This data structure trades
memory utilization for speed by storing indicators to all non-zero
column values. In addition, the combination of adjacent storage of
non-zero row values and the explicit storage of column identifiers,
greatly simplify the forward reduction and backward substitution
steps.
The remaining portions of the hierarchical data structure efficiently store the non-zero values in the borders. Because entire lower border rows or upper border columns may be sparse in a block, two layers are required to store this data in an efficient manner. The next level in this portion of the hierarchy stores the location of the first non-zero value in the row or column. The corresponding row and column identifiers can be found by referencing the structure that the pointer references. The non-zero values in the lower and upper borders are stored with the same format as data in the diagonal blocks.
Conventional compressed data formats require less storage than this data structure; however, additional memory has been traded for reduced indexing overhead. Two reasons exist that justify the use of additional memory: large memories are available with state-of-the-art distributed-memory multi-processors and these algorithms have been designed with the expressed intention to support real-time applications. The compressed data format requires
bytes to store the A matrix implicitly. Likewise, the hierarchical data structure used in this implementation requires
bytes to store the same matrix implicitly.
where: ¯
is the storage requirements in bytes for the compressed data structure.
is the storage requirements in bytes for the hierarchical data structure.
is the length if a floating point data type.
is the length if an integer data type.
is the number of non-zero values in the matrix.
N is the order of the matrix.
is the number of independent blocks.
is the number of non-zero row and column segments in the borders.
For double precision floating point or single precision complex representations of the actual data values and single word integer representations of all pointers, the hierarchical data structure takes approximately twice the data storage of the compressed data structure. By doubling the storage requirements, row and column data is available as sparse vectors for ready cache access when updating values and subsequent column or row values are directly addressable. When using conventional compressed data structures, indexing information is stored only on a single dimension and values along the other dimension must be found by searching through the data structures to find the next value to update. To find a value in a row or column, the average number of operations in the search will be one-half the average number of values in the row or column. Given that this costly search must be performed for nearly every non-zero value in the matrix, substantial indexing overhead is required when using the implicit compressed storage format. By using this data structure and doubling the storage, there is a significant decrease in indexing overhead even for the sequential version of this sparse block-diagonal-bordered LU factorization algorithm.
While Crout and Doolittle factorization algorithms permit partial pivoting [30], this static hierarchical data structure assumes that no pivoting is required to maintain numerical stability in the calculations. Traditional numerical pivoting can be difficult in a general sparse matrix due to the sparsity structure and concerns for fillin, so considerations are made to relax the normal numerical pivoting rules in Markowitz pivoting when the matrix is neither positive definite nor diagonally dominate [12]. Block-diagonal-bordered sparse matrices offer the potential for an additional relaxed pivoting rule that limits pivoting choices to within a diagonal matrix block. For the present research, it is assumed that numerical pivoting will not be required, because the matrices for power systems distribution networks will be derived from matrices that are diagonally dominate or even positive definite.