next up previous
Next: References Up: Parallel LU Factorization of Previous: Comparison to Amdahl's

Conclusions

This paper describes research into the applicability of parallel direct block-diagonal-bordered sparse matrix solvers. Direct sparse solvers for matrices in this form exhibit distinct advantages. In particular, data communications are significantly reduced when compared to general direct sparse matrix solvers described in the present literature and communications are also more uniform and structured. Communications in block-diagonal-bordered sparse matrix solvers is a function of the number of processors and less dependent on the data. Task assignments for numerical factorization of a block-diagonal-bordered sparse matrix depend only on the assignment of independent blocks to processors and the processor assignments of data in the last diagonal block. Block-diagonal-bordered sparse matrix solvers require special preprocessing to be efficient. Other papers in the literature have proposed a two step preprocessing phase that includes ordering the matrix and then performing symbolic factorization to determine the locations of all fillin values so that static data structures can be utilized for maximum efficiency when performing numerical factorization. In this paper, we describe the requirements for a three step preprocessing phase for efficient parallel block-diagonal-bordered sparse matrix solvers on distributed-memory multi-processors that includes:

This first implementation of block-diagonal-bordered sparse matrix algorithms demonstrates good speedup for factorization with test matrices, in spite of the sequential portions of the code that significantly limit parallel performance and algorithm scalability. There are areas for improvement in the algorithms presented herein. As part of our on-going research, we plan to parallelize all calculations, focusing on those calculations in the last matrix block. We plan to examine ordering techniques to determine whether or not efficient vectorization of the calculations are feasible for real power systems admittance matrices. We also plan to use new sophisticated communications techniques such as active messages and virtual channels to minimize processing bottlenecks and permit hiding communications overhead behind calculations. Further research is planned to tune the parallel block-diagonal-bordered sparse matrix algorithms for optimum performance on the Thinking Machines CM-5 with power systems network matrices, and then develop additional versions of the parallel block-diagonal-bordered sparse matrix algorithms for efficient operation on other multi-processor architectures. Lastly, we intend to compare the performance and accuracy of these techniques with iterative methods, such as waveform relaxation [18].

Future research may even examine the applicability of producing an automated tool that has the capabilities of finding both an optimal matrix ordering and an optimal version of the multi-processor algorithm that is driven by the characteristics of the irregular sparse matrix and the target computer architecture. Numerous versions of the block-diagonal-bordered sparse matrix algorithms are possible depending on the specific computer architecture and characteristics of the matrix.

The research presented in this paper illustrates great promise for parallel block-diagonal-bordered sparse matrix solvers when the data is regular and there is no load imbalance overhead encountered in the parallel algorithm. In order to minimize load imbalance overhead with real sparse matrices, explicit load balancing will be required for distributed-memory multi-processor algorithms. To ensure the most uniform distribution of workload to processors, research to examine graph partitioning, node clustering, and node tearing techniques for irregular matrices will be required as part of the ordering phase. Techniques to order irregular matrices into recursive block-diagonal-bordered form have been examined for power systems distribution networks. This work is discussed in a companion paper [14].



next up previous
Next: References Up: Parallel LU Factorization of Previous: Comparison to Amdahl's



David P. Koester
Sun Oct 22 16:27:33 EDT 1995