We have implemented a parallel version of a block-diagonal-bordered sparse LU solver and a similar Choleski solver in the C programming language for the Thinking Machines CM-5 multi-computer using message passing and a host-node paradigm. In order to have an implementation with extremely low-latency communications, we utilized the Connection Machine active message layer (CMAML) remote procedure call features [53,59]. We also implemented versions of the algorithms using non-blocking, buffered communications. Substantial improvements in the performance of the algorithm have been observed for low-latency active messages, when compared to more traditional communications paradigms that use non-blocking communications functions in conjunction with packing data into communications buffers. Throughout this discussion of parallel direct sparse solvers, the active message communications paradigm is the means with which we implemented low-latency communications on the Thinking Machines CM-5.
A version of the software is available that runs on a single processor
on the CM-5 to provide empirical speed-up data to quantify
multi-processor performance. Empirical performance data has been
gathered for a range of numbers of processors with real power systems
sparse network matrices. Results based on empirical data collected in
benchmarking trials are presented in the next chapter. Our
block-diagonal-bordered sparse direct solvers have the following
distinct segments which were derived in chapter :