We implemented two versions of the parallel block-diagonal-bordered
sparse direct solver on the Thinking Machines CM-5 and the notable
differences between the two implementations are the communications
paradigms when updating the last diagonal block in the matrix. One
communications paradigm uses low-latency, active message-based
communications, and the other uses buffered communications. Active
message-based communications on the CM-5 has latency of 1.6
second to send four words, while the buffered communications
version of the algorithm utilizes the traditional CMMD communications
library, which has 86
second latency and 0.12
second per
word communications costs [6]. Both versions of the
algorithm utilized the active message s-copy-based buffered
communications for factoring the last diagonal block. S-copy
communications has 23
second latency and 0.12
second per
word communications costs [6]. The CM-5 has a
multi-tiered communications network with 40 megabyte-per-second
bandwidth at the lowest layer
[6].