We have ported this parallel block-diagonal-bordered direct solver to the IBM scalable parallel processors (SPPs), the IBM SP1 and SP2. These multi-computers are based on workstation clusters with switched network communications. The available communications on the IBM parallel machines required the use of a non-blocking, buffered communications paradigm. Our communications language of choice for this architecture has been the Message Passing Interface (MPI), because it is being developed as a communications standard for multi-processors with strong emphasis on optimizing message-passing performance. In table 7.1, we present empirical performance data for both the IBM SP1 and IBM SP2 using MPI, and the IBM SP2 using standard Transmission Control Protocol (TCP)/Internet Protocol (IP) based communications through the embedded communications switch. In addition to providing the timing data for factorization and the triangular solutions of the EPRI6K data set, this table also provides the relative speedups for factorization. This table shows that we measured no speedup in these benchmarks for forward reduction and backward substitution. Meanwhile, speedup for factorization on the SP2 was a maximum of approximately 3.2 for eight processors.
Table 7.1: EPRI6K --- IBM SP1 and SP2 Performance Data --- Complex Variate LU Solver
This preliminary performance data from the IBM SP1 and SP2 SPPs also illustrate the following: