Next: Low-Latency Communications Considerations
Up: Algorithm Performance on
Previous: Performance Predictions
As stated above, we expect interprocessor communications for SPPs to
improve significantly in the near-future, with latency for buffered
communications decreasing to 1
second, with 100
megabyte-per-second bandwidths between individual processors.
Per-word communications costs for this architecture should be less
than 0.04
second. In figure
. we present
actual and predicted speedup values for the complex LU factorization
algorithm with the BCSPWR10 and EPRI6K power systems networks for
- empirical speedup data from the CM-5 implementation
using low-latency communications,
- predicted speedup for
processor speeds and
communications networks with 1
second latency and 100
megabyte-per-second bandwidth,
- predicted speedup for
processor speeds and
communications networks with 1
second latency and 1000
megabyte-per-second bandwidth.
The two graphs in this figure show that we may see significantly
reduced speedup for this algorithm with either future architecture.
For the BCSPWR10 data set, with
processor speeds and
communications networks with 1
second latency and 100
megabyte-per-second bandwidth, speedups would be less than three for
32 processors and only slightly better, four, with a network that is
10 times faster. The computation-to-communications ratio for both
network options are less than for the Thinking Machines CM-5 with
low-latency, active message-based communications ---
would
decrease only to 1.16
second and 1.016
second from 1.6
second respectively for the two anticipated communications
capabilities. This improvement in communications is small in
comparison to the
improvement anticipated for
.
Performance of this parallel Gauss-Seidel implementation, is (not
unexpectedly) highly dependent on communications latency, due to the
large number of small messages. Similar poor performance is predicted
for future architectures running the EPRI6K data set.
Figure: Performance Predictions for Parallel Complex Gauss-Seidel --- Low-Latency Communications Paradigm
In figure
. we present actual and predicted
speedup values for the complex Gauss-Seidel algorithm solving
applications using the BCSPWR10 and EPRI6K power systems networks for
- empirical speedup data from the CM-5 implementation using
buffered communications,
- predicted speedup for
processor speeds and
communications networks with 1
second latency and 100
megabyte-per-second bandwidth,
- predicted speedup for
processor speeds and
communications networks with 1
second latency and 1000
megabyte-per-second bandwidth.
The graphs in this figure show that we may see slightly improved
speedup for this algorithm for both future architectures with respect
to the empirical data collected on the Thinking Machines CM-5;
although, performance is not scalable to 32 processors. This lack of
scalability is due primarily to the parallel software overhead
required to set up the buffers. For the BCSPWR10 data set, with
processor speeds and communications networks with 1
second latency and 100 megabyte-per-second bandwidth, speedups
would be greater than eight for 16 processors and slightly better,
ten, with a network that is 10 times faster. The
computation-to-communications ratio for both network options are both
greater than for the Thinking Machines CM-5 with buffered
communications ---
would increase by a factor greater than
62
for 100
megabyte-per-second bandwidth communications and greater than 83
for the faster
proposed network. These communications performance improvements
compare favorably to the anticipated
improvement
anticipated for
. Similar improved performance is predicted
for future architectures running the EPRI6K data set; although, peak
performance improvement is not as great as for the BCSPWR10 data set.
Figure: Performance Predictions for Parallel Complex Gauss-Seidel --- Buffered Communications Paradigm
Next: Low-Latency Communications Considerations
Up: Algorithm Performance on
Previous: Performance Predictions
David P. Koester
Sun Oct 22 17:27:14 EDT 1995