next up previous
Next: Low-Latency Communications Considerations Up: Algorithm Performance on Previous: Performance Predictions

Performance Predictions for Iterative Solvers

 

As stated above, we expect interprocessor communications for SPPs to improve significantly in the near-future, with latency for buffered communications decreasing to 1 second, with 100 megabyte-per-second bandwidths between individual processors. Per-word communications costs for this architecture should be less than 0.04 second. In figure gif. we present actual and predicted speedup values for the complex LU factorization algorithm with the BCSPWR10 and EPRI6K power systems networks for

  1. empirical speedup data from the CM-5 implementation using low-latency communications,

  2. predicted speedup for processor speeds and communications networks with 1 second latency and 100 megabyte-per-second bandwidth,

  3. predicted speedup for processor speeds and communications networks with 1 second latency and 1000 megabyte-per-second bandwidth.
The two graphs in this figure show that we may see significantly reduced speedup for this algorithm with either future architecture. For the BCSPWR10 data set, with processor speeds and communications networks with 1 second latency and 100 megabyte-per-second bandwidth, speedups would be less than three for 32 processors and only slightly better, four, with a network that is 10 times faster. The computation-to-communications ratio for both network options are less than for the Thinking Machines CM-5 with low-latency, active message-based communications --- would decrease only to 1.16 second and 1.016 second from 1.6 second respectively for the two anticipated communications capabilities. This improvement in communications is small in comparison to the improvement anticipated for . Performance of this parallel Gauss-Seidel implementation, is (not unexpectedly) highly dependent on communications latency, due to the large number of small messages. Similar poor performance is predicted for future architectures running the EPRI6K data set.

 
Figure: Performance Predictions for Parallel Complex Gauss-Seidel --- Low-Latency Communications Paradigm 

In figure gif. we present actual and predicted speedup values for the complex Gauss-Seidel algorithm solving applications using the BCSPWR10 and EPRI6K power systems networks for

  1. empirical speedup data from the CM-5 implementation using buffered communications,
  2. predicted speedup for processor speeds and communications networks with 1 second latency and 100 megabyte-per-second bandwidth,
  3. predicted speedup for processor speeds and communications networks with 1 second latency and 1000 megabyte-per-second bandwidth.
The graphs in this figure show that we may see slightly improved speedup for this algorithm for both future architectures with respect to the empirical data collected on the Thinking Machines CM-5; although, performance is not scalable to 32 processors. This lack of scalability is due primarily to the parallel software overhead required to set up the buffers. For the BCSPWR10 data set, with processor speeds and communications networks with 1 second latency and 100 megabyte-per-second bandwidth, speedups would be greater than eight for 16 processors and slightly better, ten, with a network that is 10 times faster. The computation-to-communications ratio for both network options are both greater than for the Thinking Machines CM-5 with buffered communications --- would increase by a factor greater than 62 for 100 megabyte-per-second bandwidth communications and greater than 83 for the faster proposed network. These communications performance improvements compare favorably to the anticipated improvement anticipated for . Similar improved performance is predicted for future architectures running the EPRI6K data set; although, peak performance improvement is not as great as for the BCSPWR10 data set.

 
Figure: Performance Predictions for Parallel Complex Gauss-Seidel --- Buffered Communications Paradigm 



next up previous
Next: Low-Latency Communications Considerations Up: Algorithm Performance on Previous: Performance Predictions



David P. Koester
Sun Oct 22 17:27:14 EDT 1995