next up previous
Next: Conclusions Up: Algorithm Performance on Previous: Performance Predictions for

Low-Latency Communications Considerations

 

This research was inspired, in part, by the low-latency communications made possible using active messages on the Thinking Machines CM-5. The parallel block-diagonal-bordered Gauss-Seidel algorithm, as many others, would benefit from extremely low-latency communications, especially for short messages. As a result, future SPP architectures may provide low-latency communications for short messages because there are many classes of parallel algorithms that can only be implemented efficiently with this type of interprocessor communications support. SPP hardware developers recognize that low-latency communications increase the utility of their multi-computer and, consequently, improve market potential. There are, however, limits to possible reductions in latency.

Network latency can be viewed as a linear combination of several factors:

Physical network size contributes to latency as a function of the distance signals must travel. At the speed-of-light eleven inches equals a nanosecond, or of a second. In order to limit latency, physical network size will be a concern in future SPPs --- extremely low-latency communications will not be possible in spatially diverse networked workstations. The other three latencies are more difficult to control, but can be reduced proportionally to processor speeds.

Active messages on the CM-5 are able to significantly limit software latency and processing latency by forcing the user to assume all responsibilities of identifying what data to send and to identify the handler-function at the receiving processor. As a result, no more than 50 machine cycles are required for these operations. If active message-style communications were implemented on future, faster processors, the implementations should scale with processor speed.

The final factor that contributes to latency is the time required to send signals through switch(s) in the interconnection network. Switch latencies will decrease as faster components are used or as data parallel switching implementations are utilized. Given these contributions to latency, it may be possible to continue to decrease latency with faster components, although latency will always be bound by physical size of the network and hardware speeds.

If communications capabilities can improve significantly more than the performance of individual processors, additional classes of parallel algorithms can be implemented effectively on new multi-processor architectures. Nevertheless, improving communications capabilities proportional to computational performance increases will prove sufficiently challenging.



next up previous
Next: Conclusions Up: Algorithm Performance on Previous: Performance Predictions for



David P. Koester
Sun Oct 22 17:27:14 EDT 1995