Parallel Computing Rationale
Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
- Essentially impossible to increase clock speed and so must exploit increasing transistor density in figure of merit (1/f)2-4
Already we build chips with some factor of ten more transistors than this and this is used for “automatic” instruction level parallelism.
- This corresponds to parallelism in “innermost loops”
However getting much more speedup than this requires use of “outer loop” or data parallelism.
Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!
- Harder for sequential than parallel computers
- Data locality is unifying concept!