Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
-
Essentially impossible to increase clock speed and so must exploit increasing transistor density in figure of merit (1/f)2-4
|
Already we build chips with some factor of ten more transistors than this and this is used for "automatic" instruction level parallelism.
-
This corresponds to parallelism in "innermost loops"
|
However getting much more speedup than this requires use of "outer loop" or data parallelism.
|
Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!
-
Harder for sequential than parallel computers
-
Data locality is unifying concept!
|