Transistors are still getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
|
This chip would have little ILP (or parallelism in "innermost loops")
|
Thus next generation of processor chips more or less have to have multiple CPU's as gain from ILP limited
|
However getting much more speedup than this requires use of "outer loop" or data parallelism.
-
This is naturally implemented with threads on chip
|
The March of Parallelism: Multiple boards --> Multiple chips on a board --> Multiple CPU's on a chip
|
Implies that "outer loop" Parallel Computing gets more and more important in dominant commodity market
|
Use of "Outer Loop" parallelism can not (yet) be automated
|