How to use more transistors?
Parallelism in processing
- multiple operations per cycle reduces CPI
- soon thread level parallelism
Cache to give locality in data access
- avoids latency and reduces CPI
- also improves processor utilization
Both need (transistor) resources, so tradeoff
ILP (Instruction Loop Parallelism) drove performance gains of sequential microprocessors
ILP Success was not expected by aficionado's of parallel computing and this “delayed” relevance of scaling “outer-loop” parallelism as user’s just purchased faster “sequential machines”
CPI = Clock Cycles per Instruction