Parallelism in processing
-
multiple operations per cycle reduces CPI
-
soon thread level parallelism
|
Cache to give locality in data access
-
avoids latency and reduces CPI
-
also improves processor utilization
|
Both need (transistor) resources, so tradeoff
|
ILP (Instruction Loop Parallelism) drove performance gains of sequential microprocessors
|
ILP Success was not expected by aficionado's of parallel computing and this "delayed" relevance of scaling "outer-loop" parallelism as user's just purchased faster "sequential machines"
|
CPI = Clock Cycles per Instruction
|