Choosing the Processing Macro
Choosing the Processing Macro
- Today’s “conventional wisdom:”
- Complex memory hierarchy driving superscalar, superpipelined, branch predictions, fast TLBs, multiple function units, multi ported register files,.....
- Does that make sense in PIM environment?
- Large bandwidth from direct row buffer access
- Reduced latency (no chip crossings)
- Naturally closely coupled parallelism
- Answer: No! Better choice: design for:
- Maximum performance “per transistor”
- Minimize power per mip