1 | Need (to investigate) Threads and Parallel Compilers for latency tolerance |
2 |
Need Latency Tolerant BLAS and higher level Capabilities
|
3 |
Performance should be monitored (with no software overhead) in hardware
|
4 | Resource Management in presence of complex memory hierarchy |