Need (to investigate) Threads and Parallel Compilers for latency tolerance |
Need Latency Tolerant BLAS and higher level Capabilities
|
Performance should be monitored (with no software overhead) in hardware
|
Resource Management in presence of complex memory hierarchy |