Seem to be easiest for PIM architectures for classic geometrically data parallel problems -- even if adaptive and irregular (these are "engineering" issues) |
Must extend MPI HPF .. HPJava to support several layers of memory hierarchy |
Need to study data movement in key algorithms and how they can be supported in all prototypical architectures |
Can possibly use "load balancing ideas" (developed for decomposition onto CPU's today) to address fine grain memory hierarchy decomposition in PetaFlop architectures |
Need layered software model with capability to "escape" to lower levels |