The issues in Petaflops software fall into three classes: Fundamental, Engineering, Usability.
As fundamental, we view the linked combination of memory hierarchy--latency and bandwidth that express themselves differently in the various architectures. Each architecture also has different geometric structure (variation of these parameters with data location) for these fundamental parameters. As applications have different geometrical structure, and consequently different locality and bandwidth needs, the memory hierarchy-latency-bandwidth geometry tradeoffs are different for each application. Correspondingly, the various architectures have different performance characteristics on each application. PIM performs well in geometrical structured applications; ``classic shared memory'' on less structured dynamic non-local problems; superconducting systems do not obviously perform well on any applications with substantial memory bandwidth, and communication needs.
As engineering challenges, we first cite scaling. We term this engineering as most potential Petaflops applications do exhibit naturally, enough parallelism (100,000 to 1,000,000) to exploit Petaflops architectures. However, it requires careful architecture to produce systems that do not forget about parallelism in an area (say, input-output) and so are unable to deliver the natural parallelism. We claim that most applications are naturally massively (scaling) parallel, but not necessarily of a hierarchical structure (that matches hierarchy of hardware). Thus, memory hierarchy is a fundamental problem; scaling is engineering. The second obvious engineering issue is building a software environment that is relatively complete and works reliably.
I suggest that current HPCC software has addressed ``fundamental'' issues quite well (expressing geometric structure with HPF and more general decompositions with MPI). However, ``engineering'' issues have dominated the HPCC software with unreliable slow to appear HPF compilers, inadequate tools, and architecture blunders, such as Sequential I/O.
Under usability issues, we put those software features that are designed to make parallel systems easier to use. Portable software is an obvious theme and HPF, HPC++ are examples of systems that try to be more usable than explicit message passing MPI. If you target PetaFLOPS systems at the ``marine corps'' initial users, it is not clear how important usability issues are. Broad use of Petaflops demands usable software. So ``Fundamental'' software issues are those that allow initial users by hook or by crook to implement their applications and get whatever performance the architecture is capable of. ``Engineering'' issues ensure that we manage, design, and fund Petaflops software well enough so it robustly expresses and supports what we know. Usability issues allow a broad range of user access to Petaflops architectures.
If we confront the three software issues with the three prototypical architectures, the ``engineering'' and ``usability'' issues are roughly independent of the chosen architecture. However, the ``fundamental'' issues are extremely sensitive to the architecture as they are intrinsically entwined with the study of what algorithms and applications run on the machine and how they should be implemented.