Contribution to Rice ASCI Proposal by Geoffrey Fox, Adjunct Professor in Computer Science at Rice MultiScale Applications and Computers Background: The Petaflop workshops have emphasized that both machines and problems will exhibit dynamic hierarchical and heterogeneous structure of increasing complexity. On the application side, machines of increasing power will allow us to simulate physical systems with greater fidelity and see the intrinsic structure in space and time at multiple scales. Metaproblems such as those in multidisciplinary design lead to a heterogeneous linkage of these multiscale subproblems into the full application. For machines, the faster increase of CPU power compared to memory speed and bandwidth will require increasingly complex memory hierarchies to allow one to realize the potential performance. Further the increased importance of distributed computing coming from high speed networks and the World Wide Web phenomena, leads to heterogeneous adaptive target compute environments. This mapping of adaptive irregular hierarchical Metaproblems onto Metacomputers with the same characteristics leads to major challenges to machine designers, compiler writers and users. Tools for Hierarchical Problems and Computers We have extensive experience and success dealing with one level of memory hierarchy – namely the distribution of data and computing seen in classic distributed memory parallel computing. We intend to develop these ideas in two different directions to handle the complex memory hierarchies seen in current and future ASCI machines. Firstly one needs to explore extensions of both MPI style message passing and HPF style data parallel directives to specify communication and computation location within the complex memory hierarchies. As suggested in the Petaflop workshops, this should be implemented in a layered fashion so that the user or compiler can work at a high machine independent layer wherever possible but be able to escape to a lower level allowing higher performance where necessary. Secondly we intend adapting some of the automatic data decomposition techniques developed for parallel machines to memory hierarchies. In particular, we developed some years ago, powerful deterministic annealing methods which were aimed at hierarchical systems. Here one introduces a fake temperature which exposes more of the hierarchy as it lowered and this is naturally suited to the multilevel approach suggested in the Petaflop software studies. These runtime tools will be designed so they can integrate with Fortran C++ and Java which can be expected to be critical languages for ASCI. MultiScale Algorithms This proposal has many strong applications and the corresponding algorithms. Here we enrich the key algorithms in this proposal with a study of the so called fast multipole (or Barnes Hut in astrophysics community) method which is one of the best understood dynamic multiscale algorithms. We will use it as a test case for the runtime tools and compiler work in this proposal. Fox has already strong links to Los Alamos in this area with Mike Warren and both have studied this problem for about eight years. This method has broad applicability to pure particle problems (as in molecular dynamics), mixed particle - continuum systems (where one links adaptive meshes to same spatial subdivisions used for particles – a subject of a recent Ph.D. by Edelsohn at Syracuse), and Green’s function boundary value problems. A nice example of the latter of interest to DoE is the simulation of earthquake stresses where Fox is already working with a group led by Rundle at Colorado and again involving Los Alamos. There have been several investigations of the inclusion of fast multipole algorithms into high level languages without great success. We believe that a promising approach is in terms of a generalized library where optimization parameters (such as block sizes and decomposition strategies) can be set at compile or runtime. We will develop a fast multipole library which can be tuned by the compiler for the particular memory structure of the computer. Thus this work will also link with the proposed studies of MPI and HPF extensions to handle deep memory hierarchies.