Contribution to Rice ASCI Proposal
by Geoffrey Fox, Adjunct Professor in Computer Science at Rice

MultiScale Applications and Computers
Background: The Petaflop workshops have emphasized  that both machines and 
problems will exhibit dynamic hierarchical and heterogeneous structure of increasing 
complexity. On the application side, machines of increasing power will allow us to 
simulate physical systems with greater fidelity and see the intrinsic structure in space and 
time at multiple scales. Metaproblems such as those in multidisciplinary design lead to a 
heterogeneous linkage of these multiscale subproblems into the full application. For 
machines, the faster increase of CPU power compared to memory speed and bandwidth 
will require increasingly complex memory hierarchies to allow one to realize the potential 
performance. Further the increased importance of distributed computing coming from 
high speed networks and the World Wide Web phenomena, leads to heterogeneous 
adaptive target compute environments. This mapping of adaptive irregular hierarchical 
Metaproblems onto Metacomputers with the same characteristics leads to major 
challenges to machine designers, compiler writers and users.

Tools for Hierarchical Problems and Computers
We have extensive experience and success dealing with one level of memory hierarchy – 
namely the distribution of data and computing seen in classic distributed memory parallel 
computing. We intend to develop these ideas in two different directions to handle the 
complex memory hierarchies seen in current and future ASCI machines. Firstly one needs 
to explore extensions of both MPI style message passing and HPF style data parallel 
directives to specify communication and computation location within the complex 
memory hierarchies. As suggested in the Petaflop workshops, this should be implemented 
in a layered fashion so that the user or compiler can work at a high machine independent 
layer wherever possible but be able to escape to a lower level allowing higher 
performance where necessary. Secondly we intend adapting some of the automatic data 
decomposition techniques developed for parallel machines to memory hierarchies. In 
particular, we developed some years ago, powerful deterministic annealing methods 
which were aimed at hierarchical systems. Here one introduces a fake temperature which 
exposes more of the hierarchy as it lowered and this is naturally suited to the multilevel 
approach suggested in the Petaflop software studies.
These runtime tools will be designed so they can integrate with Fortran C++ and Java 
which can be expected to be critical languages for ASCI.

MultiScale Algorithms
This proposal has many strong applications and the corresponding algorithms. Here we 
enrich the key algorithms in this proposal with a study of the so called fast multipole (or 
Barnes Hut in astrophysics community) method which is one of the best understood 
dynamic multiscale algorithms. We will use it as a test case for the runtime tools and 
compiler work in this proposal. Fox has already strong links to Los Alamos in this area 
with Mike Warren and both have studied this problem for about eight years. This method 
has broad applicability to pure particle problems (as in molecular dynamics), mixed 
particle - continuum systems (where one links adaptive meshes to same spatial 
subdivisions used for particles – a subject of a recent Ph.D. by Edelsohn at Syracuse), and 
Green’s function boundary value problems. A nice example of the latter of interest to 
DoE is the simulation of earthquake stresses where Fox is already working with a group 
led by Rundle at Colorado and again involving Los Alamos. There have been several 
investigations of the inclusion of fast multipole algorithms into high level languages 
without great success. We believe that a promising approach is in terms of a generalized 
library where optimization parameters (such as block sizes and decomposition strategies) 
can be set at compile or runtime. We will develop a fast multipole library which can be 
tuned by the compiler for the particular memory structure of the computer. Thus this 
work will also link with the proposed studies of MPI and HPF extensions to handle deep 
memory hierarchies.