Linear waves in the BBH GC code have been run in both DAGH and HPF. Both implementations allow for very good parallel speedup, since all calculations are regular.
The figure on the left shows the parallel speedup of DAGH vs. F90+MPI. Here DAGH was faster.
Black holes
For uni-grid calculations with black holes cut out of the domain, apparent horizon boundary conditions allow for special forward and backward differencing to determine the derivatives of the grid functions at the ``boundary'' lattice points.
These points are not uniformly distributed across the domain because the are generated from placing a spherical boundary onto Cartesian coordinates.
There has been great difficulty in implementing this in HPF.
We have a DAGH version of this code which allows for black holes in the computational domain.
Linear wave problem with matching outer boundary condition for gravational wave extraction.
The difficulty in this problem for both DAGH and HPF is the following: There are two separate evolutions inside the main program. The Cauchy evolution is a 3D evolution and the characteristic matching problem uses a 1D evolution. The calculation proceeds as follows. The ADM code evolves Einsteins equations, and then lattice points from this evolution are interpolated to the points on the 1D lattice. Then the 1D evolution evolves this system, and then points from this 1D evolution are interpolated back onto the 3D lattice, to give the outer boundary points.
The problem is that there are two separate evolutions, and communications must take place between both evolutions.
Moving Inner boundary
During the evolution, the black holes will move on the computational domain. Since the computations around the holes are more expensive, the distribution of the data must be redistributed in order to achieve ``better'' load balancing.
In HPF there is no method to do this.
In DAGH we are ignoring the load balancing problems around the hole, although DAGH does provide a weight function for grid points, and this feature may be used.
DAGH does not have any special techniques to handle holes, so on processors that have many boundary points, a large slowdown is expected.
Finding the Apparent Horizon
We have two methods to solve this elliptic problem. One method expands the terms in spherical harmonics, and the other method solves the elliptic problem using finite-differencing with a Newton shooting method with a conjugate gradient method.
In order to ``DAGH'' the second method, the algorithm must be completely rewritten, since it uses a ``non-parallel'' method, and DAGH has no support for linear matrix solvers.
In HPF the second method can make use of available parallel elliptic solvers.
The first method has been written in DAGH and has been used with one of the black hole evolution codes.
Elliptic Problem for the shift
During the 3d evolution, we will ``probably'' need to solve a non-linear coupled elliptic PDE, with holes cut out of the domain.
To solve the elliptic problem we want to use a multigrid solver (or Multi Level Adaptive Technique solver).
DAGH does support multigrid, but it does ``nothing special'' on the coarse grids. On the coarse grids the communication will outweigh the computation.
We have implemented multigrid in HPF for ``trivial'' problems, but problems with holes cut out of the domain will once again provide problems.
It is unclear how to solve multigrid on the AMR hierarchy. We can solve the full MLAT equations on the hierarchy, adding extra coarse grids on the coarsest grid, in order to accelerate the convergence. We can also solve each the elliptic equations on each grid independently, using Dirichlet conditions from the parent grid.
Implementation
The problem requires support for Adaptive Mesh Refinement on a multiprocessor machines.
The PDE unigrid solver can be implemented in a data parallel paradigm, and a HPF implementation works very efficiently on systems such as SP-2 or T3D. Parallel implementation of AMR in HPF is difficult at this time because of lack adequate language features implemented in the current generation of the HPF compilers.
Implementation using ``raw'' message passing library, such as MPI is very tedious.
Anticipating these problems the Alliance developed DAGH system (Distributed Adaptive Grid Hierarchy)
DAGH System
DAGH system provides a set of high level abstractions that can be used to implement an existing sequential unigrid PDE solver in the AMR regime on multiprocessor systems.
User defines a cartesian computational domain (``base grid''), and specifies parameters for creating a grid hierarchy. The list of parameters includes maximum levels of refinement, refinement ratio, switch between cell-centered and vertex-centered grids, and more.
User provides routines to update the grid and boundary conditions, and routines that provides data for testing regridding criteria (typically truncation error estimate).
DAGH provides a library of routines for intergrid data movement (prolongation and restriction), and the user is allowed to replace it by his/her own that better matches the problem in hand.
For parallel execution, the user specifies the size of ghost regions.
Parallel Execution in DAGH
DAGH systems automatically decomposes all grids into ``grid components'' and distribute them onto available processors. The algorithm used guarantees the optimal load balance and alignment of grid components on different refinement levels defined on the same computational subdomain. Rebalancing of the load after adaptive regridding requires only next neighbor communications.
Each processor is assigned a set of grid components so that the computations are performed in a data parallel fashion. Interprocessor communication are facilitated by the prefetched ghost regions and a library of DAGH communication functions implemented in C++ on top of MPI (reductions, moving subregions defined by bounding boxes, etc.).
DAGH is implemented in C++, and the user routines, including the PDE solver can be written in Fortran (particularly important when porting an existing code into DAGH). Fortran routines take grid components (a flat set of arrays) as arguments. The price of it is that the DAGH communication library cannot be called from Fortran routines and all computations must be restricted to local data augmented by the prefetched ghost regions. All other interpocessor communications must be performed by the driving C++ routine.
As a consequence, the Fortran code must be partitioned into ``local access only'' fragments to be called from a driving C++ routines. Splitting the code in such a way may be tedious and error prone.
It is particularly inconvenient when advanced data structures are used in Fortran. For example, the Black Hole code, using Fortran 90 derived types, aggregates grid functions into 3-, 6-, 9- or 18-component tensor with specific symmetries built in. Here, the DAGH distributed grid hierarchy is flattened, send to Fortran as a set of arrays, and the tensor structure is build. This operation must be repeated each time when an interpocessor communication is needed.
Fortunately, we have not found any noticeable performance penalty of doing this, and the burden of generating the calling sequences is removed by using custom developed perl scripts for this purpose.
DAGH is not a complete language, such as Fortran or C++ (and it has never been meant to) and therefore it has some limitations of expressiveness. On several occasions we found missing advanced features, such as support for interoperability of grid hierarchies of different dimensionality (3d interior evolution and 2d boundaries). These can be custom developed using a mechanism resembling HPF extrinsic procedure mechanism, though.
HPF & DAGH usage in the Binary Black Hole Grand Challenge
Fox, Haupt, Klasky
Goals of the alliance
Computational infrastructure
ADM Equations
Causal Differencing
Apparent Horizon boundary conditions
Difficulties of the BBH problem in DAGH and HPF
Implementation
Goals of the Alliance
Develop a problem solving environment for Einstein's equations including a dynamical adaptive multilevel parallel infrastructure.
Provide controllable convergent algorithms to compute gravitational waveforms which arise from black hole encounters, and which are relevant to astrophysical events and may be used to predict signals which for detection by ground and space based detectors (LIGO).
To provide representative examples of computational waveforms.
Products
A Parallel Adaptive Computational Infrastructure (DAGH), HPF.
Initial Value Solvers.
Outer Boundary Solvers.
3D Evolution Codes.
Coordinate Conditions.
Computational Infrastructure
To get accurate waveform extraction, for stable evolution codes, our memory requirement is (unigrid codes):
If we use Adaptive Mesh Refinement, our memory requirements can be greatly reduced.
The grids will dynamically adapt to the equations, using a truncation error estimate.
DAGH (Distributive Adaptive Grid Hierarchy) is used to handle the memory management over parallel computers.
DAGH must also handle the solution of elliptic PDE's, as well as hyperbolic PDE's.
We use finite difference techniques for the spatial and temporal derivatives.
HPF can also be used for problems which do not contain black holes.
ADM Equations
3+1 Splitting of Einstein's equations
We decompose the four-dimensional covariance of Einstein's equations such that it is possible to pose them as a Cauchy problem.
A relationship between the components of the metric of spacetime and those of the spatial 3-metric must now be defined.
We can look at the Arnowitt-Desser-Misner (ADM) form of the metric to see this
where the functions a and bi are called the lapse and shift functions, and gij is the 3-metric.
a determines the lapse in proper time, and bi describes the shift in the spatial coordinates, and ti are the tangents to a coordinate observer who in moving through the spacetime moves in a timelike direction by adt, and in a spacelike direction by bi.
The extrinsic curvature is a spacelike object which determines the contraction and shear of the normals to S(t).
The evolution equations we solve are
We have the ability to use either a 3-level non-staggered leapfrog scheme or a 2-level semi-iterative Crank Nicholson scheme to evolve the equations forward in time.
In order to evolve the equations in a stable manner, we make use of causal differencing.
Causal Differencing
Causal Differencing handles the terms in the evolution equations.
moves the world lines of particles in the coordinates and changes the coordinate basis vectors.
Our implementation is as follows
Evaluate the RHS at time level n.
Compute shifted coordinates at levels n-1 and n.
Interpolate the RHS's to the new shifted coordinates at the n time level.
Interpolate gij and Kij at the shifted coordinates on the n-1 time level.
Carry out a leapfrog update.
Advantages
2nd-order convergent and Courant stable for any shift.
Courant stability condition independent of shift.
Natural way to repopulate grid points flowing out of the AH.
Disadvantages
Innermost grid points computed by extrapolation, not interpolation.
Long-term stability is questionable.
Apparent Horizon boundary conditions.
To achieve long-term stability we must cut out the singularity from the computational domain.
Previous black-hole codes produced long term instabilities due to problems associated with grid points being ``sucked'' inside the singularity.
Cosmic Censorship allows us to cut out the points inside the even horizon.
These points are casually disconnected from our universe and do not effect our the other points from our universe.
Since we can not find the even horizon for general black holes, we find the ``Apparent'' horizon, which is defined from the equation of trapped surfaces. It is the innermost trapped surface.
The apparent horizon is defined by as the solution of an elliptic equation.
Difficulties of the BBH Problem
s