CRPC talk HIGH PERFORMANCE FORTRAN: Application Perspective Tomasz Haupt Northeast Parallel Architectures Center at Syracuse University March 1995 High Performance Fortran Applications Project educational material support for application developers (HPF users group) testing and evaluation of HPF compilers available at NPAC (APR, DEC, PGI) development and collection of HPF applications, including Grand Challenges: 4-dimensional Data Assimilation Binary Black Hole Simulations language evaluation How We Test Compilers Level Compiler Suite (PARKBENCH) AA,SH,ST, and IR kernels: communication detection in array assignments TL explicit template FL FORALL statements that are difficult or inconvenient to write using Fortran 90 syntax RD nonelemental intrinsic functions (reductions) AS, IT, IM: passing distributed arrays to subroutines Application Kernels from existing NPAC suite Early Experience with the Compilers focus on robustness; performance sometimes lousy different implementation strategy (nonportable performance) different scope currently implemented reasonable tools: debuggers and profilers better diagnostics would be helpful HPF Applications ~ 50 applications collected, in various dialects of Fortran, including CM-Fortran and HPF Comes from different disciplines: Physics, Chemistry, Biology, Financial and other Different classes of algorithms: PDE solvers, FFT, Monte-Carlo, Clustering, Multilevel-Paradigm and more Also, irregular problems: motivating applications for HPF2 Few Representative Examples: Option Pricing Model: Monte-Carlo, binomial approximation models incorporating stochastic volatility (G. Cheng, NPAC) Gravitational Waves Extraction (4-dimensional scalar wave): Characteristic Initial Value Problem (R. Gomez, U.Pittsburgh, T. Haupt) ADI methods (Alternate Direction Implicit) (A. Degani, NPAC) Acoustic Wave Propagation (K.Roe, T.Haupt) System of coupled PDE hyperbolic equations solved using the Lax method: More HPF Application Examples Region Growing / Clustering: Split and Merge (N. Copty, Thesis Sept 95 NPAC) Four-Dimensional Data Assimilation NASA Goddard-JPL-NPAC The goal of Four Dimensional Data Assimilation is to incorporate actual observations (satellite, ship, land surface, baloon) into mathemathical and computational models in order to create a unified, complete description of the atmosphere. Data Assimilation: Optimal Interpolation Analysis Optimal Interpolation combines model predictions and observational data to produce a minimum error representation of the state of the atmosphere. Two stages: preprocessing of observational data to eliminate gross failures actual injecting observational data to the computational grid of the numerical model HPF implementation (in progess) Phase I: Òfine grainÓ - all observational data are processed in parallel Phase II: Òcoarse grainÓ - the computational domain in divided into subdomains (Òmini-volumesÓ) and these subdomains are processed in parallel Binary Black Holes Simulation The Alliance will produce an accurate, efficient description of the coalescence of black holes, and gravitational radiation emitted, by solving computationally EinsteinÕs equations for gravitational fields with direct application to the gravity-wave detection systems LIGO and VIRGO under construction in USA and Europe. BBH: Computational Challenge Problem size: Analysis with Uniform Grid Requested spatial resolution: 50 mesh points per black hole (radius of event horizon, R) To extract gravitational waves a space region of ~100 R is necessary Number of mesh points: (50 x100)3 => ~ 1011 Time evolution: 50,000 steps (corresponds to distance ~1000R with dt=dx) Total number of events: ~1016 Floating point operations per event: ~104 Total FLOP count: 1020 => 30 years of a Teraflop machine! Solution: Adaptive Mesh Refinement one week of a Teraflop machine Implementation of AMR in BBH Architecture of AMR for both HPF and other implementation Fortran90 Module Structure We use object oriented as well as parallel features of HPF Data Structures in F90 BBH Code See previous foil for definition of module types Parallelization of AMR (in HPF) -- I HPF currently doesnt support necessary general irregular distributions for good AMR parallelization Parallelization of AMR (in HPF) -- II The current ÒallowedÓ distribution of all meshes over all processors is not optimal but maybe sufficient HPF Experiences from BBH Sequential Fortran 90 allows for fast prototyping support for dynamical data structure use of explicitly (data) parallel constructs HPF imposes restrictions. Features not yet supported: mapping of components of derived types irregular block distribution It is crucial to adjust mapping for a particular problem BBH Test case: 2D Wave Equation two levels of refinements in HPF This is a simple prototype of BBH activity Our Overall HPF Evaluation Today If all you need is FORALL and reduction functions: HPF is there! descriptive mapping is essential ÒpassingÓ relative alignment is even more important More realistically, you need Òcoarse grain parellelismÓ as well PURE functions called from FORALL INDEPENDENT DO Sooner or later, you will need to use HPF library sorting, packing, scanning, random number generators... Support for full Fortran 90 (including modules, derived types) makes programming easier It is difficult to reimplement Òdusty decksÓ to HPF. We need a much more sophisticated compilers to achieve this. To get performance now you must think HPF.