Full HTML for Basic Spatial Differencing and ADI Solution of the NAS Benchmarks

Full HTML for

Basic foilset Spatial Differencing and ADI Solution of the NAS Benchmarks

Given by Geoffrey C. Fox at CPSP713 Case studies in Computational Science on Spring Semester 1996. Foils prepared 15 March 1996
Outside Index Summary of Material

This is second of three foilsets on CFD and NAS Benchmarks

We describe spatial differencing including numerical dissipation for stability

General analysis of resultant sparse matrix structure and iterative solvers

Role of preconditioning and artificial time formulations

focus on NAS ADI solution and its parallel solution with various distributions and their consequent communication cost

Table of Contents for full HTML of Spatial Differencing and ADI Solution of the NAS Benchmarks

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

CPS713 Case Study II) CFD and Numerical Relativity
Module on NAS Benchmarks --- Part II
Spatial Differencing and ADI
2

Abstract of CPS713NAS-II: ADI and Spatial Differencing Module
3

Spatial Differencing in NAS Benchmarks
4

Numerical Dissipation in NAS Benchmarks
5

Matrix Formalism for NAS Benchmarks
6

Structure of Sparse Matrices in NAS Benchmarks
7

Three Types of Iteration needed in Solution of NAS Benchmarks
8

Comparison of Structure of Elliptic(steady state) and Hyperbolic Equations
9

Relation between Discretized time in Hyperbolic/Parabolic Equations and Iteration Index for Solution of Steady State Equs
10

Richardson's Method and General Preconditioning Formalism in Artificial Time Framework
11

Analysis of Richardson's Method for Laplace or Poisson Equation
12

Preconditioning in the Artificial Time Approach to Iteration Methods
13

Relation of Artificial Time and Matrix Preconditioning
14

Alternating Direction Iteration -- ADI
15

ADI for NAS Benchmarks -- The first BT or Block Tridiagonal Benchmark
16

Explicit Equations for ADI with NAS Benchmarks
17

Computational Complexity of Solution of ADI Equations
18

Solution of Tridiagonal Equations
19

Parallelization of ADI
20

Communication Cost for Parallel ADI in NAS Benchmarks
21

Examples of Possible Distributions in Two Dimension for Parallel ADI
Distribute over j(h) with all values of i(x) in each node
22

Examples of Possible Distributions in Two Dimension for Parallel ADI Distribute both i and j ( x and h) in square subdomains
23

Basis of Estimates for Communication Costs for ADI
24

Examples of Different Distributions for Parallel ADI Solvers
25

Comparisons of Different Distributions for Parallel ADI Solvers

Outside Index Summary of Material

HTML version of Basic Foils prepared 15 March 1996

Foil 1 CPS713 Case Study II) CFD and Numerical Relativity
Module on NAS Benchmarks --- Part II
Spatial Differencing and ADI

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Geoffrey Fox

NPAC

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 15 March 1996

Foil 2 Abstract of CPS713NAS-II: ADI and Spatial Differencing Module

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

This is second of three foilsets on CFD and NAS Benchmarks

We describe spatial differencing including numerical dissipation for stability

General analysis of resultant sparse matrix structure and iterative solvers

Role of preconditioning and artificial time formulations

focus on NAS ADI solution and its parallel solution with various distributions and their consequent communication cost

HTML version of Basic Foils prepared 15 March 1996

Foil 3 Spatial Differencing in NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Sections 3.4.2,3.4.3,3.4.4 cover basic spatial discretization

We are using uniform grid in x,h,z directions with respective step sizes hx,hh,hz.

These cover unit cube in x,h,z space with total number of points which is for benchmark problems the same NNAS in each spatial direction

NNAS = 64 for class A and 102 for class B

We can use finite difference methods because of uniform grid and these involve such approximations as:

Of course each U should have a time superscript n which is left off for clarity

HTML version of Basic Foils prepared 15 March 1996

Foil 4 Numerical Dissipation in NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

These discretizations need to be extended for general case which is not U itself but rather a function of U and its partial derivatives

All the derivatives need to be modified near boundaries where some of needed points are missing

NAS also adds a new numerical dissipation term to improve stability of numerical solution. Previously highest order derivative came from viscous terms like Ņ2U.

New term is proportional to Ņ4U

Note the negative sign which gives dissipation i.e. stability

Equations (3.25) (3.26) and (3.27) summarize the equations after spatial and temporal discreization

HTML version of Basic Foils prepared 15 March 1996

Foil 5 Matrix Formalism for NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

We can write the right hand side of equation (3.26) as the matrix vector product

The vector Uvec has Nvec = 5 Nx Nh Nz components and RHS is a Nvec by Nvec matrix. The forcing term Hvec is also a vector with Nvec components

The components of vectors and rows and columns of matrices are labelled by i,j,k and CFD index index running from 1 to 5

Formally we can write full equations:

If we had used the explicit Euler method, then equations would have read the simple:

which only needs sparse matrix vector multiplication as its computational kernel.

HTML version of Basic Foils prepared 15 March 1996

Foil 6 Structure of Sparse Matrices in NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Nonzero structure of typical row of RHS is:

with 5 nonzeros about diagonal and two bands -- each with two nonzeros -- each side of diagonal

Each corresponds to a 5 by 5 (number of CFD labels) block matrix which is approximately "full"

So each of 5 NNAS3 rows of RHS has 65 nonzero elements

This can be contrasted with simple differencing to solve three dimensional Laplace equation Ņ2Y=0 where nonzero structure of typical row is:

with 7 nonzeros in each row of row of underlying matrix

Extra complexity of CFD makes parallel algorithms have better performance as calculation/communication ratio increased

HTML version of Basic Foils prepared 15 March 1996

Foil 7 Three Types of Iteration needed in Solution of NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

As with all nonlinear coupled differential equations, one has three distinct iterative processes in solution of NAS equations.

Similar structure is seen for set of coupled Ordinary differential equations describing electrical transmission networks.

1)First Iteration is due to nonlinearity of equations: one linearizes the nonlinear CFD equations as in equation (3.17) by Taylor expansion

to make equations linear in

the unknown Un+1.

2)Second Iteration is that needed for fixed n, to solve the implicit equations for DUn

governed by matrix

3)Third Iteration is that in index n needed to evolve in time t. In our case where time is somewhat artificial as only need steady state solution, one can view n as labelling iterative solution to steady state equations of CFD.

HTML version of Basic Foils prepared 15 March 1996

Foil 8 Comparison of Structure of Elliptic(steady state) and Hyperbolic Equations

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Both Types of equations can be written in matrix form

Elliptic or Steady State Hyperbolic-Parabolic as in steady state CFD

Note is typically not diagonally dominant
as off diagonal terms are of same size as diagonal e,g. in two dimensions Lapace Iteration matrix is:
with sum of elements in
general row zero

Time dependent Hyperbolic or Parabolic

Note is typically nearly diagonal as in NAS case:

HTML version of Basic Foils prepared 15 March 1996

Foil 9 Relation between Discretized time in Hyperbolic/Parabolic Equations and Iteration Index for Solution of Steady State Equs

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

We already showed in CPS615 how one could derive Jacobi, Gauss-Seidel and SOR Iterative schemes from Artificial Diffusion Equations.

Other important steady-state methods such as Conjugate Gradient are not elegantly looked at this fashion but basic idea still relevant.

One derives Iterative methods by splitting matrix K of steady state KU=b, into two

K = M - N , so that one finds M U = N U + b
or the iteration MU(n+1)= NU(n) + b
which M (U(n+1)-U(n)) = -KU(n) + b

The matrix M corresponds to Mim(ex)plicit in typical time dependent hyperbolic numerical solver

Note M is in simple cases (Jacobi) diagonal and is typically diagonally dominant like hyperbolic solver

HTML version of Basic Foils prepared 15 March 1996

Foil 10 Richardson's Method and General Preconditioning Formalism in Artificial Time Framework

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

We can write

As the time discretization of:

Putting Dt = -1 one gets Richardson's method where wfaketime is an acceleration or over relaxation parameter which upto a factor the same as over relaxation parameter w in SOR.

HTML version of Basic Foils prepared 15 March 1996

Foil 11 Analysis of Richardson's Method for Laplace or Poisson Equation

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

This is identical upto scale factor with that of relaxation for Jacobi method and one finds the plot of eigenvalues of iteration matrix

G= 1 + wfaketimeK as below

The optimal choice of wfaketime is 0.25 for a normalization where diagonal terms of K for Laplace equation is -4. This is identical to w=1 being optimal for relaxation analysis of Jacobi Over Relaxation.

HTML version of Basic Foils prepared 15 March 1996

Foil 12 Preconditioning in the Artificial Time Approach to Iteration Methods

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

We can generalize the simple equations

To the form

with arbitary matrix P

As the left hand side vanishs in equilibrium

This can be discretized in time to:

With P playing role of in NAS Benchmark

Note only the product wDt enters i.e. time step Dt plays same role as relaxation parameter w.

HTML version of Basic Foils prepared 15 March 1996

Foil 13 Relation of Artificial Time and Matrix Preconditioning

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

The matrix form of Preconditioning was to replace

By the form:

Which has obvious Similarities with the artificial time form:

Exploiting analogy between and P, we can relate particular approachs to solving the implicit equations of CFD to particular preconditioning schemes for the steady state equations

HTML version of Basic Foils prepared 15 March 1996

Foil 14 Alternating Direction Iteration -- ADI

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

In Dongarra's template book and Hirsch, ADI (Alternating Direction Iteration) is referred to as a preconditioning scheme

In our treatment of NAS benchmarks, it is derived intuitively from solution of implicit equations

Our derivation is probably clearer as it naturally gives "correct" relaxation parameter wDt

ADI exploits the special form of which is

with one minus a sum of "small" terms -- corresponding to the separate discretizations in x h and z directions.

HTML version of Basic Foils prepared 15 March 1996

Foil 15 ADI for NAS Benchmarks -- The first BT or Block Tridiagonal Benchmark

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

This is approximate factorization (Beam-Warming) algorithm with unnumbered equations at start of section 3.4.7 of NAS Benchmark

Taking general
form
With Equation
to be Solved:
We write this in factorized fashion:

HTML version of Basic Foils prepared 15 March 1996

Foil 16 Explicit Equations for ADI with NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

This factorized Operator can now be solved in three steps:

Each step involves solution of a 5 by 5 block tridiagonal matrix

The matrix is tridiagonal -- not pentadiagonal -- as numerical dissipation with Ņ4U is only added to explicit RHS term in this approximation

HTML version of Basic Foils prepared 15 March 1996

Foil 17 Computational Complexity of Solution of ADI Equations

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

The original nonfactorized equations were sparse matrices with bandwidth L=(2N 2NAS+1)

Solving equations with M rows and columns and bandwidth L takes ML2 operations

This gives original complexity for M=N 3NAS proportional to N 7NAS

A tridiagonal matrix has bandwidth 3 and so complexity of ADI factorized equations is just proportional to M or N 3NAS

Effect of 5 by 5 blocks:

This implies that every operation is that of 5 by 5 matrix vector multiplication or matrix/vector addition i.e. one increases estimate by roughly 53 or 125.

HTML version of Basic Foils prepared 15 March 1996

Foil 18 Solution of Tridiagonal Equations

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Here denotes a nonzero element of the tridiagonal matrix

HTML version of Basic Foils prepared 15 March 1996

Foil 19 Parallelization of ADI

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Each tridiagonal solve is not usefully parallelized

However this is not important as one has N 2NAS Independent tridiagonal solves for each of three sweeps

For instance for equation

with components labelled by (i,j,k) for three x h and z directions, the equations for each distinct value of (j,k) are independent.

HTML version of Basic Foils prepared 15 March 1996

Foil 20 Communication Cost for Parallel ADI in NAS Benchmarks

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

We decompose the three dimensional grid over the Nproc nodes of a parallel machines

There are the usual choices for this depending on the shape of regions stored in each processor

Example 1: Distribute over j(h) with all values of i and k (x and z) for each j stored in a node

Then one has communication at sweep 2 but none at sweep 1 or 3.

Example 2: Distribute as cubic blocks with all three indices i j and k distributed.

This type of distribution is normally optimal for nearest neighbor problems as it minimizes surface over volume effects.

This is not quite clear for ADI, as each sweep only in one dimension

HTML version of Basic Foils prepared 15 March 1996

Foil 21 Examples of Possible Distributions in Two Dimension for Parallel ADI
Distribute over j(h) with all values of i(x) in each node

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

No Communication at sweep 1(loop over i for fixed j)

Sweep 2 has communication for each j value in loop for fixed i.

HTML version of Basic Foils prepared 15 March 1996

Foil 22 Examples of Possible Distributions in Two Dimension for Parallel ADI Distribute both i and j ( x and h) in square subdomains

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Communication at each sweep but only at every fourth index value

Need to Pipeline as at start only edge processors active and need to work from both ends towards the middle

e.g. four computation steps before middle processors can get started
This contrasts with sixteen computation steps in each processor

HTML version of Basic Foils prepared 15 March 1996

Foil 23 Basis of Estimates for Communication Costs for ADI

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Communication is needed when one sweeps in a given index, say i, and that index location changes

Note Algorithms for sequential tridiagonal solvers are given in the appendix of volume 1 of Hirsch

For a scalar tridiagonal matrix with N rows, the solution takes time 5N

For a block tridiagonal matrix, one increases this by a factor between 25 and 125 for 5 by 5 blocks.

Note communication proportional to "5" i.e. size of vector being calculated as one changes block size

However calculation proportional to 52 to 53 as one changes block size i.e. it increases faster with block size than communication

This illustrates that extra CFD components (1 goes to 5) increases efficiency of parallel CFD codes compared to comparable scalar equations

HTML version of Basic Foils prepared 15 March 1996

Foil 24 Examples of Different Distributions for Parallel ADI Solvers

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

Consider a 64 by 64 by 64 grid with 64 processors

Example 1: One Dimensional Decomposition --

Each processor holds 1 by 64 by 64 grid points
Communication at one sweep only of size
1 x 642(number of equations per node)
x 64 (Number of times processor boundaries crossed)

Example 2: Two Dimensional Decomposition --

Each Processor holds 8 by 8 by 64 grid points
Communication at two sweeps of total size
2 (Number of Sweeps) x 642 (Number of Equations)
x 8 (Number of Processor Boundaries Crossed)

Example 3: Three Dimensional Decomposition --

Each Processor holds 16 by 16 by 16 grid points
Communication at every sweep of total size
3 (Number of Sweeps) x 642 (Number of Equations)
x 4 (Number of Processor boundaries Crossed

HTML version of Basic Foils prepared 15 March 1996

Foil 25 Comparisons of Different Distributions for Parallel ADI Solvers

From Spatial Differencing and ADI Solution of the NAS Benchmarks CPSP713 Case studies in Computational Science -- Spring Semester 1996. *

Full HTML Index

The Naive Ratio of Communication in three distributions is:

64 : 16 : 12 for One : Two : Three Dimensional Decompositions

Three dimensions also has lower pipelining start-up effects

However -- certainly for one dimensional case -- it is better to redistribute data before sweep that crosses processor boundaries

Total Algorithm Communication cost 643 Units
For a redistribution that reduces algorithm communication cost to zero, one has redistribution cost that is proportional to 642 (Number of elements per processor) times a number which depends on communication network -- this number is loge64=6 for a hypercube

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Feb 22 1998

Basic foilset Spatial Differencing and ADI Solution of the NAS Benchmarks

Table of Contents for full HTML of Spatial Differencing and ADI Solution of the NAS Benchmarks

Foil 1 CPS713 Case Study II) CFD and Numerical RelativityModule on NAS Benchmarks --- Part IISpatial Differencing and ADI

Foil 2 Abstract of CPS713NAS-II: ADI and Spatial Differencing Module

Foil 3 Spatial Differencing in NAS Benchmarks

Foil 4 Numerical Dissipation in NAS Benchmarks

Foil 5 Matrix Formalism for NAS Benchmarks

Foil 6 Structure of Sparse Matrices in NAS Benchmarks

Foil 7 Three Types of Iteration needed in Solution of NAS Benchmarks

Foil 8 Comparison of Structure of Elliptic(steady state) and Hyperbolic Equations

Foil 9 Relation between Discretized time in Hyperbolic/Parabolic Equations and Iteration Index for Solution of Steady State Equs

Foil 10 Richardson's Method and General Preconditioning Formalism in Artificial Time Framework

Foil 11 Analysis of Richardson's Method for Laplace or Poisson Equation

Foil 12 Preconditioning in the Artificial Time Approach to Iteration Methods

Foil 13 Relation of Artificial Time and Matrix Preconditioning

Foil 14 Alternating Direction Iteration -- ADI

Foil 15 ADI for NAS Benchmarks -- The first BT or Block Tridiagonal Benchmark

Foil 16 Explicit Equations for ADI with NAS Benchmarks

Foil 17 Computational Complexity of Solution of ADI Equations

Foil 18 Solution of Tridiagonal Equations

Foil 19 Parallelization of ADI

Foil 20 Communication Cost for Parallel ADI in NAS Benchmarks

Foil 21 Examples of Possible Distributions in Two Dimension for Parallel ADIDistribute over j(h) with all values of i(x) in each node

Foil 22 Examples of Possible Distributions in Two Dimension for Parallel ADI Distribute both i and j ( x and h) in square subdomains

Foil 23 Basis of Estimates for Communication Costs for ADI

Foil 24 Examples of Different Distributions for Parallel ADI Solvers

Foil 25 Comparisons of Different Distributions for Parallel ADI Solvers

Foil 1 CPS713 Case Study II) CFD and Numerical Relativity
Module on NAS Benchmarks --- Part II
Spatial Differencing and ADI

Foil 21 Examples of Possible Distributions in Two Dimension for Parallel ADI
Distribute over j(h) with all values of i(x) in each node