Full HTML for

Scripted foilset CPS615-Linear Programming and Whirlwind Full Matrix Discussion

Given by Geoffrey C. Fox at Delivered Lectures of CPS615 Basic Simulation Track for Computational Science on 5 Decemr 96. Foils prepared 29 December 1996
Outside Index Summary of Material Secs 66.2


This lecture covers two distinct areas.
Firstly a short discussion of LInear Programming -- what type of problems its used for, what the equations look like and basic issues in the difficult use of parallel processing
Then we give an abbreviated discussion of Full Matrix algorithms covering
  • The types of applications that use them
  • Matrix Multiplication including Cannon's algorithm in detail
  • Use of MPI primitives including communicator groups
  • Performance Analysis

Table of Contents for full HTML of CPS615-Linear Programming and Whirlwind Full Matrix Discussion

Denote Foils where Image Critical
Denote Foils where HTML is sufficient
Denote Foils where Image is not available
Indicates Available audio which is lightpurpleed out if missing
1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996

2 Abstract of Dec 5 1996 CPS615 Lecture
3 5:Examples IV -- Linear Programming
4 46:Linear Programming
5 47:Convex Regions and Linear Programming
6 48:Matrix Formulation of Linear Programming
7 Review of Matrices seen in PDE's
8 Examples of Full Matrices in Chemistry
9 Operations used with Hamiltonian operator
10 Examples of Full Matrices in Electromagnetics
11 Computational Electromagnetics Formalism I
12 Computational Electromagnetics Formalism II
13 Comments on Computational Electromagnetics
14 Summary of Use of Full Matrices in Chemistry
15 Notes on the use of full matrices
16 Full Matrix Multiplication
17 Sub-block definition of Matrix Multiply
18 Some References
19 The First Algorithm
(Broadcast, Multiply, and Roll)

20 The first stage -- index n=0 in sub-block sum -- of the algorithm on N=16 example
21 The second stage -- n=1 in sum over subblock indices -- of the algorithm on N=16 example
22 Second stage, continued
23 Look at the whole algorithm on one element
24 Cartesian Topology in MPI -- General
25 Matrix Multiplication MPI Style Pseudocode
26 Matrix Multiplication Pseudocode, continued
27 Performance Analysis of Matrix Multiplication
28 Cannon's Algorithm for Matrix Multiplication
29 Parallel Decomposition

Outside Index Summary of Material



HTML version of Scripted Foils prepared 29 December 1996

Foil 1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of December 5 - 1996

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 66.2
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Scripted Foils prepared 29 December 1996

Foil 2 Abstract of Dec 5 1996 CPS615 Lecture

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 66.2
This lecture covers two distinct areas.
Firstly a short discussion of LInear Programming -- what type of problems its used for, what the equations look like and basic issues in the difficult use of parallel processing
Then we give an abbreviated discussion of Full Matrix algorithms covering
  • The types of applications that use them
  • Matrix Multiplication including Cannon's algorithm in detail
  • Use of MPI primitives including communicator groups
  • Performance Analysis

HTML version of Scripted Foils prepared 29 December 1996

Foil 3 5:Examples IV -- Linear Programming

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 357.1
See Original Foil

HTML version of Scripted Foils prepared 29 December 1996

Foil 4 46:Linear Programming

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 247.6
See Original Foil

HTML version of Scripted Foils prepared 29 December 1996

Foil 5 47:Convex Regions and Linear Programming

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 224.6
See Original Foil

HTML version of Scripted Foils prepared 29 December 1996

Foil 6 48:Matrix Formulation of Linear Programming

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 437.7
See Original Foil

HTML version of Scripted Foils prepared 29 December 1996

Foil 7 Review of Matrices seen in PDE's

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 247.6
We have studied partial differential equations, for example
and shown how they translate into matrix equations.
These matrices were "sparse". The operator only linked a few states ( values in , components of ).
"Full" matrices are those with "essentially all" elements nonzero. More precisely, it is not worth exploiting the zero's.
  • i.e. treat zero's as "ordinary" numbers with a*0=0 and a+0=a implemented using floating point unit and not by omitting computation!

HTML version of Scripted Foils prepared 29 December 1996

Foil 8 Examples of Full Matrices in Chemistry

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 64.8
Full matrices come from "operators" which link several (~all) basis states used in expressing general state.
Examples from Chemistry:
Operator is Hamiltonian H with matrix elements,
States can be labelled by many quantities:
  • positions of electrons
  • number of electrons
  • orbits of electrons
  • vibrational modes
  • chemical compound
  • "channel" etc.

HTML version of Scripted Foils prepared 29 December 1996

Foil 9 Operations used with Hamiltonian operator

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 97.9
Often you want to find eigenstates
where operator H is diagonal with respect to basis set . However, this is usually impossible.
Often one knows that
for example, if the compound is ,then is the "free" Hamiltonian for isolated molecules
and is the interaction, i.e. the forces between atoms.
Simple states diagonalize , but will be nonzero for "most" i and j.

HTML version of Scripted Foils prepared 29 December 1996

Foil 10 Examples of Full Matrices in Electromagnetics

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 144
According to survey by Edelman, the dominant use of "large" matrix inversion on Supercomputers is the Method of Moments for Computational Electromagnetics. This method of solving matrix equations was invented by Harrington at Syracuse University in about 1967.
Suppose we have
where the are suitable expansion functions for which can be calculated. Then
The easiest method would be to use eigenfunctions

HTML version of Scripted Foils prepared 29 December 1996

Foil 11 Computational Electromagnetics Formalism I

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 63.3
This is a very important method although you can't find eigenvectors often.
For example, in "scattering problems", which are usual in electromagnetics, eigenvalues are continuous. In the case,
for any , is an eigenfunction with eigenvalue:
So we look at problems where is not an eigenfunction.
where we need .

HTML version of Scripted Foils prepared 29 December 1996

Foil 12 Computational Electromagnetics Formalism II

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 44.6
Choose "suitable" set of weight functions
becomes with
  • the vector a given by:
  • is matrix with matrix elements
  • is vector with coefficients

HTML version of Scripted Foils prepared 29 December 1996

Foil 13 Comments on Computational Electromagnetics

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 300.9
Read the literature (for example, Computer Physics Communications, Nov. 1991) for choices of and . Clearly one will take as functions for which can be easily calculated.
Comments:
  • With N expansion functions - the computational work is
  • if We have N grid points
    • with best methods, work is
    • with worst methods, work is
  • However, wave equations have "oscillating" solutions. These could be very hard to represent numerically with finite differences and grid points but could be just fine if oscillation embodied as oscillating expansion functions.

HTML version of Scripted Foils prepared 29 December 1996

Foil 14 Summary of Use of Full Matrices in Chemistry

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 96.4
Eigenvalues/vectors - to find "bound states", "equilibrium states" - for example, using MOPAC
Equation Solutions - for reasons similar to those just discussed
Multiplication to "change basis"

HTML version of Scripted Foils prepared 29 December 1996

Foil 15 Notes on the use of full matrices

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 178.5
From the previous discussion, we see that matrix multiplication is actually very rarely used in scientific computing for large N. (Yet it's a favorite computer science algorithm!)
Full matrix equation solvers are sometimes used but not very common. Of course equation solvers are incredibly important for sparse matrices because if the matrix is large
  • the physics of the problem will make it sparse
  • it will be insoluble unless it is sparse
In solving , formally , but this is NOT normally the best numerical method.
If is sparse, both and the better LU decomposition are NOT sparse.

HTML version of Scripted Foils prepared 29 December 1996

Foil 16 Full Matrix Multiplication

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 228.9
Suppose we want to multiply the matrices A and B together to form matrix
C:
We will assume all matrices are square - the algorithm can be generalized to deal with rectangular matrices.
The input matrices, A and B, are decomposed into rectangular sub-blocks.
If we have N processors we have rows and columns of sub-blocks. This means that N should be a perfect square. (Of course this condition can be relaxed -- it is used for simplicity of discussion)
One sub-block is assigned to each processor in a grid corresponding to the sub-block structure.
The algorithm ensures that the output matrix C has the same decomposition as A and B.
If A B and C are M by M with M2 = N m2 and thus M = m
Then each processor stores an m by m subblock
Processors are labelled by a two vector (l,k) l,k = 1 ...

HTML version of Scripted Foils prepared 29 December 1996

Foil 17 Sub-block definition of Matrix Multiply

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 125.2
Let be the sub-block at position (l,k), then the problem can be stated in block matrix form:
Example of sub-block structure of matrix A where N = 16 ( = 4):

HTML version of Scripted Foils prepared 29 December 1996

Foil 18 Some References

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 36
Fox, Johnson, Lyzenga, Otto, Salmon and Walker, "Solving Problems on Concurrent Processors", Vol. 1, 1988, p. 167, for the broadcast, multiply, roll algorithm.
Cannon, L., Ph.D. thesis, Montana State University, Bozeman, MN, 1969.
Demmel, Heath, and van der Vorst, "Parallel Numerical Linear Algebra", 1993, for a detailed survey of algorithms.
Agarwal, Gustavson, and Zubair, "A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication", IBM Journal of Research and Development, which discusses issues of overlapping computing and communication in block matrix multiply.

HTML version of Scripted Foils prepared 29 December 1996

Foil 19 The First Algorithm
(Broadcast, Multiply, and Roll)

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 126.7
See Fox, Johnson, Lyzenga, Otto, Salmon and Walker, "Solving Problems on Concurrent Processors", Vol. 1, 1988

HTML version of Scripted Foils prepared 29 December 1996

Foil 20 The first stage -- index n=0 in sub-block sum -- of the algorithm on N=16 example

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 269.2
Broadcast sub-blocks of A along rows to form matrix T
Multiply sub-blocks and add into C

HTML version of Scripted Foils prepared 29 December 1996

Foil 21 The second stage -- n=1 in sum over subblock indices -- of the algorithm on N=16 example

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 100.8
Roll B up one row (picture
shows result after roll with
correct elements of B in place)
Broadcast the next sub-blocks
of A along the rows

HTML version of Scripted Foils prepared 29 December 1996

Foil 22 Second stage, continued

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 23
Multiply sub-blocks and add into C

HTML version of Scripted Foils prepared 29 December 1996

Foil 23 Look at the whole algorithm on one element

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 31.6

HTML version of Scripted Foils prepared 29 December 1996

Foil 24 Cartesian Topology in MPI -- General

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 129.6
Under the new MPI cartesian topology, there are two functions which convert between the logical process coordinates, such as [i,j] in our 2D topology, to the (possibly new) 1D topolgy rank number required by the send and receive functions.
MPI_CART_COORDS(comm2d, myrank, 2, coords)
Takes a processor number myrank, in a 2 dim communicator called comm2d, and returns the 2 element vector cooresponding to the [i,j] position of this processor in the new system.
MPI_CART_RANK(comm2d, [i,j], myrank)
Takes a 2 element int array with logical process coordinates i and j and returns a 1D processor rank in the variable myrank.

HTML version of Scripted Foils prepared 29 December 1996

Foil 25 Matrix Multiplication MPI Style Pseudocode

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 97.9
Assume that main program has set up the 2-dimensional grid structure

HTML version of Scripted Foils prepared 29 December 1996

Foil 26 Matrix Multiplication Pseudocode, continued

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 141.1
Assume that each processor has array coords, which tell its position in the processor grid and a "rows" communicator group is created:
  • MPI_Cart_coords(comm2d, myrank, 2, coords)
  • MPI_Cart_sub(comm2d, [F,T], comm2drows)

HTML version of Scripted Foils prepared 29 December 1996

Foil 27 Performance Analysis of Matrix Multiplication

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 619.2
Time of matrix multiply
  • Time for pipe broadcast of A
  • Time to roll B
  • Time to calculate C
  • Total time
Efficiency is given by
Overhead, where :
  • If n = m2 is the grain size, then
  • with typical two dimensional character with
  • overhead proportional to in d dimensions

HTML version of Scripted Foils prepared 29 December 1996

Foil 28 Cannon's Algorithm for Matrix Multiplication

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 100.8
This is a similar matrix multiplication algorithm to the first in that it assumes the block definition of matrix multiply
and the calculation of each is done in place (owner computes rule) by moving the pairs of blocks and to the processor in which it resides.
Cannon's algorithm differs in the order in which the pairs of blocks are multiplied. We first "skew" both the A and the B matrix so that we can "roll" both A and B - A to the left and B to the top - to circulate the blocks in row l and column k to calculate .
Reference:
Ho, Johnsson and Edelman

HTML version of Scripted Foils prepared 29 December 1996

Foil 29 Parallel Decomposition

From CPS615-Linear Programming and Whirlwind Full Matrix Discussion Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 5 Decemr 96. *
Full HTML Index Secs 247.6
We must choose a decomposition for the matrix A which is load balanced throughout the algorithm, and which minimizes communication.
  • Contiguous blocks or rows or columns?
  • This won't work since it is not load balanced. Once processing of a block of rows or columns is completed, the corresponding processor will have nothing to do. For example in a decomposition of rows by block, when the computational window has reached the point as shaded in the diagram, processor 0 is idle for the rest of the algorithm.

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Thu Aug 14 1997