Cornell Theory Center

MPI Example Programs: mm

Matrix Multiply


Table of Contents


Quick Summary

This example program illustrates a simple domain decomposition (by blocks of rows or columns).


Description of Problem

For the matrix multiplication A * B = C, each element in the result matrix C is formed by taking a row of A and a column of B, multiplying each element of the row with the corresponding element of the column (these must be the same length), and summing these values.

In C:

for (k=0; k<NCB; k++)
  for (i=0; i<NRA; i++) {
      c[i][k] = 0.0;
      for (j=0; j<NCA; j++)
      c[i][k] = c[i][k] + a[i][j] * b[j][k];
     }
In Fortran:
do 100 k=1, NCB
  do 100 i=1, NRA
    c(i,k) = 0.0
    do 100 j=1, NCA
      c(i,k) = c(i,k) + a(i,j) * b(j,k)
100  continue
NCB is the number of columns in matrix B (and C), NRA is the number of rows in A (and C), and NCA is the number of columns in A (and rows in B).


Parallel Implementation

This implementation of a parallel matrix multiply was chosen as a simple example of domain decomposition. There are much more efficient ways to program a matrix multiply.

For the C version, the problem is decomposed by assigning each task a number of consecutive rows of matrix A, and replicating matrix B on all tasks. Each task generates one or more rows of the result matrix C. Looking at the C code fragment above, the work is distributed according to the "i" index.

This decomposition is convenient because C stores matrices in row-major order (elements in the same row of the matrix are consecutive in storage). All message data is therefore contiguous. Each task has all the information needed for its part of the problem -- no intertask communication is necessary during the matrix multiply. In addition, since each row of A has to be multiplied against all columns of B, all data passed to each task is used by that task.

Since Fortran stores matrices in column-major order, the Fortran version is decomposed by replicating matrix A on all tasks, and distributing columns of matrix B to each task. Each task generates one or more columns of the result matrix. Looking at the Fortran code fragment, the work is distributed according to the "k" index.

The code is SPMD, i.e. each task runs the same executable. The task with taskid 0 is the master task, which distributes the matrices and collects results, but does not contribute to the calculation.


Instructions for Compiling and Running

  1. Copy the needed files to your working directory. You may copy the files from within your browser, or use the cp command. Some browsers add or replace characters when files are copied.

    Tutorial directory: /usr/local/doc/www/Edu/Tutor/MPI/Templates/mm/

    Fortran file: mm.f
    C file: mm.c

  2. Compile using the mpxlf or mpcc command.

  3. Specify how many nodes to run on (choose n <= 4):

  4. Execute the master program:
    or, to redirect output to a file:


Cleanup

Some or all of the following files will be left in your directory, and should be removed to save space:
[CTC Home Page] [Search] [Education] [Resources]
[Copyright Statement] [Feedback] [Tutorials] [IBM SP Documentation]

URL http://www.tc.cornell.edu/Edu/Tutor/
updated June 5, 1996