Full Matrix Multiplication

Short Description

Run Applet

Source Code

 

1. The Fox's Algorithm

The purpose of this applet is to show how the "The Fox’s Algorithm (Broadcast, Multiply, and Roll)" works in parallel machine. The Fox 's algorithm is one of well-known memory-efficient parallel algorithm for multiplying dense matrices. For N processors machine, the algorithm performs multiplication following steps:

For the algorithm in detail, see Fox, Johnson, Lyzenga, Otto, Salmon and Walker, "Solving Problems on Concurrent Processors", Vol. 1, 1998.

Let's take an example with N=16 machine. The first stage is broadcasting sub-blocks of A along rows.

Then, multiply sub-blocks and add into C,

In the second stage, roll B up one row. The picture below shows result after roll with correct elements of B in place.

Then, broadcast the next sub-blocks on A along the rows.

And, multiply sub-blocks and add into C,

The third and fourth stages are similar to second stage. After those stages, the final result is stored in sub-array C.

If we assume that the program has set up the 2-dimensional grid structure, the MPI style pseudocode for this matrix multiplication is,

 

2. How to use this applet

  1. Control panel
  2. Start, Next: Press the Start key to run this applet. And use the Next key to move to next step.

    Slow, Fast: These buttons adjust the speed of this applet.

  3. Communication Steps: This panel is designed to help the users understand the communication between each nodes graphically. It is showing the communications in each step of the algorithms.
  4. Example panel: This is the actual example of the algorithm. The each box stand for each node of a parallel machine. Matrix C stores the result of multiplication of A and B. And A and B display current value passed from other nodes by the algorithm.
  5. Instruction panel: This panel is for giving some information about current step and short instruction also.

 

Short Description

Run Applet

Source Code