Full Matrix Multiplication
1. The Fox's Algorithm
The purpose of this applet is to show how the "The Fox’s Algorithm (Broadcast, Multiply, and Roll)" works in parallel machine. The Fox 's algorithm is one of well-known memory-efficient parallel algorithm for multiplying dense matrices. For N processors machine, the algorithm performs multiplication following steps:
For the algorithm in detail, see Fox, Johnson, Lyzenga, Otto, Salmon and Walker, "Solving Problems on Concurrent Processors", Vol. 1, 1998.
Let's take an example with N=16 machine. The first stage is broadcasting sub-blocks of A along rows.
Then, multiply sub-blocks and add into C,
In the second stage, roll B up one row. The picture below shows result after roll with correct elements of B in place.
Then, broadcast the next sub-blocks on A along the rows.
And, multiply sub-blocks and add into C,
The third and fourth stages are similar to second stage. After those stages, the final result is stored in sub-array C.
If we assume that the program has set up the 2-dimensional grid structure, the MPI style pseudocode for this matrix multiplication is,
2. How to use this applet
Start, Next: Press the Start key to run this applet. And use the Next key to move to next step.
Slow, Fast: These buttons adjust the speed of this applet.