- Input to this program
4 files(file0, file1, file2, file3). The whole data matrix is
distributed on these 4 files. Each of 4 processors reads in one of files.
This idea is about those cases we have larger data set than one processor
can handle. If we break up data into files, we can scale better and
handle more data.
Just in case it is not clearly stated yet, the reason why we divide the data
matrix into 4 files is that we assume each processor can handle
only that one file at a time because of the limitation of
memory size.
- Output of this program
4 files(file0.out, file1.out, file2.out, file3.out). Those are results of
computation.
- Transpose step
p0 p1 p2 p3
-------- -------- -------- --------
| X | | | | | | |
| | | | | | | |
-------- -------- -------- --------
-------- -------- -------- --------
| | | X | | | | |
| | | | | | | |
-------- -------- -------- --------
-------- -------- -------- --------
| | | | | X | | |
| | | | | | | |
-------- -------- -------- --------
-------- -------- -------- --------
| | | | | | | X |
| | | | | | | |
-------- -------- -------- --------
Suppose there are 4 processors. At the above picture, the outer square
means the whole data matrix. It is divided by 4 files and each file is
charged by each processors. When we do transpose, each processor can
divide it's own data set by 4 (number of processors) modules. Those
modules marked 'X' stay, which means "do not involve MPI communication."
Other 3 modules should be sent to other processors to achieve transpose.
- Compile and run
mpicc trans.c -lm -o trans
mpirun -np 4 -machinefile me trans