So here we can use a send and receive to achieve our upward roll: |
MPI_SENDRECV_REPLACE(B, m*m, MPI_REAL,top_neighbor, 0, sourceproc, 0, comm2d, status) |
In the case where N=16, we choose the processor (sub-block) at position (2,1 Fortran (1,0) in C), and follow the algorithm through all 4 steps. |
The default topology of nodes in MPI is a one-dimensional ring, with processor numbers, as given by the MPI_rank function, ranging from 0 to n-1, corresponding to the physical processors. |
For this problem, we can use the MPI Cartesian coordinate functions to define and use a different topology of virtual processors. |