integer status(MPI_STATUS_SIZE) An array to store status
integer mpierr, count, datatype, source, tag, comm
integer recvbuf(100)
source=MPI_ANY_SOURCE accept any source processor
tag=MPI_ANY_TAG accept anmy message tag
call MPI_RECV (recvbuf,count,datatype,source,tag,comm,status,mpierr)
Note source and tag can be wild-carded
#include "mpi.h"
main( int argc, char **argv )
char message[20];
int i, rank, size, tag=137; # Any value of type allowed
MPI_Status status;
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size); # Number of Processors
MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Who is this processor
if( rank == 0 ) { # We are on "root" -- Processor 0
strcpy(message,"Hello MPI World"); # Generate message
for(i=1; i<size; i++) # Send message to size-1 other processors
MPI_Send(message, strlen(message)+1, MPI_CHAR, i, tag, MPI_COMM_WORLD);
MPI_Recv(message,20, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
printf("This is a message from node %d saying %s\n", rank, message);
One can often perform computing during a collective communication
MPI_REDUCE performs reduction operation of type chosen from
maximum(value or value and location), minimum(value or value and location), sum, product, logical and/or/xor, bit-wise and/or/xor
e.g. operation labelled MPI_MAX stores in location result of processor rank the global maximum of original in each processor as in
call MPI_REDUCE(original,result,1,MPI_REAL,MPI_MAX,rank,comm,ierror)
One can also supply one's own reduction function
MPI_ALLREDUCE is as MPI_REDUCE but stores result in all -- not just one -- processors
MPI_SCAN performs reductions with result for processor r depending on data in processors 0 to r
Four Processors where each has a send buffer of size 2
0 1 2 3 Processors
(2,4) (5,7) (0,3) (6,2) Initial Send Buffers
MPI_BCAST with root=2
(0,3) (0,3) (0,3) (0,3) Resultant Buffers
MPI_REDUCE with action MPI_MIN and root=0
(0,2) (_,_) (_,_) (_,_) Resultant Buffers
MPI_ALLREDUCE with action MPI_MIN and root=0
(0,2) (0,2) (0,2) (0,2) Resultant Buffers
MPI_REDUCE with action MPI_SUM and root=1
(_,_) (13,16) (_,_) (_,_) Resultant Buffers
Four Processors where each has a send buffer of size 2
0 1 2 3 Processors
(2,4) (5,7) (0,3) (6,2) Initial Send Buffers
MPI_SENDRECV with 0,1 and 2,3 paired
(5,7) (2,4) (6,2) (0,3) Resultant Buffers
MPI_GATHER with root=0
(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Resultant Buffers
Four Processors where only rank=0 has send buffer
(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Initial send Buffers
MPI_SCATTER with root=0
(2,4) (5,7) (0,3) (6,2) Resultant Buffers
All to All Communication with i'th location in j'th processor being sent to j'th location in i'th processor
Processor 0 1 2 3
Start (a0,a1,a2,a3) (b0,b1,b2,b3) (c0,c1,c2,c3) (d0,d1,d2,d3)
After (a0,b0,c0,d0) (a1,b1,c1,d1) (a2,b2,c2,d2) (a3,b3,c3,d3
There are extensions MPI_ALLTOALLV to handle case where data stored in noncontiguous fashion in each processor and when each processor sends different amounts of data to other processors
Many MPI routines have such "vector" extensions
Derived Datatypes should be declared integer in Fortran and MPI_Datatype in C
Generally have form {(type0,disp0),(type1,disp1)...(type(n-1),disp(n-1))} with list of primitive data types typei and displacements (from start of buffer) dispi
call MPI_TYPE_CONTIGUOUS (count, oldtype, newtype, ierr)
creates a new datatype newtype made up of count repetitions of old datatype oldtype
one must use call MPI_TYPE_COMMIT(derivedtype,ierr)
before one can use the type derivedtype in a communication call
call MPI_TYPE_FREE(derivedtype,ierr) frees up space used by this derived type
integer derivedtype, ...
call MPI_TYPE_CONTIGUOUS(10, MPI_REAL, derivedtype, ierr)
call MPI_TYPE_COMMIT(derivedtype, ierr)
call MPI_SEND(data, 1, derivedtype, dest,tag, MPI_COMM_WORLD, ierr)
call MPI_TYPE_FREE(derivedtype, ierr)
is equivalent to simpler single call
call MPI_SEND(data, 10, MPI_REAL, dest, tag, MPI_COMM_WORLD, ierr)
and each sends 10 contiguous real values at location data to process dest
integer North,South,East,West
# These are the processor ranks of 4 nearest neighbors
integer rowtype,coltype # the new derived types
# Fortran stores elements in columns contiguously
# (C has opposite convention!)
call MPI_TYPE_COMMIT(coltype,ierr)
# rows (North and South) are not contiguous
call MPI_TYPE_VECTOR(NLOC, 1, NLOC2, MPI_REAL, rowtype, ierr)
call MPI_TYPE_COMMIT(rowtype,ierr)
call MPI_SEND(array(2,2), 1, coltype, west,0,comm,ierr)
call MPI_SEND(array(2,NLOC+1), 1, coltype, east,0,comm,ierr)
call MPI_SEND(array(2,2), rowtype, north, 0,comm,ierr)
call MPI_SEND(array(NLOC+1,2), 1, rowtype, south, 0,comm,ierr)
The example uses three idealized (not present in real message passing system) primitives
Message Passing Shift to right MPSHIFT (+1, SOURCE, DEST)
Sends 1 word in location SOURCE to processor on the right
Receives word in location DEST from the processor on the left
SOURCE and DEST are locations -- if set to "DUMMY", then no information is to be sent or received
Message Passing Shift to left MPSHIFT (-1, SOURCE, DEST)
Sends 1 word in SOURCE to processor on the left
Receives word in DEST from processor on the right
takes TEST from all processors
forms TESTMAX = maximum value of TEST over all processors
replaces TEST by TESTMAX in all processors
Let PROCNUM be processor "rank" (number or order in
one dimensional decomposition)
IF (IPAR.e.q.0)
THEN ! even processors
CALL MPI_RECV (DEST,1,MPI_REAL, PROCNUM-1,tag,comm,status)
ELSE ! odd processors
Note: MPI uses reserved word MPI_PROCNULL rather than "DUMMY". Also, we could remove setting of "DUMMY" in calling routine and placed in MPSHIFT as test on PROCNUM
We can implement MPSHIFT directly in MPI as CALL MPI_SENDRECV(SOURCE,1,MPI_REAL, PROCNUM+1, sendtag, DEST,1,MPI_REAL, PROCNUM-1, recvtag,comm,status)
MPI_REAL denotes that variable is real
"sendtag/recvtag" are for this purpose, a largely irrelevant additional message tag
"comm" is extra system message tag defining "scope" -- i.e. the set of processors involved -- here it is all of them
"status" tells you about received data. You needn't look at it if you trust your code and hardware