Basic MPI Point-to-Point Communication in C

Table of Contents

The MPI World

In MPI, the group of all nodes used in a particular MPI application is the "world," represented by the variable MPI_COMM_WORLD. This variable is a communicator that provides all the information necessary to do message-passing in the group. A communicator constrains the message-passing to remain within the group and prevents nodes outside the group from receiving spurious messages. Each node in the group is identified by a unique "rank" that is an integer value. Ranks are numbered 0, 1, 2, ..., n-1 if the total number of nodes in the group is n.

Starting and Ending MPI

To start up an MPI application, all MPI programs must contain one and only one call to MPI_Init, which enables communications. The format of the MPI_Init call is:
     MPI_Init(&argc, &argv);
MPI_Init must be called before any other MPI routine is called.

Before exiting an MPI process, MPI_Finalize should be called to clean up the MPI state. The user must ensure that all pending communications involving a process complete before the program calls MPI_Finalize. This call has the following format:

     MPI_Finalize();
     

Rank & Size

Rank, an integer, is used in MPI to be a process identifier associated with a communicator. An MPI call MPI_Comm_rank is desired to obtain the process's rank. The format of calling MPI_Comm_rank is given by:
     MPI_Comm_rank(MPI_Comm_world, &rank);
     
where MPI_Comm_world is a communicator predefined in the file mpi.h, and rank(integer) is the rank identifying the process making the call.

To find out the size of the communication world one can call MPI_Comm_size. The format of calling MPI_Comm_size is given by:
     MPI_Comm_size(MPI_Comm_world, &size);
     
where size (integer) is the total number of processes in the MPI_Comm_world.

Point-to-Point Communication

The most elementary form of message-passing communication involves two nodes, one passing a message to the other. Although there are several ways that this might happen in hardware, logically the communication is point-to-point: one node calls a send routine and the other calls a receive.

A message sent from a sender contains two parts: data (message content) and the message envelope. The data part of the message consists of a sequence of successive items of the type indicated by the variable datatype. MPI supports all the basic C datatypes and allows a more elaborate application to construct new datatypes at runtime (discussed in an advanced topic tutorial). The basic MPI datatypes for C are MPI_INT, MPI_FLOAT, MPI_DOUBLE, MPI_COMPLEX, MPI_CHAR. The message envelope contains information such as the source (sender), destination (receiver), tag and communicator.

As with most existing message-passing systems today, MPI provides blocking send and receive as well as nonblocking send and receive. We will introduce and discuss both blocking and nonblocking communication in this tutorial.

Blocking Send and Receive

  1. Blocking Send Operation:

    Below is the syntax of the blocking send operation:

         MPI_Send(void* buf, int count, MPI_Datatype datatype, 
    	     int dest, int tag, MPI_Comm comm);
         
    where buf is the address of the send buffer, count is the number of items in the send buffer, datatype (MPI_INT, MPI_FLOAT, MPI_CHAR, etc.) describes these items' datatype, dest is the rank of the destination processor, tag is the message type identifier and comm is the communicator to be used (see MPI advanced topics for more information on communicators).

    The blocking send call does not return until the message has been safely stored away so that the sender can freely reuse the send buffer.

  2. Blocking Receive Operation:

    The MPI blocking receive call has the form:

         MPI_Recv(void* buf, int count, MPI_Datatype datatype, 
    	     int source, int tag, MPI_Comm comm, 
    	     MPI_Status *status);
    where all the arguments have the same meaning as for the blocking send except for source and status. Source is the rank of the node sent the message and status is an MPI-defined integer array of size MPI_STATUS_SIZE. The information carried by status can be used in other MPI routines.

    The blocking receive does not return until the message has been stored in the receive buffer.

  3. Order:

    Messages are non-overtaking: if a sender sends two messages in succession to the same destination and both match the same receive, then this operation cannot receive the second message while the first is still pending. If a receiver posts two receives in succession and both match the same message, then this message cannot satisfy the second receive operation, as long as the first one is still pending. This requirement facilitates matching sends to receives. It guarantees that message-passing code is deterministic if processes are single-threaded and the wildcard MPI_ANY_SOURCE is not used in receives.

  4. Progress:

    If a pair of matching send and receives have been initiated on two processes, then at least one of these two operations will complete, independent of other action in the system. The send operation will complete unless the receive is satisfied and completed by another message. The receive operation will complete unless the message sent is consumed by another matching receive that was posted at the same destination process.

  5. Avoid a Deadlock:

    It is possible to get into a deadlock situation if one uses blocking send and receive. Here is a fragment of code to illustrate the deadlock situation:

         MPI_Comm_rank(comm,&rank);
         if (rank == 0) { 
            MPI_Recv(recvbuf,count,MPI_REAL,1,tag,comm,&status);
            MPI_Send(sendbuf,count,MPI_REAL,1,tag,comm);
         }
         elseif (rank == 1) {
            MPI_Recv(recvbuf,count,MPI_REAL,0,tag,comm,&status);
            MPI_Send(sendbuf,count,MPI_REAL,0,tag,comm);
         }
         
    The receive operation of the first process must complete before its send, and can complete only if the matching send of the second process is executed. The receive operation of the second process must complete before its send and can complete only if the matching send of the first process is executed. This program will always deadlock. To avoid deadlock, one can use one of the following two examples:
         MPI_Comm_rank(comm,&rank);
         if (rank == 0) {
            MPI_Send(sendbuf,count,MPI_REAL,1,tag,comm,);
            MPI_Recv(recvbuf,count,MPI_REAL,1,tag,comm,&status);
         }
         elseif(rank == 1) {
            MPI_Recv(recvbuf,count,MPI_REAL,0,tag,comm,&status);
            MPI_Send(sendbuf,count,MPI_REAL,0,tag,comm);
         }
         
    or
         MPI_Comm_rank(comm,&rank);
         if (rank == 0) {
            MPI_Recv(recvbuf,count,MPI_REAL,1,tag,comm,&status);
            MPI_Send(sendbuf,count,MPI_REAL,1,tag,comm);
         }
         elseif(rank == 1) {
            MPI_Send(sendbuf,count,MPI_REAL,0,tag,comm);
            MPI_Recv(recvbuf,count,MPI_REAL,0,tag,comm,&status);
         }
         

A Minimum Set of MPI Calls

Here are the six functions introduced so far:
Function	Explanation
-----------------------------------------------------
MPI_Init        Initiate MPI
MPI_Comm_size   Find out how many processes there are
MPI_Comm_rank   Determine rank of the calling process
MPI_Send        Send a message
MPI_Recv        Receive a message
MPI_Finalize    Terminate MPI
With these six calls, you can write a vast number of useful and efficient programs.

Other functions in MPI not yet introduced all add flexibility, robustness, efficiency, modularity, and convenience. It is a good practice, however, for those who are just beginning to learn message-passing programming to ignore other more advanced MPI functions and concepts and to concentrate first on these six basic functions.

Nonblocking Send and Receive

One can improve performance on many systems by overlapping communication and computation. One way to achieve that is to use nonblocking communication. MPI includes nonblocking send and receive calls, described below:

  1. Nonblocking Send Operation:

    A nonblocking send call initiates the send operation, but does not complete it. The send start call will return before the message is copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data have been copied out of the send buffer. Here is the syntax of the nonblocking send operation:

         MPI_Isend(void* buf, int count, MPI_Datatype datatype, 
    	     int dest, int tag, MPI_Comm comm, 
    	     MPI_Request *request);
         
    where the first six arguments have the same meaning as in the blocking send. Request is a communication request handle that you can use later to query the status of the communication or to wait for its completion. Calling the nonblocking send indicates that the system may start copying data out of the send buffer. The sender should not access any part of the send buffer after a nonblocking send operation is called until the send completes.

  2. Nonblocking Receive Operation:

    A nonblocking receive start call initiates the receive operation, but does not complete it. The call will return before a message is stored into the receive buffer. An MPI_Wait call is needed to complete the receive operation. The format of the nonblocking receive is:

    MPI_Irecv(void* buf, int count, MPI_Datatype datatype, 
    	int source, int tag, MPI_Comm comm, 
    	MPI_Request *request);
    
    where source is the rank of the processor that sent the message and the remaining arguments are as previously discussed.

    The receive buffer stores count consecutive elements of the type specified by datatype, starting at the address in buf. The length of the received message must be less than or equal to the length of the receive buffer. An overflow error occurs if all incoming data does not fit, without truncation, into the receive buffer.

  3. Check for Completion:

    To check the status of a nonblocking send or receive, one can call

    MPI_Test(MPI_Request *request, int *flag, 
    	MPI_Status *status);
    
    where request is a communication request (from a call to MPI_Isend, for example), flag is set to TRUE if the operation is completed, and status is a status object containing information on the transaction returned by the call. The MPI_Test call returns immediately.

    A call to MPI_Wait returns when the operation identified by request is complete.

    MPI_Wait(MPI_Request *request, MPI_Status *status);
    
    where all arguments are the same as for MPI_Test.

  4. Code Fragments Using Nonblocking Send and Receive:

    Order: Nonblocking communication operations occur in the same order as the execution of the calls that initiate the communication. In both blocking and nonblocking communication, operations are non-overtaking. An example of a nonblocking send and receive with an MPI_Wait call is given below:

    MPI_Comm_rank(comm,&rank);
    if (rank == 0) {
       MPI_Isend(sendbuf,count,MPI_REAL,1,tag,comm,&request);
        ****** do some computation to mask latency ******
       MPI_Wait(&request,&status);
    }
    else (!(rank == 1)) {
       MPI_Irecv(recvbuf,count,MPI_REAL,0,tag,comm,&request);
        ****** do some computation to mask latency ******
       MPI_Wait(&request,&status);
    }
    
    A request object can be deadlocked without waiting for the associated communication to complete. One way to avoid any deadlock caused by a request object is to use a different request for each pair of sends and receives. For example:
    MPI_Comm_rank(comm,&rank); 
    if (rank == 0) { 
       MPI_Isend(sbuf1,count,MPI_REAL,1,tag,comm,&req1);
       MPI_Isend(sbuf2,count,MPI_REAL,1,tag,comm,&req2);
    }
    else (!(rank == 1) {
       MPI_Irecv(rbuf1,count,MPI_REAL,0,tag,comm,&req1);
       MPI_Irecv(rbuf2,count,MPI_REAL,0,tag,comm,&req2);
    }
       MPI_Wait(&req1,&status);
       MPI_Wait(&req2,&status);
    

  5. Progress:

    A call to MPI_Wait that completes a receive will eventually terminate and return if a matching send has been started, unless the send is satisfied by another receive. In particular, if the matching send is nonblocking, then the receive should complete even if no call is executed by the sender to complete the send. Similarly, a call to MPI_Wait that completes a send will eventually return if a matching receive has been started, unless the receive is satisfied by another send, even if no call is executed to complete the receive.

Additional Operations

The MPI_Probe and MPI_Iprobe operations check for incoming messages without actually receiving them. After using these operations, the user can decide how to receive the messages, based on the information returned by the probe. In particular, the user may allocate adequate memory for the receive buffer according to the length of the probed message. The syntax for both the blocking and nonblocking probes are as follows:
MPI_Probe(int source, int tag, MPI_Comm comm, 
	MPI_Status *status);

MPI_Iprobe(int source, int tag, MPI_Comm comm, 
	int *flag, MPI_Status *status);
where source and tag are selection criteria for messages, comm is the communicator and status contains information about a waiting message. In the nonblocking call, MPI_Iprobe, flag returns TRUE if a message matching the criteria is waiting. The blocking call, MPI_Probe, waits until such a message is available for receipt. Note that there are times when the length of the incoming message is not known a priori. In this case, one can use MPI_Probe or MPI_Iprobe with MPI_Get_count:
MPI_Probe(source,tag,comm,&status);
MPI_Get_count(&status,datatype,&count);
where count returns the number of the items contained in the incoming message.