/* 
  This file contains the routines to handle receiving a message

  Because we don't know the length of messages (at least the long ones with
  tag MPID_PT2PT2_TAG(src) tags), we never post a receive.  Rather, we have
  a MPID_CH_check_incoming routine that looks for headers.  Note that
  messages sent from a source to destination with the MPID_PT2PT2_TAG(src)
  are ordered (we assume that the message-passing system preserves order).

  Possible Improvements:
  This current system does not "prepost" Irecv's, and so, on systems with
  aggressive delivery (like the Paragon), can suffer performance penalties.
  Obvious places for improvements are
     On blocking receives for long messages, post the irecv FIRST.
     If the message is actually short, cancel the receive.
         (potential problem - if the next message from the same source is
         long, it might match (incorrectly) with the posted irecv.  
         Possible fix: use sequence number for each processor pair, with
         the sequence numbers incremented for each message, short or long, 
         and place the sequence number into the tag field)

     For tags/sources/contexts in a specified range, post the irecv with
     at tag synthesized from all of the tag/source/context(/sequence number)
     May need to use cancel if the message is shorter than expected.
     This can be done with both blocking and non-blocking messages.

     Another approach is to generate "go-ahead" messages, to be handled in 
     chsend.c, perhaps by an interrupt-driven receive.  
 */

This is the description of chget's algorithm
/*
    This code provides routines for performing get-operation copies.
    This code allows partial data to be returned; the packet is returned
    to the sender when the transfer is completed (as a control packet).

    A receive get happens in two parts.
    If the recv_id field in the packet is null, then this is the first
    time we've seen this packet.  Otherwise, the recv_id gives us the
    address of the matching receive handle.  

    The address is the address to get from; len_avail is the amount to
    copy.  When the copy is complete, the mode is changed to MPID_PKT_DONE_GET
    and sent back the the sender.

    It turns out that to send the same packet back in the case where the 
    packets are dynamically allocated is a bit too tricky for the current
    implementation.  One possibility is an "in_use" bit; this requires that
    any packet that is returned eventually comes back.  Another is to keep
    track of whether the packet should be free; perhaps by inlining the 
    "DO_GET" code (then we can use the MPID_PKT_RECV_CLR(pkt) call).
    Left as an exercise for the reader.

    Details:
    Short messsages are sent in the usual way, by putting them into the 
    packet itself.  The packet is returned to the sender's avail stack
    with the macro MPID_PKT_SEND_FREE.  

    Long messages are sent in parts.  In the simplest situation, 
    Sender                            Receiver
    (xxx_post_send_long)
    MPID_PKT_DO_GET
    Increment MPID_n_pending
                                      (MPID_xx_do_get, do_get_mem)
                                      receives, processes. 
				      Always return an MPID_PKT_DONE_GET
                                      (even if not done, i.e., in the
				      case of a partial send).  We
				      put the address of the dmpi_recv_handle
				      into the packet (recv_id field).
    (MPID_xx_done_get)				     
    If all of the data has not been
       sent, setup the next bunch of
       data and send an MPID_PKT_CONT_GET
       packet.  
    else
       decrement MPID_n_pending
                                      (MPID_xxx_cont_get) 
				      (if needed)
				      Copy the data, send a 
				      MPID_PKT_DONE_GET message pack
				      with updated info how much
				      data has been copied.

    There are some ways to reduce the number of messages that are sent; 
    for now, we're assuming that the long messages are long enough to
    hide the overhead.  This isn't actually a good assumption, but
    we need to simplify some of the receive code before we can 
    cleanly handle a leaner protocol.

    Note on MPID_CMPL_RECV_GET (completer for get's)
    This is simply a blocking loop over check_incoming until the completer
    field is set to clear (0), since completion happens by receiving 
    additional packets.

    Note the while it might seem unnecessary in the two-copy case (copy into
    and out of shared memory) to send an "I'm done" message to the sender, 
    this will allow us to have the sender maintain their own memory pool and
    to do the return without any shared locks (in the memory pool).
 */