TCP Implementation

The benchmark is started by a CGI-script or a program that will call rsh to run the benchmarking routines on the selected machines. A process, lets call it Node Status Process(NSP), is the parent of all other processes on that node. It is also responsible for reporting the results from that node. The NSP forks nphase children which fork nnode receiving children and nnode sending children.

Let us first mention all the proceses that are involved in the benchmark. Total number of processes involved in the benchmark on the SP2 nodes can be calculated by the following formula.

TotalProcess = (((nhost*2)*nphase-1)+nphase+1)*nhost;

The above equation does not includee the process that is used to start up the benchmark which can reside on any remote node as well. The main process started on each node forks nphase children which in turn (with the exception of the first and last phase) forks nhost children for receiving and nhost children for sending. The first phase process only forks children for sending and the last phase process only forks children for receiving.

Socket connections are estabilished between the sending processes of one phase to the receiving processes of the next phase. If the connection needed is on the same node, then Unix-Domain sockets are used.

After all the socket connections are estabilished. Each Node Status Process signals the first phase on its node to start transmitting and notes the current time. After the last phase sends a signal notifying completion, the current time is recorded as the completion time.

Between two phases, two parallel buffers are used. Each of these buffers are divided into nnode blocks which the sending and receiving buffers read/write from/to. While the receiving children are writing to one buffer, the sending children read from the other buffer, to maximize the load on the switch. These buffers are indexed with a "boolean" as the pseudo-code below shows ( k=!k; ).

Socket connections are only estabilished for the purpose of transfering data. Inter-Process communication is estabilished through the use of signals and shared memory.

The first and the last phase processes are special. The first and the last phases either read/write from/to a local file or read/write from/to the core memory. Their pseudo-codes are as follows:

First Phase:

    [fork nnode children(senders)]
    fill_buffer(k);
    signal(senders);
    for(i=1;i<nbuf;i++) {
      k=!k;
      fill_buffer(k);
      signal(senders);
      while(!signal_received) wait;
    }
    while(!signal_received) wait;
    exit;

Last Phase:

    [fork nnode children(receivers)]
    for (i=0;i<nbuf;i++) {
      while(!signal_received) wait;
      signal(receivers);
      read_buffer(k);
      k=!k;
    }
    exit;

The middle phase processes are present only fork the receiving and sending children and later just monitor their activities. After forking the children, middle phase processes step away and let the sending and receiving children do all the communication. The pseudo-code for the receivers and the senders are as follows.

Receivers:

    for(i=0;i<nbuf;i++) {
      receive_buffer(k);
      signal(senders);
      while(!signal_received) wait;
      k=!k;
    }
    exit;

Senders:

    for (i=0;i<nbuf;i++) {
      signal(receivers);
      while(!signal_received) wait;
      send_buffer(k);
      k=!k;
    }
    exit;

here is an example of the implementation architecture and dataflow for a 3-node/3-phase test.