Full HTML for

Scripted foilset CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D

Given by Geoffrey C. Fox at Delivered Lectures of CPS615 Basic Simulation Track for Computational Science on 7 November 96. Foils prepared 11 November 1996
Outside Index Summary of Material


This completes the MPI general discussion with the basic message passing, collective communication and some advanced features
It then returns to Laplace Example foilset to show how MPI can be used here
  • We have previously used this for HPF and performance analysis

Table of Contents for full HTML of CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D

Denote Foils where Image Critical
Denote Foils where HTML is sufficient
Indicates Available audio which is lightpurpleed out if missing
1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

2 Abstract of Nov 7 1996 CPS615 Lecture
3 Blocking Receive MPI_Recv(C) MPI_RECV(Fortran)
4 Fortran example:Blocking Receive MPI_RECV
5 Hello World:C Example of Send and Receive
6 Interpretation of Returned Message Status
7 Naming Conventions for Send and Receive
8 Collective Communication
9 Hello World:C Example of Broadcast
10 Collective Computation
11 Examples of Collective Communication/Computation
12 More Examples of Collective Communication/Computation
13 Examples of MPI_ALLTOALL
14 Motivation for Derived Datatypes in MPI
15 Derived Datatype Basics
16 Simple Example of Derived Datatype
17 More Complex Datatypes MPI_TYPE_VECTOR/INDEXED
18 Use of Derived Types in Jacobi Iteration with Guard Rings--I
19 Use of Derived Types in Jacobi Iteration with Guard Rings--II
20 Other Useful Concepts in MPI
21 Parallel Laplace Programming: Set Up of
Message Passing for Jacobi Iteration in One Dimension

22 Node Program: Message Passing for Laplace Sover
23 Collective Communication Primitives
24 Implementation of MPSHIFT(+1, SOURCE,DEST)
25 Possible Implementation of MPSHIFT in MPI
26 Implementation of SHIFT in MPI
27 Implementation of GLOBALMAX (TEST)

Outside Index Summary of Material



HTML version of Scripted Foils prepared 11 November 1996

Foil 1 Delivered Lectures for CPS615 -- Base Course for the Simulation Track of Computational Science
Fall Semester 1996 --
Lecture of November 7 - 1996

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index
Geoffrey Fox
NPAC
Room 3-131 CST
111 College Place
Syracuse NY 13244-4100

HTML version of Scripted Foils prepared 11 November 1996

Foil 2 Abstract of Nov 7 1996 CPS615 Lecture

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index
This completes the MPI general discussion with the basic message passing, collective communication and some advanced features
It then returns to Laplace Example foilset to show how MPI can be used here
  • We have previously used this for HPF and performance analysis

HTML version of Scripted Foils prepared 11 November 1996

Foil 3 Blocking Receive MPI_Recv(C) MPI_RECV(Fortran)

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 70.5
call MPI_RECV(
  • IN start_of_buffer Address of place to store data(address is INput
    • values of data are of course output starting at this adddress!)
  • IN buffer_len Maximum number of items allowed
  • IN datatype Type of each data type
  • IN source_rank Processor number (rank) of source
  • IN tag only accept messages with this tag value
  • IN communicator Communicator of both sender and receiver group
  • OUT return_status Data structure describing what happened!
  • OUT error_message) Error Flag (absent in C)
Note that return_status is used after completion of receive to find actual received length (buffer_len is a MAXIMUM), actual source processor rank and actual message tag
In C syntax is
int error_message=MPI_Recv(void *start_of_buffer,int buffer_len, MPI_DATATYPE datatype, int source_rank, int tag, MPI_Comm communicator, MPI_Status *return_status)

HTML version of Scripted Foils prepared 11 November 1996

Foil 4 Fortran example:Blocking Receive MPI_RECV

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 171.3
integer status(MPI_STATUS_SIZE) An array to store status
integer mpierr, count, datatype, source, tag, comm
integer recvbuf(100)
count=100
datatype=MPI_REAL
comm=MPI_COMM_WORLD
source=MPI_ANY_SOURCE accept any source processor
tag=MPI_ANY_TAG accept anmy message tag
call MPI_RECV (recvbuf,count,datatype,source,tag,comm,status,mpierr)
Note source and tag can be wild-carded

HTML version of Scripted Foils prepared 11 November 1996

Foil 5 Hello World:C Example of Send and Receive

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 257.7
#include "mpi.h"
main( int argc, char **argv )
{
  • char message[20];
  • int i, rank, size, tag=137; # Any value of type allowed
  • MPI_Status status;
  • MPI_Init (&argc, &argv);
  • MPI_Comm_size(MPI_COMM_WORLD, &size); # Number of Processors
  • MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Who is this processor
  • if( rank == 0 ) { # We are on "root" -- Processor 0
    • strcpy(message,"Hello MPI World"); # Generate message
    • for(i=1; i<size; i++) # Send message to size-1 other processors
    • MPI_Send(message, strlen(message)+1, MPI_CHAR, i, tag, MPI_COMM_WORLD);
  • }
  • else
    • MPI_Recv(message,20, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
  • printf("This is a message from node %d saying %s\n", rank, message);
  • MPI_Finalize();
}

HTML version of Scripted Foils prepared 11 November 1996

Foil 6 Interpretation of Returned Message Status

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 115.2
In C status is a structure of type MPI_Status
  • status.source gives actual source process
  • status.tag gives the actual message tag
In Fortran the status is an integer array and different elements give:
  • in status(MPI_SOURCE) the actual source process
  • in status(MPI_TAG) the actual message tag
In C and Fortran, the number of elements (called count) in the message can be found from call to
MPI_GET_COUNT (IN status, IN datatype,
OUT count, OUT error_message)
  • where as usual in C last argument is missing as returned in function call

HTML version of Scripted Foils prepared 11 November 1996

Foil 7 Naming Conventions for Send and Receive

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 95
SEND Blocking Nonblocking
Standard MPI_Send MPI_Isend
Ready MPI_Rsend MPI_Irsend
Synchronous MPI_Ssend MPI_Issend
Buffered MPI_Bsend MPI_Ibsend
RECEIVE Blocking Nonblocking
Standard MPI_Recv MPI_Irecv
Any type of receive routine routine can be used to receive messages from any type of send routine

HTML version of Scripted Foils prepared 11 November 1996

Foil 8 Collective Communication

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 470.8
MPI_BARRIER(comm) Global Synchronization within a given communicator
MPI_BCAST Global Broadcast
MPI_GATHER Concatenate data from all processors in a communicator into one process
  • MPI_ALLGATHER puts result of concatenation in all processors
MPI_SCATTER takes data from one processor and scatters over all processors
MPI_ALLTOALL sends data from all processes to all other processes
MPI_SENDRECV exchanges data between two processors -- often used to implement "shifts"
  • viewed as pure point to point by some

HTML version of Scripted Foils prepared 11 November 1996

Foil 9 Hello World:C Example of Broadcast

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 192.9
#include "mpi.h"
main( int argc, char **argv )
{
  • char message[20];
  • int rank;
  • MPI_Init (&argc, &argv);
  • MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Who is this processor
  • if( rank == 0 ) # We are on "root" -- Processor 0
    • strcpy(message,"Hello MPI World"); # Generate message
  • # MPI_Bcast sends from root=0 and receives on all other processors
  • MPI_Bcast(message,20, MPI_CHAR, 0, MPI_COMM_WORLD);
  • printf("This is a message from node %d saying %s\n", rank,
    • message);
  • MPI_Finalize();
}

HTML version of Scripted Foils prepared 11 November 1996

Foil 10 Collective Computation

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 315.3
One can often perform computing during a collective communication
MPI_REDUCE performs reduction operation of type chosen from
  • maximum(value or value and location), minimum(value or value and location), sum, product, logical and/or/xor, bit-wise and/or/xor
  • e.g. operation labelled MPI_MAX stores in location result of processor rank the global maximum of original in each processor as in
  • call MPI_REDUCE(original,result,1,MPI_REAL,MPI_MAX,rank,comm,ierror)
  • One can also supply one's own reduction function
MPI_ALLREDUCE is as MPI_REDUCE but stores result in all -- not just one -- processors
MPI_SCAN performs reductions with result for processor r depending on data in processors 0 to r

HTML version of Scripted Foils prepared 11 November 1996

Foil 11 Examples of Collective Communication/Computation

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 168.4
Four Processors where each has a send buffer of size 2
0 1 2 3 Processors
(2,4) (5,7) (0,3) (6,2) Initial Send Buffers
MPI_BCAST with root=2
(0,3) (0,3) (0,3) (0,3) Resultant Buffers
MPI_REDUCE with action MPI_MIN and root=0
(0,2) (_,_) (_,_) (_,_) Resultant Buffers
MPI_ALLREDUCE with action MPI_MIN and root=0
(0,2) (0,2) (0,2) (0,2) Resultant Buffers
MPI_REDUCE with action MPI_SUM and root=1
(_,_) (13,16) (_,_) (_,_) Resultant Buffers

HTML version of Scripted Foils prepared 11 November 1996

Foil 12 More Examples of Collective Communication/Computation

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 93.6
Four Processors where each has a send buffer of size 2
0 1 2 3 Processors
(2,4) (5,7) (0,3) (6,2) Initial Send Buffers
MPI_SENDRECV with 0,1 and 2,3 paired
(5,7) (2,4) (6,2) (0,3) Resultant Buffers
MPI_GATHER with root=0
(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Resultant Buffers
Four Processors where only rank=0 has send buffer
(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Initial send Buffers
MPI_SCATTER with root=0
(2,4) (5,7) (0,3) (6,2) Resultant Buffers

HTML version of Scripted Foils prepared 11 November 1996

Foil 13 Examples of MPI_ALLTOALL

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 241.9
All to All Communication with i'th location in j'th processor being sent to j'th location in i'th processor
Processor 0 1 2 3
Start (a0,a1,a2,a3) (b0,b1,b2,b3) (c0,c1,c2,c3) (d0,d1,d2,d3)
After (a0,b0,c0,d0) (a1,b1,c1,d1) (a2,b2,c2,d2) (a3,b3,c3,d3
There are extensions MPI_ALLTOALLV to handle case where data stored in noncontiguous fashion in each processor and when each processor sends different amounts of data to other processors
Many MPI routines have such "vector" extensions

HTML version of Scripted Foils prepared 11 November 1996

Foil 14 Motivation for Derived Datatypes in MPI

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 270.7
These are an elegant solution to a problem we struggled with a lot in the early days -- all message passing is naturally built on buffers holding contiguous data
However often (usually) the data is not stored contiguously. One can address this with a set of small MPI_SEND commands but we want messages to be as big as possible as latency is so high
One can copy all the data elements into a single buffer and transmit this but this is tedious for the user and not very efficient
It has extra memory to memory copies which are often quite slow
So derived datatypes can be used to set up arbitary memory templates with variable offsets and primitive datatypes. Derived datatypes can then be used in "ordinary" MPI calls in place of primitive datatypes MPI_REAL MPI_FLOAT etc.

HTML version of Scripted Foils prepared 11 November 1996

Foil 15 Derived Datatype Basics

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 355.6
Derived Datatypes should be declared integer in Fortran and MPI_Datatype in C
Generally have form {(type0,disp0),(type1,disp1)...(type(n-1),disp(n-1))} with list of primitive data types typei and displacements (from start of buffer) dispi
call MPI_TYPE_CONTIGUOUS (count, oldtype, newtype, ierr)
  • creates a new datatype newtype made up of count repetitions of old datatype oldtype
one must use call MPI_TYPE_COMMIT(derivedtype,ierr)
before one can use the type derivedtype in a communication call
call MPI_TYPE_FREE(derivedtype,ierr) frees up space used by this derived type

HTML version of Scripted Foils prepared 11 November 1996

Foil 16 Simple Example of Derived Datatype

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 97.9
integer derivedtype, ...
call MPI_TYPE_CONTIGUOUS(10, MPI_REAL, derivedtype, ierr)
call MPI_TYPE_COMMIT(derivedtype, ierr)
call MPI_SEND(data, 1, derivedtype, dest,tag, MPI_COMM_WORLD, ierr)
call MPI_TYPE_FREE(derivedtype, ierr)
is equivalent to simpler single call
call MPI_SEND(data, 10, MPI_REAL, dest, tag, MPI_COMM_WORLD, ierr)
and each sends 10 contiguous real values at location data to process dest

HTML version of Scripted Foils prepared 11 November 1996

Foil 17 More Complex Datatypes MPI_TYPE_VECTOR/INDEXED

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 272.1
MPI_TYPE_VECTOR (count,blocklen,stride,oldtype,newtype,ierr)
  • IN count Number of blocks to be added
  • IN blocklen Number of elements in block
  • IN stride Number of elements (NOT bytes) between start of each block
  • IN oldtype Datatype of each element
  • OUT newtype Handle(pointer) for new derived type
MPI_TYPE_INDEXED (count,blocklens,indices,oldtype,newtype,ierr)
  • IN count Number of blocks to be added
  • IN blocklens Number of elements in each block -- an array of length count
  • IN indices Displacements (an array of length count) for each block
  • IN oldtype Datatype of each element
  • OUT newtype Handle(pointer) for new derived type

HTML version of Scripted Foils prepared 11 November 1996

Foil 18 Use of Derived Types in Jacobi Iteration with Guard Rings--I

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 128.1
Assume each processor stores NLOC by NLOC set of grid points in an array PHI dimensioned PHI(NLOC2,NLOC2) with NLOC=NLOC+2 to establish guard rings

HTML version of Scripted Foils prepared 11 November 1996

Foil 19 Use of Derived Types in Jacobi Iteration with Guard Rings--II

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 374.4
integer North,South,East,West
# These are the processor ranks of 4 nearest neighbors
integer rowtype,coltype # the new derived types
# Fortran stores elements in columns contiguously
# (C has opposite convention!)
call MPI_TYPE_CONTIGUOUS(NLOC, MPI_REAL, coltype, ierr)
call MPI_TYPE_COMMIT(coltype,ierr)
# rows (North and South) are not contiguous
call MPI_TYPE_VECTOR(NLOC, 1, NLOC2, MPI_REAL, rowtype, ierr)
call MPI_TYPE_COMMIT(rowtype,ierr)
call MPI_SEND(array(2,2), 1, coltype, west,0,comm,ierr)
call MPI_SEND(array(2,NLOC+1), 1, coltype, east,0,comm,ierr)
call MPI_SEND(array(2,2), rowtype, north, 0,comm,ierr)
call MPI_SEND(array(NLOC+1,2), 1, rowtype, south, 0,comm,ierr)

HTML version of Scripted Foils prepared 11 November 1996

Foil 20 Other Useful Concepts in MPI

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 178.5
More general versions of MPI_?SEND and associated inquiry routines to see if messages have arrived. Use of these allows you to overlap communication and computation. In general this is not used even though more efficient
  • Also use in more general asynchronous applications -- blocking routines are most natural in loosely syncronous communicate-compute cycles
Application Topology routines allow to find rank of nearest neighbor processors as North,South,East,West in Jacobi iteration
Packing and Unpacking of data to make single buffers -- derived datatypes are usually a more elegant approach to this
Communicators to set up subgroups of processors (remember matrix example) and to set up independent MPI universes as needed to build libraries so that messages generated by library do not interfere with those from other libraries or user code
  • Historically (in my work) WOULD have been useful to distinguish debugging and application messages

HTML version of Scripted Foils prepared 11 November 1996

Foil 21 Parallel Laplace Programming: Set Up of
Message Passing for Jacobi Iteration in One Dimension

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 44.6
Processor 1
("typical")

HTML version of Scripted Foils prepared 11 November 1996

Foil 22 Node Program: Message Passing for Laplace Sover

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 86.4
Initialization NLOC=NTOT/NPROC (Assume NTOT divisible by NPROC)
    • I1=2
    • NLOC1=NLOC + 1
    • IF (PROCNUM.EQ.NPROC-1 .or. PROCNUM.EQ.0) NLOC1=NLOC1-1
BASIC LOOP
BEGIN TEST = 0
Shift to the right SOURCE = PHIOLD(NLOC1) address of data to be sent IF(PROCNUM.EQ.NPROC-1) SOURCE = "DUMMY" DEST = PHIOLD(1) address where data to be stored IF(PROCNUM.EQ.1)DEST = "DUMMY" CALL MPSHIFT(+1, SOURCE, DEST)
Shift to left SOURCE = FOLD(I1) IF(PROCNUM.EQ.0) SOURCE = "DUMMY" address of data to be sent DEST = PHIOLD(NLOC1+1) IF(PROCNUM.EQ.NPROC-1) DEST = "DUMMY" address where data to be stored CALL MPSHIFT(-1, SOURCE, DEST)
    • DO 1 I=I1,NLOC1 TEMP = 0.5 * (PHIOLD(I-1) + PHIOLD(I+1)) TEST = AMAX1(TEST,ABS(TEMP-PHIOLD(I))) 1 PHINEW (I) = TEMP DO 2 I=I1, NLOC1 2 PHIOLD(I) = PHINEW(I)
    • CALL GLOBALMAX (TEST) IF (TEST>CONVG) GO TO BEGIN ELSE STOP

HTML version of Scripted Foils prepared 11 November 1996

Foil 23 Collective Communication Primitives

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 43.2
The example uses three idealized (not present in real message passing system) primitives
Message Passing Shift to right MPSHIFT (+1, SOURCE, DEST)
Sends 1 word in location SOURCE to processor on the right
Receives word in location DEST from the processor on the left
SOURCE and DEST are locations -- if set to "DUMMY", then no information is to be sent or received
Message Passing Shift to left MPSHIFT (-1, SOURCE, DEST)
Sends 1 word in SOURCE to processor on the left
Receives word in DEST from processor on the right
GLOBALMAX (TEST)
takes TEST from all processors
forms TESTMAX = maximum value of TEST over all processors
replaces TEST by TESTMAX in all processors

HTML version of Scripted Foils prepared 11 November 1996

Foil 24 Implementation of MPSHIFT(+1, SOURCE,DEST)

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 63.3
Consider for example, shift to right
Then Processors 0 .. Nproc-2 Send to right
    • Processors 1 .. Nproc-1 Receive from left
We can't necessarily send all messages and then receive all messages. "Standard" send can be "blocking" i.e. will not return unless receive completed by destination processor. In this case, we can deadlock as all "hang" after send, waiting for receive
So we are more careful

HTML version of Scripted Foils prepared 11 November 1996

Foil 25 Possible Implementation of MPSHIFT in MPI

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 72
Let PROCNUM be processor "rank" (number or order in
one dimensional decomposition)
    • IPAR = PROCNUM/2
  • IF (IPAR.e.q.0)
    • THEN ! even processors
    • IF (SOURCE.NE."DUMMY")
    • CALL MPI_SEND (SOURCE,1,MPI_REAL,PROCNUM+1,tag,comm)
    • IF (DEST.NE"DUMMY")
    • CALL MPI_RECV (DEST,1,MPI_REAL, PROCNUM-1,tag,comm,status)
    • ELSE ! odd processors
    • IF (DEST.NE."DUMMY")
    • CALL MPI_RECV (DEST,1,MPI_REAL, PROCNUM-1,tag,comm)
    • IF (SOURCE.NE."DUMMY")
    • CALL MPI_SEND (SOURCE,1,MPI_REAL,PROCNUM+1,tag,comm,status)
Note: MPI uses reserved word MPI_PROCNULL rather than "DUMMY". Also, we could remove setting of "DUMMY" in calling routine and placed in MPSHIFT as test on PROCNUM

HTML version of Scripted Foils prepared 11 November 1996

Foil 26 Implementation of SHIFT in MPI

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 144
We can implement MPSHIFT directly in MPI as CALL MPI_SENDRECV(SOURCE,1,MPI_REAL, PROCNUM+1, sendtag, DEST,1,MPI_REAL, PROCNUM-1, recvtag,comm,status)
Notes:
MPI_REAL denotes that variable is real
"sendtag/recvtag" are for this purpose, a largely irrelevant additional message tag
"comm" is extra system message tag defining "scope" -- i.e. the set of processors involved -- here it is all of them
"status" tells you about received data. You needn't look at it if you trust your code and hardware

HTML version of Scripted Foils prepared 11 November 1996

Foil 27 Implementation of GLOBALMAX (TEST)

From CPS615-Completion of MPI foilset and Application to Jacobi Iteration in 2D Delivered Lectures of CPS615 Basic Simulation Track for Computational Science -- 7 November 96. *
Full HTML Index Secs 151.2
In MPI, this is a single call
CALL MPI_ALLREDUCE (TEST,TEST,1,MPI_REAL,MPI_MAX,comm)
Flag MPI_MAX specifies global maximum
The implementation is quite subtle and usually involves a logarithmic tree
There is a clever extension to get maximum in all processors in "same" time as that on one processor on hypercube and other architectures

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Fri Aug 15 1997