Given by Ian Foster, Gina Goff, Ehtesham Hayder, Chuck Koelbel at DoD Modernization Tutorial on 1995-1998. Foils prepared August 29 98
Summary of Material
Day 1
Day 2
Summary of Material
Day 1
Day 2
A standard message-passing library
An MPI program defines a set of processes
... that communicate by calling MPI functions
... and can be constructed in a modular fashion
MPI defines a language-independent interface
Bindings are defined for different languages
Multiple implementations
Special compiler commands for simple programs
Options enable MPI profiling features
Standard makefiles for larger programs
To run a program on two processors
To list available command-line argument
To list commands mpirun would execute
Standard does not specify startup mechanism
1st generation message passing systems allowed only a contiguous array of bytes
MPI specifies the buffer by starting address, datatype, and count
Specifications of elementary datatypes allows heterogeneous communication. |
Elimination of length in favor of count is clearer |
Specifying application-oriented layouts allows maximal use of special hardware |
1st generation message passing systems used hardware addresses
MPI supports process groups
All communication takes place in groups
1st generation message passing systems used an integer "tag" (a.k.a. "type" or "id") to match messages when received
Calls Sub1 and Sub2 are from different libraries |
Same sequence of calls on all processes, with no global synch
Each library was self-consistent
Interaction between the libraries killed them
The lesson:
Other examples teach other lessons:
A separate communication context for each family of messages
No wild cards allowed, for security |
Allocated by the system, for security |
Tags retained for use within a context
Thus the basic (blocking) send is:
and the receive is:
MPI is very simple: 6 functions allow you to write many programs
program main |
include 'mpif.h' |
integer rank, size, tag, count, i, ierr |
integer src, dest, st_source, st_tag, st_count |
integer status(MPI_STATUS_SIZE) |
call MPI_INIT( ierr ) |
call MPI_COMM_RANK( MPI_COMM_WORLD, rank, ierr ) |
call MPI_COMM_SIZE( MPI_COMM_WORLD, size, ierr ) |
print *, 'Process ', rank, ' of ', size, ' is alive' |
... |
... |
dest = size - 1 |
if (rank .eq. 0) then |
data = 0 |
call MPI_SEND( data, 1, MPI_INTEGER, rank+1, 99, |
+ MPI_COMM_WORLD, ierr ) |
else if (rank .ne. dest) then |
call MPI_RECV(data, count, MPI_INTEGER, rank-1, |
+ tag, MPI_COMM_WORLD, status, ierr ) |
data = data + rank |
call MPI_SEND( data, 1, MPI_INTEGER, rank+1, 99, |
+ MPI_COMM_WORLD, ierr ) |
else |
call MPI_RECV(data, count, MPI_INTEGER, rank-1, |
+ tag, MPI_COMM_WORLD, status, ierr ) |
print *, rank, ' received', data |
endif |
... |
... |
call MPI_FINALIZE( ierr ) |
end |
Collective communication
Nonblocking communication
Intercommunicators |
Cartesian Grids |
Provides standard interfaces to common global operations
A collective operation uses a process group
Message tags not needed (generated internally) |
Useful for producing meaningful timings
Typically, synchronization comes from messages themselves |
"ALL" routines deliver results to all participating processes |
Routines ending in "V" allow different sized inputs on different processors |
rank = MPI_COMM_RANK( comm ) |
if (rank .eq. 0) then |
read *, n |
end if |
call MPI_BCAST(n, 1, MPI_INTEGER, 0, comm ) |
lo = rank*n+1 |
hi = lo+n-1 |
sum = 0.0d0 |
do i = lo, hi |
sum = sum + 1.0d0 / i |
end do |
call MPI_REDUCEALL( sum, sumout, 1, MPI_DOUBLE, |
& MPI_ADD_DOUBLE, comm) |
Where does data go when you send it?
Right is more efficient, but not always correct |
Copies are not needed if
MPI provides modes to arrange this
All combinations are legal
MPI provides support for connecting separate message-passing programs |
Intercommunicators connect disjoint communicators |
Programs use separate (disjoint) communicators for internal messages
To connect them
MPI contains routines to simplify writing programs for regular grid operations
int size, dims[2], periods[2]; |
MPI_Comm_size( MPI_COMM_WORLD, &size ); |
MPI_Dims_create( size, 2, dims ); |
periods[0] = periods[1] = 0; |
MPI_Cart_create( MPI_COMM_WORLD, 2, dims, |
periods, 1, &comm2d ); |
Getting neighbors |
Sending data
MPI_CART_SHIFT( comm2d, 0, 1, &nbrleft, &nbrright ); |
MPI_CART_SHIFT( comm2d, 1, 1, &nbrbottom, &nbrtop ); |
Numerically solve a PDE on a square mesh |
Partitioning is simple
Communication is simple
Agglomeration works along dimensions
Mapping: Cartesian grid directly supported by MPI virtual topologies
For generality, write as the 2-D version
Adjust array bounds, iterate over local array
PARAMETER(nxblock=nx/nxp, nyblock=ny/nyp, nxlocal=nxblock+1, nylocal=nyblock+1) |
REAL u(0:nxlocal,0:nylocal),unew(0:nxlocal,0:nylocal),f(0:nxlocal,0:nylocal) |
dims(1) = nxp; dims(2) = nyp; |
periods(1) = .false.; periods(2) = .false. |
reorder = .true. |
ndim = 2 |
call MPI_CART_CREATE(MPI_COMM_WORLD,ndim,dims,periods, reorder,comm2d,ierr) |
CALL MPI_CART_COORDS( MPI_COMM_WORLD, myrank, 2, coords, ierr ) |
CALL MPI_CART_SHIFT( comm2d, 0, 1, nbrleft, nbrright, ierr ) |
CALL MPI_CART_SHIFT( comm2d, 1, 1, nbrbottom, nbrtop, ierr ) |
CALL MPI_Type_vector( nyblock, 1, nxlocal+1, MPI_REAL, rowtype ) |
CALL MPI_Type_commit( rowtype, ierr ); |
dx = 1.0/nx; dy = 1.0/ny; err = tol * 1e6 |
DO j = 0, nylocal
DO WHILE (err > tol)