Given by Ian Foster, Gina Goff, Ehtesham Hayder, Chuck Koelbel at DoD Modernization Tutorial on 1995-1998. Foils prepared August 29 98
Outside Index
Summary of Material
Day 1
|
Day 2
|
Outside Index Summary of Material
(c) 1995, 1996, 1997, 1998 |
Ian Foster Gina Goff |
Ehtesham Hayder Charles Koelbel |
A Tutorial Presented by the Department of Defense HPC Modernization Programming Environment & Training Program |
Day 1
|
Day 2
|
A standard message-passing library
|
An MPI program defines a set of processes
|
... that communicate by calling MPI functions
|
... and can be constructed in a modular fashion
|
MPI defines a language-independent interface
|
Bindings are defined for different languages
|
Multiple implementations
|
Special compiler commands for simple programs
|
Options enable MPI profiling features
|
Standard makefiles for larger programs
|
To run a program on two processors
|
To list available command-line argument
|
To list commands mpirun would execute
|
Standard does not specify startup mechanism
|
Questions:
|
1st generation message passing systems allowed only a contiguous array of bytes
|
MPI specifies the buffer by starting address, datatype, and count
|
Specifications of elementary datatypes allows heterogeneous communication. |
Elimination of length in favor of count is clearer |
Specifying application-oriented layouts allows maximal use of special hardware |
1st generation message passing systems used hardware addresses
|
MPI supports process groups
|
All communication takes place in groups
|
1st generation message passing systems used an integer "tag" (a.k.a. "type" or "id") to match messages when received
|
Calls Sub1 and Sub2 are from different libraries |
Same sequence of calls on all processes, with no global synch
|
Each library was self-consistent
|
Interaction between the libraries killed them
|
The lesson:
|
Other examples teach other lessons:
|
A separate communication context for each family of messages
|
No wild cards allowed, for security |
Allocated by the system, for security |
Tags retained for use within a context
|
Thus the basic (blocking) send is:
|
and the receive is:
|
MPI is very simple: 6 functions allow you to write many programs
|
program main |
include 'mpif.h' |
integer rank, size, tag, count, i, ierr |
integer src, dest, st_source, st_tag, st_count |
integer status(MPI_STATUS_SIZE) |
call MPI_INIT( ierr ) |
call MPI_COMM_RANK( MPI_COMM_WORLD, rank, ierr ) |
call MPI_COMM_SIZE( MPI_COMM_WORLD, size, ierr ) |
print *, 'Process ', rank, ' of ', size, ' is alive' |
... |
... |
dest = size - 1 |
if (rank .eq. 0) then |
data = 0 |
call MPI_SEND( data, 1, MPI_INTEGER, rank+1, 99, |
+ MPI_COMM_WORLD, ierr ) |
else if (rank .ne. dest) then |
call MPI_RECV(data, count, MPI_INTEGER, rank-1, |
+ tag, MPI_COMM_WORLD, status, ierr ) |
data = data + rank |
call MPI_SEND( data, 1, MPI_INTEGER, rank+1, 99, |
+ MPI_COMM_WORLD, ierr ) |
else |
call MPI_RECV(data, count, MPI_INTEGER, rank-1, |
+ tag, MPI_COMM_WORLD, status, ierr ) |
print *, rank, ' received', data |
endif |
... |
... |
call MPI_FINALIZE( ierr ) |
end |
Collective communication
|
Buffering
|
Nonblocking communication
|
Intercommunicators |
Cartesian Grids |
Provides standard interfaces to common global operations
|
A collective operation uses a process group
|
Message tags not needed (generated internally) |
MPI_Barrier(comm)
|
Useful for producing meaningful timings
|
Typically, synchronization comes from messages themselves |
"ALL" routines deliver results to all participating processes |
Routines ending in "V" allow different sized inputs on different processors |
rank = MPI_COMM_RANK( comm ) |
if (rank .eq. 0) then |
read *, n |
end if |
call MPI_BCAST(n, 1, MPI_INTEGER, 0, comm ) |
lo = rank*n+1 |
hi = lo+n-1 |
sum = 0.0d0 |
do i = lo, hi |
sum = sum + 1.0d0 / i |
end do |
call MPI_REDUCEALL( sum, sumout, 1, MPI_DOUBLE, |
& MPI_ADD_DOUBLE, comm) |
Where does data go when you send it?
|
Right is more efficient, but not always correct |
Copies are not needed if
|
MPI provides modes to arrange this
|
All combinations are legal
|
MPI provides support for connecting separate message-passing programs |
Intercommunicators connect disjoint communicators |
Programs use separate (disjoint) communicators for internal messages
|
To connect them
|
MPI contains routines to simplify writing programs for regular grid operations
|
int size, dims[2], periods[2]; |
MPI_Comm_size( MPI_COMM_WORLD, &size ); |
MPI_Dims_create( size, 2, dims ); |
periods[0] = periods[1] = 0; |
MPI_Cart_create( MPI_COMM_WORLD, 2, dims, |
periods, 1, &comm2d ); |
Getting neighbors |
Sending data
|
MPI_CART_SHIFT( comm2d, 0, 1, &nbrleft, &nbrright ); |
MPI_CART_SHIFT( comm2d, 1, 1, &nbrbottom, &nbrtop ); |
Partitioning
|
Communication
|
Agglomeration
|
Mapping
|
Numerically solve a PDE on a square mesh |
Method:
|
Partitioning is simple
|
Communication is simple
|
Agglomeration works along dimensions
|
Mapping: Cartesian grid directly supported by MPI virtual topologies
|
For generality, write as the 2-D version
|
Adjust array bounds, iterate over local array
|
PARAMETER(nxblock=nx/nxp, nyblock=ny/nyp, nxlocal=nxblock+1, nylocal=nyblock+1) |
REAL u(0:nxlocal,0:nylocal),unew(0:nxlocal,0:nylocal),f(0:nxlocal,0:nylocal) |
dims(1) = nxp; dims(2) = nyp; |
periods(1) = .false.; periods(2) = .false. |
reorder = .true. |
ndim = 2 |
call MPI_CART_CREATE(MPI_COMM_WORLD,ndim,dims,periods, reorder,comm2d,ierr) |
CALL MPI_COMM_RANK( MPI_COMM_WORLD, myrank, ierr ) |
CALL MPI_CART_COORDS( MPI_COMM_WORLD, myrank, 2, coords, ierr ) |
CALL MPI_CART_SHIFT( comm2d, 0, 1, nbrleft, nbrright, ierr ) |
CALL MPI_CART_SHIFT( comm2d, 1, 1, nbrbottom, nbrtop, ierr ) |
CALL MPI_Type_vector( nyblock, 1, nxlocal+1, MPI_REAL, rowtype ) |
CALL MPI_Type_commit( rowtype, ierr ); |
dx = 1.0/nx; dy = 1.0/ny; err = tol * 1e6 |
DO j = 0, nylocal
|
END DO |
DO WHILE (err > tol)
|
END DO |