Full HTML for Basic CPS615 Foils -- Message Passing Interface MPI for users

Full HTML for

Basic foilset CPS615 Foils -- Message Passing Interface MPI for users

Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 10 Oct 1995
Outside Index Summary of Material

This covers MPI from a user's point of view and is meant to be a supplement to other online resources from MPI Forum, David Walker's Tutorial, Ian Foster's "Designing and Building Parallel Programs", Gropp,Lusk and Skjellum "Using MPI"

An Overview is based on subset of 6 routines that cover send/receive, environment inquiry (for rank and total number of processors) initialize and finalization with simple examples

Processor Groups, Collective Communication and Computation and Derived Datatypes are also discussed

Table of Contents for full HTML of CPS615 Foils -- Message Passing Interface MPI for users

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

The Message Passing Interface MPI for
CPS 615 -- Computational Science in
Simulation Track
October 1, 1995
Updated Oct 31 1996
2

Abstract of MPI -- The Message Passing Interface -- Presentation
3

MPI Overview -- Comparison with HPF -- I
4

MPI Overview -- Comparison with HPF -- II
5

Some Key Features of MPI
6

Some Difficulties with MPI
7

Why use Processor Groups?
8

MPI Conventions
9

Standard Constants in MPI
10

The Six Fundamental MPI routines
11

MPI_Init -- Environment Management
12

MPI_Comm_rank -- Environment Inquiry
13

MPI_Comm_size -- Environment Inquiry
14

MPI_Finalize -- Environment Management
15

Hello World in C plus MPI
16

Comments on Parallel Input/Output - I
17

Comments on Parallel Input/Output - II
18

Review of Message Passing Paradigm
19

Basic Point to Point Message Passing I
20

Basic Point to Point Message Passing II
21

Blocking Send MPI_Send(C) MPI_SEND(Fortran)
22

Blocking Receive MPI_Recv(C) MPI_RECV(Fortran)
23

Fortran example:Blocking Receive MPI_RECV
24

Hello World:C Example of Send and Receive
25

Interpretation of Returned Message Status
26

Naming Conventions for Send and Receive
27

Collective Communication
28

Hello World:C Example of Broadcast
29

Collective Computation
30

Examples of Collective Communication/Computation
31

More Examples of Collective Communication/Computation
32

Examples of MPI_ALLTOALL
33

Motivation for Derived Datatypes in MPI
34

Derived Datatype Basics
35

Simple Example of Derived Datatype
36

More Complex Datatypes MPI_TYPE_VECTOR/INDEXED
37

Use of Derived Types in Jacobi Iteration with Guard Rings--I
38

Use of Derived Types in Jacobi Iteration with Guard Rings--II
39

Other Useful Concepts in MPI

Outside Index Summary of Material

HTML version of Basic Foils prepared 10 Oct 1995

Foil 1 The Message Passing Interface MPI for
CPS 615 -- Computational Science in
Simulation Track
October 1, 1995
Updated Oct 31 1996

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Geoffrey Fox

NPAC

Syracuse University

111 College Place

Syracuse NY 13244-4100

HTML version of Basic Foils prepared 10 Oct 1995

Foil 2 Abstract of MPI -- The Message Passing Interface -- Presentation

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

This covers MPI from a user's point of view and is meant to be a supplement to other online resources from MPI Forum, David Walker's Tutorial, Ian Foster's "Designing and Building Parallel Programs", Gropp,Lusk and Skjellum "Using MPI"

An Overview is based on subset of 6 routines that cover send/receive, environment inquiry (for rank and total number of processors) initialize and finalization with simple examples

Processor Groups, Collective Communication and Computation and Derived Datatypes are also discussed

HTML version of Basic Foils prepared 10 Oct 1995

Foil 3 MPI Overview -- Comparison with HPF -- I

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

MPI collected ideas from many previous message passing systems and put them into a "standard" so we could write portable (runs on all current machines) and scalable (runs on future machines we can think of) parallel software

MPI agreed May 1994 after a process that began with a workshop in April 1992

MPI plays same role to message passing systems that HPF does to data parallel languages

BUT whereas MPI has essentially all one could want -- as message passing fully understood

HPF will still evolve as many unsolved data parallel compiler issues

e.g. HPC++ -- the C++ version version of HPF still uncertain
and there is no data parallel version of C due to pointers (C* has restrictions)
HPJava is our new idea
whereas MPI fine with Fortran C or C++ and even Java

HTML version of Basic Foils prepared 10 Oct 1995

Foil 4 MPI Overview -- Comparison with HPF -- II

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

HPF runs on SIMD and MIMD machines and is high level as it expresses a style of programming or problem architecture

MPI runs on MIMD machines (in principle it could run on SIMD but unnatural and inefficient) -- it expresses a machine architecture

Traditional Software Model is

Problem --> High Level Language --> Assembly Language --> Machine
-------- Expresses Problem -- Expresses Machine

So in this analogy MPI is universal "machine-language" of Parallel processing

HTML version of Basic Foils prepared 10 Oct 1995

Foil 5 Some Key Features of MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Point to Point Message Passing

Collective Communication -- messages between >2 simultaneous processes

Support for Process Groups -- messaging in subsets of processors

Support for communication contexts -- general specification of message labels and ensuring unique to a set of routines as in a precompiled library

Note a communicator incorporates contexts and groups as single data structure and defines the scope of communication operation

Support for application (virtual) topologies analogous to distribution types in HPF

Inquiry routines to find out about environment such as number of processors

HTML version of Basic Foils prepared 10 Oct 1995

Foil 6 Some Difficulties with MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Kitchen Sink has 129 functions and each has many arguments

This completeness is strength and weakness!
Hard to implement efficiently and hard to learn

It is not a complete operating environment and does not have ability to create and spawn processes etc.

PVM is previous dominant approach

It is very simple with much less functionality than MPI
However it runs on essentially all machines including workstation clusters
Further it is complete albeit simple operating environment

MPI outside distributed computing world with HTTP of the Web, ATM protocols and systems like ISIS from Cornell

However it does look as though MPI is being adopted as general messaging system by parallel computer vendors

HTML version of Basic Foils prepared 10 Oct 1995

Foil 7 Why use Processor Groups?

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

We find a good example when we consider typical Matrix Algorithm

(matrix multiplication)

A i,j = Sk B i,k C k,j

summed over k'th column of B and k'th row of C

Consider a square decomposition of 16 by 16 matrices B and C as for Laplace's equation. (Good Choice)

Each operation involvea a subset(group) of 4 processors

HTML version of Basic Foils prepared 10 Oct 1995

Foil 8 MPI Conventions

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

All MPI routines are prefixed by MPI_

C is always MPI_Xnnnnn(parameters) : C is case sensitive
Fortran is case insensitive but we will write MPI_XNNNNN(parameters)

MPI constants are in upper case as MPI datatype MPI_FLOAT for floating point number in C

Specify overall constants with

#include "mpi.h" in C programs
- include "mpif.h" in Fortran

C routines are actually integer functions and always return a status (error) code

Fortran routines are really subroutines and have returned status code as argument

Please check on status codes although this is often skipped!

HTML version of Basic Foils prepared 10 Oct 1995

Foil 9 Standard Constants in MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

There a set of predefined constants in include files for each language and these include:

MPI_SUCCESS -- succesful return code

MPI_COMM_WORLD (everything) and MPI_COMM_SELF(current process) are predefined reserved communicators in C and Fortran

Fortran elementary datatypes are:

MPI_INTEGER, MPI_REAL, MPI_DOUBLE_PRECISION, MPI_COMPLEX, MPI_DOUBLE_COMPLEX, MPI_LOGICAL, MPI_CHARACTER, MPI_BYTE, MPI_PACKED

C elementary datatypes are:

MPI_CHAR, MPI_SHORT, MPI_INT, MPI_LONG, MPI_UNSIGNED_CHAR, MPI_UNSIGNED_SHORT, MPI_UNSIGNED, MPI_UNSIGNED_LONG, MPI_FLOAT, MPI_DOUBLE, MPI_LONG_DOUBLE, MPI_BYTE, MPI_PACKED

HTML version of Basic Foils prepared 10 Oct 1995

Foil 10 The Six Fundamental MPI routines

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

call MPI_INIT(mpierr) -- initialize

call MPI_COMM_RANK (comm,rank,mpierr) -- find processor label (rank) in group

call MPI_COMM_SIZE(comm,size,mpierr) -- find total number of processors

call MPI_SEND (sndbuf,count,datatype,dest,tag,comm,mpierr) -- send a message

call MPI_RECV (recvbuf,count,datatype,source,tag,comm,status,mpierr) -- receive a message

call MPI_FINALIZE(mpierr) -- End Up

HTML version of Basic Foils prepared 10 Oct 1995

Foil 11 MPI_Init -- Environment Management

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

This MUST be called to set up MPI before any other MPI routines may be called

For C: int MPI_Init(int *argc, char **argv[])

argc and argv[] are conventional C main routine arguments
As usual MPI_Init returns an error handle

For Fortran: call MPI_INIT(mpierr)

nonzero (more pedantically values not equal to MPI_SUCCESS) values of mpierr represent errors

HTML version of Basic Foils prepared 10 Oct 1995

Foil 12 MPI_Comm_rank -- Environment Inquiry

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

This allows you to identify each processor by a unique integer called the rank which runs from 0 to N-1 where there are N processors

If we divide the region 0 to 1 by domain decomposition into N parts, the processor with rank r controls

region: r/N to (r+1)/N

for C:int MPI_Comm_rank(MPI_Comm comm, int *rank)

comm is an MPI communicator of type MPI_Comm

for FORTRAN: call MPI_COMM_RANK (comm,rank,mpierr)

HTML version of Basic Foils prepared 10 Oct 1995

Foil 13 MPI_Comm_size -- Environment Inquiry

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

This returns in integer size number of processes in given communicator comm (remember this specifies processor group)

For C: int MPI_Comm_size(MPI_Comm comm,int *size)

For Fortran: call MPI_COMM_SIZE (comm,size,mpierr)

where comm, size, mpierr are integers
comm is input; size mpierr returned

HTML version of Basic Foils prepared 10 Oct 1995

Foil 14 MPI_Finalize -- Environment Management

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Before exiting, an MPI application it is courteous to clean up the MPI state and MPI_FINALIZE does this. No MPI routine may be called in a given process after that process has called MPI_FINALIZE

for C: int MPI_Finalize()

for Fortran: MPI_FINALIZE(mpierr)

mpierr is an integer

HTML version of Basic Foils prepared 10 Oct 1995

Foil 15 Hello World in C plus MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

#include <stdio.h>

#include <mpi.h>

void main(int argc,char *argv[])

{

int ierror, rank, size
MPI_Init(&argc, &argv); # Initialize
MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Find Processor Number
if( rank == 0)
- printf ("hello World!\n");
ierror=MPI_Comm_size(MPI_COMM_WORLD, &size);
if( ierror != MPI_SUCCESS ) # Find Total number of processors above
- MPI_Abort(MPI_COMM_WORLD,ierror); # Abort
printf("I am processor %d out of total of %d\n", rank, size);
MPI_Finalize(); # Finalize

}

HTML version of Basic Foils prepared 10 Oct 1995

Foil 16 Comments on Parallel Input/Output - I

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Parallel I/O has technical issues -- how best to optimize access to a file whose contents may be stored on N different disks which can deliver data in parallel and

Semantic issues -- what does printf in C (and PRINT in Fortran) mean?

The meaning of printf/PRINT is both undefined and changing

In my old Caltech days, printf on a node of a parallel machine was a modification of UNIX which automatically transferred data from nodes to "host" and produced a single stream
In those days, full UNIX did not run on every node of machine
We introduced new UNIX I/O modes (singular and multiple) to define meaning of parallel I/O and I thought this was a great idea but it didn't catch on!!

HTML version of Basic Foils prepared 10 Oct 1995

Foil 17 Comments on Parallel Input/Output - II

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Today, memory costs have declined and ALL mainstream MIMD distributed memory machines whether clusters of workstations or integrated systems such as T3D/ Paragon/ SP-2 have enough memory on each node to run UNIX

Thus printf today means typically that the node on which it runs will stick it out on "standard output" file for that node

However this is implementation dependent
e.g. If you want a stream of output with information in order
- Say that from node 0, then node 1, then node 2 etc.
- This was default on old Caltech machines but
In general you need to communicate information from nodes 1 to N-1 to node 0 and let node 0 sort it and output in required order

New MPI-IO initiative will link I/O to MPI in a standard fashion

HTML version of Basic Foils prepared 10 Oct 1995

Foil 18 Review of Message Passing Paradigm

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Data is propagated between processors via messages which can be divided into packets but at MPI level we only see logically single complete messages

The building block is Point to Point Communication with one processor sending information and one other receiving it

Collective communication involves more than one message

Broadcast of one processor to group of other processors (call a multicast on Internet when restricted to group)
Synchronization or barrier
Exchange of Information between processors
Reduction Operation such as a Global Sum

Collective Communication can ALWAYS be implemented in terms of elementary point to point communications but is provided

for user convenience and
often the best algorithm is hardware dependent and "official" MPI collective routines ought to be faster than portable implementation in terms of point to point primitive routines.

HTML version of Basic Foils prepared 10 Oct 1995

Foil 19 Basic Point to Point Message Passing I

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Communication is between two processors and receiving process must expect a message although can be uncertain asto message type and sending process

Information required to specify a message includes

Identification of sender process(or)
Identification of destination process
Type of data -- Integers, Floats, Characaters -- note floating point numbers may need conversion if sending between machines of different type
Number of data elements to send and in destination process, maximum size of message that can be received
Where the data lives in sending process -- typically a pointer
Where the received data should be stored into -- again a pointer

HTML version of Basic Foils prepared 10 Oct 1995

Foil 20 Basic Point to Point Message Passing II

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Two types of communication operations applicable to send and receive

Blocking -- sender and receiver wait until communication is complete
Non-blocking -- send and receive handled concurrently with computation -- MPI send and receive routines return immediately before message processed so processor can do other things

In addition four types of send operation

Standard -- A send may be initiated even if matching receive has not been initiated
Synchronous -- Waits for recipent to complete and so message delivery guaranteed
Buffered -- Copies message into a buffer (if necessary) on destination processor so that completion of send independent of matching receive
Ready -- A send is only started when matching receive initiated

HTML version of Basic Foils prepared 10 Oct 1995

Foil 21 Blocking Send MPI_Send(C) MPI_SEND(Fortran)

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

call MPI_SEND (

IN message start address of data to send
IN message_len number of items (length in bytes
- determined by type)
IN datatype type of each data element
IN dest_rank Processor number (rank) of destination
IN message_tag tag of message to allow receiver to filter
IN communicator Communicator of both sender and receiver group
OUT error_message) Error Flag (absent in C)

Fortran example:

integer count, datatype, dest, tag, comm, mpierr
comm= MPI_COMM_WORLD
tag=0
datatype=MPI_REAL

call MPI_SEND (sndbuf,count,datatype,dest,tag,comm,mpierr)

HTML version of Basic Foils prepared 10 Oct 1995

Foil 22 Blocking Receive MPI_Recv(C) MPI_RECV(Fortran)

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

call MPI_RECV(

IN start_of_buffer Address of place to store data(address is INput
- values of data are of course output starting at this adddress!)
IN buffer_len Maximum number of items allowed
IN datatype Type of each data type
IN source_rank Processor number (rank) of source
IN tag only accept messages with this tag value
IN communicator Communicator of both sender and receiver group
OUT return_status Data structure describing what happened!
OUT error_message) Error Flag (absent in C)

Note that return_status is used after completion of receive to find actual received length (buffer_len is a MAXIMUM), actual source processor rank and actual message tag

In C syntax is

int error_message=MPI_Recv(void *start_of_buffer,int buffer_len, MPI_DATATYPE datatype, int source_rank, int tag, MPI_Comm communicator, MPI_Status *return_status)

HTML version of Basic Foils prepared 10 Oct 1995

Foil 23 Fortran example:Blocking Receive MPI_RECV

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

integer status(MPI_STATUS_SIZE) An array to store status

integer mpierr, count, datatype, source, tag, comm

integer recvbuf(100)

count=100

datatype=MPI_REAL

comm=MPI_COMM_WORLD

source=MPI_ANY_SOURCE accept any source processor

tag=MPI_ANY_TAG accept anmy message tag

call MPI_RECV (recvbuf,count,datatype,source,tag,comm,status,mpierr)

Note source and tag can be wild-carded

HTML version of Basic Foils prepared 10 Oct 1995

Foil 24 Hello World:C Example of Send and Receive

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

#include "mpi.h"

main( int argc, char **argv )

{

char message[20];
int i, rank, size, tag=137; # Any value of type allowed
MPI_Status status;
MPI_Init (&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size); # Number of Processors
MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Who is this processor
if( rank == 0 ) { # We are on "root" -- Processor 0
- strcpy(message,"Hello MPI World"); # Generate message
- for(i=1; i<size; i++) # Send message to size-1 other processors
- MPI_Send(message, strlen(message)+1, MPI_CHAR, i, tag, MPI_COMM_WORLD);
}
else
- MPI_Recv(message,20, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status);
printf("This is a message from node %d saying %s\n", rank, message);
MPI_Finalize();

}

HTML version of Basic Foils prepared 10 Oct 1995

Foil 25 Interpretation of Returned Message Status

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

In C status is a structure of type MPI_Status

status.source gives actual source process
status.tag gives the actual message tag

In Fortran the status is an integer array and different elements give:

in status(MPI_SOURCE) the actual source process
in status(MPI_TAG) the actual message tag

In C and Fortran, the number of elements (called count) in the message can be found from call to

MPI_GET_COUNT (IN status, IN datatype,

OUT count, OUT error_message)

where as usual in C last argument is missing as returned in function call

HTML version of Basic Foils prepared 10 Oct 1995

Foil 26 Naming Conventions for Send and Receive

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

SEND Blocking Nonblocking

Standard MPI_Send MPI_Isend

Ready MPI_Rsend MPI_Irsend

Synchronous MPI_Ssend MPI_Issend

Buffered MPI_Bsend MPI_Ibsend

RECEIVE Blocking Nonblocking

Standard MPI_Recv MPI_Irecv

Any type of receive routine routine can be used to receive messages from any type of send routine

HTML version of Basic Foils prepared 10 Oct 1995

Foil 27 Collective Communication

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

MPI_BARRIER(comm) Global Synchronization within a given communicator

MPI_BCAST Global Broadcast

MPI_GATHER Concatenate data from all processors in a communicator into one process

MPI_ALLGATHER puts result of concatenation in all processors

MPI_SCATTER takes data from one processor and scatters over all processors

MPI_ALLTOALL sends data from all processes to all other processes

MPI_SENDRECV exchanges data between two processors -- often used to implement "shifts"

viewed as pure point to point by some

HTML version of Basic Foils prepared 10 Oct 1995

Foil 28 Hello World:C Example of Broadcast

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

#include "mpi.h"

main( int argc, char **argv )

{

char message[20];
int rank;
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank); # Who is this processor
if( rank == 0 ) # We are on "root" -- Processor 0
- strcpy(message,"Hello MPI World"); # Generate message
# MPI_Bcast sends from root=0 and receives on all other processors
MPI_Bcast(message,20, MPI_CHAR, 0, MPI_COMM_WORLD);
printf("This is a message from node %d saying %s\n", rank,
- message);
MPI_Finalize();

}

HTML version of Basic Foils prepared 10 Oct 1995

Foil 29 Collective Computation

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

One can often perform computing during a collective communication

MPI_REDUCE performs reduction operation of type chosen from

maximum(value or value and location), minimum(value or value and location), sum, product, logical and/or/xor, bit-wise and/or/xor
e.g. operation labelled MPI_MAX stores in location result of processor rank the global maximum of original in each processor as in
call MPI_REDUCE(original,result,1,MPI_REAL,MPI_MAX,rank,comm,ierror)
One can also supply one's own reduction function

MPI_ALLREDUCE is as MPI_REDUCE but stores result in all -- not just one -- processors

MPI_SCAN performs reductions with result for processor r depending on data in processors 0 to r

HTML version of Basic Foils prepared 10 Oct 1995

Foil 30 Examples of Collective Communication/Computation

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Four Processors where each has a send buffer of size 2

0 1 2 3 Processors

(2,4) (5,7) (0,3) (6,2) Initial Send Buffers

MPI_BCAST with root=2

(0,3) (0,3) (0,3) (0,3) Resultant Buffers

MPI_REDUCE with action MPI_MIN and root=0

(0,2) (_,_) (_,_) (_,_) Resultant Buffers

MPI_ALLREDUCE with action MPI_MIN and root=0

(0,2) (0,2) (0,2) (0,2) Resultant Buffers

MPI_REDUCE with action MPI_SUM and root=1

(_,_) (13,16) (_,_) (_,_) Resultant Buffers

HTML version of Basic Foils prepared 10 Oct 1995

Foil 31 More Examples of Collective Communication/Computation

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Four Processors where each has a send buffer of size 2

0 1 2 3 Processors

(2,4) (5,7) (0,3) (6,2) Initial Send Buffers

MPI_SENDRECV with 0,1 and 2,3 paired

(5,7) (2,4) (6,2) (0,3) Resultant Buffers

MPI_GATHER with root=0

(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Resultant Buffers

Four Processors where only rank=0 has send buffer

(2,4,5,7,0,3,6,2) (_,_) (_,_) (_,_) Initial send Buffers

MPI_SCATTER with root=0

(2,4) (5,7) (0,3) (6,2) Resultant Buffers

HTML version of Basic Foils prepared 10 Oct 1995

Foil 32 Examples of MPI_ALLTOALL

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

All to All Communication with i'th location in j'th processor being sent to j'th location in i'th processor

Processor 0 1 2 3

Start (a0,a1,a2,a3) (b0,b1,b2,b3) (c0,c1,c2,c3) (d0,d1,d2,d3)

After (a0,b0,c0,d0) (a1,b1,c1,d1) (a2,b2,c2,d2) (a3,b3,c3,d3

There are extensions MPI_ALLTOALLV to handle case where data stored in noncontiguous fashion in each processor and when each processor sends different amounts of data to other processors

Many MPI routines have such "vector" extensions

HTML version of Basic Foils prepared 10 Oct 1995

Foil 33 Motivation for Derived Datatypes in MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

These are an elegant solution to a problem we struggled with a lot in the early days -- all message passing is naturally built on buffers holding contiguous data

However often (usually) the data is not stored contiguously. One can address this with a set of small MPI_SEND commands but we want messages to be as big as possible as latency is so high

One can copy all the data elements into a single buffer and transmit this but this is tedious for the user and not very efficient

It has extra memory to memory copies which are often quite slow

So derived datatypes can be used to set up arbitary memory templates with variable offsets and primitive datatypes. Derived datatypes can then be used in "ordinary" MPI calls in place of primitive datatypes MPI_REAL MPI_FLOAT etc.

HTML version of Basic Foils prepared 10 Oct 1995

Foil 34 Derived Datatype Basics

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Derived Datatypes should be declared integer in Fortran and MPI_Datatype in C

Generally have form { (type0,disp0), (type1,disp1) ... (type(n-1),disp(n-1)) } with list of primitive data types typei and displacements (from start of buffer) dispi

call MPI_TYPE_CONTIGUOUS (count, oldtype, newtype, ierr)

creates a new datatype newtype made up of count repetitions of old datatype oldtype

one must use call MPI_TYPE_COMMIT(derivedtype,ierr)

before one can use the type derivedtype in a communication call

call MPI_TYPE_FREE(derivedtype,ierr) frees up space used by this derived type

HTML version of Basic Foils prepared 10 Oct 1995

Foil 35 Simple Example of Derived Datatype

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

integer derivedtype, ...

call MPI_TYPE_CONTIGUOUS(10, MPI_REAL, derivedtype, ierr)

call MPI_TYPE_COMMIT(derivedtype, ierr)

call MPI_SEND(data, 1, derivedtype, dest,tag, MPI_COMM_WORLD, ierr)

call MPI_TYPE_FREE(derivedtype, ierr)

is equivalent to simpler single call

call MPI_SEND(data, 10, MPI_REAL, dest, tag, MPI_COMM_WORLD, ierr)

and each sends 10 contiguous real values at location data to process dest

HTML version of Basic Foils prepared 10 Oct 1995

Foil 36 More Complex Datatypes MPI_TYPE_VECTOR/INDEXED

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

MPI_TYPE_VECTOR (count,blocklen,stride,oldtype,newtype,ierr)

IN count Number of blocks to be added
IN blocklen Number of elements in block
IN stride Number of elements (NOT bytes) between start of each block
IN oldtype Datatype of each element
OUT newtype Handle(pointer) for new derived type

MPI_TYPE_INDEXED (count,blocklens,indices,oldtype,newtype,ierr)

IN count Number of blocks to be added
IN blocklens Number of elements in each block -- an array of length count
IN indices Displacements (an array of length count) for each block
IN oldtype Datatype of each element
OUT newtype Handle(pointer) for new derived type

HTML version of Basic Foils prepared 10 Oct 1995

Foil 37 Use of Derived Types in Jacobi Iteration with Guard Rings--I

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

Assume each processor stores NLOC by NLOC set of grid points in an array PHI dimensioned PHI(NLOC2,NLOC2) with NLOC=NLOC+2 to establish guard rings

HTML version of Basic Foils prepared 10 Oct 1995

Foil 38 Use of Derived Types in Jacobi Iteration with Guard Rings--II

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

integer North,South,East,West

# These are the processor ranks of 4 nearest neighbors

integer rowtype,coltype # the new derived types

# Fortran stores elements in columns contiguously

# (C has opposite convention!)

call MPI_TYPE_CONTIGUOUS(NLOC, MPI_REAL, coltype, ierr)

call MPI_TYPE_COMMIT(coltype,ierr)

# rows (North and South) are not contiguous

call MPI_TYPE_VECTOR(NLOC, 1, NLOC2, MPI_REAL, rowtype, ierr)

call MPI_TYPE_COMMIT(rowtype,ierr)

call MPI_SEND(array(2,2), 1, coltype, west,0,comm,ierr)

call MPI_SEND(array(2,NLOC+1), 1, coltype, east,0,comm,ierr)

call MPI_SEND(array(2,2), rowtype, north, 0,comm,ierr)

call MPI_SEND(array(NLOC+1,2), 1, rowtype, south, 0,comm,ierr)

HTML version of Basic Foils prepared 10 Oct 1995

Foil 39 Other Useful Concepts in MPI

From Fox Presentation Fall 1995 CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *

Full HTML Index

More general versions of MPI_?SEND and associated inquiry routines to see if messages have arrived. Use of these allows you to overlap communication and computation. In general this is not used even though more efficient

Also use in more general asynchronous applications -- blocking routines are most natural in loosely syncronous communicate-compute cycles

Application Topology routines allow to find rank of nearest neighbor processors as North,South,East,West in Jacobi iteration

Packing and Unpacking of data to make single buffers -- derived datatypes are usually a more elegant approach to this

Communicators to set up subgroups of processors (remember matrix example) and to set up independent MPI universes as needed to build libraries so that messages generated by library do not interfere with those from other libraries or user code

Historically (in my work) WOULD have been useful to distinguish debugging and application messages

© on Tue Oct 7 1997