|
Abstract |
The Message Passing Interface Standard (MPI) is a message passing library
standard based on the consensus of the MPI Forum, which has over 40
participating organizations, including vendors, researchers, software library
developers, and users. The goal of the Message Passing Interface is to
establish a portable, efficient, and flexible standard for message passing
that will be widely used for writing message passing programs. As such, MPI
is the first standardized, vendor independent, message passing library. The
advantages of developing message passing software using MPI closely match the
design goals of portability, efficiency, and flexibility. MPI is not an
IEEE or ISO standard, but has in fact, become the "industry standard" for
writing message passing programs on HPC platforms.
The goal of this tutorial is to teach those unfamiliar with MPI how to develop and run parallel programs according to the MPI standard. The primary topics that are presented focus on those which are the most useful for new MPI programmers. The tutorial begins with an introduction, background, and basic information for getting started with MPI. This is followed by a detailed look at the MPI routines that are most useful for new MPI programmers, including MPI Environment Management, Point-to-Point Communications, and Collective Communications routines. Numerous examples in both C and Fortran are provided, as well as a lab exercise.
The tutorial materials also include more advanced topics such as Derived Data Types, Group and Communicator Management Routines, and Virtual Topologies. However, these are not actually presented during the lecture, but are meant to serve as "further reading" for those who are interested.
Level/Prerequisites: Ideal for those who are new to parallel
programming with MPI. A basic understanding of parallel programming
in C or Fortran is assumed. For those who are unfamiliar with Parallel
Programming in general, the material covered in
EC3500: Introduction To Parallel
Computing would be helpful.
What is MPI? |
An Interface Specification:
History and Evolution:
|
|
Reasons for Using MPI:
Programming Model:
Getting Started |
Header File:
C include file | Fortran include file |
---|---|
#include "mpi.h"
include 'mpif.h' |
|
Format of MPI Calls:
C Binding | |||||
---|---|---|---|---|---|
Format: | Example:
| Error code:
| Returned as "rc". MPI_SUCCESS if successful
| |
Fortran Binding | |||||
---|---|---|---|---|---|
Format: | call mpi_xxxxx(parameter,..., ierr) Example:
| Error code:
| Returned as "ierr" parameter. MPI_SUCCESS if successful
| |
General MPI Program Structure:
|
Communicators and Groups:
|
Rank:
Environment Management Routines |
MPI environment management routines are used for an assortment of purposes, such as initializing and terminating the MPI environment, querying the environment and identity, etc. Most of the commonly used ones are described below.
MPI_INIT (ierr)
MPI_COMM_SIZE (comm,size,ierr)
MPI_COMM_RANK (comm,rank,ierr)
MPI_ABORT (comm,errorcode,ierr)
MPI_GET_PROCESSOR_NAME (name,resultlength,ierr)
MPI_INITIALIZED (flag,ierr)
MPI_WTIME ()
MPI_WTICK ()
MPI_FINALIZE (ierr)
#include "mpi.h" #include <stdio.h> int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, rc; rc = MPI_Init(&argc,&argv); if (rc != MPI_SUCCESS) { printf ("Error starting MPI program. Terminating.\n"); MPI_Abort(MPI_COMM_WORLD, rc); } MPI_Comm_size(MPI_COMM_WORLD,&numtasks); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf ("Number of tasks= %d My rank= %d\n", numtasks,rank); /******* do some work *******/ MPI_Finalize(); } |
program simple include 'mpif.h' integer numtasks, rank, ierr, rc call MPI_INIT(ierr) if (ierr .ne. MPI_SUCCESS) then print *,'Error starting MPI program. Terminating.' call MPI_ABORT(MPI_COMM_WORLD, rc, ierr) end if call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) print *, 'Number of tasks=',numtasks,' My rank=',rank C ****** do some work ****** call MPI_FINALIZE(ierr) end |
Point to Point Communication Routines |
Order and Fairness:
Point to Point Communication Routines |
Blocking sends | Non-blocking sends
| Blocking receive
| Non-blocking receive
| |
C Data Types | Fortran Data Types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MPI_CHAR
signed char
| MPI_CHARACTER
| character(1)
| MPI_SHORT
| signed short int
|
|
| MPI_INT
| signed int
| MPI_INTEGER
| integer
| MPI_LONG
| signed long int
|
|
| MPI_UNSIGNED_CHAR
| unsigned char
|
|
| MPI_UNSIGNED_SHORT
| unsigned short int
|
|
| MPI_UNSIGNED
| unsigned int
|
|
| MPI_UNSIGNED_LONG
| unsigned long int
|
|
| MPI_FLOAT
| float
| MPI_REAL
| real
| MPI_DOUBLE
| double
| MPI_DOUBLE_PRECISION
| double precision
| MPI_LONG_DOUBLE
| long double
|
|
|
|
| MPI_COMPLEX
| complex
|
|
| MPI_DOUBLE_COMPLEX
| double complex
|
|
| MPI_LOGICAL
| logical
| MPI_BYTE
| 8 binary digits
| MPI_BYTE
| 8 binary digits
| MPI_PACKED
| data packed or unpacked with MPI_Pack()/
MPI_Unpack
| MPI_PACKED
| data packed or unpacked with MPI_Pack()/
MPI_Unpack
| |
Notes:
Point to Point Communication Routines |
MPI_SEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_RECV (buf,count,datatype,source,tag,comm,status,ierr)
MPI_SSEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_BSEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_Buffer_detach (&buffer,size)
MPI_BUFFER_ATTACH (buffer,size,ierr)
MPI_BUFFER_DETACH (buffer,size,ierr)
MPI_RSEND (buf,count,datatype,dest,tag,comm,ierr)
......
&recvbuf,recvcount,recvtype,source,recvtag,
......
comm,&status)
MPI_SENDRECV (sendbuf,sendcount,sendtype,dest,sendtag,
......
recvbuf,recvcount,recvtype,source,recvtag,
......
comm,status,ierr)
MPI_Waitany (count,&array_of_requests,&index,&status)
MPI_Waitall (count,&array_of_requests,&array_of_statuses)
MPI_Waitsome (incount,&array_of_requests,&outcount,
......
&array_of_offsets, &array_of_statuses)
MPI_WAIT (request,status,ierr)
MPI_WAITANY (count,array_of_requests,index,status,ierr)
MPI_WAITALL (count,array_of_requests,array_of_statuses,
......
ierr)
MPI_WAITSOME (incount,array_of_requests,outcount,
......
array_of_offsets, array_of_statuses,ierr)
MPI_PROBE (source,tag,comm,status,ierr)
Task 0 pings task 1 and awaits return ping
#include "mpi.h" #include <stdio.h> int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, dest, source, rc, count, tag=1; char inmsg, outmsg='x'; MPI_Status Stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { dest = 1; source = 1; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); } else if (rank == 1) { dest = 0; source = 0; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } rc = MPI_Get_count(&Stat, MPI_CHAR, &count); printf("Task %d: Received %d char(s) from task %d with tag %d \n", rank, count, Stat.MPI_SOURCE, Stat.MPI_TAG); MPI_Finalize(); } |
program ping include 'mpif.h' integer numtasks, rank, dest, source, count, tag, ierr integer stat(MPI_STATUS_SIZE) character inmsg, outmsg outmsg = 'x' tag = 1 call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) if (rank .eq. 0) then dest = 1 source = 1 call MPI_SEND(outmsg, 1, MPI_CHARACTER, dest, tag, & MPI_COMM_WORLD, ierr) call MPI_RECV(inmsg, 1, MPI_CHARACTER, source, tag, & MPI_COMM_WORLD, stat, ierr) else if (rank .eq. 1) then dest = 0 source = 0 call MPI_RECV(inmsg, 1, MPI_CHARACTER, source, tag, & MPI_COMM_WORLD, stat, err) call MPI_SEND(outmsg, 1, MPI_CHARACTER, dest, tag, & MPI_COMM_WORLD, err) endif call MPI_GET_COUNT(stat, MPI_CHARACTER, count, ierr) print *, 'Task ',rank,': Received', count, 'char(s) from task', & stat(MPI_SOURCE), 'with tag',stat(MPI_TAG) call MPI_FINALIZE(ierr) end |
Point to Point Communication Routines |
MPI_ISEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IRECV (buf,count,datatype,source,tag,comm,request,ierr)
MPI_ISSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IBSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IRSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_Testany (count,&array_of_requests,&index,&flag,&status)
MPI_Testall (count,&array_of_requests,&flag,&array_of_statuses)
MPI_Testsome (incount,&array_of_requests,&outcount,
......
&array_of_offsets, &array_of_statuses)
MPI_TEST (request,flag,status,ierr)
MPI_TESTANY (count,array_of_requests,index,flag,status,ierr)
MPI_TESTALL (count,array_of_requests,flag,array_of_statuses,ierr)
MPI_TESTSOME (incount,array_of_requests,outcount,
......
array_of_offsets, array_of_statuses,ierr)
MPI_IPROBE (source,tag,comm,flag,status,ierr)
Nearest neighbor exchange in ring topology
#include "mpi.h" #include <stdio.h> int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2; MPI_Request reqs[4]; MPI_Status stats[4]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); prev = rank-1; next = rank+1; if (rank == 0) prev = numtasks - 1; if (rank == (numtasks - 1)) next = 0; MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]); MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]); MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]); MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]); { do some work } MPI_Waitall(4, reqs, stats); MPI_Finalize(); } |
program ringtopo include 'mpif.h' integer numtasks, rank, next, prev, buf(2), tag1, tag2, ierr integer stats(MPI_STATUS_SIZE,4), reqs(4) tag1 = 1 tag2 = 2 call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) prev = rank - 1 next = rank + 1 if (rank .eq. 0) then prev = numtasks - 1 endif if (rank .eq. numtasks - 1) then next = 0 endif call MPI_IRECV(buf(1), 1, MPI_INTEGER, prev, tag1, & MPI_COMM_WORLD, reqs(1), ierr) call MPI_IRECV(buf(2), 1, MPI_INTEGER, next, tag2, & MPI_COMM_WORLD, reqs(2), ierr) call MPI_ISEND(rank, 1, MPI_INTEGER, prev, tag2, & MPI_COMM_WORLD, reqs(3), ierr) call MPI_ISEND(rank, 1, MPI_INTEGER, next, tag1, & MPI_COMM_WORLD, reqs(4), ierr) C do some work call MPI_WAITALL(4, reqs, stats, ierr); call MPI_FINALIZE(ierr) end |
Collective Communication Routines |
All or None:
Types of Collective Operations:
MPI_BARRIER (comm,ierr)
Perform a scatter operation on the rows of an array
#include "mpi.h" #include <stdio.h> #define SIZE 4 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, sendcount, recvcount, source; float sendbuf[SIZE][SIZE] = { {1.0, 2.0, 3.0, 4.0}, {5.0, 6.0, 7.0, 8.0}, {9.0, 10.0, 11.0, 12.0}, {13.0, 14.0, 15.0, 16.0} }; float recvbuf[SIZE]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); if (numtasks == SIZE) { source = 1; sendcount = SIZE; recvcount = SIZE; MPI_Scatter(sendbuf,sendcount,MPI_FLOAT,recvbuf,recvcount, MPI_FLOAT,source,MPI_COMM_WORLD); printf("rank= %d Results: %f %f %f %f\n",rank,recvbuf[0], recvbuf[1],recvbuf[2],recvbuf[3]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Finalize(); } |
program scatter include 'mpif.h' integer SIZE parameter(SIZE=4) integer numtasks, rank, sendcount, recvcount, source, ierr real*4 sendbuf(SIZE,SIZE), recvbuf(SIZE) C Fortran stores this array in column major order, so the C scatter will actually scatter columns, not rows. data sendbuf /1.0, 2.0, 3.0, 4.0, & 5.0, 6.0, 7.0, 8.0, & 9.0, 10.0, 11.0, 12.0, & 13.0, 14.0, 15.0, 16.0 / call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) if (numtasks .eq. SIZE) then source = 1 sendcount = SIZE recvcount = SIZE call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf, & recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr) print *, 'rank= ',rank,' Results: ',recvbuf else print *, 'Must specify',SIZE,' processors. Terminating.' endif call MPI_FINALIZE(ierr) end |
rank= 0 Results: 1.000000 2.000000 3.000000 4.000000 rank= 1 Results: 5.000000 6.000000 7.000000 8.000000 rank= 2 Results: 9.000000 10.000000 11.000000 12.000000 rank= 3 Results: 13.000000 14.000000 15.000000 16.000000
Derived Data Types |
C Data Types | Fortran Data Types |
---|---|
MPI_CHAR
MPI_SHORT MPI_INT MPI_LONG MPI_UNSIGNED_CHAR MPI_UNSIGNED_SHORT MPI_UNSIGNED_LONG MPI_UNSIGNED MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_BYTE MPI_PACKED MPI_CHARACTER
| MPI_INTEGER MPI_REAL MPI_DOUBLE_PRECISION MPI_COMPLEX MPI_DOUBLE_COMPLEX MPI_LOGICAL MPI_BYTE MPI_PACKED |
MPI_TYPE_CONTIGUOUS (count,oldtype,newtype,ierr)
MPI_TYPE_VECTOR (count,blocklength,stride,oldtype,newtype,ierr)
MPI_TYPE_INDEXED (count,blocklens(),offsets(),old_type,newtype,ierr)
MPI_TYPE_STRUCT (count,blocklens(),offsets(),old_types,newtype,ierr)
MPI_TYPE_EXTENT (datatype,extent,ierr)
MPI_TYPE_COMMIT (datatype,ierr)
MPI_TYPE_FREE (datatype,ierr)
Create a data type representing a row of an array and distribute a
different row to all processes.
#include "mpi.h" #include <stdio.h> #define SIZE 4 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source=0, dest, tag=1, i; float a[SIZE][SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; float b[SIZE]; MPI_Status stat; MPI_Datatype rowtype; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Type_contiguous(SIZE, MPI_FLOAT, &rowtype); MPI_Type_commit(&rowtype); if (numtasks == SIZE) { if (rank == 0) { for (i=0; i<numtasks; i++) MPI_Send(&a[i][0], 1, rowtype, i, tag, MPI_COMM_WORLD); } MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat); printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f\n", rank,b[0],b[1],b[2],b[3]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Type_free(&rowtype); MPI_Finalize(); } |
program contiguous include 'mpif.h' integer SIZE parameter(SIZE=4) integer numtasks, rank, source, dest, tag, i, ierr real*4 a(0:SIZE-1,0:SIZE-1), b(0:SIZE-1) integer stat(MPI_STATUS_SIZE), columntype C Fortran stores this array in column major order data a /1.0, 2.0, 3.0, 4.0, & 5.0, 6.0, 7.0, 8.0, & 9.0, 10.0, 11.0, 12.0, & 13.0, 14.0, 15.0, 16.0 / call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) call MPI_TYPE_CONTIGUOUS(SIZE, MPI_REAL, columntype, ierr) call MPI_TYPE_COMMIT(columntype, ierr) tag = 1 if (numtasks .eq. SIZE) then if (rank .eq. 0) then do 10 i=0, numtasks-1 call MPI_SEND(a(0,i), 1, columntype, i, tag, & MPI_COMM_WORLD,ierr) 10 continue endif source = 0 call MPI_RECV(b, SIZE, MPI_REAL, source, tag, & MPI_COMM_WORLD, stat, ierr) print *, 'rank= ',rank,' b= ',b else print *, 'Must specify',SIZE,' processors. Terminating.' endif call MPI_TYPE_FREE(columntype, ierr) call MPI_FINALIZE(ierr) end |
rank= 0 b= 1.0 2.0 3.0 4.0 rank= 1 b= 5.0 6.0 7.0 8.0 rank= 2 b= 9.0 10.0 11.0 12.0 rank= 3 b= 13.0 14.0 15.0 16.0
Create a data type representing a column of an array and distribute
different columns to all processes.
#include "mpi.h" #include <stdio.h> #define SIZE 4 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source=0, dest, tag=1, i; float a[SIZE][SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; float b[SIZE]; MPI_Status stat; MPI_Datatype columntype; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Type_vector(SIZE, 1, SIZE, MPI_FLOAT, &columntype); MPI_Type_commit(&columntype); if (numtasks == SIZE) { if (rank == 0) { for (i=0; i<numtasks; i++) MPI_Send(&a[0][i], 1, columntype, i, tag, MPI_COMM_WORLD); } MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat); printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f\n", rank,b[0],b[1],b[2],b[3]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Type_free(&columntype); MPI_Finalize(); } |
program vector include 'mpif.h' integer SIZE parameter(SIZE=4) integer numtasks, rank, source, dest, tag, i, ierr real*4 a(0:SIZE-1,0:SIZE-1), b(0:SIZE-1) integer stat(MPI_STATUS_SIZE), rowtype C Fortran stores this array in column major order data a /1.0, 2.0, 3.0, 4.0, & 5.0, 6.0, 7.0, 8.0, & 9.0, 10.0, 11.0, 12.0, & 13.0, 14.0, 15.0, 16.0 / call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) call MPI_TYPE_VECTOR(SIZE, 1, SIZE, MPI_REAL, rowtype, ierr) call MPI_TYPE_COMMIT(rowtype, ierr) tag = 1 if (numtasks .eq. SIZE) then if (rank .eq. 0) then do 10 i=0, numtasks-1 call MPI_SEND(a(i,0), 1, rowtype, i, tag, & MPI_COMM_WORLD, ierr) 10 continue endif source = 0 call MPI_RECV(b, SIZE, MPI_REAL, source, tag, & MPI_COMM_WORLD, stat, ierr) print *, 'rank= ',rank,' b= ',b else print *, 'Must specify',SIZE,' processors. Terminating.' endif call MPI_TYPE_FREE(rowtype, ierr) call MPI_FINALIZE(ierr) end |
rank= 0 b= 1.0 5.0 9.0 13.0 rank= 1 b= 2.0 6.0 10.0 14.0 rank= 2 b= 3.0 7.0 11.0 15.0 rank= 3 b= 4.0 8.0 12.0 16.0
Create a datatype by extracting variable portions of an array and distribute
to all tasks.
#include "mpi.h" #include <stdio.h> #define NELEMENTS 6 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source=0, dest, tag=1, i; int blocklengths[2], displacements[2]; float a[16] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; float b[NELEMENTS]; MPI_Status stat; MPI_Datatype indextype; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); blocklengths[0] = 4; blocklengths[1] = 2; displacements[0] = 5; displacements[1] = 12; MPI_Type_indexed(2, blocklengths, displacements, MPI_FLOAT, &indextype); MPI_Type_commit(&indextype); if (rank == 0) { for (i=0; i<numtasks; i++) MPI_Send(a, 1, indextype, i, tag, MPI_COMM_WORLD); } MPI_Recv(b, NELEMENTS, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat); printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f %3.1f %3.1f\n", rank,b[0],b[1],b[2],b[3],b[4],b[5]); MPI_Type_free(&indextype); MPI_Finalize(); } |
program indexed include 'mpif.h' integer NELEMENTS parameter(NELEMENTS=6) integer numtasks, rank, source, dest, tag, i, ierr integer blocklengths(0:1), displacements(0:1) real*4 a(0:15), b(0:NELEMENTS-1) integer stat(MPI_STATUS_SIZE), indextype data a /1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, & 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0 / call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) blocklengths(0) = 4 blocklengths(1) = 2 displacements(0) = 5 displacements(1) = 12 call MPI_TYPE_INDEXED(2, blocklengths, displacements, MPI_REAL, & indextype, ierr) call MPI_TYPE_COMMIT(indextype, ierr) tag = 1 if (rank .eq. 0) then do 10 i=0, numtasks-1 call MPI_SEND(a, 1, indextype, i, tag, MPI_COMM_WORLD, ierr) 10 continue endif source = 0 call MPI_RECV(b, NELEMENTS, MPI_REAL, source, tag, MPI_COMM_WORLD, & stat, ierr) print *, 'rank= ',rank,' b= ',b call MPI_TYPE_FREE(indextype, ierr) call MPI_FINALIZE(ierr) end |
rank= 0 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 1 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 2 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 3 b= 6.0 7.0 8.0 9.0 13.0 14.0
Create a data type that represents a particle and distribute an array
of such particles to all processes.
#include "mpi.h" #include <stdio.h> #define NELEM 25 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source=0, dest, tag=1, i; typedef struct { float x, y, z; float velocity; int n, type; } Particle; Particle p[NELEM], particles[NELEM]; MPI_Datatype particletype, oldtypes[2]; int blockcounts[2]; /* MPI_Aint type used to be consistent with syntax of */ /* MPI_Type_extent routine */ MPI_Aint offsets[2], extent; MPI_Status stat; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); /* Setup description of the 4 MPI_FLOAT fields x, y, z, velocity */ offsets[0] = 0; oldtypes[0] = MPI_FLOAT; blockcounts[0] = 4; /* Setup description of the 2 MPI_INT fields n, type */ /* Need to first figure offset by getting size of MPI_FLOAT */ MPI_Type_extent(MPI_FLOAT, &extent); offsets[1] = 4 * extent; oldtypes[1] = MPI_INT; blockcounts[1] = 2; /* Now define structured type and commit it */ MPI_Type_struct(2, blockcounts, offsets, oldtypes, &particletype); MPI_Type_commit(&particletype); /* Initialize the particle array and then send it to each task */ if (rank == 0) { for (i=0; i<NELEM; i++) { particles[i].x = i * 1.0; particles[i].y = i * -1.0; particles[i].z = i * 1.0; particles[i].velocity = 0.25; particles[i].n = i; particles[i].type = i % 2; } for (i=0; i<numtasks; i++) MPI_Send(particles, NELEM, particletype, i, tag, MPI_COMM_WORLD); } MPI_Recv(p, NELEM, particletype, source, tag, MPI_COMM_WORLD, &stat); /* Print a sample of what was received */ printf("rank= %d %3.2f %3.2f %3.2f %3.2f %d %d\n", rank,p[3].x, p[3].y,p[3].z,p[3].velocity,p[3].n,p[3].type); MPI_Type_free(&particletype); MPI_Finalize(); } |
program struct include 'mpif.h' integer NELEM parameter(NELEM=25) integer numtasks, rank, source, dest, tag, i, ierr integer stat(MPI_STATUS_SIZE) type Particle sequence real*4 x, y, z, velocity integer n, type end type Particle type (Particle) p(NELEM), particles(NELEM) integer particletype, oldtypes(0:1), blockcounts(0:1), & offsets(0:1), extent call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) C Setup description of the 4 MPI_REAL fields x, y, z, velocity offsets(0) = 0 oldtypes(0) = MPI_REAL blockcounts(0) = 4 C Setup description of the 2 MPI_INTEGER fields n, type C Need to first figure offset by getting size of MPI_REAL call MPI_TYPE_EXTENT(MPI_REAL, extent, ierr) offsets(1) = 4 * extent oldtypes(1) = MPI_INTEGER blockcounts(1) = 2 C Now define structured type and commit it call MPI_TYPE_STRUCT(2, blockcounts, offsets, oldtypes, & particletype, ierr) call MPI_TYPE_COMMIT(particletype, ierr) C Initialize the particle array and then send it to each task tag = 1 if (rank .eq. 0) then do 10 i=0, NELEM-1 particles(i) = Particle ( 1.0*i, -1.0*i, 1.0*i, & 0.25, i, mod(i,2) ) 10 continue do 20 i=0, numtasks-1 call MPI_SEND(particles, NELEM, particletype, i, tag, & MPI_COMM_WORLD, ierr) 20 continue endif source = 0 call MPI_RECV(p, NELEM, particletype, source, tag, & MPI_COMM_WORLD, stat, ierr) print *, 'rank= ',rank,' p(3)= ',p(3) call MPI_TYPE_FREE(particletype, ierr) call MPI_FINALIZE(ierr) end |
rank= 0 3.00 -3.00 3.00 0.25 3 1 rank= 2 3.00 -3.00 3.00 0.25 3 1 rank= 1 3.00 -3.00 3.00 0.25 3 1 rank= 3 3.00 -3.00 3.00 0.25 3 1
Group and Communicator Management Routines |
Groups vs. Communicators:
Primary Purposes of Group and Communicator Objects:
Programming Considerations and Restrictions:
Create two different process groups for separate collective communications exchange. Requires creating new communicators also.
#include "mpi.h" #include <stdio.h> #define NPROCS 8 int main(argc,argv) int argc; char *argv[]; { int rank, new_rank, sendbuf, recvbuf, numtasks, ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7}; MPI_Group orig_group, new_group; MPI_Comm new_comm; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); if (numtasks != NPROCS) { printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS); MPI_Finalize(); exit(0); } sendbuf = rank; /* Extract the original group handle */ MPI_Comm_group(MPI_COMM_WORLD, &orig_group); /* Divide tasks into two distinct groups based upon rank */ if (rank < NPROCS/2) { MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group); } else { MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group); } /* Create new new communicator and then perform collective communications */ MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm); MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm); MPI_Group_rank (new_group, &new_rank); printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf); MPI_Finalize(); } |
program group include 'mpif.h' integer NPROCS parameter(NPROCS=8) integer rank, new_rank, sendbuf, recvbuf, numtasks integer ranks1(4), ranks2(4), ierr integer orig_group, new_group, new_comm data ranks1 /0, 1, 2, 3/, ranks2 /4, 5, 6, 7/ call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) if (numtasks .ne. NPROCS) then print *, 'Must specify MPROCS= ',NPROCS,' Terminating.' call MPI_FINALIZE(ierr) stop endif sendbuf = rank C Extract the original group handle call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr) C Divide tasks into two distinct groups based upon rank if (rank .lt. NPROCS/2) then call MPI_GROUP_INCL(orig_group, NPROCS/2, ranks1, & new_group, ierr) else call MPI_GROUP_INCL(orig_group, NPROCS/2, ranks2, & new_group, ierr) endif call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group, & new_comm, ierr) call MPI_ALLREDUCE(sendbuf, recvbuf, 1, MPI_INTEGER, & MPI_SUM, new_comm, ierr) call MPI_GROUP_RANK(new_group, new_rank, ierr) print *, 'rank= ',rank,' newrank= ',new_rank,' recvbuf= ', & recvbuf call MPI_FINALIZE(ierr) end |
rank= 7 newrank= 3 recvbuf= 22 rank= 0 newrank= 0 recvbuf= 6 rank= 1 newrank= 1 recvbuf= 6 rank= 2 newrank= 2 recvbuf= 6 rank= 6 newrank= 2 recvbuf= 22 rank= 3 newrank= 3 recvbuf= 6 rank= 4 newrank= 0 recvbuf= 22 rank= 5 newrank= 1 recvbuf= 22
Virtual Topologies |
What Are They?
Why Use Them?
Example:
A simplified mapping of processes into a Cartesian virtual topology appears below:
Create a 4 x 4 Cartesian topology from 16 processors and have each process exchange its rank with four neighbors.
#include "mpi.h" #include <stdio.h> #define SIZE 16 #define UP 0 #define DOWN 1 #define LEFT 2 #define RIGHT 3 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source, dest, outbuf, i, tag=1, inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,}, nbrs[4], dims[2]={4,4}, periods[2]={0,0}, reorder=0, coords[2]; MPI_Request reqs[8]; MPI_Status stats[8]; MPI_Comm cartcomm; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); if (numtasks == SIZE) { MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm); MPI_Comm_rank(cartcomm, &rank); MPI_Cart_coords(cartcomm, rank, 2, coords); MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]); MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]); outbuf = rank; for (i=0; i<4; i++) { dest = nbrs[i]; source = nbrs[i]; MPI_Isend(&outbuf, 1, MPI_INT, dest, tag, MPI_COMM_WORLD, &reqs[i]); MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag, MPI_COMM_WORLD, &reqs[i+4]); } MPI_Waitall(8, reqs, stats); printf("rank= %d coords= %d %d neighbors(u,d,l,r)= %d %d %d %d\n", rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT], nbrs[RIGHT]); printf("rank= %d inbuf(u,d,l,r)= %d %d %d %d\n", rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Finalize(); } |
program cartesian include 'mpif.h' integer SIZE, UP, DOWN, LEFT, RIGHT parameter(SIZE=16) parameter(UP=1) parameter(DOWN=2) parameter(LEFT=3) parameter(RIGHT=4) integer numtasks, rank, source, dest, outbuf, i, tag, ierr, & inbuf(4), nbrs(4), dims(2), coords(2), & stats(MPI_STATUS_SIZE, 8), reqs(8), cartcomm, & periods(2), reorder data inbuf /MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL, & MPI_PROC_NULL/, dims /4,4/, tag /1/, & periods /0,0/, reorder /0/ call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr) if (numtasks .eq. SIZE) then call MPI_CART_CREATE(MPI_COMM_WORLD, 2, dims, periods, reorder, & cartcomm, ierr) call MPI_COMM_RANK(cartcomm, rank, ierr) call MPI_CART_COORDS(cartcomm, rank, 2, coords, ierr) print *,'rank= ',rank,'coords= ',coords call MPI_CART_SHIFT(cartcomm, 0, 1, nbrs(UP), nbrs(DOWN), ierr) call MPI_CART_SHIFT(cartcomm, 1, 1, nbrs(LEFT), nbrs(RIGHT), & ierr) outbuf = rank do i=1,4 dest = nbrs(i) source = nbrs(i) call MPI_ISEND(outbuf, 1, MPI_INTEGER, dest, tag, & MPI_COMM_WORLD, reqs(i), ierr) call MPI_IRECV(inbuf(i), 1, MPI_INTEGER, source, tag, & MPI_COMM_WORLD, reqs(i+4), ierr) enddo call MPI_WAITALL(8, reqs, stats, ierr) print *,'rank= ',rank,' coords= ',coords, & ' neighbors(u,d,l,r)= ',nbrs print *,'rank= ',rank,' ', & ' inbuf(u,d,l,r)= ',inbuf else print *, 'Must specify',SIZE,' processors. Terminating.' endif call MPI_FINALIZE(ierr) end |
rank= 0 coords= 0 0 neighbors(u,d,l,r)= -3 4 -3 1 rank= 0 inbuf(u,d,l,r)= -3 4 -3 1 rank= 1 coords= 0 1 neighbors(u,d,l,r)= -3 5 0 2 rank= 1 inbuf(u,d,l,r)= -3 5 0 2 rank= 2 coords= 0 2 neighbors(u,d,l,r)= -3 6 1 3 rank= 2 inbuf(u,d,l,r)= -3 6 1 3 . . . . . rank= 14 coords= 3 2 neighbors(u,d,l,r)= 10 -3 13 15 rank= 14 inbuf(u,d,l,r)= 10 -3 13 15 rank= 15 coords= 3 3 neighbors(u,d,l,r)= 11 -3 14 -3 rank= 15 inbuf(u,d,l,r)= 11 -3 14 -3
A Brief Word on MPI-2 |
History:
Key Areas of New Functionality:
More Information on MPI-2:
LLNL Specific Information and Recommendations |
MPI Implementations:
Platform | Implementations | Comments |
---|---|---|
IBM AIX | IBM MPI library | Thread-safe |
Intel Linux | Quadrics MPI | Clusters with a switch. Not thread-safe |
MPICH | Clusters without a switch. On-node communications only. Not thread safe. | |
Opteron Linux | MVAPICH | Clusters with a switch. Not thread-safe |
MPICH | Clusters without a switch. On-node communications only. Not thread safe. |
Compiling, Running and More:
This completes the tutorial.
|
Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise. |
Where would you like to go now?
References and More Information |
Appendix A: MPI-1 Routine Index |