Professional Documents
Culture Documents
Module 3
Distributed Memory
Programming with
MPI
# Chapte
er Subtitle
Roadmap
Hello World!
(a classic)
5
Compilation
wrapper script to compile
source file
mpicc -g
g -Wall
Wall -o
o mpi_hello
mpi hello mpi_hello.c
mpi hello c
produce
debugging
information
create
t this
thi executable
t bl fil
file name
(as opposed to default a.out)
turns on all warnings
Execution
mpiexec -n
n <number of processes> <executable>
mpiexec -n 1 ./mpi_hello
run with
ith 1 process
mpiexec -n 4 ./mpi_hello
run with 4 processes
Execution
mpiexec -n 1 ./mpi_hello
Greetings from process 0 of 1 !
mpiexec -n 4 ./mpi_hello
Greetings from process 0 of 4 !
Greetings from process 1 of 4 !
Greetings from process 2 of 4 !
Greetings from process 3 of 4 !
10
MPI Programs
Written in C.
Has main.
Uses stdio.h, string.h, etc.
MPI Components
MPI_Init
MPI Finalize
MPI_Finalize
12
Basic Outline
13
Communicators
Called MPI_COMM_WORLD.
14
Communicators
my rank
(the process making this call)
15
SPMD
Single-Program Multiple-Data
We compile one program.
program
Process 0 does something different.
16
Communication
17
Data types
18
Communication
19
Message matching
r
MPI_Send
src = q
MPI_Recv
dest = r
20
Receiving messages
21
status_p argument
MPI Status*
MPI_Status
MPI_Status* status;
status.MPI_SOURCE
t t MPI SOURCE
status.MPI_TAG
MPI_SOURCE
MPI
SOURCE
MPI_TAG
MPI_ERROR
22
23
24
25
26
27
One trapezoid
28
29
30
Parallel pseudo-code
31
32
33
34
35
36
unpredictable output
37
Input
38
39
COLLECTIVE
COMMUNICATION
40
Tree-structured communication
1. In the first phase:
(a) Process 1 sends to 0, 3 sends to 2, 5 sends to 4, and
7 sends to 6.
(b) Processes 0, 2, 4, and 6 add in the received values.
((c)) Processes 2 and 6 send their new values to
processes 0 and 4, respectively.
(d) Processes 0 and 4 add the received values into their
new values
values.
( ) Process 4 sends its newest value to process 0.
2. (a)
(b) Process 0 adds the received value to its newest
value.
41
42
An alternative tree-structured
global sum
43
MPI_Reduce
44
45
The output_data_p
output data p argument is only used
on dest_process.
However, all of the processes still need to
pass in an actual argument corresponding
to output_data_p, even if its just NULL.
48
Example (1)
50
Example (2)
51
Example (3)
52
MPI_Allreduce
53
A global
l b l sum ffollowed
ll
d
by distribution of the
result.
54
55
Broadcast
56
A tree-structured broadcast.
57
58
Data distributions
59
60
61
Partitioning options
Block partitioning
Cyclic partitioning
Bl k
Block-cyclic
li partitioning
titi i
62
Parallel implementation of
vector addition
63
Scatter
64
65
Gather
66
67
68
Allgather
69
Matrix-vector multiplication
i-th component of y
70
Matrix-vector multiplication
71
Serial pseudo-code
pseudo code
72
C style arrays
stored as
73
Serial matrix-vector
multiplication
74
An MPI matrix-vector
multiplication function (1)
75
An MPI matrix-vector
multiplication function (2)
76
77
Derived datatypes
78
Derived datatypes
79
MPI_Type create_struct
80
MPI_Get_address
81
MPI_Type_commit
82
MPI_Type_free
83
84
85
86
PERFORMANCE EVALUATION
87
88
89
90
MPI_Barrier
91
MPI_Barrier
92
(Seconds)
93
Speedup
94
Efficiency
95
96
97
Scalability
98
Scalability
A PARALLEL SORTING
ALGORITHM
100
Sorting
102
A sequence of phases.
Even phases
phases, compare swaps:
103
Example
Start: 5, 9, 4, 3
Even phase: compare-swap
compare swap (5
(5,9)
9) and (4
(4,3)
3)
getting the list 5, 9, 3, 4
Odd phase: compare
compare-swap
swap (9
(9,3)
3)
getting the list 5, 3, 9, 4
E
Even
phase:
h
compare-swap (5,3)
(5 3) and
d (9
(9,4)
4)
getting the list 3, 5, 4, 9
Odd phase: compare-swap (5,4)
getting the list 3, 4, 5, 9
104
105
106
107
Pseudo-code
108
Compute_partner
109
110
111
113
MPI_Ssend
114
Restructuring communication
115
MPI_Sendrecv
MPI_Sendrecv
117
118
119
120
121
122
123
124