Professional Documents
Culture Documents
Distributed Systems
References
1. Distributed Snapshots: Determining Global States of
Distributed Systems, K. Mani Chandy and Leslie Lamport,
ACM Transactions on Computer Systems, vol 3, no 1, Feb85.
2. PUBLISHING: A Reliable Broadcast Communication
Mechanism, Michael L. Powell and David L. Presotto,
Proceedings of the Ninth ACM Symposium on Operating
Systems Principles, Oct 83.
3. Consistent Global States of Distributed Systems: Fundamental
Concepts and Mechanisms, Ozalp Babaoglu and Keith
Marzullo, Distributed Systems, Sape J. Mullender, AddisonWesley, 1993.
Global State Detection
Model of Computation
Finite set of processes
Process send messages on a finite set of
unidirectional channels
Channels are error free, FIFO and have infinite
buffers
Messages experience arbitrary but finite delays
Strongly connected network
Sp0
Sq
Sp1
Sp2
Sq
Sp3
Sq
Sq3
4
e`
e`
Transitive
if e
e` and e`
e`` then e
e``
More on States
process state
memory state + register state + signal masks + open
files + kernel buffers +
Or
application specific info like transactions completed,
functions executed etc,.
channel state
Messages in transit i.e. those messages that have
been sent but not yet received
Global State Detection
e.g.
distributed deadlock detection is finding a cycle in the
Wait For Graph.
Termination detection
Checkpointing
many more..
Global State Detection
Difficulties
Instantaneous recording not possible
No global clock : Distributed recording of local states
cannot be synchronized based on time
Random Network Delays : No centralized process can
initiate the detection
10
Non-Deterministic Computation
At any point in computation there can be more than one
event that can happen next.
Global State Detection
11
Producer code:
Consumer code:
while (1)
while (1)
{
produce m;
send m;
wait for ack;
recv m;
consume m;
send ack;
}
12
13
Example
14
Example
15
Example
16
Example
17
Example
18
19
Non-deterministic computation
3 processes
p
q
r
m1
m2
m3
Global State Detection
20
m1
p
m3
q
m2
r
p
q
r
m1
m1
m2
m3
m3
m2
Global State Detection
21
A Non-Deterministic Computation
22
23
A Non-Deterministic Computation
24
Non-Determinism
Deterministic computation
A local event would reveal everything about the
global state!
The process will know other process state
25
26
Example
Producer Consumer problem
p records its state
p
27
Example
q
m
28
Example
q records its state
p
q
m
29
Example
The recorded state
p
30
31
Error!!
The sender has no record of the sending
The receiver has the record of the receipt
Result
Global state has record of the receive event but
no send event violating the happened before
concept!!
32
33
An Example
q
Sp0
Sp1
Sp2
Sp3
m2
m1
q
Sq0
m3
Sq1
Sq2
Global State Detection
Sq3
34
A Consistent State?
Sp0
Sp1
Sq1
Sp1
Sp2
Sp3
m2
m1
q
Sq0
m3
Sq1
Sq2
Global State Detection
Sq3
35
Yes
Sp0
Sp1
Sq1
Sp1
Sp2
Sp3
m2
m1
q
Sq0
m3
Sq1
Sq2
Global State Detection
Sq3
36
A Consistent State?
Sp0
Sp2
Sq3
Sp1
m3
Sp2
Sp3
m2
m1
q
Sq0
m3
Sq1
Sq2
Global State Detection
Sq3
37
Yes
Sp0
Sp2
Sq3
Sp1
m3
Sp2
Sp3
m2
m3
m1
q
Sq0
Sq1
Sq2
Global State Detection
Sq3
38
An inconsistent State
Sp0
Sp1
Sq3
Sp1
Sp2
Sp3
m2
m1
q
Sq0
m3
Sq1
Sq2
Global State Detection
Sq3
39
40
41
Algorithm in Action
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
42
Algorithm in Action
q records state as Sq1 , sends marker to p
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
43
Algorithm in Action
p records state as Sp2, channel state as empty
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
44
Algorithm in Action
q records channel state as m3
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
45
Algorithm in Action
Recorded Global State = ((Sp2, Sq1), (0,m3) )
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
46
m
q
47
Algorithm in Action
Recorded Global State = ((Sp2, Sq1), (0,m3) )
Sp0
Sp1
m1
Sq0
Sp2
m2
Sq1
Sp3
m3
Sq2
Sq3
48
49
50
S* Is reachable from Si
Si
Sj
Global State Detection
51
Sj Is reachable from S*
Si
Sj
Global State Detection
52
( S ) ( S `)
Eg: Deadlock, Termination, Token loss
53
Stable Properties
Si
S*
Sj
Global State Detection
54
Stable Properties
Si
S*
Sj
Global State Detection
55
56
Checkpointing
S* serves as a
checkpoint
On a failure, restart the
computation from S*
Si
Problem!
Not able to restore
to Sj
Global State Detection
S*
Sj
57
Solution: Publishing
A Broadcast medium
A central recorder process records all the
messages received by each process
Processes record their states at their own
time and send it to the recorder
58
Architecture of Publishing
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp1
Sq1
59
Sp1
Sq1
Sp1
Sq2
1
Global State Detection
60
p sends an ack
recorder records m1
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq2
m1
1
Global State Detection
61
Plus
Messages recd since last checkpoint
62
Problems
Publishing keeps track of all messages
received by each process
Expensive!
Solution
recorder takes checkpoint of process p at time t
deletes all messages recd by p before t.
63
p checkpoints
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq2
m1
1
Global State Detection
64
recorder
STATE SENT MSGS
ID
RECD
p
Sp2
Sq1
Sp2
Sq2
1
Global State Detection
65
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq2
m1
1
Global State Detection
66
Say p crashes
recorder
Sq2
Sp1
Sq1
m1
1
Global State Detection
67
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp1
Sq2
m1
1
Global State Detection
68
Replays back m1
m1
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq2
m1
1
Global State Detection
69
q crashes
recorder
Sp2
Sp1
Sq1
m1
1
Global State Detection
70
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq1
m1
1
Global State Detection
71
Ignore m1
m1
recorder
STATE SENT MSGS
ID
RECD
p
Sp1
Sq1
Sp2
Sq1
m1
1
Global State Detection
72
Comparison
SNAPSHOT PUBLISHING
Network
Strongly
connected
Need not be
Mode
Distributed
Centralized
Scalability
Yes
No
Restorability
No
Yes
73
Summary
Global State detection difficult in
Distributed Systems
Snapshot algorithm may not give an actual
state but is very helpful in detecting Stable
Properties
Publishing gives an asynchronous way of
determining global states but is unscalable
Global State Detection
74