7 Repeated Games

Introduction to Game Theory
7. Repeated Games
Dana Nau University of Maryland
Nau: Game Theory 1
Repeated Games
!! Used by game theorists, economists, social and behavioral scientists
as highly simplified models of various real-world situations
Iterated Prisoners Dilemma
Roshambo
Iterated Battle of the Sexes Repeated Ultimatum Game
Iterated Chicken Game
Repeated Stag Hunt
Repeated Matching Pennies Nau: Game Theory 2
Finitely Repeated Games

!! In repeated games, some game G is played
Prisoners Dilemma: 2 1 C D C 3, 3 5, 0 D 0, 5 1, 1
multiple times by the same set of agents

!! G is called the stage game
! Usually (but not always), G is a normal-form game

!! Each occurrence of G is called
an iteration or a round
!! Usually each agent knows what all
the agents did in the previous iterations, but not what theyre doing in the current iteration
!! Thus, an imperfect-information
Iterated Prisoners Dilemma, with 2 iterations: Agent 1: Round 1: Round 2: Total payoff: C D 3+5 = 5 Agent 2: C C 3+0 = 3
game with perfect recall

!! Usually each agents
payoff function is additive
Nau: Game Theory 3
Strategies
!! The repeated game has a much bigger strategy space than the stage game !! One kind of strategy is a stationary strategy:
!! Use the same strategy
at every iteration
!! More generally, an Iterated Prisoners Dilemma with 2 iterations:
agents play at each stage may depend on the history

!! What happened in
previous iterations
Nau: Game Theory 4
Backward Induction
!! If the number of iterations is finite and known, we can use backward
induction to get a subgame-perfect equilibrium

!! Example: finitely many repetitions of the Prisoners Dilemma
!! In the last round,
the dominant strategy is D

!! Thats common knowledge !! So in the 2nd-to-last round,
D also is the dominant strategy

!! !! The SPE is (D,D) on every round
Agent 1: Round 1: Round 2: Round 3: D D D D
Agent 2: D D D D
!! As with the Centipede game,
Round 4: this argument is vulnerable to both empirical and theoretical criticisms
Nau: Game Theory 5
Backward Induction when G is 0-sum

!! As before, backward induction works much better in zero-sum games
!! In the last round, equilibrium is the minimax profile
! Each agent uses his/her minimax strategy

!! Thats common knowledge !! So in the 2nd-to-last round, it
again is the minimax strategies

!! The SPE is (D,D) on every round
Nau: Game Theory 6
Infinitely Repeated Games

!! An infinitely repeated game in extensive form would be an infinite tree
!! Payoffs cant be attached to any terminal nodes !! Payoffs cant be the sums of the payoffs in the stage games (generally infinite)
!! Two common ways around this problem

!! Let r (1)i , r (2)i , be an infinite sequence of payoffs for agent i
!! Agent is average reward is
lim $ j =1 ri( j ) / k
k "#
!! Agent is future discounted reward is the discounted sum of the payoffs, i.e.,
$
!
j ( j) " ri where !! (with 0 ! ! ! 1) is a constant called the discount factor j =1
!! Two ways to interpret the discount factor:
1.! The agent cares more about the preset than the future 2.! The agent cares about the future, but the game ends at any round with probability 1 " !
Nau: Game Theory 7
Example
!! Some well-known strategies for the Iterated Prisoners Dilemma:
! AllC: always cooperate ! AllD (the Hawk strategy):
always defect ! Grim: cooperate until the other agent defects, then defect forever ! Tit-for-Tat (TFT): cooperate on the first move. On the nth move, repeat the other agent (n1)th move ! Tester: defect on move 1. If the other agent retaliates, play TFT. Otherwise, randomly intersperse cooperation and defection
AllC, AllC, Grim, Grim, or TFT or TFT C C C C C C C ... C C C C C C C ...
TFT or Grim AllD C D D D D D D D D D D D D D
TFT Tester C D C C C C C D C C C C C C
!! If the discount factor is large enough, each of the following is a Nash equilibrium
!! (TFT, TFT), (TFT,GRIM), and (GRIM,GRIM)
Nau: Game Theory 8
Equilibrium Payoffs for Repeated Games

!! Theres a folk theorem that tells what the possible equilibrium payoffs
are in repeated games

!! It says roughly the following:
!! In an infinitely repeated game whose stage game is G, there is a
Nash equilibrium whose average payoffs are (p1, p2, , pn) if and only if
!! G has a mixed-strategy profile (s1, s2, , sn) with the following
property:
! For each i, sis payoff would be # pi if the other agents used minimax strategies against i
Nau: Game Theory 9
Proof and Examples

!! The proof proceeds in 2 parts:
!! Use the definitions of minimax and best-
response to show that in every equilibrium, an agents average payoff # the agents minimax value
!! Show how to construct an equilibrium that
Example 1: IPD with (p1, p2) = (3,3) Grim C C C C C D D D Other agent C C C C D C C C
Example 2: IPD with (p1, p2) = (2.5,2.5) Agent 1 Agent 2 D C D C D D D D C D C D D D C D
gives each agent i the average payoff pi, given certain constraints on (p1, p2, , pn) ! In this equilibrium, the agents cycle in lock-step through a sequence of game outcomes that achieve (p1, p2, , pn) ! If any agent i deviates, then the others punish i forever, by playing their minimax strategies against i
!! Theres a large family of such theorems,
for various conditions on the game
Nau: Game Theory 10
Zero-Sum Repeated Games

!! For two-player zero-sum repeated games, the folk theorem is still true, but it
becomes vacuous
!! Suppose we iterate a two-player zero-sum game G
!! Let V be the value of G (from the Minimax Theorem) !! If agent 2 uses a minimax strategy against 1, then 1s maximum payoff is V
! Thus max value for p1 is V, so min value for p2 is V

!! If agent 1 uses a minimax strategy against 2, then 2s maximum payoff is V
! Thus max value for p2 is V, so min value for p1 is V

!! Thus in the iterated game, the only Nash-equilibrium payoff profile is (V,V)
!! The only way to get this is if each agent always plays his/her minimax strategy
! If agent 1 plays a non-minimax strategy s1 and agent 2 plays his/her best response, 2s expected payoff will be higher than V
Nau: Game Theory 11
Roshambo (Rock, Paper, Scissors)

A2 A1 Rock 0, 0 1, 1 1, 1 Paper 1, 1 0, 0 1, 1 Scissors 1, 1 1, 1 0, 0 Rock Paper Scissors
!! Nash equilibrium for the stage game:

!! choose randomly, P=1/3 for each move
!! Nash equilibrium for the repeated game:

!! always choose randomly, P=1/3 for each move
!! Expected payoff = 0 !! Lets see how that works out in practice

Nau: Game Theory 12
Roshambo (Rock, Paper, Scissors)

A2 A1 Rock 0, 0 1, 1 1, 1 Paper 1, 1 0, 0 1, 1 Scissors 1, 1 1, 1 0, 0 Rock Paper Scissors
!! 1999 international roshambo programming competition
www.cs.ualberta.ca/~darse/rsbpc1.html
!! Round-robin tournament:
! 55 programs, 1000 iterations for each pair of programs ! Lowest possible score = 55000, highest possible score = 55000
!! Average over 25 tournaments:
! Highest score (Iocaine Powder): 13038 ! Lowest score (Cheesebot): 36006

!! Very different from the game-theoretic prediction
Nau: Game Theory 13
!! A Nash equilibrium strategy is best for you
if the other agents also use their Nash equilibrium strategies

!! In many cases, the other agents wont use Nash equilibrium strategies
!! If you can forecast their actions accurately, you may be able to do
much better than the Nash equilibrium strategy

!! Why wont the other agents use their Nash equilibrium strategies?
!! Because they may be trying to forecast your actions too
!! Something analogous can happen in non-zero-sum games
Nau: Game Theory 14
Iterated Prisoners Dilemma

Prisoners Dilemma
!! Multiple iterations of the Prisoners Dilemma
P1
!! Widely used to study the emergence of
P2
Cooperate
Defect
Cooperate Defect cooperative behavior among agents

!! e.g., Axelrod (1984), The Evolution of Cooperation
3, 3 5, 0
0, 5 1, 1
Nash equilibrium
!! Axelrod ran a famous set of tournaments

!! People contributed strategies
encoded as computer programs !! Axelrod played them against each other
If I defect now, he might punish me by defecting next time
Nau: Game Theory 15
TFT with Other Agents

!! In Axelrods tournaments, TFT usually did best
! It could establish and maintain cooperations with many other agents ! It could prevent malicious agents from taking advantage of it
TFT AllC C C C C C C C ... C C C C C C C ...
TFT AllD C D D D D D D D D D D D D D
TFT Grim C C C C C C C C C C C C C C
TFT TFT C C C C C C C C C C C C C C
TFT Tester C D C C C C C D C C C C C C
Nau: Game Theory 16
Example:
!! A real-world example of the IPD, described in Axelrods book:
!! World War I trench warfare
!! Incentive to cooperate:
!! If I attack the other side, then theyll retaliate and Ill get hurt !! If I dont attack, maybe they wont either
!! Result: evolution of cooperation

!! Although the two infantries were supposed to be enemies, they
avoided attacking each other
Nau: Game Theory 17
IPD with Noise

!! In noisy environments,
!! Theres a nonzero probability (e.g., 10%)
Did he really intend to do that?
that a noise gremlin will change some of the actions ! Cooperate (C) becomes Defect (D), and vice versa !! Can use this to model accidents !! Compute the score using the changed action !! Can also model misinterpretations !! Compute the score using the original action
C C C
C C D C
Noise
Nau: Game Theory 18
Example of Noise
!! Story from a British army officer in World War I:

!! I was having tea with A Company when we heard a lot of shouting and went
out to investigate. We found our men and the Germans standing on their respective parapets. Suddenly a salvo arrived but did no damage. Naturally both sides got down and our men started swearing at the Germans, when all at once a brave German got onto his parapet and shouted out: We are very sorry about that; we hope no one was hurt. It is not our fault. It is that damned Prussian artillery.
!! The salvo wasnt the German infantrys intention
!! They didnt expect it nor desire it
Nau: Game Theory 19
Noise Makes it Difficult to Maintain Cooperation
!! Consider two agents
who both use TFT !! One accident or misinterpretation can cause a long string of retaliations
Retaliation
Retaliation
C C C C D C D C ...
C C C D C C D C D ...
Noise"
Retaliation
Retaliation
Nau: Game Theory 20
Some Strategies for the Noisy IPD

!! Principle: be more forgiving in the face of defections !! Tit-For-Two-Tats (TFTT)
! Retaliate only if the other agent defects twice in a row

! Can tolerate isolated instances of defections, but susceptible to exploitation of its generosity ! Beaten by the TESTER strategy I described earlier !! Generous Tit-For-Tat (GTFT) ! Forgive randomly: small probability of cooperation if the other agent defects ! Better than TFTT at avoiding exploitation, but worse at maintaining cooperation !! Pavlov ! Win-Stay, Lose-Shift ! Repeat previous move if I earn 3 or 5 points in the previous iteration ! Reverse previous move if I earn 0 or 1 points in the previous iteration ! Thus if the other agent defects continuously, Pavlov will alternatively cooperate and defect
Nau: Game Theory 21
Discussion
!! The British army officers story:
!! a German shouted, ``We are very sorry about that; we hope no one was
hurt. It is not our fault. It is that damned Prussian artillery.

!! The apology avoided a conflict
!! It was convincing because it was consistent with the German infantrys
past behavior
!! The British had ample evidence that the German infantry wanted to
keep the peace

!! If you can tell which actions are affected by noise, you can avoid reacting
to the noise
!! IPD agents often behave deterministically
!! For others to cooperate with you it helps if youre predictable
!! This makes it feasible to build a model from observed behavior

Nau: Game Theory 22
The DBS Agent

!! Work by my recent PhD graduate, Tsz-Chiu Au
!! Now a postdoc at University of Texas
!! From the other agents recent behavior, build a model $ of the other
agents strategy
!! Use the model to filter noise !! Use the model to help plan our next move
Au & Nau. Accident or intention: That is the question (in the iterated prisoners dilemma). AAMAS, 2006. Au & Nau. Is it accidental or intentional? A symbolic approach to the noisy iterated prisoners dilemma. In G. Kendall (ed.), The Iterated Prisoners Dilemma: 20 Years On. World Scientific, 2007.
Nau: Game Theory 23
Modeling the other agent

!! A set of rules of the following form
if our last move was m and their last move was m' then P[their next move will be C]
!! Four rules: one for each of (C,C), (C,D), (D,C), and (D,D)
!! For example, TFT can be described as

!! (C,C)
1, (C, D)
1, (D, C )
0, (D, D)
!! How to get the probabilities?

!! One way: look at the agents behavior in the recent past
!! During the last k iterations,

!! What fraction of the time did the other agent cooperate at iteration j
when the agents moves were (x,y) at iteration j1?
Nau: Game Theory 24
Modeling the other agent

!! $ can only model a very small set of strategies !! It doesnt even model the Grim strategy correctly:
!! If Grim defects, it may be defecting because of something that
happened many moves ago

!! But were not trying to model an agents entire strategy, just its recent
behavior
!! If an agents behavior changes, then the probabilities in $ will change
!! e.g., after Grim defects a few times, the rules will give a very low
probability of it cooperating again
Nau: Game Theory 25
Noise Filtering
!! Suppose the applicable rule is
deterministic
!! P[their next move will be C] = 0 or 1
!! If the other agents next move
isnt what the rule predicts, then

!! Assume the
observed action is noise

!! Behave as if the
The other agent cooperates when I do So I wont retaliate here. I think these defections are actually noise
action were what the rule predicted
C C C C C C C D C C C C D C C C C C : :
Nau: Game Theory 26
Change of Behavior
!! Anomalies in observed behavior can be due
I am Grim. If you ever betray me, I will never forgive you.
either to noise or to a genuine change of behavior

!! Changes of behavior occur because
!! The other agent can change its strategy
anytime
!! E.g., if noise affects one of Agent 1s
C C CD C C C : : :
C C C D D D D D :
Nau: Game Theory 27
actions, this may trigger a change in Agent 2s behavior ! Agent 1 does not know this
!! How to distinguish noise from a real change
of behavior?
These moves are not noise
Detection of a Change of Behavior

Temporary tolerance:
!! When we observe unexpected
behavior from the other agent

!! Dont immediately decide
whether its noise or a real change of behavior

!! Instead, defer judgment
The other agent cooperates when I do The defections might be accidents, so I shouldnt lose my temper too soon I think the other agents has really changed, so Ill change mine too
for a few iterations

!! If the anomaly persists, then
recompute $ based on the other agents recent behavior
C C C C C C D D :
C C C D D D D D :
Nau: Game Theory 28
Move generation
!! Modified version of game-tree search
!! Use the policy $ to predict probabilities of the other agents moves !! Compute expected utility) for move x as
u1(x) = ! y"{C,D} u1(x,y) # P(y | ", previous moves) where x = my move, y = other agents move
!! Choose the move with the highest expected utility
Current Iteration Next Iteration
(C,C)
(C,D) (D,C)
(D,D)
Iteration after next

Nau: Game Theory 29
Suppose we have the rules 1. (C,C) % 0.7 2. (C,D) % 0.4 3. (D,C) % 0.1 4. (D,D) % 0.1
(C,C)
Example
(C,D) (D,C)
(D,D)
C C D C
C D C C ??
!! Suppose we search to depth 1
u1(C) = 0.7 u1(C,C) + 0.3 u1(C,D) = 2.1 + 0 = 2.1 u1(D) = 0.7 u1(D,C) + 0.3 u1(D,D) = 3.5 + 0.3 = 3.8
! So D looks better
!! Is D really what we should choose?
Rule 1 predicts P(C) = 0.7, P(D) = 0.3

Nau: Game Theory 30
Suppose we have the rules 1. (C,C) % 0.7 2. (C,D) % 0.4 3. (D,C) % 0.1 4. (D,D) % 0.1
(C,C)
Example
(C,D) (D,C)
(D,D)
C C D C
C D C C ??
!! Its not wise to choose D
retaliate with P=0.9 ! The depth-1 search didnt see this !! But if we search to depth d>1, well see it !! C will look better and well choose it instead
!! In general, its best look far ahead
! On the move after that, the opponent will
Rule 1 predicts P(C) = 0.7, P(D) = 0.3
! e.g., 60 moves
Nau: Game Theory 31
How to Search Deeper

!! Game trees grow exponentially with search depth
! How to search to the tree deeply?

!! Key assumption: $ accurately models the other agents future behavior !! Then we can use dynamic programming
! Makes the search polynomial in the search depth ! Can easily search to depth 60 ! Equivalent to solving an acyclic MDP of depth 60
!! This generates fairly good moves
Current iteration (D,D) Next iteration
(C,C)
(C,D) (D,C)
iteration after next : : : :

Nau: Game Theory 32
20th Anniversary IPD Competition

http://www.prisoners-dilemma.com
!! Category 2: IPD with noise

!! 165 programs participated
!! DBS dominated the
top 10 places
!! Two agents scored
higher than DBS

!! They both used
master-and-slaves strategies
Nau: Game Theory 33
Master & Slaves Strategy

!! Each participant could submit up to 20 programs !! Some submitted programs that could recognize each other
!! (by communicating pre-arranged sequences of Cs and Ds)
!! The 20 programs worked as a team
! 1 master, 19 slaves
!! When a slave plays with its master
! Slave cooperates, master defects => maximizes the masters payoff

!! When a slave plays with
My goons give me all their money
an agent not in its team ! It defects => minimizes the other agents payoff
and they beat up everyone else
Nau: Game Theory 34
Comparison
!! Analysis
!! Each master-slaves teams average score was much lower than DBSs !! If BWIN and IMM01 had each been restricted to ! 10 slaves,
DBS would have placed 1st

!! Without any slaves, BWIN and IMM01 would have done badly
!! In contrast, DBS had no slaves

!! DBS established cooperation
with many other agents

!! DBS did this despite the noise,
because it filtered out the noise
Nau: Game Theory 35
Summary
!! Finitely repeated games backward induction !! Infinitely repeated games
!! average reward, future discounted reward !! equilibrium payoffs
!! Non-equilibrium strategies
!! opponent modeling in roshambo !! iterated prisoners dilemma with noise
! opponent models based on observed behavior ! detection and removal of noise ! game-tree search against the opponent model
!! 20th anniversary IPD competition
Nau: Game Theory 36

7 Repeated Games

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

7 Repeated Games

Uploaded by

Copyright:

Available Formats

Introduction to Game Theory

Nau: Game Theory 1

as highly simplified models of various real-world situations

Iterated Prisoners Dilemma

Iterated Battle of the Sexes Repeated Ultimatum Game

Iterated Chicken Game

Repeated Stag Hunt

Repeated Matching Pennies Nau: Game Theory 2

Finitely Repeated Games

multiple times by the same set of agents

! Usually (but not always), G is a normal-form game

game with perfect recall

payoff function is additive

Nau: Game Theory 3

agents play at each stage may depend on the history

Nau: Game Theory 4

induction to get a subgame-perfect equilibrium

the dominant strategy is D

D also is the dominant strategy

Agent 1: Round 1: Round 2: Round 3: D D D D

!! As with the Centipede game,

Round 4: this argument is vulnerable to both empirical and theoretical criticisms

Nau: Game Theory 5

Backward Induction when G is 0-sum

! Each agent uses his/her minimax strategy

again is the minimax strategies

Nau: Game Theory 6

Infinitely Repeated Games

!! Two common ways around this problem

!! Agent is average reward is

j ( j) " ri where !! (with 0 ! ! ! 1) is a constant called the discount factor j =1

!! Two ways to interpret the discount factor:

! AllC: always cooperate ! AllD (the Hawk strategy):

AllC, AllC, Grim, Grim, or TFT or TFT C C C C C C C ... C C C C C C C ...

TFT or Grim AllD C D D D D D D D D D D D D D

Equilibrium Payoffs for Repeated Games

are in repeated games

Nau: Game Theory 9

Proof and Examples

Example 1: IPD with (p1, p2) = (3,3) Grim C C C C C D D D Other agent C C C C D C C C

Example 2: IPD with (p1, p2) = (2.5,2.5) Agent 1 Agent 2 D C D C D D D D C D C D D D C D

for various conditions on the game

Nau: Game Theory 10

Zero-Sum Repeated Games

! Thus max value for p1 is V, so min value for p2 is V

! Thus max value for p2 is V, so min value for p1 is V

Nau: Game Theory 11

Roshambo (Rock, Paper, Scissors)

!! Nash equilibrium for the stage game:

!! Nash equilibrium for the repeated game:

!! Expected payoff = 0 !! Lets see how that works out in practice

Roshambo (Rock, Paper, Scissors)

!! 1999 international roshambo programming competition

! Highest score (Iocaine Powder): 13038 ! Lowest score (Cheesebot): 36006

!! A Nash equilibrium strategy is best for you

if the other agents also use their Nash equilibrium strategies

much better than the Nash equilibrium strategy

!! Something analogous can happen in non-zero-sum games

Nau: Game Theory 14

Iterated Prisoners Dilemma

Cooperate Defect cooperative behavior among agents

!! Axelrod ran a famous set of tournaments

encoded as computer programs !! Axelrod played them against each other

If I defect now, he might punish me by defecting next time