Modeling Poker - 3part

1.
Introduction
This is Part 3 of the article series "Modeling Poker" where we'll study poker theory analytically using mathematical modeling and simplified poker games (so-called toy games).
In Part 1 and Part 2 we solved the following version of the AQK game over 1/2 street with fixedlimit betting structure:
Rules for the AKQ game: We have two players: Alice (out of position) and Bob (in position) At the start of the game both players puts a 1 bb ante into the pot Both players get dealt one random card from the AKQ deck Alice checks "in the dark" Bob can now check and see a showdown, or he bets 1 bb If Bob bets, Alice can fold, or she can call and see a showdown When the betting round is over, and nobody has folded, the highest card wins in a showdown
We then solved the game under the assumption that both players are playing perfectly. We deduced the following optimal strategies for Alice and Bob: Alice Always check-calls A Check-calls K 1/3 of the time Always check-folds Q
Bob Always bets A for value Always checks behind with K Buffs Q 1/3 of the time
In Part 3 we'll continue to work on this game. This time we'll experiment with the solution found previously and see how Alice and Bob can increase their EV if they play against an opponent that makes mistakes. If one of them deviates from optimal play, the other can increase EV by also deviating from optimal play. We then move from optimal play to exploitive play.
2. Using exploitive strategies in the AQK game

In Part 1 and Part 2 we found optimal strategies for both Alice and Bob, and we calculated the value of the game for each of them. We found that Bob made +1/18 bb per game, which is 100/18 =5.56 bb/100 using standard poker jargon. Since the game is heads-up with no rake, it's a zero sum game. Alice's win rate is then the negative of Bob's win rate, namely -5.56 bb/100. Both Alice and Bob are playing optimally. An optimal strategy is what we get when we're trying to play perfectly against an opponent who is also trying to play perfectly against us. We can imagine an evolutionary process where each player continually adjusts to the other player to exploit his or her mistakes maximally. At the same time each player is trying to defend against the other player's attempts to exploit. The process converges towards a state where neither player can do anything to increase his EV for the game. Any attempt to do so will create an opportunity for the other player to increase his EV even more. We then end up with an optimal strategy pair. The pair is made up of two optimal strategies, one for each player. If one player deviates from his optimal strategy, the other player can increase EV by adapting to this deviation. Thus, neither player has any incentive to deviate from optimal play, since: 1. They can't profit from it, if their opponent continues to play optimally. 2. They risk losing, since their opponent has the option to adjust to exploit them
Therefore, deviating from optimal strategy for no reason is a "negative freeroll". In the best case scenario our EV stays the same (if our opponent continues to play optimally). In the worst case our EV drops (if he deviates from optimal play to exploit our deviation from optimal play).
Now we'll let Alice and Bob make various mistakes in the AKQ game, and then we'll see how the other player can increase EV by adjusting to exploit the mistakes. To measure the effects of the adjustments, we'll look at Bob's EV in the game. We'll use an Excel spreadsheet to calculate Bob's EV as a function of the strategies employed by him and Alice:
Link for downloading the spreadsheet (right click and choose "Save as .."): AKQ-game.xls We have used the following abbreviations: - c/c =check-call - c/f =check-fold We now simply fill inn the percentages for Alice's and Bob's folding, checking, calling and betting with each hand (the top part of the spreadsheet). Then we get Bob's EV calculated for us (bottom part). we also get the individual EV contributions from each of the 6 possible scenarios that can occur (Alice has A/Bob has K, Alice has A/Bob has Q, etc). The percentages that are most interesting for us to tweak (how often Bob bluffs with Q and how often Alice calls with K) are marked with red. In the picture above we have plugged in the optimal strategies for both players. As expected, we we find Bob's EV to be +5.56 bb/100. Now we'll introduce systematic mistakes (=leaks) for both Alice and Bob, where we define a mistake as any deviation from optimal play. Then we'll let the other player adjust to exploit the mistakes.
2.1 Exploitive adjustment: Betting against a calling station

Assume Alice is a calling station. She of course folds her pure "air" (the worthless hand Q), but she now always calls with the bluffcatcher K. Let's first see what happens when Bob continues to play optimally:
Alice always calls with K, but Bob's EV stays the same. Is this unexpected? No, since the definition of an optimal strategy is that our opponent can not change our EV by changing his or her strategy. Alice now attempts to increase her EV by calling more. But since Bob is valuebetting and bluffing with perfect balance, she achieves nothing and the EV does not change. Alice is now making a mistake. However, Bob does not profit from this mistake if he continues to play optimally. But Alice has created a leak in her game that Bob can attack. So what should he do? In real poker there are two intuitively obvious adjustments to make against an unbluffable calling station: - Bet more hands for value - Bluff less Here Bob doesn't have any more value hands to use (the only hand he can bet profitably is A), so his adjustment must be to decrease his bluffing percentage with Q. Since Alice always calls Bob's bluffs, it's obviously correct for Bob to never bluff at all. He is getting 2 : 1 in pot-odds on a bluff, but he always gets called (Alice can have A or K when Bob has Q, and she always calls with both of them). So bluffing becomes pointless, and we decrease Bob's bet% with Q to 0%. :
Bob's EV now increases from 5.56 bb/100 to 16.67 bb/100. This is an increase of almost 300%!. This illustrates an important principle for play against loose opponents in all forms of poker: You can make a lot of money playing opponents with loose calling standards. But you have to be willing to "gear down" and drop most (if not all) of your bluffs. A lot of the extra profit you make against these players comes from the fact that you don't have to bluff them to get your strong hands paid off. Bluffing will just cost you money. You can also profit from betting more hands for value in real poker. The same principle goes for these hands: You'll get paid when you bet them for value, even if you never bluff. Bob her bets the same hands (A) as before, since he doesn't have additional value hands to use. But he gets paid more every time, and he can drop bluffing from his strategy, since Alice is unbluffable. When Alice is willing to pay off Bob's valuebets every time she has a bluffcatcher, Bob's EV triples, even if his only adjustment is to stop bluffing. However, against a loose player who is observant and capable of adjusting, Bob's adjustment opens himself up for getting counter-exploited. If Alice realizes that Bob now only bets A for value and never bluffs, she can exploit him back by never calling with K. Bob then loses EV relative to optimal play, and this is the next scenario we'll look at:
2.2 Exploitive adjustment: Betting against a nit

Let's go one step further with the scenario from the previous example and assume that Alice has adjusted to Bob's lack of bluffing, and she now folds K every time. :
Bob's EV now drops to 0. He only bets A and checks down K and Q. Since Alice never calls when Bob bets, Bob could just as well have checked A also. The net result of both player's adjustment is that no money changes hands as a result of betting, and the outcome becomes similar to both players checking to see who wins. And then the game becomes symmetrical, and neither player can win money. We remember that this process started with Bob dropping his bluffs to exploit Alice's always-call strategy with her bluffcatchers. Then Alice adjusted to this adjustment by always folding her bluffcatchers. Now Bob is the one who gets exploited, since his EV drops from 5.56 bb/100 to 0. So which player is making the mistake here? If we use game theory optimal though processes, this question becomes unimportant. What matters is which player is getting exploited at the moment. It began with Alice making a mistake, and then Bob adjusted. Alice then adjusted to Bob's adjustment, and his adjustment ended up costing him money. Both players are now making mistakes (since we define a mistake as any deviation from optimal play), but in the end Bob is the one getting exploited. Note that Alice's adjustment to Bob's lack of bluffing transforms her from a calling station to a nit. A nit plays too tight, and we cane exploit a nit by bluffing a lot. When Alice changes from a calling station to a nit, Bob can make another adjustment and begin bluffing again. Since Alice always folds her bluffcatchers, it's obvious that Bob maximizes his EV by bluffing at every opportunity:
And Bob is back at +16.67 bb/100 again, which is what he had when he never bluffed against the calling station-version of Alice. We have another situation where both players are making mistakes, but this time it's Alice who ends up getting exploited. We see that the process of exploitation/counter-exploitation sends both players on a journey of "strategic ping-pong" with sudden and extreme strategy shifts. When two observant players are trying to exploit each other aggressively, this can happen. Both players are using reads and previous history to predict how their opponent will play right now. Then they are both trying to stay one step ahead of the other. When you are playing against a weak opponent with systematic leaks, you should adjust to exploit this. For example, Bob can triple his win rate by never bluffing against a calling station in the AKQ game, or by always bluffing against a nit. If your opponent is unaware of what you are doing, you don't have to worry about getting counter-exploited. But against an observant opponent, your exploitive adjustment opens you up for counter-exploitation. Bob therefore has a decision to make in the AKQ game when he wants to take advantage of Alice's leaks: He can continue to play optimally and take his guaranteed profit of 5.56 bb/100 He can deviate from optimal play himself and hope Alice is not aware enough to punish him for it
In practice, when we play real poker we're often making moderate adjustments to hide the fact that we're exploiting a leak. Let's say you're playing NLHE 6-max. You're on the button with two very tight players in the blinds. They are so tight that you believe you can open any two cards profitably. But if you think they will catch on and fight back more, you might be better of in the long run showing some moderation.
Your task is then to find a sweet spot that balances your desire to steal a lot with your desire to keep your opponent in a tight state where you can continue to steal a lot. So on the long run it might be better for you to steal something like 80%, and fold the absolute garbage hands like 83o, 62o, etc. They might be profitable opening hands in isolation, here and now, but if you open 100% of your hands, your profit from stealing on the button might decrease in the long run. That said, be aware that if we want to exploit a non-optimal strategy maximally, we have to make extreme adjustments. Even if the mistake we're exploiting isn't extreme. We'll now illustrate this graphically:
2.3 Exploiting a non-extreme leak using a non-extreme adjustment

Now we'll illustrate some important concepts: Even if our opponent's leak isn't extreme, we can always by deviating from optimal play When we want to exploit an oppontent's mistake maximally, we have to use an extreme adjustment However, against an observant opponent, it might be more profitable in the long run to make a moderate adjustment that hides our strategy change from our opponent
By "extreme" we here mean a maximum deviation from optimal play in one direction or other. We have seen examples of this previously when we let Alice play like a calling station and always calling with K. Bob then made an extreme strategy change by dropping all bluffs. This was obviously his most profitable adjustment, since his bluffs had zero chance of success. Now we'll let Alice have smaller leaks and we'll show that Bob's most profitable adjustment is still the most extreme adjustment:
2.4 Exploitive adjustment: Exploiting a moderate leak maximally

Now we let Alice have calling station tendencies, but not to the extreme. In the optimal strategy she is supposed to call 33% with L. Let's now assume she plays just a tad looser that that, calling 40%. This is per our definition a leak, but not an extreme leak. How should Bob adjust to exploit this moderate leak? Beginners sometimes misunderstand this problem and believe that the best adjustment to exploit a moderate leak is to make a moderate adjustment. This is wrong. The best way to exploit any leak, but or small, is to move as far away from optimal play as possible. This of course presupposes that our leaky opponent (or other players at the table) doesn't quickly adjust to counter-exploit us.
We'll show this graphically. The strategy component Bob tweaks against an Alice who calls a it too much is his bluffing percentage with Q. We start with 0%, gradually adjusts up to 100% bluffing, and plot the resulting EV graph:
Bob's EV decreases linearly as a function of his bluffing percentage. If he plays optimally (marked on the graph) and bluffs 33% he makes 5.56 bb/100. If he bluffs 0% he makes 6.67 bb/100 (we can calculate this by plugging Alice's call% =40% and Bob's bluff% =0% into the spreadsheet we used earlier). And if he increases his bluffing to 100%, his EV drops to 3.33 bb/100. Bob's best strategy against an Alice with moderate calling station tendencies is therefore to deviate maximally from optimal play and stop bluffing altogether. We can also argue mathematically for this adjustment. Bob bluffs 1 bb into a 2 bb pot (he's getting pot odds 2 : 1), but he can't make money from a player who calls more than 1/(2 + 1) =1/3 of the time. Since Alice calls A every time and K 40% of the time, her total calling percentage with A and K combined is 0.5 x 100% (A) + 0.5 x 40% (K) =70%. So the odds against her folding are 70 : 30 =2.33 : 1, which is worse than Bob's pot-odds 2 : 1. But as we have said earlier, Bob might not want to drop all bluffs in practice, if he suspects Alice is observant and capable of adjusting. Bob then has 3 alternatives: Continue to play optimally and take his guaranteed 5.56 bb/10 profit Adjust to 0% bluffing and go for 6.67 bb/100 profit, hoping Alice won't adjust and exploit him instead (we saw earlier that Alice can now reduce his EV to 0 bb/100 if she stops calling with K altogether) Make a moderate adjustment (say, bluffing 20% instead of the optimal 33% or the maximally exploitive 0%) to make it less obvious for Alice that she's being exploited
If Bob chooses the last alternative, he is trying to find a sweet spot that maximizes his profit overall in the long run. He wants to exploit Alice, but at the same time he doesn't want her to adjust. Let's say that Bob after some consideration lands on 20% bluffing. Then he increases his EV from the optimal 5.56 bb/100 to 6.00 bb/100, as shown below. This is pretty good.
Alice can now exploit this by dropping all calling with K, if she realizes what Bob is doing. The same logic applies. Bob's deviation from optimal play is a moderate one, but Alice's best response is an extreme one. If she drops all bluffs, Bob's EV becomes 3.33 bb/100:
And we're back to the extreme exploit/counter-exploit game we discussed earlier in the article. If Bob doesn't want to play this way, he can simply stick with optimal play. But if he believes he adjusts better and quicker than Alice, he can increase his profits by trying to exploit her, prepared to adjust to her adjustments to his adjustments, ad nauseum.
3. Summary
In this article we have experimented with strategies in the AKQ game over 1/2 street with fixed-limit betting. We made an Excel spreadsheet that calculated Bob's EV in the game as a function of his and Alice's strategies. Then we let Alice make various mistakes that Bob tried to exploit. If Alice has leaks, and if she never tries to adjust to Bob's adjustments, the game is simple for Bob. He moves as far away from optimal play as he can get, in order to exploit Alice's mistakes maximally. This is his most profitable play regardless of the size of Alice's mistake. The same applies to Alice if she wants to adjust to Bob's mistakes. In practice the most profitable adjustment in the long run might be a moderate one. This disguises our attempt to exploit and makes our opponent less likely to counter-exploit us. In Part 4 we'll look at another variation of the AKQ game. Now we'll let the pot be of arbitrary size P, and we'll find the general solution for the AKQ game over 1/2 street with fixed-limit betting. We'll also discuss some of the effects of pot size for fixed-limit play in general.

Modeling Poker - 3part

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modeling Poker - 3part

Uploaded by

Copyright:

Available Formats

1.

2. Using exploitive strategies in the AQK game

2.1 Exploitive adjustment: Betting against a calling station

2.2 Exploitive adjustment: Betting against a nit

2.3 Exploiting a non-extreme leak using a non-extreme adjustment

2.4 Exploitive adjustment: Exploiting a moderate leak maximally

You might also like