Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Tue Jun 25, 2019 4:10 am

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Thu Jan 08, 2015 10:11 pm 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
Heads-up limit hold’em poker is solved
by: Michael Bowling, Neil Burch, Michael Johanson, Oskari Tammelin

Abstract
Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold’em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR+, which is capable of solving extensive-form games orders of magnitude larger than previously possible.

http://www.sciencemag.org/content/347/6218/145.abstract


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 6:02 am 
Offline
Junior Member

Joined: Wed Jul 17, 2013 10:52 pm
Posts: 21
Also noticed this just now: http://poker.cs.ualberta.ca/open_cfr_plus.html

Oskari Tammelin's web page also has some brief notes / pseudocode: http://jeskola.net/ (click CFR at top)


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 8:00 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 557
https://pdf.yt/d/qv-O9AwQuV1Kjb04


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 2:08 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 438
"Actions that have appeared poor (with less than zero regret for not having been played) will be chosen immediately after proving useful."

Code:
cfr[player][a] = std::max(0.0, cfr[player][a] + cfu[a] - ev);


Brilliant. Hats off, gentlemen.


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 9:32 pm 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
Quote:
- poor convergence rate in monte carlo variations


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 10:50 pm 
Offline
Junior Member

Joined: Thu May 23, 2013 11:35 pm
Posts: 23
flopnflush wrote:
Quote:
- poor convergence rate in monte carlo variations

Where did you find this quote?


Top
 Profile  
 
PostPosted: Fri Jan 09, 2015 10:57 pm 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
algonoob wrote:
flopnflush wrote:
Quote:
- poor convergence rate in monte carlo variations

Where did you find this quote?

http://jeskola.net/cfr/demo/


Top
 Profile  
 
PostPosted: Sat Jan 10, 2015 12:45 am 
Offline
Junior Member

Joined: Wed Jul 17, 2013 10:52 pm
Posts: 21
Here's a brief paper by Oskari on CFR+ with pseudocode: http://arxiv.org/pdf/1407.5042v1

He says that CFR+ is meant for "vector-form" updates, and cites http://poker.cs.ualberta.ca/publication ... 12-pcs.pdf with more info about that. The pseudocode from both papers are very similar...

My current interpretation is that CFR+ works with public chance sampling (go through all hands for each player, but just one board). My concern is if it means you have to do Vanilla CFR iterations and go through all boards as well.

I'm curious to hear of any attempts to implement this. It'll take time to get some old PCS code working again :?


Top
 Profile  
 
PostPosted: Sat Jan 10, 2015 11:56 am 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
Cool stuff!

nonpareil wrote:
Here's a brief paper by Oskari on CFR+ with pseudocode: http://arxiv.org/pdf/1407.5042v1

He says that CFR+ is meant for "vector-form" updates, and cites http://poker.cs.ualberta.ca/publication ... 12-pcs.pdf with more info about that. The pseudocode from both papers are very similar...

My current interpretation is that CFR+ works with public chance sampling (go through all hands for each player, but just one board). My concern is if it means you have to do Vanilla CFR iterations and go through all boards as well.

I'm curious to hear of any attempts to implement this. It'll take time to get some old PCS code working again :?


PCS should work better than monte carlo, but there will still be actions that "prove useful" due to variance and not because they're actually useful. Using PCS and only using CFR+ on the river could work though (no variance there).

Maybe it's possible to use some negative treshold instead of 0 to provide some buffer against variance in sampling versions.

i.e.

Code:
cfr[player][a] = std::max(treshold, cfr[player][a] + cfu[a] - ev);


Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 11:58 pm 
Offline
Junior Member

Joined: Wed Jul 17, 2013 10:52 pm
Posts: 21
somehomelessguy wrote:
PCS should work better than monte carlo, but there will still be actions that "prove useful" due to variance and not because they're actually useful. Using PCS and only using CFR+ on the river could work though (no variance there).

Maybe it's possible to use some negative treshold instead of 0 to provide some buffer against variance in sampling versions.

i.e.

Code:
cfr[player][a] = std::max(treshold, cfr[player][a] + cfu[a] - ev);


Thanks for the response. I tried Pure CFR with several different negative thresholds, but it didn't converge faster in my tests.

I also tried a form of PCS that only uses the current strategy when training. It also performed worse when using a 0 threshold on the river compared to not using one at all. When using a card abstraction (and imperfect recall) there can be several board rundowns that result in the same bucket, so there is still variance on the river I think.

Has anyone seen improvement when trying something like CFR+ but with sampling? I feel like there should be some compromise between Vanilla CFR with no abstraction and one with buckets and sampling cards and actions.

Of course, there could be bugs in my implementations, my tests were not run for very long, maybe it does better with larger card abstractions, etc. I did get CFR+ working in toy games just fine though, based off Oskari's demo code.


Top
 Profile  
 
PostPosted: Tue Jan 20, 2015 3:26 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 16, 2014 3:36 am
Posts: 36
Location: Germany
Yes.

You can play against Cepheus the solved Heads-up limit hold’em PokerBot here

http://poker.srv.ualberta.ca/


Top
 Profile  
 
PostPosted: Wed Feb 11, 2015 11:44 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
I have a couple of questions regarding optimization

[1] In http://webdocs.cs.ualberta.ca/~bowling/papers/15science.pdf they state
Quote:
Finally, unlike with CFR, we have empirically observed that the exploitability of the players’ current strategies during the computation regularly approaches zero. Therefore, we can skip the step of computing and storing the average strategy, instead using the players’ current strategies as the CFR+ solution.


So I modified the pseudocode provided in http://arxiv.org/pdf/1407.5042.pdf as I understand it. (Attached to this post - See bottom). Do you agree with that?

[2] In http://webdocs.cs.ualberta.ca/~bowling/papers/15science.pdf they explain
Quote:
To address the memory challenge we store the average strategy and accumulated regrets using compression. We use fixed-point arithmetic by first multiplying all values by a scaling factor and truncating them to integers. The resulting integers are then ordered to maximize compression efficiency, with compression ratios around 13-to-1 on the regrets and 28-to-1 on the strategy. Overall, we require less than 11 TB of storage to store the regrets and 6 TB to store the average strategy during the computation, which is distributed across a cluster of computation nodes.

What might that scaling factor be? 1,000? 10,000? 100,000?

[3] The pseudocode uses a non-declared/ non-preinitialized vector m. What might that be? A temporary variable?

[4] The entire strategy is ~524 TB in size:
- Since the average strategy is not used -> 524 / 2 = 262 TB required
- Since we are using 4 Byte Int32 instead of 8 Byte Doubles -> 262 / 2 = 131 TB required
- Since storage allows a compression of factor 13 -> 131 / 13 = ~10 TB required

Correct? Thanks ahead


Attachments:
File comment: Updates pseudocode/ Questions
cfrmplus_improved.jpg
cfrmplus_improved.jpg [ 88.63 KiB | Viewed 25779 times ]
Top
 Profile  
 
PostPosted: Fri Mar 06, 2015 5:11 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
nonpareil wrote:
somehomelessguy wrote:
PCS should work better than monte carlo, but there will still be actions that "prove useful" due to variance and not because they're actually useful. Using PCS and only using CFR+ on the river could work though (no variance there).

Maybe it's possible to use some negative treshold instead of 0 to provide some buffer against variance in sampling versions.

i.e.

Code:
cfr[player][a] = std::max(treshold, cfr[player][a] + cfu[a] - ev);


Thanks for the response. I tried Pure CFR with several different negative thresholds, but it didn't converge faster in my tests.

I also tried a form of PCS that only uses the current strategy when training. It also performed worse when using a 0 threshold on the river compared to not using one at all. When using a card abstraction (and imperfect recall) there can be several board rundowns that result in the same bucket, so there is still variance on the river I think.

Has anyone seen improvement when trying something like CFR+ but with sampling? I feel like there should be some compromise between Vanilla CFR with no abstraction and one with buckets and sampling cards and actions.

Of course, there could be bugs in my implementations, my tests were not run for very long, maybe it does better with larger card abstractions, etc. I did get CFR+ working in toy games just fine though, based off Oskari's demo code.


I couldn't get the thresholding to work either. I did found sort of an improvement though (which could possibly be enhanced).

Weighing the regrets has been discussed here before and gives a slight boost. This seems to be related to the cfr+ phenomena, because weighing only the negative regrets gives faster convergence than weighing all regrets.

The boost is nothing like cfr+ over vanilla cfr however.

Using weight=iterations+1 in CSCFR and weighing negative regrets and avg strategy, it reached <5mbb/g exploitability 20% faster than weighing all regrets and avg strategy, and 36% faster than not weighing anything (in the flop subgame I tested).

The weight will likely depend on what sampling is used, and linear isn't necessarily optimal.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group