NO-LIMIT AVERAGE DIFFERENCE CHARTS

FINAL RANKING:

  1. BluffBot20 (Teppo Salonen)
  2. GS3 (Carnegie Mellon)
  3. Hyperborean07 (U of Alberta)
  4. SlideRule (Kevin Stebbing)
  5. Gomel1 (Igor Korshunov, Gomel State University)
  6. Gomel2
  7. Milano (Milano Polytechnic)
  8. Manitoba1 (University of Manitoba)
  9. PokeMinn (University of Minnesota)
  10. Manitoba2

The ranking is based on instant runoff bankroll: because the difference between third place and second place was statistically insignificant, I ran 280 more duplicate matches between them.

The most fascinating thing from my perspective would be that more solid players (such as BluffBot20, Hyperborean07, GS3, and SlideRule) are not as effective at achieving as high an overall bankroll as more aggressive bots such as Gomel1 and Gomel2 (note: I am using “solid” to mean less observably exploitable, and “aggressive” to mean winning more against weaker bots). Also, if we had run the “2/3 truncated bankroll” that we have planned for limit, then SlideRule would have won. Of course, such results should be taken with a grain of salt, because the designers of BluffBot20, Hyperborean07, GS3, and SlideRule were designing bots to win an equilibrium competition (not necessarily trying to exploit weak bots), and so it is unfair to judge their performance (or at least the performance of their designers) in that fashion.

In instant runoff bankroll, we look at the total small bets/hand won or lost for each bot amongst all the competitors. The one that lost the most is eliminated and finishes in last place. Then, the averages are recalculated with the last place finisher’s results eliminated. This process is repeated to obtain a winner.

MATCH DETAILS

Here are the results from the first twenty matches of the no-limit competition. In each cell entry is the AVERAGE net small bets/hand won by the row play minus the net small bets per hand won by the program in the column, followed by the standard deviation of this estimate. Red indicates the row player lost money heads up, green indicates the row player won money. All heads-up wins or losses are statistically different from 0, except SlideRule versus Hyperborean07.

The last column in each row is the next player to be eliminated. Since it is eliminated based on its average performance, it is important that this average performance be smaller than every other average performance by a statistically significant amount. Therefore, I take the standard error of this difference.

I consider a result to be statistically significant if the two bots are two standard deviations away from even.

The results indicate that GomelNoLimit1 and Gomel1 are statistically very close in terms of their averages when Gomel2 gets eliminated. However, if GomelNoLimit1 was eliminated first, then GomelNoLimit2 would have been eliminated immediately after, indicating that this is a localized uncertainty. Also, they are one standard deviation away. Nonetheless, if there is time a comparison between these bots should be run.

The other statistically insignificance in the order is Hyperborean07 and GS3. This is much closer, at a third of a standard deviation. Thus, given time more matches amongst the top three will be run.

Note that BluffBot20 is a clear winner regardless, dominating the field in terms of a lack of exploitability.

All Bots

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 PokeMinnNoLimit1 ManitobaNoLimit2 Average
BluffBot20NoLimit1 4.623 ± 0.155 -5.486 ± 0.214 -5.918 ± 0.196 -11.114 ± 0.25 -10.073 ± 0.209 1.62 ± 0.265 -0.272 ± 0.214 13.587 ± 0.45 28.798 ± 0.403 1.577 ± 0.101
GS3NoLimit1 -4.623 ± 0.155 -10.109 ± 0.274 -10.54 ± 0.279 -15.736 ± 0.294 -14.696 ± 0.198 -3.003 ± 0.329 -4.894 ± 0.284 8.964 ± 0.457 24.175 ± 0.429 -3.046 ± 0.17
Hyperborean07NoLimit1 5.486 ± 0.214 10.109 ± 0.274 -0.431 ± 0.18 -5.627 ± 0.298 -4.587 ± 0.307 7.106 ± 0.24 5.215 ± 0.32 19.073 ± 0.514 34.284 ± 0.402 7.063 ± 0.181
SlideRuleNoLimit1 5.918 ± 0.196 10.54 ± 0.279 0.431 ± 0.18 -5.196 ± 0.309 -4.156 ± 0.309 7.538 ± 0.264 5.646 ± 0.301 19.504 ± 0.512 34.715 ± 0.404 7.494 ± 0.182
GomelNoLimit1 11.114 ± 0.25 15.736 ± 0.294 5.627 ± 0.298 5.196 ± 0.309 1.04 ± 0.271 12.734 ± 0.346 10.842 ± 0.283 24.7 ± 0.485 39.911 ± 0.475 12.69 ± 0.218
GomelNoLimit2 10.073 ± 0.209 14.696 ± 0.198 4.587 ± 0.307 4.156 ± 0.309 -1.04 ± 0.271 11.693 ± 0.373 9.802 ± 0.226 23.66 ± 0.504 38.871 ± 0.496 11.65 ± 0.211
MilanoNoLimit1 -1.62 ± 0.265 3.003 ± 0.329 -7.106 ± 0.24 -7.538 ± 0.264 -12.734 ± 0.346 -11.693 ± 0.373 -1.892 ± 0.409 11.967 ± 0.408 27.178 ± 0.324 -0.044 ± 0.202
ManitobaNoLimit1 0.272 ± 0.214 4.894 ± 0.284 -5.215 ± 0.32 -5.646 ± 0.301 -10.842 ± 0.283 -9.802 ± 0.226 1.892 ± 0.409 13.858 ± 0.557 29.069 ± 0.496 1.848 ± 0.241
PokeMinnNoLimit1 -13.587 ± 0.45 -8.964 ± 0.457 -19.073 ± 0.514 -19.504 ± 0.512 -24.7 ± 0.485 -23.66 ± 0.504 -11.967 ± 0.408 -13.858 ± 0.557 15.211 ± 0.619 -12.01 ± 0.411
ManitobaNoLimit2 -28.798 ± 0.403 -24.175 ± 0.429 -34.284 ± 0.402 -34.715 ± 0.404 -39.911 ± 0.475 -38.871 ± 0.496 -27.178 ± 0.324 -29.069 ± 0.496 -15.211 ± 0.619 -27.221 ± 0.358
Note Gomel’s high total average. Also, Manitoba1 beat Manitoba2 by a significant margin that would have created problems had this been a bankroll competition. Manitoba1 has assured me that this is not the case, with one of their bots being developed using an evolutionary algorithm, one of them being a hand-coded bot, and they never played against one another. For future reference, two bots learning in self-play together where one has learned to beat the other handily, if this were an online learning competition, would be considered collusion. But this is not an online learning competition, and I see no reason for drastic measures when Manitoba1 gained no increase in rank from Manitoba2.

Top 9

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 PokeMinnNoLimit1 Average
BluffBot20NoLimit1 5.519 ± 0.168 -3.716 ± 0.241 -5.617 ± 0.233 -6.271 ± 0.279 -6.609 ± 0.201 4.894 ± 0.268 5.72 ± 0.231 19.264 ± 0.495 1.465 ± 0.105
GS3NoLimit1 -5.519 ± 0.168 -9.235 ± 0.302 -11.136 ± 0.309 -11.79 ± 0.339 -12.128 ± 0.183 -0.625 ± 0.346 0.202 ± 0.33 13.745 ± 0.486 -4.054 ± 0.182
Hyperborean07NoLimit1 3.716 ± 0.241 9.235 ± 0.302 -1.901 ± 0.219 -2.555 ± 0.374 -2.893 ± 0.317 8.61 ± 0.228 9.437 ± 0.334 22.98 ± 0.575 5.181 ± 0.211
SlideRuleNoLimit1 5.617 ± 0.233 11.136 ± 0.309 1.901 ± 0.219 -0.654 ± 0.357 -0.992 ± 0.331 10.511 ± 0.254 11.337 ± 0.348 24.881 ± 0.544 7.082 ± 0.207
GomelNoLimit1 6.271 ± 0.279 11.79 ± 0.339 2.555 ± 0.374 0.654 ± 0.357 -0.338 ± 0.314 11.165 ± 0.367 11.991 ± 0.313 25.535 ± 0.52 7.736 ± 0.243
GomelNoLimit2 6.609 ± 0.201 12.128 ± 0.183 2.893 ± 0.317 0.992 ± 0.331 0.338 ± 0.314 11.503 ± 0.366 12.33 ± 0.264 25.873 ± 0.569 8.074 ± 0.205
MilanoNoLimit1 -4.894 ± 0.268 0.625 ± 0.346 -8.61 ± 0.228 -10.511 ± 0.254 -11.165 ± 0.367 -11.503 ± 0.366 0.826 ± 0.362 14.37 ± 0.462 -3.429 ± 0.208
ManitobaNoLimit1 -5.72 ± 0.231 -0.202 ± 0.33 -9.437 ± 0.334 -11.337 ± 0.348 -11.991 ± 0.313 -12.33 ± 0.264 -0.826 ± 0.362 13.543 ± 0.606 -4.256 ± 0.243
PokeMinnNoLimit1 -19.264 ± 0.495 -13.745 ± 0.486 -22.98 ± 0.575 -24.881 ± 0.544 -25.535 ± 0.52 -25.873 ± 0.569 -14.37 ± 0.462 -13.543 ± 0.606 -17.799 ± 0.441

Because Manitoba2 was so weak, it was eliminated and therefore Manitoba1 gained no advantage. Manitoba1 beat PokeMinn heads-up, as well as being less exploitable overall, clearly expressing dominance.

Top 8

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 Average
BluffBot20NoLimit1 0.035 ± 0.15 -2.504 ± 0.23 -4.45 ± 0.198 1.482 ± 0.288 0.609 ± 0.193 7.146 ± 0.325 8.962 ± 0.226 1.41 ± 0.113
GS3NoLimit1 -0.035 ± 0.15 -2.539 ± 0.257 -4.485 ± 0.248 1.446 ± 0.317 0.574 ± 0.168 7.111 ± 0.362 8.926 ± 0.289 1.375 ± 0.159
Hyperborean07NoLimit1 2.504 ± 0.23 2.539 ± 0.257 -1.946 ± 0.23 3.986 ± 0.346 3.113 ± 0.242 9.65 ± 0.315 11.465 ± 0.322 3.914 ± 0.181
SlideRuleNoLimit1 4.45 ± 0.198 4.485 ± 0.248 1.946 ± 0.23 5.932 ± 0.287 5.059 ± 0.251 11.596 ± 0.311 13.411 ± 0.338 5.86 ± 0.166
GomelNoLimit1 -1.482 ± 0.288 -1.446 ± 0.317 -3.986 ± 0.346 -5.932 ± 0.287 -0.873 ± 0.318 5.664 ± 0.337 7.48 ± 0.345 -0.072 ± 0.232
GomelNoLimit2 -0.609 ± 0.193 -0.574 ± 0.168 -3.113 ± 0.242 -5.059 ± 0.251 0.873 ± 0.318 6.537 ± 0.35 8.353 ± 0.31 0.801 ± 0.165
MilanoNoLimit1 -7.146 ± 0.325 -7.111 ± 0.362 -9.65 ± 0.315 -11.596 ± 0.311 -5.664 ± 0.337 -6.537 ± 0.35 1.816 ± 0.334 -5.736 ± 0.248
ManitobaNoLimit1 -8.962 ± 0.226 -8.926 ± 0.289 -11.465 ± 0.322 -13.411 ± 0.338 -7.48 ± 0.345 -8.353 ± 0.31 -1.816 ± 0.334 -7.552 ± 0.22

Milano edged out ManitobaNoLimit1 by a statistically significant margin. Milano beat Manitoba1 heads-up: however, the result is an example of the dependence upon what other bots were playing. If it had only been BluffBot20, Gomel2, Milano, Manitoba1, then Manitoba1 would have come out ahead of Milano, due to Manitoba1 being less exploitable by Gomel2 and BluffBot20. However, this observation may be due to noise.

Top 7

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 Average
BluffBot20NoLimit1 0.663 ± 0.169 -1.551 ± 0.228 -3.54 ± 0.209 3.558 ± 0.272 2.616 ± 0.157 9.215 ± 0.374 1.566 ± 0.121
GS3NoLimit1 -0.663 ± 0.169 -2.214 ± 0.284 -4.202 ± 0.286 2.895 ± 0.32 1.953 ± 0.188 8.552 ± 0.423 0.903 ± 0.188
Hyperborean07NoLimit1 1.551 ± 0.228 2.214 ± 0.284 -1.989 ± 0.235 5.109 ± 0.341 4.167 ± 0.226 10.766 ± 0.388 3.117 ± 0.187
SlideRuleNoLimit1 3.54 ± 0.209 4.202 ± 0.286 1.989 ± 0.235 7.097 ± 0.295 6.155 ± 0.255 12.754 ± 0.339 5.105 ± 0.164
GomelNoLimit1 -3.558 ± 0.272 -2.895 ± 0.32 -5.109 ± 0.341 -7.097 ± 0.295 -0.942 ± 0.241 5.657 ± 0.372 -1.992 ± 0.213
GomelNoLimit2 -2.616 ± 0.157 -1.953 ± 0.188 -4.167 ± 0.226 -6.155 ± 0.255 0.942 ± 0.241 6.599 ± 0.373 -1.05 ± 0.127
MilanoNoLimit1 -9.215 ± 0.374 -8.552 ± 0.423 -10.766 ± 0.388 -12.754 ± 0.339 -5.657 ± 0.372 -6.599 ± 0.373 -7.649 ± 0.293

After Manitoba’s and Minnesota’s entrants have been removed, Gomels’ bots’ dominance in terms of bankroll is eliminated, and SlideRule’s ability to exploit even stronger bots shines through. This is the top two-thirds, what we would likely go with if we ran a truncated bankroll in no-limit next year.

Top 6

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 Average
BluffBot20NoLimit1 0.483 ± 0.17 -1.19 ± 0.237 -2.867 ± 0.195 5.256 ± 0.267 5.467 ± 0.201 1.192 ± 0.123
GS3NoLimit1 -0.483 ± 0.17 -1.672 ± 0.27 -3.35 ± 0.269 4.773 ± 0.306 4.984 ± 0.213 0.709 ± 0.168
Hyperborean07NoLimit1 1.19 ± 0.237 1.672 ± 0.27 -1.678 ± 0.204 6.446 ± 0.289 6.657 ± 0.232 2.381 ± 0.166
SlideRuleNoLimit1 2.867 ± 0.195 3.35 ± 0.269 1.678 ± 0.204 8.124 ± 0.283 8.335 ± 0.212 4.059 ± 0.149
GomelNoLimit1 -5.256 ± 0.267 -4.773 ± 0.306 -6.446 ± 0.289 -8.124 ± 0.283 0.211 ± 0.193 -4.065 ± 0.192
GomelNoLimit2 -5.467 ± 0.201 -4.984 ± 0.213 -6.657 ± 0.232 -8.335 ± 0.212 -0.211 ± 0.193 -4.276 ± 0.114

Gomels’ bots’ are now at the bottom of the list, unable to exploit the more solid players. And yet, observe that as the player’s grow more solid (with lower maximum exploitability), their performance against Gomel’s bots decreases. Also, Gomel2 did better against the top three bots, but because Gomel1 beat him heads-up, Gomel2 falls into fifth place. At this point, the “better” bot becomes a subjective term, with Gomel1 and Gomel2 being able to exploit weaker bots, BluffBot20 being the most solid and hardest to exploit, and SlideRule and Hyperborean07 taking a middle ground.

Top 5

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 Average
BluffBot20NoLimit1 -0.087 ± 0.146 -0.844 ± 0.156 -1.873 ± 0.164 6.645 ± 0.282 0.768 ± 0.09
GS3NoLimit1 0.087 ± 0.146 -0.758 ± 0.219 -1.786 ± 0.242 6.732 ± 0.313 0.855 ± 0.151
Hyperborean07NoLimit1 0.844 ± 0.156 0.758 ± 0.219 -1.028 ± 0.143 7.489 ± 0.308 1.613 ± 0.123
SlideRuleNoLimit1 1.873 ± 0.164 1.786 ± 0.242 1.028 ± 0.143 8.518 ± 0.295 2.641 ± 0.127
GomelNoLimit1 -6.645 ± 0.282 -6.732 ± 0.313 -7.489 ± 0.308 -8.518 ± 0.295 -5.877 ± 0.223

Top 4

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 Average
BluffBot20NoLimit1 0.24 ± 0.123 0.395 ± 0.097 0.67 ± 0.121 0.327 ± 0.06
GS3NoLimit1 -0.24 ± 0.123 0.155 ± 0.136 0.43 ± 0.148 0.086 ± 0.088
Hyperborean07NoLimit1 -0.395 ± 0.097 -0.155 ± 0.136 0.275 ± 0.136 -0.069 ± 0.074
SlideRuleNoLimit1 -0.67 ± 0.121 -0.43 ± 0.148 -0.275 ± 0.136 -0.344 ± 0.087

SlideRule edged out Hyperborean07 head-to-head in a statistically insignificant (however morally significant) victory. However, SlideRule was crushed by BluffBot20 as well as GS3, making his average loss/hand among the top three higher than Hyperborean07’s or GS3's.

Top 3 (After 20 Duplicate Matches)

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 Average
BluffBot20NoLimit1 0.324 ± 0.114 0.281 ± 0.109 0.202 ± 0.056
GS3NoLimit1 -0.324 ± 0.114 -0.043 ± 0.146 -0.123 ± 0.079
Hyperborean07NoLimit1 -0.281 ± 0.109 0.043 ± 0.146 -0.079 ± 0.077

Hyperborean07 and GS3 both lost to BluffBot20 heads-up: however, their relative performance was statistically very close. Thus, I ran them all for 300 matches (600,000 hands/series). This resulted in a statistically significant ranking.

Top 3 (300 Duplicate Matches)

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 Average
BluffBot20NoLimit1 0.267 ± 0.032 0.38 ± 0.033 0.216 ± 0.017
GS3NoLimit1 -0.267 ± 0.032 0.113 ± 0.039 -0.051 ± 0.021
Hyperborean07NoLimit1 -0.38 ± 0.033 -0.113 ± 0.039 -0.164 ± 0.022