NO-LIMIT RESULTS

FINAL RANKING:

  1. BluffBot20 (Teppo Salonen)
  2. GS3 (Carnegie Mellon)
  3. Hyperborean07 (U of Alberta)
  4. SlideRule (Kevin Stebbing)
  5. Gomel1 (Igor Korshunov, Gomel State University)
  6. Gomel2
  7. Milano (Milano Polytechnic)
  8. Manitoba1 (University of Manitoba)
  9. PokeMinn (University of Minnesota)
  10. Manitoba2

The ranking is based on instant runoff bankroll.

The most fascinating thing from my perspective would be that more solid players (such as BluffBot20, Hyperborean07, GS3, and SlideRule) are not as effective at achieving as high an overall bankroll as more aggressive bots such as Gomel1 and Gomel2 (note: I am using “solid” to mean less observably exploitable, and “aggressive” to mean winning more against weaker bots). Also, if we had run the “2/3 truncated bankroll” that we have planned for limit, then SlideRule would have won. Of course, such results should be taken with a grain of salt, because neither BluffBot20 nor Hyperborean07 were trying to exploit weak bots, and so it is unfair to judge their performance (or at least the performance of their designers) in that fashion.

In instant runoff bankroll, we look at the total small bets/hand won or lost for each bot amongst all the competitors. The one that lost the most is eliminated and finishes in last place. Then, the averages are recalculated with the last place finisher’s results eliminated. This process is repeated to obtain a winner.

MATCH DETAILS

Here are the results from the first twenty matches of the no-limit competition. In each cell entry is the small bets/hand won (if positive) or lost (if negative) by the program in the row from/to the program in the column, followed by the standard deviation of this estimate. Red indicates the row player lost money heads up, green indicates the row player won money.

In addition to these fairly readable charts, I have also written average difference charts that can be used to more accurately determine the statistical significance of where people placed. In particular, I used it to determine that the rank of Hyperborean07 over GS3 is not statistically significant, nor is the ranking of Gomel1 over Gomel2, but all other results seem statistically significant. I therefore ran the top three for 300 duplicate matches for each series, and afterwards GS3 was ahead of Hyperborean07 by a statistically significant margin.

All Bots

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 PokeMinnNoLimit1 ManitobaNoLimit2 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 2.885 ± 0.306 3.437 ± 0.243 0.475 ± 0.153 1.848 ± 0.252 2.471 ± 0.138 1.577 ± 0.101
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.124 ± 0.467 1.875 ± 0.377 4.204 ± 0.323 -42.055 ± 0.606 5.016 ± 0.192 -3.046 ± 0.17
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 5.455 ± 0.483 6.795 ± 0.551 8.697 ± 0.424 14.051 ± 0.723 22.116 ± 0.589 7.063 ± 0.181
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 9.73 ± 0.359 10.337 ± 0.595 10.387 ± 0.523 15.637 ± 0.685 10.791 ± 0.405 7.494 ± 0.182
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 3.184 ± 0.287 8.372 ± 0.705 11.45 ± 0.854 62.389 ± 1.264 52.325 ± 0.599 12.69 ± 0.218
GomelNoLimit2 -2.885 ± 0.306 -0.124 ± 0.467 -5.455 ± 0.483 -9.73 ± 0.359 -3.184 ± 0.287 15.078 ± 0.83 11.907 ± 0.848 58.985 ± 0.892 40.256 ± 0.61 11.65 ± 0.211
MilanoNoLimit1 -3.437 ± 0.243 -1.875 ± 0.377 -6.795 ± 0.551 -10.337 ± 0.595 -8.372 ± 0.705 -15.078 ± 0.83 5.741 ± 0.675 12.719 ± 1.124 27.04 ± 0.736 -0.044 ± 0.202
ManitobaNoLimit1 -0.475 ± 0.153 -4.204 ± 0.323 -8.697 ± 0.424 -10.387 ± 0.523 -11.45 ± 0.854 -11.907 ± 0.848 -5.741 ± 0.675 18.817 ± 1.236 50.677 ± 0.91 1.848 ± 0.241
PokeMinnNoLimit1 -1.848 ± 0.252 42.055 ± 0.606 -14.051 ± 0.723 -15.637 ± 0.685 -62.389 ± 1.264 -58.985 ± 0.892 -12.719 ± 1.124 -18.817 ± 1.236 34.299 ± 1.37 -12.01 ± 0.411
ManitobaNoLimit2 -2.471 ± 0.138 -5.016 ± 0.192 -22.116 ± 0.589 -10.791 ± 0.405 -52.325 ± 0.599 -40.256 ± 0.61 -27.04 ± 0.736 -50.677 ± 0.91 -34.299 ± 1.37 -27.221 ± 0.358
Note Gomel’s high total average. Also, Manitoba1 beat Manitoba2 by a significant margin that would have created problems had this been a bankroll competition. Manitoba1 has assured me that this is not the case, with one of their bots being developed using an evolutionary algorithm, one of them being a hand-coded bot, and they never played against one another. For future reference, two bots learning in self-play together where one has learned to beat the other handily, if this were an online learning competition, would be considered collusion. But this is not an online learning competition, and I see no reason for drastic measures when Manitoba1 gained no increase in rank from Manitoba2.

Top 9

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 PokeMinnNoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 2.885 ± 0.306 3.437 ± 0.243 0.475 ± 0.153 1.848 ± 0.252 1.465 ± 0.105
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.124 ± 0.467 1.875 ± 0.377 4.204 ± 0.323 -42.055 ± 0.606 -4.054 ± 0.182
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 5.455 ± 0.483 6.795 ± 0.551 8.697 ± 0.424 14.051 ± 0.723 5.181 ± 0.211
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 9.73 ± 0.359 10.337 ± 0.595 10.387 ± 0.523 15.637 ± 0.685 7.082 ± 0.207
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 3.184 ± 0.287 8.372 ± 0.705 11.45 ± 0.854 62.389 ± 1.264 7.736 ± 0.243
GomelNoLimit2 -2.885 ± 0.306 -0.124 ± 0.467 -5.455 ± 0.483 -9.73 ± 0.359 -3.184 ± 0.287 15.078 ± 0.83 11.907 ± 0.848 58.985 ± 0.892 8.074 ± 0.205
MilanoNoLimit1 -3.437 ± 0.243 -1.875 ± 0.377 -6.795 ± 0.551 -10.337 ± 0.595 -8.372 ± 0.705 -15.078 ± 0.83 5.741 ± 0.675 12.719 ± 1.124 -3.429 ± 0.208
ManitobaNoLimit1 -0.475 ± 0.153 -4.204 ± 0.323 -8.697 ± 0.424 -10.387 ± 0.523 -11.45 ± 0.854 -11.907 ± 0.848 -5.741 ± 0.675 18.817 ± 1.236 -4.256 ± 0.243
PokeMinnNoLimit1 -1.848 ± 0.252 42.055 ± 0.606 -14.051 ± 0.723 -15.637 ± 0.685 -62.389 ± 1.264 -58.985 ± 0.892 -12.719 ± 1.124 -18.817 ± 1.236 -17.799 ± 0.441

Because Manitoba2 was so weak, it was eliminated and therefore Manitoba1 gained no advantage. Manitoba1 beat PokeMinn heads-up, as well as being less exploitable overall, clearly expressing dominance.

Top 8

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 ManitobaNoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 2.885 ± 0.306 3.437 ± 0.243 0.475 ± 0.153 1.41 ± 0.113
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.124 ± 0.467 1.875 ± 0.377 4.204 ± 0.323 1.375 ± 0.159
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 5.455 ± 0.483 6.795 ± 0.551 8.697 ± 0.424 3.914 ± 0.181
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 9.73 ± 0.359 10.337 ± 0.595 10.387 ± 0.523 5.86 ± 0.166
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 3.184 ± 0.287 8.372 ± 0.705 11.45 ± 0.854 -0.072 ± 0.232
GomelNoLimit2 -2.885 ± 0.306 -0.124 ± 0.467 -5.455 ± 0.483 -9.73 ± 0.359 -3.184 ± 0.287 15.078 ± 0.83 11.907 ± 0.848 0.801 ± 0.165
MilanoNoLimit1 -3.437 ± 0.243 -1.875 ± 0.377 -6.795 ± 0.551 -10.337 ± 0.595 -8.372 ± 0.705 -15.078 ± 0.83 5.741 ± 0.675 -5.736 ± 0.248
ManitobaNoLimit1 -0.475 ± 0.153 -4.204 ± 0.323 -8.697 ± 0.424 -10.387 ± 0.523 -11.45 ± 0.854 -11.907 ± 0.848 -5.741 ± 0.675 -7.552 ± 0.22

Milano edged out ManitobaNoLimit1 by a statistically significant margin. Milano beat Manitoba1 heads-up: however, the result is an example of the dependence upon what other bots were playing. If it had only been BluffBot20, Gomel2, Milano, Manitoba1, then Manitoba1 would have come out ahead of Milano, due to Manitoba1 being less exploitable by Gomel2 and BluffBot20. However, this observation may be due to noise.

Top 7

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 MilanoNoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 2.885 ± 0.306 3.437 ± 0.243 1.566 ± 0.121
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.124 ± 0.467 1.875 ± 0.377 0.903 ± 0.188
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 5.455 ± 0.483 6.795 ± 0.551 3.117 ± 0.187
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 9.73 ± 0.359 10.337 ± 0.595 5.105 ± 0.164
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 3.184 ± 0.287 8.372 ± 0.705 -1.992 ± 0.213
GomelNoLimit2 -2.885 ± 0.306 -0.124 ± 0.467 -5.455 ± 0.483 -9.73 ± 0.359 -3.184 ± 0.287 15.078 ± 0.83 -1.05 ± 0.127
MilanoNoLimit1 -3.437 ± 0.243 -1.875 ± 0.377 -6.795 ± 0.551 -10.337 ± 0.595 -8.372 ± 0.705 -15.078 ± 0.83 -7.649 ± 0.293

After Manitoba’s and Minnesota’s entrants have been removed, Gomels’ bots’ dominance in terms of bankroll is eliminated, and SlideRule’s ability to exploit even stronger bots shines through. This is the top two-thirds, what we would likely go with if we ran a truncated bankroll in no-limit next year.

Top 6

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 GomelNoLimit2 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 2.885 ± 0.306 1.192 ± 0.123
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.124 ± 0.467 0.709 ± 0.168
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 5.455 ± 0.483 2.381 ± 0.166
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 9.73 ± 0.359 4.059 ± 0.149
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 3.184 ± 0.287 -4.065 ± 0.192
GomelNoLimit2 -2.885 ± 0.306 -0.124 ± 0.467 -5.455 ± 0.483 -9.73 ± 0.359 -3.184 ± 0.287 -4.276 ± 0.114

Gomels’ bots’ are now at the bottom of the list, unable to exploit the more solid players. And yet, observe that as the player’s grow more solid (with lower maximum exploitability), their performance against Gomel’s bots decreases. Also, Gomel2 did better against the top three bots, but because Gomel1 beat him heads-up, Gomel2 falls into fifth place. At this point, the “better” bot becomes a subjective term, with Gomel1 and Gomel2 being able to exploit weaker bots, BluffBot20 being the most solid and hardest to exploit, and SlideRule and Hyperborean07 taking a middle ground.

Top 5

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 GomelNoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 2.093 ± 0.346 0.768 ± 0.09
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 3.161 ± 0.597 0.855 ± 0.151
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 6.657 ± 0.493 1.613 ± 0.123
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 11.596 ± 0.295 2.641 ± 0.127
GomelNoLimit1 -2.093 ± 0.346 -3.161 ± 0.597 -6.657 ± 0.493 -11.596 ± 0.295 -5.877 ± 0.223

Top 4

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 SlideRuleNoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.576 ± 0.102 0.327 ± 0.06
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 0.503 ± 0.148 0.086 ± 0.088
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.048 ± 0.171 -0.069 ± 0.074
SlideRuleNoLimit1 -0.576 ± 0.102 -0.503 ± 0.148 0.048 ± 0.171 -0.344 ± 0.087

SlideRule edged out Hyperborean07 head-to-head in a statistically insignificant (however morally significant) victory. However, SlideRule was crushed by BluffBot20 as well as GS3, making his average loss/hand among the top three higher than Hyperborean07’s or GS3's.

Top 3 (After 20 Duplicate Matches)

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 Average
BluffBot20NoLimit1 0.166 ± 0.074 0.237 ± 0.08 0.202 ± 0.056
GS3NoLimit1 -0.166 ± 0.074 -0.079 ± 0.148 -0.123 ± 0.079
Hyperborean07NoLimit1 -0.237 ± 0.08 0.079 ± 0.148 -0.079 ± 0.077

Hyperborean07 and GS3 both lost to BluffBot20 heads-up: however, their relative performance was statistically very close. Thus, I ran them all for 300 matches (600,000 hands/series). This resulted in a statistically significant ranking.

Top 3 (300 Duplicate Matches)

BluffBot20NoLimit1 GS3NoLimit1 Hyperborean07NoLimit1 Average
BluffBot20NoLimit1 0.267 ± 0.032 0.38 ± 0.033 0.216 ± 0.017
GS3NoLimit1 -0.267 ± 0.032 0.113 ± 0.039 -0.051 ± 0.021
Hyperborean07NoLimit1 -0.38 ± 0.033 -0.113 ± 0.039 -0.164 ± 0.022

GS3 did statistically significantly better than Hyperborean07 in the run-off. Not surprisingly, BluffBot20 still came out on top.