Loki has been tested for extended periods of time in more realistic settings against human opposition (and an occasional computer player). For this purpose, the program participates in an on-line poker game, running on the Internet Relay Chat ( irc.poker.net). Human players connect to IRC and participate in games conducted by dedicated server programs. Bankroll statistics on each player are maintained, but no real money is at stake, and this may contribute to results being a little optimistic. There are three different games available for limit Texas Hold'em. For the first level, you begin with $1000 and the betting scale is 10-20. Once you have $2000 ( i.e. won a net of $1000) you are allowed to play in a second game where the betting scale is 20-40. At $5000 you are allowed to play in a third game where the betting scale is 50-100. The competition becomes much stronger at each level, but it is more difficult to find opponents, so games at the higher levels are less common. Early versions of Loki participated in games with 2 to 12 players. Later the server was changed to allow only 2 to 10 players.
Playing short-handed (2-4 players) emphasizes the need for strong opponent modeling. Typically, with many players (5 or more) the computer can win in the long run by playing the odds, usually because there are some bad players. With only a few opponents, we face many one on one situations where the non-mathematical elements of the game become more important. When those few players are strong (or colluding), Loki loses a large amount of money. It does not ``give up" in a bad situation, so it continues to lose. Since games go by significantly faster, there is too much data in this limited context. We ignore results for games with 2 to 4 players because the overall results are distorted. All the reported results are for games with 5 to 12 players (or 5 to 10 in the later sessions).
As this is not a closed environment, the natural variance in these games is very high, and the results depend strongly on which players happen to be playing. Consequently, not enough information has been gathered to make any safe conclusions.
Very early versions of Loki had mixed results on the IRC server, but played too few games to be conclusive. However, it appeared to play at about the same level as the average human participant in the open games, roughly breaking even over the course of about 12,000 hands. Opponent modeling appeared to be much stronger: in one session of 8,894 games (5 to 12 players), a version using generic opponent modeling (GOM) achieved a winning rate of 0.07 small bets per hand (this places Loki comfortably in the top 10% of players who play the 10-20 games on the server). In a later session of 29,301 games (5 to 10 players), another version that used specific opponent modeling (SOM) achieved a winning rate of 0.08 small bets per hand. While the difference between the two modeling versions may not be significant, they both win consistently and perform much better than the previous versions.
Recognizing that many human opponents were easily identifying when Loki had a strong or weak hand (occasional semi-bluffing did not add enough deception), we added some new deceptive strategies: pure bluffs (betting with the weakest hands on the river), balancing raises (occasionally raising instead of calling), and check-raising (following a check with a raise in the same round). Check-raising is normally used with the strongest hands, but to ensure that no information can reliably be gained from any particular action, we also use ``fake" check-raises with mediocre hands. However, these are simply more expert rules ( e.g. check-raise 60% of the time with three callers behind us when EHS'>=0.92). We are interested in machine-dependent approaches to computer poker where Loki can discover for itself what the best strategy is in a situation (which makes it easier to introduce opponent modeling into such decisions). So we are not interested in the performance contribution of any particular strategy. However, the introduction of these advanced tactics probably explains why the following results are better. The two opponent modeling versions that use these features are GOM' and SOM'.
We used these stronger features to see if we could find a noticeable performance difference between GOM' and SOM'. In 35,607 games, SOM' maintained a winning rate of 0.12 small bets per hand. In 36,299 games, GOM' maintained a winning rate of 0.10 small bets per hand. This is stronger evidence that specific opponent modeling is better. In fact, we believe that it may not be worth as much against the weaker class of human players (in the 10-20 game) and may lead to a stronger disparity in the higher level games.
In the stronger IRC game (20-40), earlier versions of the program without opponent modeling lost, averaging about -0.08 small bets per hand in 2,354 games. This is too small a sample size for conclusive results, but strongly suggests it was a losing player overall in these games. Opponent modeling demonstrated a noticeable difference at this level; SOM averaged about +0.05 small bets per hand in 34,799 games. This was probably influenced by good results early on (before the human players had adjusted to the new style of Loki) so it is probably closer to a break-even player. However, this is noticeably better than the earlier version without opponent modeling. We have have not tested GOM, GOM' or SOM' at this level.
A third form of competition was introduced strictly against other computer programs on the IRC server, called BotWarz. In BotWarz I, four programs participated, using three copies of each in a 12-player game. Two programs, R00lbot and Loki, were clearly dominant over the other two, Xbot and Replicat, with the more established R00lbot winning overall. In 39,786 hands, Loki averaged about +0.03 small bets per hand. It should be noted, however, that this competition is representative of only one type of game, where all the players are quite conservative. Replicat in particular performed much better in the open games against human opposition than in this closed experiment, in which it lost the most money. Also, despite the closed environment, the variance was still quite high.
There were also noticeable interdependencies between the different players. Late in the competition, Replicat dropped out and it became apparent that Loki may have been taking advantaging of this player more than the other two. In a final session of 23,773 hands, Loki lost 0.03 small bets per hand.
After some changes, like the introduction of the new pre-flop system (but prior to opponent modeling) Loki participated in BotWarz II. Again, there were 3 copies each of 4 different programs; this time the opponents were Prop, R00lbot, and Xbot. Prop only played roughly the first 9,000 hands and Xbot only played roughly the first 18,000 hands. In 10,103 games with all 12 players, Loki averaged a winning rate 0.03 small bets per hand. Against only Xbot and R00lbot it lost at a rate of approximately 0.02.
Finally, BotWarz III was played after the introduction of the 10-player limit. This time there were five programs, with two copies of each. The four opponents were USAbot, Xbot, R00lbot and Fishbot (USAbot and Fishbot are most similar in design to Xbot). The results had a much higher variance because, in addition to only having two copies, numerous players kept dropping out and coming back in. This time, Xbot easily won the most money overall while USAbot and Fishbot lost the most. In the first 19,000 hands of the tournament, both GOM and SOM approximately broke even.
For the latter part of the tournament (significantly longer) GOM and SOM were replaced by GOM' and SOM' (recall they used additional expert rules for bluffing and check-raising). GOM' played 64,037 hands, but half of these were with 8 players. One quarter involved 6 players and the remaining quarter involved only 4 players. Over all the games, it broke even, but with all the players involved (3,840 hands) it achieved a winning rate of 0.05 small bets per hand. SOM' won about 0.01 small bets per hand over all the games, and 0.06 over the 3,840 games with all 10 players. The results of these tournaments suggest that a simple program with a decent betting strategy (expert rules) can play better than a program with many other strengths, but a weak link in its betting strategy.
In addition to programs by hobbyists playing over IRC, there are numerous commercial programs available. However, we have not tested Loki against them because we have not found any with a programmable interface. Hence, it is not known if they are better or worse.
One final important method of evaluation we have not mentioned is the critique of expert human players. Experts can review the play of the computer and determine if certain decisions are ``reasonable" under the circumstances, or are indicative of a serious weakness or misconception. Based on this opinion, it appears to be feasible to write a program that is much stronger than the average human player in a casino game, although Loki has not yet achieved that level.