The First International RoShamBo Programming Competition -------------------------------------------------------- Traditional research in computer game playing has focused on chess, with some impressive accomplishments. In recent years, attention has shifted to other game domains, offering many new challenges. A game like poker has several properties that have not been well studied, and the imperfect information aspect of the game strikes at some of the fundamental problems in computer science. To play poker at the highest level, a player must understand the opponent's style in order to maximize their results. There is no single best play in a given situation. For example, one would call a bet much more often against a player who bluffs too frequently, but fold a marginal hand against an opponent who never bluffs at all. Unlike other games, opponent modeling cannot simply be ignored in poker, and will probably play a large role in producing a world-class computer program. While other aspects of poker may eventually be well-solved, there is likely no "right way" to deduce an opponent's decision process. This particular problem can be investigated with an even simpler game -- RoShamBo. RoShamBo, also known as "Rock-Paper-Scissors", is a simple game known to almost everyone. On the surface, it seems trivial, but this belies some enormous strategic complexity. From a game-theoretic point of view, the optimal strategy is simple and well-known -- a player should just choose their action uniformly at random, one third of the time for each. But what happens when two intelligent players are actively trying to defeat each other? The First International RoShamBo Programming Competition was held in September of 1999, attracting more than 50 entrants from 10 countries. The participants were required to write a C procedure that could play a 1000 turn match within one second. The exact protocol was made explicit by providing them with a program to conduct round-robin tournaments, along with six simple example players (such as "beat the most frequent pick by the opponent"). The complete rules of the contest, results, and original reports can be found on the World Wide Web via: http://www.cs.ualberta.ca/~darse/rsbpc.html Some of the participants in the competition were highly knowledgeable researchers in Artificial Intelligence game-playing, and most of them were surprised by the complexity involved in the seemingly simple task of writing a strong Rock-Paper-Scissors program. Many different approaches were used, and the authors of all the top programs went through several versions as they discovered the richness of this "trivial" problem. To win a tournament, it is not enough to break even with every opponent, so the optimal strategy is useless. The competitors were told that some of the players would be simple "dummy bots", which could be exploited with varying degrees of success. It was necessary to extract as many points as possible in these matches, since the margin of victory would count in the final standings. However, the more important factor turned out to be the results of the head-to-head matches between the entries themselves. To win a match, a program must be able to predict an opponent's actions, without being predictable itself. To do this, the algorithm must detect patterns and tendencies in the play of the opponent, and then employ an appropriate counter-strategy. RoShamBo (and it's even simpler cousin, the Penny-Matching game) is an example of a pure prediction game. The difficulty lies in everything that is associated with opponent modeling, or trying to outwit an adversary. There is a lot of theory that can be brought to bear on the problem, including but not limited to statistics, advanced game theory (the "best-response dynamic in fictitious play"), prediction models, information theory, encryption, and even philosophical meta-theory. So what is to be gained by playing this silly little kid's game? Many other problems deal with some form of context analysis or meta-reasoning (thinking about thinking about ...). To quote Jason Hutchens (author of MegaHAL, which was probably the best pattern detector in the tournament): A good predictive algorithm will be able to play RoShamBo very well, without being explicitly programmed to solve that task. A few applications of such algorithms are data compression, grammatical inference, speech recognition, data mining, natural language understanding, syntactic pattern recognition, speech segmentation, machine translation, text generation, spelling correction, author identification, email classification, image recognition, stock market analysis, finding structure in data, analysis of DNA sequences, analysis of music, input methods for disabled users, and playing RoShamBo! Several other lessons were learned, and more than a few myths were shattered. In fact, the number of misconceptions surrounding the game came as quite a surprise. Here are two of the more common ones. Myth: Random (Optimal) can't be beat. The optimal strategy won't lose a match by a statistically significant margin, but it also won't *win* a match, regardless of how predictable the opponent is. Try winning a chess tournament by drawing every game! Moreover, the statement isn't even true in a more fundamental sense. Opportunistic strategies can be theoretically better, having positive expectation under more realistic assumptions. People interested in advanced game theory may enjoy the recent book "The Theory of Learning in Games" by Fudenberg and Levine (MIT Press, 1998). In the actual competition, Random (Optimal) finished in the middle of the pack, as expected. Although it didn't lose a match, it also had no chance of winning the tournament. Myth: Since all non-optimal strategies can, in theory, be exploited, the result of a tournament will be a crap-shoot. At the very least, the outcome will be highly sensitive to the exact composition of players (algorithms) in the tournament. The premise is true, but the conclusion is false. Any non-optimal algorithm can be beaten, just by employing the same algorithm and "adding one" to the action. But complex algorithms are not vulnerable to such an attack. In general, they can only be beaten by an opponent who does a superior job of analysis. There are many levels of complexity for playing algorithms, which can differ in the way they use history (context), in their perceptiveness of the opponent strategy, and in their defensive ability (hiding their own strategy). By in large, the more information a program processes, the better it will play the game. In hundreds of preliminary experiments leading up to the official event, the distinctions between different classes of algorithms was obvious. New programs were constantly being added or deleted from the mix, and the degree of predictability and exploitability of the dummy bots varied widely as they were tuned. Regardless of the exact field of players, the top six programs almost never changed position. The second tier of programs, which were closer to each other in strength, always changed positions amongst themselves, and the bottom four programs never changed. The results were remarkably robust, and changing the length of a match down to 400 turns or up to 10000 turns had a negligible effect. ... Results ------- The winner of the First International RoShamBo Programming Competition was _Dan Egnor_, for his program _"Iocaine Powder"_. Dan wrote an incredibly strong Rock-Paper-Scissors program, which simply dominated every aspect of the competition. Twenty-five independent tournaments were run for the Open Competition, and Iocaine Powder won ALL of them, with an average margin of victory of more than 1000 points. In the six sets of 25 tournaments conducted for the "Best of the Best" competition, Iocaine Powder finished first every time. In many ways, Dan's program was a generation ahead of its time. It would have been a worthy winner of the *second* RoShamBo competition, after all the lessons from the first event had been learned, and the ideas shared. By co-operating with his fellow alumni from Caltech, he greatly improved a previous version of his program which was already strong enough to win the competition! The program name refers to a scene from the movie "The Princess Bride". The Man In Black has added iocaine powder, a deadly poison, to one of two goblets of wine. Vizzini the Sicilian must choose a goblet, and then each will drink their wine. It is a battle of wits to the death, as Vizzini must deduce what level of trickery the Man In Black used in choosing the goblet to poison. If you don't know the outcome, rent the movie (it is terrific). Vizzini begins his reasoning with: "Now, a clever man would put the poison into his own goblet, because he would know that only a great fool would reach for what he was given. I am not a great fool, so I can clearly not choose the wine in front of you. But you must have known I was not a great fool, you would have counted on it, so I can clearly not choose the wine in front of me..." Like The Man In Black, Iocaine Powder was effectively able to use this type of reasoning to outwit and defeat its adversaries. There were essentially two competitions, differing in the way a match was scored. In the Open Competition every point counted, so it was important to maximize one's score against the weaker opponents, and minimize the loss to the stronger opposition. In the "Best of the Best" tournaments, only the match result (win, loss, or a draw) was used to determine the order of finish -- the magnitude of a win did not matter. A draw is defined by the mathematical variance for a match, based on randomness (at the 95% confidence interval this is +/- 10 points over 25 events, which was also the observed maximum for Random (Optimal)). In the following summary, "Open" is the finish in the Open Competition, "BoB" is the rank in the "Best of the Best" series of tournaments, "Leng" is the approximate length of the program in terms of C statements, and "Nat" is the nationality of the program author. A "*" indicates a dummy bot. The overall standing was based on the average rank. Ratings were computed by Michael Gherrity for all programs released by the authors after the tournament. All ratings are based on 500 matches, and are accurate to within +/- 30 points, using the Glicko rating system (now used for chess ratings by the USCF). Overall Standings Rank Program Name Rating Open BoB Leng Nat Author 1 Iocaine Powder 2329 1 1 134 USA Dan Egnor 2 Phasenbott 2234 2 2 99 USA Jakob Mandelson 3 MegaHAL 2036 3 7 149 Aus Jason Hutchens 4 RussRocker4 2058 4 8 120 USA Russ Williams 5 Biopic 1979 6 6 80 Can Jonathan Schaeffer 6 RoShamBot unr 13 4 91 USA Perry Friedman 7 Simple Modeller 2034 7 13 135 UK Don Beal 8 Robertot 1913 8 12 53 Ger Andreas Junghanns 9 RoShamBot_Off unr 5 20 90 USA Perry Friedman 10 Boom 1909 10 18 208 Net Jack van Rijswijck 11 Shofar 2018 17 11 98 USA Rudi Cilibrasi 12 Bugbrain 1948 14 15 51 Can Sunir Shah 13 ACT-R Lag2 1991 15 14 20 USA Bothell, Lebiere, West 14 Simple Predictor 1871 9 21 63 UK Don Beal 15 Majikthise 2063 25 5 62 Can Markian Hlynka 16 * De Bruijn 1932 28 3 5 17 Granite 1781 11 23 97 Can Aaron Davidson 18 Vroomfondel 1995 24 10 62 Can Markian Hlynka 19 Marble 1800 12 24 95 Can Aaron Davidson 20 Context Predictor unr 21 16 46 USA Llew Mason 21 * Pi 1938 31 9 5 22 ZQ Bot 1754 16 26 99 Can Neil Burch 23 JM's best player unr 20 25 49 Spa Jose Miguel 24 Sweet Rocky 1695 18 30 36 Mex Lourdes Pena 25 Piedra 1729 19 31 23 Mex Lourdes Pena 26 Inconceivable unr 33 17 23 USA Bob Lord 27 Meta-Meta-Random 1685 32 19 1 UK Mark Bassett 28 Mixed Strategy 1700 23 29 53 UK Thad Frogley 29 * Anti-rotn 1695 22 32 40 30 PsychoRock unr 26 28 308 Can Gaspard Petit 31 Benbot unr 27 27 56 USA Ben Glasson 32 * Foxtrot 1529 36 22 4 33 QDScanner unr 29 34 135 Fin Mika Rasanen 34 Abbot unr 30 33 16 Fin Mikko Viljakainen 35 GNADS unr 37 36 21 Can Denis Papp 36 Ramdu unr 35 43 18 USA Randy Saint 37 Drew's Bot unr 34 46 97 USA Andrew Prock 38 Multi-strategy 1617 42 38 86 Can Mark James 39 Asterious unr 47 35 16 Gre Kastellanos Nikos 40 * Flat 1406 40 44 16 41 The Analyzer unr 44 40 33 Fin Sami Tirkkonen 42 * Add-drift 1468 38 47 9 43 Naivete unr 46 39 86 USA Brandon Stone 44 Inocencio 1682 48 37 95 UK Rafael Morales 45 * Copy-drift 1428 39 50 9 46 * Text 1393 41 52 5 47 BoomRash unr 52 41 37 USA Roger Murray 48 * Add-react 1324 43 51 9 49 * Switch A Lot 1375 45 49 3 50 Peterbot 1566 53 42 43 USA Peter Baylie 51 Beat Gambits unr 51 45 13 Can Don Papp 52 Knucklehead 1523 49 48 30 Can Sunir Shah 53 Bait and Switch unr 50 55 16 USA Lee Daniel Crocker 54 Tit for Tat unr 54 53 1 Fin Juha Syrjala 55 Cheesebot unr 55 54 4 USA Joshua Schachter Country entries United States 17 Canada 12 United Kingdom 5 Finland 4 Mexico 2 Australia 1 Germany 1 Greece 1 Netherlands 1 Spain 1 Total 10 45 U of Alberta: 11 ... Programs and Authors -------------------- The programs and ideas of selected participants will be discussed in some detail. The algorithms used for the dummy bots are also given, including "De Bruijn", a program based on a fixed string that finished a remarkable third in the "Best of the Best" competition. Many of the entrants generously agreed to release the source code for their programs, including most of the top 20 finishers. A tournament program with all of those algorithms is available via the above URL. This will provide a good test suite for developing the next generation of programs to compete in the second competition, scheduled for early this summer. Iocaine Powder, Dan Egnor Dan is a former member of an ACM programming competition team for the California Institute of Technology, and other members of that team played a role in the development of Iocaine Powder. His approach was based on an algorithm mentioned to him by Rudi Cilibrasi (Shofar) some years ago. Dan gave his "historybot" to other members of the Caltech ACM alumni, and challenged them to do better. Through open discussion and friendly competition, the group improved their ideas and constructed stronger programs. Other contributors were Wei-Hwa Huang, who built a large test suite of programs, and Joshua Schachter. (Joshua must have been amused with the surprising incompetence of his little Cheesebot, and he probably submitted the cellar-dweller for that reason). During its development phase, Iocaine Powder was overtaken by some other programs, but Dan Egnor met every challenge. The version he submitted 10 days before the deadline won every single test tournament for a week, regardless of which other algorithms participated (including the very strong MegaHAL program). But he wasn't done yet! Dan Egnor has released the source code to Iocaine Powder. You can have a look at the Rock-Paper-Scissors World Champion at: http://ofb.net/~egnor/iocaine.html Dan's explanation of how Iocaine Powder works is appended to the end of this article. Phasenbott, Jakob Mandelson Jakob is another Caltech alumni who based his program on one of the earlier versions of Iocaine Powder. His refinements to the basic idea resulted in significant improvements, and when he submitted Phasenbott, it became the new favourite, surpassing its mentor by about 800 points in the Open Competition. However, its reign was short-lived, as Dan submitted the newest version of Iocaine Powder the following day, which reclaimed the lead by a similar margin. There were many strong players entered in the competition, but these two were in a class by themselves. So what is their secret? One key element is the generalized approach for out-smarting the opponent. This is probably the most critical feature, as neither program is as strong as some other entries in terms of pattern detection and statistical analysis. Iocaine Powder and Phasenbott implicitly deduce the opponent's prediction method, and counter that, rather than just the actual moves played. In a sense, they try to beat whichever player is most predictable -- the opponent, or themselves! MegaHAL, Jason Hutchens Jason is a PhD student at the University of Western Australia, in Perth, and his range of expertise includes many aspects of prediction models. In 1996, he was the winner of the first ever Loebner Prize, which is a "Turing Test" contest, where each program has a (written) conversation with judges, and tries to be indistinguishable from humans. He submitted his first RoShamBo program three days after the contest announcement, using an "off the shelf" predictor, based on a finite context Markov model. He later upgraded it to use an infinite context Markov model maintained as a ternary trie, and named it after his other championship program, MegaHAL. This was easily the strongest program received in the early going, winning dozens of preliminary tournaments in the week prior to the submission of Iocaine Powder. It is probably still the best "context sniffer" in the competition, but it was not as strong as Iocaine Powder at deducing and staying ahead of the opponent's method of prediction. Jason Hutchens has released the code to MegaHAL. You can have a look at the strongest pattern recognizer in the contest, and read Jason's explanation of how it works at: http://ciips.ee.uwa.edu.au/~hutch/puzzles/roshambo/ Jason has also provided a shell routine, so humans can play against any of the programs more conveniently. RussRocker4, Russ Williams Russ' program is another extremely good pattern recognizer, but using a more straight-forward application of integer arrays to store and process statistics. It is similar to the approach used by many contestants, but was taken quite a bit further, with multi-dimensional arrays of up to 3^7 in size. This provided excellent contextual analysis, combined with the program's other abilities. Unfortunately, one common idea that he failed to encorporate was a "bail-out strategy", for when the program is getting badly beaten. In the Open Competition, RussRocker4 took massive losses to the top two programs and a large loss to RoShamBot. Had he employed such a fail-safe, he could have reduced those losses by more than 1000 points, and moved into third place ahead of MegaHAL. Although Russ did think of the idea, he felt it "wasn't in the spirit of the competition". Biopic, Jonathan Schaeffer Dr. Schaeffer is the famous author of Chinook, the checkers program which became the first computer program to win an official world championship against humans in a game of skill. Dr. Schaeffer heads up the computer game-playing, analytical methods, and empirical studies (GAMES) group at the University of Alberta, which is a world leader in that area of research. Jonathan couldn't resist the opportunity to show-up his students, and took on the challenge of writing a RoShamBo program only a few days before the deadline. Biopic uses straight-forward statistical and historical analysis, combined with an expected value calculation for each of several plausible opponent strategies. RoShamBot, Perry Friedman Perry graduated from Stanford University, and is now with SportsRocket. His program, the RoShamBot, has been playing against humans on the internet for years. He is one of the "Tiltboys", a notorious group of gamblers well known to the rec.gambling.poker community. Among their many "accomplishments", they deserve credit for popularizing both the name "RoShamBo", and the game itself. The RoShamBot uses good frequency stats and history analysis, and does an accurate evaluation of the resulting distribution -- one of the few programs to compute expected values correctly. The program has an interesting personality, in that it plays quite defensively to avoid giving away information to the opponent. It is designed to win a match in the long run, rather than maximize the net score in the short term. In fact, it cannot score more than 500 points out of 1000, even against Always-Rock! Naturally this greatly reduced its chances in the Open Competition, but it was the early favourite for any type of "evolutionary" competition, because it never lost a match. Perry changed a parameter of his program to be more maximal against weak opposition, and RoShamBot_Off performed much better than the original in the Open Competition. You can play the RoShamBot on the World Wide Web at SportsRocket: http://www.sportsrocket.com/cgi-bin/roshambo/roshambot.cgi Simple Predictor and Simple Modeller, Don Beal Dr. Beal is well known to the Artificial Intelligence game-playing community for his work in computer chess. His first submission was a remarkable achievement in the "most for the least" category, as it was sent only a few hours after learning of the competition, and is only 35 lines of code! It does a standard pairwise analysis of the history of both players, but he appears to get more mileage out of that approach. Simple Predictor was a reliable benchmark during testing, as it only lost to the strongest programs, and was always the "best of the rest". It was also one of the few programs with a history decay function to favour more recent actions, which showed up in its results against the Add-drift and Copy-drift dummy bots. Later, he submitted the Simple Modeller, which takes into account the opponent's attempt to predict it, again in his characteristically succinct style. His experience was demonstrated again after the competition, when his General Modeller program finished almost exactly even with Iocaine Powder, based purely on statistical methods. Robertot, Andreas Junghanns Andreas recently finished his PhD at the U of A (under the supervision of Dr. Schaeffer), and he is a leading authority in single-agent search. He was the first person to submit an entry (named after his son Robert, who was 4 days old at the time), and it remained one of the best. Robertot uses a voting scheme, where different lines of analysis make a prediction, and give it a weight based on the statistical significance of the observation. If a particular pattern or tendency is highly aberrant, the associated weight for that line of reasoning can give what amounts to "veto power" for that prediction. If none of the analysis leads to a result that is more extreme than a random noise level, the program chooses randomly (which often enables it to draw against very strong opposition). The implementation of the idea was crude, but it was still highly effective. Boom, Jack van Rijswijck Jack is a PhD student at the U of A. He is the author of a Hex program and an Awari program that are both ranked among the best in the world. His RoShamBo idea was somewhat similar to Robertot, but was much more rigourous in design. In fact, Jack spent a week doing the math before writing a single line of code! All of Boom's actions are based on a decision tree, using the computed statistical significance for each observation. It also does a thorough analysis of bail-out strategies, recent success in the match, and uses "gear shifting" to adapt to the opponent. Boom played significantly better in post-event testing, after changing a single parameter setting. ACT-R Lag2, Dan Bothell, Christian Lebiere, and Robert West These three men are associated with the Department of Psychology at Carnegie Mellon University. The ACT group studies the theory and architecture of cognition. ACT-R principles have been demonstrated in a number of applications, one of which was Rock-Paper-Scissors. This was another entry in the "most for the least" category, since it is only 20 C-statements in length. Despite its brevity, ACT-R does pairwise context analysis of the opponent's history, uses a negative exponential history decay function, and a normally distributed random noise function! You can play the program on the World Wide Web at: http://act.psy.cmu.edu/ACT/ftp/models/RPS/index.html "Dummy bots", Darse Billings (tournament organizer) As explained in the contest announcement, some of the programs competing in the tournament were intentionally weak. This ensures that entrants try to win matches and maximize their winning margin, since it is not enough to simply avoid losing (as with the Random (Optimal) strategy). Most of the dummy bots used a strategy that is oblivious to the opponent, and did not try to win the match. In general, they were more exploitable by stronger algorithms, but could achieve a draw or even win against weaker opposition. The results against the dummy bots show that there is still room (and will always be room) for better pattern detection in RoShamBo players. The dummy bots were designed to be fairly elusive, without being so hard to trace as to be effectively random. In some cases, they were tuned by introducing some amount of random noise to lessen the obviousness of the pattern, so that none of the dummy programs lost by massive scores. The objective was to have a wide range of exploitation among the entries, in order to identify the strengths and weaknesses of each program. However, if the entire set of dummy bots is removed, the results of the competition are not significantly different. The Random (Optimal) program competed as Meta-Meta-Random. Mark Bassett was allowed to use this entry since he submitted it before it was ruled ineligible. He was certain it would finish in first place, but it only broke even, drawing every match, as expected. The "Pi bot" was based on the digits of pi. Each digit was returned modulo 3, unless the digit was 0, in which case it was skipped. In theory it was possible to guess this algorithm, and thereby score close to 100% against it. Of course, it would be both uninteresting and futile to try to guess all such silly algorithms, so no one was expected to detect the pattern. Pi performed better than Random (Optimal), suggesting that the digits of pi may be more regular than a truly random sequence, in a manner similar to the De Bruijn sequence. This is interesting, because the digits of pi have passed all known tests of pseudo-randomness (although those tests are not normally performed on the earlier digits of pi). Another of the several pre-determined sequences was called De Bruijn. A De Bruijn sequence is a cyclic string which contains all possible substrings of a given length exactly once. For example, "rrpspprss" is a ternary string of length nine containing all substrings of length two. The intention was to use this as a sequence in a simple repeating cycle, since it would be "statistically well-balanced". Somewhat surprisingly, this type of string finished second in one test tournament, and had better results against the *stronger* programs! A little bit of thought reveals why this fixed sequence is successful. Any string will appear to contain patterns, even if it is random. The De Bruijn sequence naturally gives the appearance of certain tendencies, but then automatically switches away from repeating those same patterns. If the opponent is over-zealous in its exploitation of any perceived weakness, it will be continuously deceived and punished. The stronger the opponent is at detecting subtle patterns (that don't actually exist) the more susceptible it is to the decoy strategy that the De Bruijn string implicitly employs! Other programs attempted to use a "bait and switch" idea, but without much success, finishing near the bottom of the crosstable. A simple repetition of the same De Bruijn string was vulnerable against the stronger programs, for two reasons. First, many entrants had a large context length, and were simply able to see the repetition. Secondly, the sequence has a detectable deviance from random, because many of the longer substrings never occur. For example, a De Bruijn string of length nine never plays the same move three times in a row (in fact, only one-third of the length three substrings are ever used). This type of weakness is exploitable by even the most basic predictors, at least partially. The same flaws exist for a cyclic string of length 27, but they are considerably harder to find, in a statistical sense. For the official competition, the De Bruijn bot used a concatenation of 13 different sequences of length 81 (containing all substrings of length four). In a triumph of ignorance over intelligence, De Bruijn finished third in the "Best of the Best" series, losing only to Iocaine Powder and Phasenbott. Its performance improved significantly in the later rounds, because it had only drawn (or lost) against the weakest entrants! Is the De Bruijn sequence a "magic bullet" for RoShamBo? Probably not. Some of its success may be due to a flaw in the way most programs handle statistics. If they only look for over-represented patterns, rather than the equally important under-represented occurrences, then De Bruijn will be a very good counter-strategy to any greedy exploitation algorithm. Another pre-set sequence of actions was the Text bot, suggested by Perry Friedman. It is simply a block of English text whose ascii characters have been converted to RoShamBo actions by taking their value modulo 3. Since natural language contains many recurring patterns and non-random frequencies, about half of the contestants were able to beat this string by a score in the range of 100-200. Of the six dumb algorithms included in the original sample program, the one that seemed to give decent predictors the most trouble was "Always Switchin'". This is a bit of a surprise, since it takes very little analysis to discover that the opponent never repeats a move, and the simple counter-strategy would net +500 points per 1000. Even some of the strongest programs did no better than +350. This may be another indication that they are looking for patterns directly, rather than for the *absence* of patterns that should be present. Switch-A-Lot had a 12% chance of repeating the previous move. This made it very susceptible to the pairwise frequency based predictors, including many programs that did not finish very high in the standings. The best exploiter was the simple Anti-rotn bot, which scored close to the maximum. In contrast, five of the strongest head-to-head players, including Iocaine Powder and Phasenbott, scored less than +100 points on average. The Foxtrot sequence has a pattern that is easy to generate, but is not easy to detect. It had the form: [rand prev+2 rand prev+1 rand prev+0] repeating, where rand is a random move, and prev is the previous move. As a result, all of the odd numbered moves are random, but the even moves are completely deterministic (albeit not in an obvious way). It turned out to be more susceptible to the direct-history algorithms like Iocaine Powder and Biopic, rather than the strong pattern detectors like MegaHAL. It lost to most programs by only a small margin. The Flat bot generated a distribution of moves that were more uniform than expected for a random sequence. This is a common weakness in human generated sequences. For example, if you ask someone to write down a "random" series of 20 heads or tails, they will normally avoid having many more of one than the other. If you then ask them to write down the outcomes of 20 actual coin tosses, you can usually identify the truly random sequence just by looking for the one with the longest run of heads or tails. The Flat bot had an 80% chance of choosing the least frequent action in its history, and only a 10% chance of selecting the most frequent. This distortion causes other statistical anomalies that good predictive algorithms can exploit, so most programs did reasonably well against the Flat bot. However, no one approached the maximum exploitability of -420 points, which was measured by playing it against the Anti-Flat algorithm. The Copy-drift program biased its decision toward the opponent's previous move, but also "changed gears" over time. If a player always went rock, it would mimic that move two-thirds of the time to start, but shift to favour scissors, then paper, continuing to rotate its preference every 111 turns. The intention was to have a strategy that was fairly easy to deduce, but that gradually changed over time, in a predicatable way. The Add-drift player was similar, but based its decision on the sum of both player's previous moves. As with Copy-drift, it played a random move with 50% probability. This pattern was less detectable to the majority of programs, since they did not consider trends based on the moves of both players. Don Beal's Simple Predictor and Simple Modeller, which use a history decay function to favour recent observations over older ones, had a definite advantage against both of the "drift bots". The Add-react dummy bot had a different mechanism for changing gears, depending on its recent results in the match. This behavior gave a more chaotic image, and the noise level had to be reduced considerably before any of the programs could defeat it by a significant margin. The Anti-rotn bot was the only dummy bot that actually analyzed the opponent's moves and tried to win. It had a minimal use of history, counting only the "rotations" in the opponent's move sequence. If the opponent repeated an action, went up by one, or up by two, it was counted as a 0, +1, or -1 rotation, respectively. This means that it treats the actual choice of rock, paper, or scissors as irrelevant, and considers only the difference between each pair of moves. The "Anti-rotten-bot" looked for both maximum and minimum occurrences in the distribution, which many of the entries failed to do. For example, if a prediction algorithm believes the probability distribution for the opponent's next action is 20-39-41 (20% rock, 39% paper, 41% scissors), several programs counter only the maximum likelyhood, and would play rock to beat the expected scissors. However, there is a much stronger opponent tendency toward "not rock", and playing scissors would have a much higher expectation. In general, the expected value of each action "x" is simply: EV[x] = Pr[x+2] - Pr[x+1] where "+" is understood to be modulo 3. So in the 20-39-41 example, EV[rock] = +2%, EV[paper] = -21%, and EV[scissors] = +19%. The other feature built into the Anti-rotn bot was a simple fail-safe, set at 4% of the total match length. By playing randomly (optimal) if it fell behind by 40 points in the match, it saved hundreds of points against stronger opposition. However, this approach does not prevent a match loss. Even though a -40 result is less than the noise level for a single match (+/- 50.6), many independent tournaments can be run to reduce the variance to an arbitrarily small amount, making the average negative result decisive. Some programs, such as Biopic, Robertot, and Boom, only deviate from random when there appears to be a statistically significant weakness to exploit. This enables them to draw against very strong opposition, who do not exhibit such patterns. Interestingly, Iocaine Powder and Phasenbott were still able to defeat these programs by a small margin, presumably by behaving predictably enough to induce a pattern in their opponent's replies, which they could then exploit for a net profit! One final note about generality. Several programs were based on the idea of using several different strategies, and choosing the one that would have been most successful in the past. One of the smoothest systems for adaptation was used by Mixed Strategy by Thad Frogley. The weakness of this approach is that it is only as good as the best substrategy it considers. Combining different specific techniques is doomed to yield only an incremental improvement in performance. Overall, these programs did not do very well. In contrast, Iocaine Powder uses a much more general (and elegant) method for analyzing different strategies, based on six levels of meta-reasoning, each applied to its own strongest prediction method. MegaHAL also takes a generalized approach to the task of recognizing patterns. The lesson is clear: general techniques are much stronger than specific algorithms for RoShamBo. Conclusion ---------- The programming competition was more successful than we had hoped, thanks to the participation of so many interested people. Everyone had fun, and learned something in the process. In particular, it was a valuable exercise for studying the problem of opponent modeling, in isolation from other complicating factors. The most successful ideas are more easily understood as a result, and can be readily applied to more complex domains, like poker. It will be interesting to see what the next generation of RoShamBo programs are capable of, given the full disclosure of all the main contenders from the first contest. The _Second International RoShamBo Programming Competition_ is tentatively scheduled for May, 2000. Please check the aforementioned website for recent updates and details. ... Iocaine Powder, as explained by Dan Egnor They were both poisoned. Iocaine Powder is a heuristically designed compilation of strategies and meta-strategies, entered in Darse Billings' excellent First International RoShamBo Programming Competition. You may use its source code freely; I ask only that you give me credit for any derived works. I attempt here to explain how it works. Meta-Strategy RoShamBo strategies attempt to predict what the opponent will do. Given a successful prediction, it is easy to defeat the opponent (if you know they will play rock, you play paper). However, straightforward prediction will often fail; the opponent may not be vulnerable to prediction, or worse, they might have anticipated your predictive logic and played accordingly. Iocaine Powder's meta-strategy expands any predictive algorithm P into six possible strategies: P.0: naive application Assume the opponent is vulnerable to prediction by P; predict their next move, and play to beat it. If P predicts your opponent will play rock, play paper to cover rock. This is the obvious application of P. P.1: defeat second-guessing Assume the opponent thinks you will use P.0. If P predicts rock, P.0 would play paper to cover rock, but the opponent could anticipate this move and play scissors to cut paper. Instead, you play rock to dull scissors. P.2: defeat triple-guessing Assume the opponent thinks you will use P.1. Your opponent thinks you will play rock to dull the scissors they would have played to cut the paper you would have played to cover the rock P would have predicted, so they will play paper to cover your rock. But you one-up them, playing scissors to cut their paper. At this point, you should be getting weary of the endless chain. "We could second-guess each other forever," you say. But no; because of the nature of RoShamBo, P.3 recommends you play paper -- just like P.0! So we're only left with these three strategies, each of which will suggest a different alternative. (This may not seem useful to you, but have patience.) P'.0: second-guess the opponent This strategy assumes the opponent uses P themselves against you. Modify P (if necessary) to exchange the position of you and your opponent. If P' predicts that you will play rock, you would expect your opponent to play paper, but instead you play scissors. P'.1, P'.2: variations on a theme As with P.1 and P.2, these represent "rotations" of the basic idea, designed to counteract your opponent's second-guessing. So, for even a single predictive algorithm P, we now have six possible strategies. One of them may be correct -- but that's little more useful than saying "one of rock, scissors, or paper will be the correct next move". We need a meta-strategy to decide between these six strategies. Iocaine Powder's basic meta-strategy is simple: Use past performance to judge future results. The basic assumption made by this meta-strategy is that an opponent will not quickly vary their strategy. Either they will play some predictive algorithm P, or they will play to defeat it, or use some level of second-guessing; but whatever they do, they will do it consistently. To take advantage of this (assumed) predictability, at every round Iocaine Powder measures how well each of the strategies under consideration (six for every predictive algorithm!) would have done, if it had played them. It assigns each one a score, using the standard scoring scheme used by the tournament: +1 point if the strategy would have won the hand, -1 if it would have lost, 0 if it would have drawn. Then, to actually choose a move, Iocaine Powder simply picks the strategy variant with the highest score to date. The end result is that, for any given predictive technique, we will beat any contestant that would be beaten by the technique, any contestant using the technique directly, and any contestant designed to defeat the technique directly. Strategies All the meta-strategy in the world isn't useful without some predictive algorithms. Iocaine Powder uses three predictors: Random guess This "predictor" simply chooses a move at random. I include this algorithm as a hedge; if someone is actually able to predict and defeat Iocaine Powder with any regularity, before long the random predictor will be ranked with the highest score (since nobody can defeat random!). At that point, the meta-strategy will ensure that the program "cuts its losses" and starts playing randomly to avoid a devastating loss. (Thanks to Jesse Rosenstock for discovering the necessity of such a hedge.) Frequency analysis The frequency analyzer looks at the opponent's history, finds the move they have made most frequently in the past, and predicts that they will choose it. While this scores a resounding defeat again "Good Ole Rock", it isn't very useful against more sophisticated opponents (which are usually quite symmetrical). I include it mostly to defeat other competitors which use it as a predictive algorithm. History matching This is easily the strongest predictor in Iocaine Powder's arsenal, and variants of this technique are widely used in other strong entries. The version in Iocaine Powder looks for a sequence in the past matching the last few moves. For example, if in the last three moves, we played paper against rock, scissors against scissors, and scissors against rock, the predictor will look for times in the past when the same three moves occurred. (In fact, it looks for the longest match to recent history; a repeat of the last 30 moves is considered better than just the last 3 moves.) Once such a repeat is located, the history matcher examines the move our opponent made afterwards, and assumes they will make it again. (Thanks to Rudi Cilibrasi for introducing me to this algorithm, long ago.) Once history is established, this predictor easily defeats many weak contestants. Perhaps more importantly, the application of meta-strategy allows Iocaine to outguess other strong opponents. Details If you look at Iocaine Powder's source code, you'll discover additional features which I haven't treated in this simplified explanation. For example, the strategy arsenal includes several variations of frequency analysis and history matching, each of which looks at a different amount of history; in some cases, prediction using the last 5 moves is superior to prediction using the last 500. We run each algorithm with history sizes of 1000, 100, 10, 5, 2, and 1, and use the general meta-strategy to figure out which one does best. In fact, Iocaine even varies the horizon of its meta-strategy analyzer! Strategies are compared over the last 1000, 100, 10, 5, 2, and 1 moves, and a meta-meta-strategy decides which meta-strategy to use (based on which picker performed best over the total history of the game). This was designed to defeat contestants which switch strategy, and to outfox variants of the simpler, older version of Iocaine Powder. Summary One must remember, when participating in a contest of this type, that we are not attempting to model natural phenomena or predict user actions; instead, these programs are competing against hostile opponents who are trying very hard not to be predictable. Iocaine Powder doesn't use advanced statistical techniques or Markov models (though it could perhaps be improved if it did), but it is very devious. It is, however, by no means the last word. Iocaine may have defeated all comers in the first tournament, but I have no doubt that my opponents will rise to the challenge in upcoming events. ... Tournament Crosstables Each of the following abridged crosstables is a compilation of 25 independent round-robin tournaments, averaged and presented as a single event. Each match was 1000 turns in length. Complete crosstables are available at the RoShamBo Programming Competition website. Open Event: Player Name total 1 2 3 4 5 6 7 8 9 10 1 Iocaine Powder 13038 0 114 53 523 318 18 36 34 179 4 2 Phasenbott 11995 -114 0 54 535 304 12 37 58 130 19 3 MegaHAL 10703 -53 -54 0 86 205 -6 -5 23 63 15 4 RussRocker4 9759 -523 -535 -86 0 68 -5 7 35 89 22 5 RoShamBot_Off 8566 -318 -304 -205 -68 0 -46 -131 -7 -32 -6 6 Biopic 8335 -18 -12 6 5 46 0 17 27 43 16 7 Simple Modeller 7984 -36 -37 5 -7 131 -17 0 6 270 12 8 Robertot 7335 -34 -58 -23 -35 7 -27 -6 0 27 -4 9 Simple Predictor 6475 -179 -130 -63 -89 32 -43 -270 -27 0 -10 10 Boom 6442 -4 -19 -15 -22 6 -16 -12 4 10 0 11 Granite 6070 -221 -230 -236 -144 -106 -114 -76 -53 -23 -50 12 Marble 5993 -235 -227 -238 -146 -107 -112 -93 -50 -21 -45 13 RoShamBot 5501 -72 -112 50 253 0 0 -8 44 109 20 14 Bugbrain 5149 -16 -30 -7 -15 95 -13 -39 4 -22 15 15 ACT-R Lag2 5077 -66 -60 4 -5 52 -35 0 10 53 20 16 ZQ Bot 4941 -291 -267 -254 -164 -157 -90 -288 -89 -169 -56 17 Shofar 4791 -8 0 8 -4 33 -10 3 -5 18 20 18 Sweet Rocky 4589 -37 -39 -36 -45 -41 -38 -41 -38 -41 -40 19 Piedra 4580 -39 -41 -33 -40 -30 -38 -35 -44 -44 -37 20 JM's best player 4189 -214 -184 -190 -145 -85 -92 -56 -71 4 -22 21 Context Predictor 3916 -337 -43 -35 -88 -38 -326 -8 -86 48 -2 22 * Anti-rotn 3781 -60 -56 -58 -56 -64 -37 -56 -55 -43 -49 23 Mixed Strategy 3069 -51 -65 -59 -55 -24 -47 -83 -46 -56 -25 24 Vroomfondel 2539 -20 -9 20 5 19 -8 5 11 16 13 25 Majikthise 2499 -13 -12 8 5 22 0 -1 -3 11 21 26 PsychoRock 2172 -285 -340 -315 -277 -164 -93 -143 -49 -163 -36 27 Benbot 1917 -227 -227 -188 -290 -237 -145 -123 -109 -50 -84 28 * De Bruijn 1213 -23 -38 31 29 -6 7 38 14 72 4 29 QDScanner 1122 -117 -126 -117 -129 -73 -122 -120 -102 -116 -116 30 Abbot 964 -110 -123 -102 -149 -108 -106 -34 -152 -44 -137 31 * Pi 282 -2 8 13 18 24 -9 7 16 0 4 32 Meta-Meta-Random 5 4 4 0 -9 -1 7 2 -2 5 -1 33 Inconceivable 4 -5 10 7 -3 -1 -7 -2 0 -5 -6 34 Drew's Bot -1009 -57 -50 -68 -64 -71 -51 -60 -48 -79 -26 35 Ramdu -1372 -309 -281 -250 -306 -279 -145 -239 -196 -278 -129 36 * Foxtrot -1710 -350 -380 -60 -179 -72 -375 8 10 13 1 37 GNADS -1872 -598 -570 -517 -706 -417 2 -81 -232 -69 -85 38 * Add-drift -1933 -145 -113 -195 -120 -100 8 -232 -19 -285 -24 39 * Copy-drift -2762 -180 -82 -72 -130 -78 -22 -245 -72 -259 -33 40 * Flat -3293 -203 -88 -135 -104 -169 -108 -98 -69 -81 -30 41 * Text -3853 -104 -87 -148 -70 -156 -120 -133 -174 -159 -109 42 Multi-strategy -3936 -250 -223 -218 -275 -354 -235 -198 -255 -203 -219 43 * Add-react -3993 -303 -320 -284 -341 -366 -17 -108 -73 -117 -93 44 The Analyzer -4255 -412 -326 -275 -407 -494 -191 -297 -518 -303 -334 45 * Switch A Lot -4513 -52 -82 -234 -131 -166 -91 -224 -96 -215 -137 46 Naivete -4565 -486 -556 -449 -533 -274 -568 -121 -459 -120 -478 47 Asterious -4945 -797 -813 -898 -760 -859 -23 -82 -121 -133 -98 48 Inocencio -5165 -232 -139 -231 -224 -213 -145 -451 -34 -437 -95 49 Knucklehead -7698 -563 -390 -382 -478 -361 -489 -249 -414 -269 -430 50 Bait and Switch -8155 -191 -200 -249 -257 -271 -223 -220 -189 -259 -131 51 Beat Gambits -11664 -644 -645 -612 -669 -541 -653 -491 -575 -466 -540 52 BoomRash -11856 -600 -518 -549 -565 -608 -615 -487 -429 -520 -550 53 Peterbot -12067 -902 -926 -897 -932 -862 -739 -522 -745 -581 -431 54 Tit for Tat -28374 -988 -985 -991 -994 -997 -995 -992 -952 -989 -995 55 Cheesebot -36006 -947 -980 -987 -989 -978 -980 -983 -972 -986 -955 Open Event Match results: (maximum = 108) Player Name W L D total 1 Iocaine Powder 49 0 5 103 2 Phasenbott 48 1 5 101 3 Biopic 40 2 12 92 4 RoShamBot 42 4 8 92 5 MegaHAL 40 6 8 88 6 RussRocker4 40 6 8 88 7 Simple Modeller 37 4 13 87 8 Shofar 34 1 19 87 9 ACT-R Lag2 36 7 11 83 10 Vroomfondel 31 2 21 83 11 Bugbrain 34 8 12 80 12 Robertot 35 9 10 80 13 Majikthise 27 4 23 77 14 * De Bruijn 27 4 23 77 15 Boom 33 11 10 76 16 RoShamBot_Off 35 13 6 76 17 Context Predictor 31 12 11 73 18 Simple Predictor 34 15 5 73 19 * Pi 22 5 27 71 20 Granite 32 18 4 68 21 Marble 31 18 5 67 22 ZQ Bot 31 21 2 64 23 JM's best player 30 21 3 63 24 Meta-Meta-Random 0 0 54 54 25 Inconceivable 0 1 53 53 26 Benbot 23 26 5 51 27 PsychoRock 22 26 6 50 28 Mixed Strategy 21 25 8 50 29 Piedra 23 27 4 50 30 Sweet Rocky 22 26 6 50 31 * Foxtrot 10 16 28 48 32 * Anti-rotn 18 27 9 45 33 Abbot 18 28 8 44 34 QDScanner 17 29 8 42 35 Asterious 15 28 11 41 36 GNADS 12 27 15 39 37 Inocencio 17 35 2 36 38 Multi-strategy 14 35 5 33 39 Naivete 13 34 7 33 40 The Analyzer 14 36 4 32 41 BoomRash 14 36 4 32 42 Peterbot 15 37 2 32 43 Ramdu 14 37 3 31 44 * Flat 6 32 16 28 45 Beat Gambits 12 38 4 28 46 Drew's Bot 8 35 11 27 47 * Add-drift 1 28 25 27 48 Knucklehead 8 36 10 26 49 * Switch A Lot 6 35 13 25 50 * Copy-drift 3 35 16 22 51 * Add-react 4 36 14 22 52 * Text 4 37 13 21 53 Tit for Tat 3 39 12 18 54 Cheesebot 6 42 6 18 55 Bait and Switch 2 43 9 13 Round 2 Match results: (maximum = 62) Player Name W L D total 1 Iocaine Powder 27 0 4 58 2 Phasenbott 28 1 2 58 3 RoShamBot 20 3 8 48 4 Biopic 18 2 11 47 5 Majikthise 17 2 12 46 6 * De Bruijn 16 3 12 44 7 Shofar 14 1 16 44 8 RussRocker4 17 5 9 43 9 Vroomfondel 15 3 13 43 10 Simple Modeller 15 5 11 41 11 MegaHAL 17 7 7 41 12 ACT-R Lag2 15 7 9 39 13 Bugbrain 12 7 12 36 14 * Pi 10 5 16 36 15 Robertot 12 8 11 35 16 Context Predictor 14 11 6 34 17 Inconceivable 0 0 31 31 18 Boom 11 12 8 30 19 Meta-Meta-Random 0 2 29 29 20 RoShamBot_Off 12 14 5 29 21 Simple Predictor 11 16 4 26 22 * Foxtrot 6 13 12 24 23 Granite 9 19 3 21 24 Marble 8 19 4 20 25 JM's best player 8 21 2 18 26 ZQ Bot 7 20 4 18 27 Benbot 5 23 3 13 28 PsychoRock 4 24 3 11 29 Mixed Strategy 2 23 6 10 30 Sweet Rocky 2 25 4 8 31 Piedra 2 27 2 6 32 * Anti-rotn 1 27 3 5 Round 3 Match results: (maximum = 30) Player Name W L D total 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 Iocaine Powder 13 0 2 28 x 2 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 Phasenbott 13 2 0 26 0 x 2 2 2 2 2 2 0 2 2 2 2 2 2 2 3 RoShamBot 7 3 5 19 0 0 x 0 1 1 1 2 2 2 1 2 1 2 2 2 4 * De Bruijn 7 3 5 19 0 0 2 x 1 1 2 2 1 2 2 1 2 2 1 0 5 Biopic 5 2 8 18 0 0 1 1 x 1 1 2 1 1 1 2 2 2 1 2 6 Majikthise 2 2 11 15 0 0 1 1 1 x 1 1 2 1 1 1 1 1 1 2 7 Vroomfondel 2 3 10 14 0 0 1 0 1 1 x 1 2 1 1 1 2 1 1 1 8 MegaHAL 4 5 6 14 0 0 0 0 0 1 1 x 1 2 1 2 1 1 2 2 9 * Pi 3 4 8 14 1 2 0 1 1 0 0 1 x 1 1 1 2 2 1 0 10 RussRocker4 4 5 6 14 0 0 0 0 1 1 1 0 1 x 1 2 1 2 2 2 11 Shofar 0 2 13 13 1 0 1 0 1 1 1 1 1 1 x 1 1 1 1 1 12 Robertot 1 6 8 10 0 0 0 1 0 1 1 0 1 0 1 x 1 1 1 2 13 Simple Modeller 1 6 8 10 0 0 1 0 0 1 0 1 0 1 1 1 x 1 2 1 14 ACT-R Lag2 1 7 7 9 0 0 0 0 0 1 1 1 0 0 1 1 1 x 1 2 15 Bugbrain 0 6 9 9 0 0 0 1 1 1 1 0 1 0 1 1 0 1 x 1 16 Context Predictor 2 9 4 8 0 0 0 2 0 0 1 0 2 0 1 0 1 0 1 x * tie-break by number of match wins: MegaHAL and RussRocker4 advance; Vroomfondel and Pi do not Round 4 Match results: (maximum = 14) Player Name W L D total 1 2 3 4 5 6 7 8 1 Iocaine Powder 7 0 0 14 x 2 2 2 2 2 2 2 2 Phasenbott 6 1 0 12 0 x 2 2 2 2 2 2 3 * De Bruijn 5 2 0 10 0 0 x 2 2 2 2 2 4 RoShamBot 2 3 2 6 0 0 0 x 1 1 2 2 5 Majikthise 1 3 3 5 0 0 0 1 x 1 2 1 6 Biopic 1 3 3 5 0 0 0 1 1 x 2 1 7 MegaHAL 1 6 0 2 0 0 0 0 0 0 x 2 8 RussRocker4 0 5 2 2 0 0 0 0 1 1 0 x Round 5 Match results: (maximum = 6) Player Name W L D total 1 2 3 4 1 Iocaine Powder 3 0 0 6 x 2 2 2 2 Phasenbott 2 1 0 4 0 x 2 2 3 * De Bruijn 1 2 0 2 0 0 x 2 4 RoShamBot 0 3 0 0 0 0 0 x Final Round Match results: (maximum = 2) Player Name W L D total 1 2 1 Iocaine Powder 1 0 0 2 x 2 2 Phasenbott 0 1 0 0 0 x