The First International RoShamBo Programming Competition

Results (Part 2)

[rock crushes scissors] [paper covers rock] [scissors cuts paper]


[goto Part 1]   last updated: October 4, 1999.


NEWSFLASH:

Dan Egnor has released the code to Iocaine Powder! You can have a look at the Rock-Paper-Scissors World Champion, and read Dan's explanation of how it works at:

http://ofb.net/~egnor/iocaine.html

Jason Hutchens has released the code to MegaHAL! You can have a look at the strongest pattern recognizer in the contest, and read Jason's explanation of how it works at:

http://ciips.ee.uwa.edu.au/~hutch/puzzles/roshambo/

Jason has also provided a shell routine, so humans can play against any of the programs more conveniently. Thanks Jason!

All entrants are encouraged to release their programs. I will be providing a new sample program that has many of the top algorithms, including most of the U of A submissions, plus all of the dummy bots. This will provide a good test suite for everyone to use in developing the second generation of programs to compete in the next competition.

However, I will not add any program without the author's expressed consent, so please send me an e-mail if you are willing to contribute your algorithm to the collection. It doesn't have to be a champion, either -- we want lots of different ideas, not just programs that performed well in the first contest.

The Second International RoShamBo Programming Competition will be held in a few months, perhaps as early as January, but no later than next summer. Stay tuned for an announcement, or send me e-mail if you want your name added to the mailing list.

[rock] [paper] [scissors]

Algorithms and Ideas

There were many different approaches used to write a RoShamBo player, and I know for a fact that the authors of all top ten programs went through several levels of thought and re-design as they discovered the richness of this "trivial" problem. I won't address any of these concepts in great detail, but I do hope to touch on some of them, at least in passing.

  Dummy bots:

As explained in the contest announcement, some of the programs competing in the tournament were intentionally weak. This ensures that programs try to win matches and maximize their winning margin, since it is not enough to simply avoid losing (as with the Random (Optimal) strategy).

Most of the dummy bots use a strategy that is oblivious to the opponent, and don't even try to win the match. In general, they will be more exploitable by stronger algorithms, and may achieve a draw or even win against weaker opposition.

The results against the dummy bots show that there is still room (and will always be room) for better pattern detection in RoShamBo players.

Please note that none of these algorithms (except Random (Optimal)) will be competing in the second programming competition, so there is no point in trying to detect any of these specific patterns in the future.

The dummy bots were designed to be fairly elusive, without being so hard to trace as to be effectively random. In some cases, they were tuned by introducing some amount of random noise to lessen the obviousness of the pattern, so that none of the dummy programs lost by massive scores. The objective was to have a wide range of exploitation among the entries, to identify the strengths and weaknesses of each program. However, when the entire set of dummy bots is removed, the results of the competition are not significantly different.

  Dummy Bot: Random (Optimal)

  Length (in C semi-colons):   1
  Type of history analysis:    none
  Level of meta-reasoning:     zero (or "all levels")
  theoretical exploitability:  0

In this competition, Random (Optimal) was called Meta-Meta-Random. I allowed Mark Bassett to keep this entry since he submitted it on the same day the competition was announced (before I ruled it ineligible), and he was certain it would finish in first place. Well, not exactly. It finished in the middle of the pack (27th), drawing every match, as expected.

In the "Best of the Best" competition, which cuts away half of the field each round, it dropped out in the round of 32. Naturally, it always has a 50% chance of making it to the next round. A slower progression would allow it to always stay alive, but it would still be uninteresting. :)

If time permits, I plan to run a true "evolutionary" scheme, where the number of copies of a program depends on its success in the previous iteration. In this format, Random (Optimal) will always keep its 1/n share of the population, and neither grow nor diminish over time. Again, that will not be enough to win a tournament.

  Dummy Bot: Pi

  Length (in C semi-colons):   5
  Type of history analysis:    none (fixed string)
  Level of meta-reasoning:     zero
  theoretical exploitability:  1000 out of 1000

One of several pre-determined sequences in the contest was based on the digits of pi. Each digit was returned mod 3, unless the digit was 0, in which case it was skipped. Assuming the opponent has knowledge of common physical constants, it was possible to guess this algorithm, and thereby score close to 100% against it. Of course, it would be both uninteresting and futile to try to guess all such silly algorithms, so no one was expected to detect the pattern.

Pi performed better than Random (Optimal), suggesting that the digits of pi may be more regular than a truly random sequence, in a manner similar to the De Bruijn sequence. This is interesting, because the digits of pi have passed all known tests of pseudo-randomness (although those tests are not normally performed on the earlier digits of pi).

  Dummy Bot: De Bruijn

  Length (in C semi-colons):   5
  Type of history analysis:    none (fixed string)
  Level of meta-reasoning:     zero
  theoretical exploitability:  500 out of 1000

A De Bruijn sequence is a cyclic string which contains all possible substrings of a given length exactly once. An example of a ternary De Bruijn string of length nine, containing all substrings of length two, would be "rrpspprss". I was looking for a sequence to repeat in a simple cycle, and tried this type of string because it would be "statistically well-balanced". Much to my surprise, a fixed string of length 27 finished second in one test tournament, and had better results against the stronger programs! This oblivious algorithm was particularly successful with match lengths of 100, when full-history programs had less opportunity to lock onto the cyclic repetition. A little bit of thought reveals why this fixed sequence is successful.

Every string will appear to contain patterns, even if it is random. The De Bruijn sequence naturally gives the appearance of certain tendencies, and then automatically switches away from repeating those same patterns. If the opponent is over-zealous in its exploitation of any perceived weakness, it will be continuously deceived and punished! The stronger the opponent is at detecting subtle patterns (that don't actually exist) the more susceptible it is to the decoy strategy that the De Bruijn string implicitly employs. Other programs attempted to use a "bait and switch" idea, but without much success. To borrow Perry Friedman's term, the De Bruijn string is a "master baiter". :)

A simple repetition of the same De Bruijn string was too vulnerable to the stronger programs, for two reasons. First, many programs will have a large context length, and will simply see the repetition. Secondly, the sequence will have a detectable deviance from random, because many of the longer substrings never occur. Consider a De Bruijn string of length nine: it never plays the same move three times in a row (in fact, only one-third of the substrings of length three are ever used). This type of weakness is at least partially exploitable by even the most basic predictors. The same flaws exist for a cyclic string of length 27, but they are considerably harder to find, in a statistical sense.

For the official competition, the De Bruijn bot used a concatenation of 13 different sequences of length 81 (containing all substrings of length four). In a triumph of ignorance over intelligence, De Bruijn finished third in the "Best of the Best" series, losing only to Iocaine Powder and Phasenbott. Its performance improved significantly in the later rounds, because it only drew (or lost) to the weakest entrants!

Is the De Bruijn sequence a "magic bullet" for RoShamBo, in the same way that the ultra-simple Tit-for-Tat strategy was so successful for Prisoner's Dilemma? I don't think so. I suspect that its success may be due to a flaw in the way most programs handle statistics. If they only look for over-represented patterns, rather than the equally important under-represented occurrences, then De Bruijn will be the perfect counter-strategy (at least, as far as zero-history constant strings are concerned:).

  Dummy Bot: Text

  Length (in C semi-colons):   5
  Type of history analysis:    none (fixed string)
  Level of meta-reasoning:     zero
  theoretical exploitability:  unknown (empirical: ~200 out of 1000)

The Text bot was based on a suggestion by Perry Friedman. It is simply a block of English text whose ascii characters have been converted to actions by taking their value mod 3. The text used was the original announcement of the RoShamBo competition (including whitespace). Since natural language contains recurring patterns and non-random frequencies, about half of the contestants were able to beat this string by a score in the range of 100-200.

  Dummy Bot: Switch A Lot

  Length (in C semi-colons):   3
  Type of history analysis:    its own last move
  Level of meta-reasoning:     zero
  theoretical exploitability:  320 out of 1000

Of all the dumb algorithms included in the original sample program, the one that seemed to give decent predictors the most trouble was "Always Switchin'". This is a bit of a surprise, since it takes very little analysis to discover that the opponent never repeats a move, and the simple counter-strategy would net +500 points per 1000. Even some of the strongest programs did no better than +350. This may be another indication that they are looking for patterns directly, rather than for the absence of patterns that should be present.

Switch A Lot had a 12% chance of repeating the previous move. This made it very susceptible to the pairwise frequency based predictors, including many programs that did not finish very high in the standings. The best exploiter was the simple Anti-rotn bot, which scored close to the maximum. In contrast, five of the strongest head-to-head players, including Iocaine Powder and Phasenbott, scored less than +100 points on average.

  Dummy Bot: Foxtrot

  Length (in C semi-colons):   4
  Type of history analysis:    its own last move
  Level of meta-reasoning:     zero
  theoretical exploitability:  500 out of 1000

The Foxtrot sequence has a pattern that is easy to generate, but is not easy to detect. It has the form:

[ rand   prev+2   rand   prev+1   rand   prev+0 ]

repeating, where rand is a random move, and prev is the previous move. So if rand was always 1, the sequence would be "prpspp" repeating. All of the odd numbered moves are random but the even moves are completely deterministic (albeit not in an obvious way).

Foxtrot turned out to be more susceptible to the direct-history algorithms like Iocaine Powder and Biopic, rather than the strong pattern detectors like MegaHAL. It lost to four programs by more than 300 points, but by only a small margin to most.

  Dummy Bot: Flat

  Length (in C semi-colons):   16
  Type of history analysis:    its own r/p/s frequencies
  Level of meta-reasoning:     zero
  theoretical exploitability:  420 out of 1000

The Flat bot generates a distribution of moves that are more uniform than expected for a random sequence. This is a common weakness in human generated sequences. For example, if you ask someone to write down a "random" series of 20 heads or tails, they will normally avoid having many more of one than the other. If you then ask them to write down the outcomes of 20 actual coin tosses, you can usually identify the truly random sequence just by guessing that it is the one with the longest run of heads or tails.

If the frequency of one particular action was strictly less than the other two, the Flat bot had an 80% chance of choosing that action. If one was strictly greater than the other two, it had only a 10% chance of being selected. Naturally this distortion causes other statistical anomalies that predictive algorithms can exploit, so most programs did reasonably well against the Flat bot. However, no one approached the maximum exploitability of -420 points, which was measured by playing it against the Anti-Flat algorithm.

  Dummy Bot: Copy-drift

  Length (in C semi-colons):   9
  Type of history analysis:    opponent's last move
  Level of meta-reasoning:     zero
  theoretical exploitability:  500 out of 1000

This program biases its decision toward the opponent's previous move, but "changes gears" over time. If a player always goes rock, then it copies that move two-thirds of the time to start, shift gears to favour scissors, then shifts to paper, continuing to rotate its preference every 111 turns. The intention was to have a strategy that was fairly easy to deduce, but that gradually changed over time.

  Dummy Bot: Add-drift

  Length (in C semi-colons):   9
  Type of history analysis:    both player's last move
  Level of meta-reasoning:     zero
  theoretical exploitability:  500 out of 1000

This program bases its decision on the sum of both player's previous moves, and shifts gears over time. As with Copy-drift, it played a random move with 50% probability. This pattern was less detectable to the majority of programs, since they did not consider trends based on the moves of both players.

Don Beal's Simple Predictor and Simple Modeller, which use a simple history decay function to favour recent observations over older ones, had a definite advantage against both of the "drift bots".

  Dummy Bot: Add-react

  Length (in C semi-colons):   9
  Type of history analysis:    recent success (wins, losses, draws)
  Level of meta-reasoning:     zero
  theoretical exploitability:  800 out of 1000

This dummy bot also based its decisions on the sum of the previous moves by both players, but had a different mechanism for changing gears. If the current strategy incurred a net loss of three or more points in the last 20 turns, or more than a 10% loss over more than 20 turns, it would then change gears. This behavior gives a more chaotic image, and the noise level had to be reduced considerably before any of the programs could defeat it by a significant margin.

  Dummy Bot: Anti-rotn

  Length (in C semi-colons):   40
  Type of history analysis:    consecutive pairs of opponent actions
  Level of meta-reasoning:     one
  theoretical exploitability:  40 out of 1000

The Anti-rotn bot is the only dummy bot that actually analyzed the opponent's moves and tried to win. It has a minimal use of history, counting only the "rotations" in the opponent's move sequence. If the opponent repeated an action, went up by one, or up by two, it was counted as a 0, +1, or -1 rotation, respectively. This means that it treats the actual choice of rock, paper, or scissors as irrelevant, and considers only the difference between each pair of moves. This enables it to pick up a particular pattern faster than if it "split the signal" into three different categories (for example by treating "rp", "ps", and "sr" as three distinct patterns).

The "Anti-rotten-bot" looks for both maximum and minimum occurrences in the distribution, which many of the entries failed to do. For example, if a prediction algorithm believes the probability distribution for the opponent's next action is 20-39-41 (20% rock, 39% paper, 41% scissors), several programs counter only the maximum likelyhood, and would play rock to beat the expected scissors. However, there is a much stronger opponent tendency toward "not rock", and playing scissors would have a much higher expectation.

In general, the expected value of each action "x" is simply:

EV[x] = Pr[x+2] - Pr[x+1]

where "+" is understood to be modulo 3. So in the 20-39-41 example,

EV[rock] = +2%,   EV[paper] = -21%,   and EV[scissors] = +19%.

The other feature built into the Anti-rotn bot is a simple fail-safe, set at 4% of the total match length. This single line of code:

if ( score < -trials/25 ) { return(random() % 3); }

would have saved many entries thousands of points against stronger opposition. The same approach would not prevent a match loss, however. Even though a -40 result is less than the noise level for a single match (+/- 50.6), many independent tournaments can be run to reduce the variance to a small amount, making the -40 expectation decisive.

Some programs, such as Biopic, Robertot, and Boom, only deviate from random when there appears to be a statistically significant weakness to exploit. This enables them to draw against very strong opposition, who do not exhibit such patterns. Interestingly, Iocaine Powder and Phasenbott are still able to defeat these programs by a small margin, presumably by behaving predictably enough to induce a pattern in their opponent's replies, which they can then exploit for a net profit!

 

One final note about generality. Several programs were based on the idea of using several different strategies, and choosing the one that would have been most successful in the past. One of the smoothest systems for adaptation was that of Mixed Strategy by Thad Frogley.

The weakness of this approach is that it is only as good as the best substrategy it considers. Combining different specific techniques is doomed to yield only an incremental improvement in overall performance.

In contrast, Iocaine Powder uses a much more general (and elegant) method for analyzing different strategies, based on six levels of meta-reasoning, each applied to its own strongest prediction method. MegaHAL also takes a generalized approach to the task of recognizing patterns. The lesson is clear: general techniques are stronger than specific algorithms for RoShamBo.

[rock] [paper] [scissors]

Unofficial Super-Modified Class

There was no shortage of players in the "anything goes" category. In fact, there seemed to be almost as much interest in how to cheat the tournament system as there was in writing a good player! I would like to thank Mark James and Denis Papp for their suggestions on other ways a program might breech security. They are both in the computer game industry, so writing AI that cheats outrageously is part of their job description. :)

There were originally ten players in this category, but the "John Galt" algorithm withdrew, taking three other excellent programs with him.

    Player Name                  1     2     3     4     5     6  total
							        
  1 The Matrix                   0  1000  1000  1000  1000  1000   5000
  2 Psychic Friends Network  -1000     0   998F  998   998   998   2992
  3 Fork Bot                 -1000  -998F    0  1000  1000  1000   1002
  4 Nostradamus              -1000  -998 -1000     0     5  1000  -1993
  5 Iocaine Powder           -1000  -998 -1000    -5     0    10  -2993
  6 Random (Optimal)         -1000  -998 -1000 -1000   -10     0  -4008

Random (Optimal) and Iocaine Powder you already know. They got stomped.

Nostradamus was written by Tim Dierks, a VP of Engineering at Certicom, who has a lot of expertise in cryptography. The program defeats the optimal player by reverse-engineering the internal state of the random() generator, which he states "was both easier and harder than I thought it would be". To be sporting, it then plays optimally against all other opponents.

Fork Bot was based on an idea that Dan Egnor came up with a few minutes after hearing about the contest. Since "library routines are allowed", his elegant solution was to spawn three processes with fork(), have each one make a different move, and then kill off the two that did not win. This was implemented by Andreas Junghanns in about 10 lines of code. Unfortunately, since all three moves lost to the Psychic Friends Network after the first turn, the program exited and the remainder of that match was declared forfeited.

The Psychic Friends Network is a truly hilarious piece of obfuscated C, written by Michael Schatz and company at RST Corporation. Among other things, it uses an auxiliary function to find good karma, consults horoscopes, cooks spaghetti and (mystic) pizza to go with various kinds of fruit, #defines democrats as communists, and undefines god. We're still trying to figure out exactly what it is doing with the stack frame, but we do know that it never scores less than +998 in a match, unless it is playing against a meta-meta-cheater.

Michael also gives credit to Frank Hill and T.J. Walls, who were the other minds behind Psychic Friends Network. Incidentally, they were part of the team from Reliable Software Technologies that recently discovered a major security flaw in the software used by several online gambling sites. For more details on this, check out: http://www.rstcorp.com/news/gambling.html

The Matrix was written by Darse Billings, who holds the prestigious title of "Student for Life", and recently started the PhD programme at the University of Alberta. His RoShamBo program defeated every opponent with a perfect score, based on the simple principle "There is no spoon".

Since The Matrix is also the tournament program, it has complete access to all other algorithms, data structures, and output routines, and is therefore unlikely to ever be overtaken. As a result, this category is hereby declared to be solved, and thus retired from future competitions.

[rock] [paper] [scissors]

This report was hastily prepared, in order to provide the results as soon as possible. The contest and other experiments will eventually be written up for publication, so please inform me of any errors or omissions. I also welcome more feedback from the participants, both on your ideas and on your personal background. Continue to check the web page for links to revised reports and new results (such as the evolutionary tournament format). http://www.cs.ualberta.ca/~darse/rsbpc.html         - Darse.

[goto the test suite]

[rock] [paper] [scissors]