The First Man-Machine
Poker Championship

Humans Beat Poker Bot ... Barely

Posted: Wednesday, July 25, 2007 7:38 PM by Alan Boyle

The results are in from the great "Man vs. Machine" computer poker showdown in Vancouver, with the humans coming out on top by a narrow margin. But the main result of the exercise was mutual respect, on the part of the computer programmers as well as the poker pros.

The final 500-hand playoff went until past 11 p.m. PT Tuesday, and when the takes were totaled up, high-ranked poker players Phil "The Unabomber" Laak and Ali Eslami came out $570 ahead. Those results were combined with a too-close-to-call draw and a win for the University of Alberta's Polaris computer program on Monday, plus a win for the humans earlier Tuesday. That led Rutgers computer scientist Michael Littman, the showdown's arbiter, to declare Laak and Eslami the "clear winners."

For Laak, however, the outcome is more ambiguous. "The subtlety to the whole thing is, we won, not by a significant amount, and the bots are closing in," he told me today. "That's the true summary."

The $50,000 two-day showdown, conducted at the annual meeting of the Association for the Advancement of Artificial Intelligence, brought the kind of media hype that attended chess champion Garry Kasparov's faceoffs against IBM's Deep Blue in 1996 and 1997. Play-by-play commentaries were provided by Weblogs offered by the University of Alberta as well as the Poker Academy.

To minimize the luck of the draw, this week's games were set up so that cards were dealt to the computer vs. human in one match (say, Laak's), and the same cards were dealt to the human vs. the computer in the other match (Eslami's). Thus, the human in one game was playing the same cards that the computer was dealt in the other game.

The profits from each 500-hand match were combined to figure out who came out on top. Thus, in the final match, Laak ended the game up $110, and Eslami was up $460. There was a bonus system in place that earned the humans an extra $12,500 (U.S.) over the course of the four-match playoff. That comes on top of the $5,000 honorarium plus expenses that each player received for showing up.

"I want to emphasize that this is peanuts," said University of Alberta computer scientist Jonathan Schaeffer, a leader of the Polaris team. "These guys usually play for a helluva lot more money."

Schaeffer said Laak (a tournament player who has a celebrity connection to actress Jennifer Tilly) and Eslami (who ranks among the world's top high-stakes cash players) fully deserved the win.

"This was an incredible win-win situation," he told me today. "It's the first time we validated that we were competitive with them. I'm not going to say that we're in their league - that would be silly."

Laak was similarly sportsmanlike toward the Polaris team. "Kudos to those guys," he said. "They solved checkers in the last month, now they're trying to solve poker. The University of Alberta must be very proud."

Both players said they felt exhausted at the finish. "It was an emotionally draining match," Eslami told me. They also felt fortunate to leave Vancouver with the win.

"I literally felt the same feeling that you would have if you beat 500 people in a tournament and won a million dollars," Laak said.

In a way, poker is a tougher challenge than chess because competitors have to make decisions based on incomplete information about the state of the play - and often pretend they know more than they do.

The humans actually played several variants of the Polaris program in the course of the match. The first and the last of the four sets were played against a variant dubbed Mr. Pink, which Laak described as a "careful, reasonable, disciplined, thoughtful player."

Then there was Agent Orange, the humans' opponent in the second set, the one that beat them so badly.

"Agent Orange was like a crazed, cocaine-driven maniac with an ax," Laak said.

Eslami had a more scientifically couched description of the differences: Mr. Pink, he said, was programmed to come close to playing the Nash equilibrium for the game of poker - that is, the strategy most likely to settle into a draw. In contrast, Agent Orange was programmed to adapt its strategy to the play of its opponents as the game went on.

Laak said Agent Orange was also tweaked for more aggressive play, essentially by "thinking" that each pot was 7 percent bigger than it really was. "This strategy puzzled us," Laak said.

After their shellacking, Laak and Eslami got their heads together and decided to take an even more focused, deliberate approach to the third set on Tuesday afternoon. The programmers, meanwhile, went to a mixed strategy, selecting three software variants as tag teams for each of the human opponents.

In the end, it was the humans who were able to adapt to the bots. The humans won the third faceoff against the tag-team bots, and went on to beat Mr. Pink in the fourth and final round.

"The computer program is tricky," Schaeffer said. "It's hard to model. Its roots are in deep algorithms. Either consciously or subconsciously, [the humans] were able to figure out something and win."

The Polaris team as well as Laak and Eslami are all looking forward to a rematch in the next few months. By that time, Polaris may well become unbeatable - at least when it comes to Heads Up Limit Hold 'Em. "If it's not done, it's so close to done it's not even funny," Laak said. (As more than one commenter has noted below, no-limit games against multiple opponents are an entirely different proposition.)

Laak said he and Eslami gave Polaris' programmers some suggestions for making the bots better. "We actually told them the way you can beat us," he said. "If you could take Agent Orange, crank him down 50 percent, then have that guy play us randomly, so that each hand would be the new Agent Orange or Mr. Pink ... that might be the thing we can't beat."

Eslami said he encouraged the programmers to focus on the adaptive approach used by Agent Orange. "I think that's going to have application in broader society," he said.

After all, the whole point of this exercise is to create more intelligent software, not just craftier poker-playing bots. Eslami said artificial adaptive behavior could lead to "software that better understands the way humans think ... instead of being competitive and trying to oppose the person's next move, being cooperative and trying to help the person's next move."

The University of Alberta's Schaeffer echoed that view: "The challenge to us is how to get computers to reason and act intelligently in the absence of complete information. Poker is a game of what we call partial information. In this case, you don't know the opponent's cards. That doesn't sound like a big deal, but since you don't know what they have, you have to deal with probabilities."

The same challenges apply to making money in the stock market, where you have only partial information about the prospects for all the companies you could invest in ... or to buying a used car, where you have to sort through incomplete and sometimes misleading information as you negotiate a deal.

"What you're doing is, you're playing a game of poker," Schaeffer said. Next-generation software could help humans play those real-life games better - and, one can hope, more fairly.

So is Schaeffer disappointed to see Polaris lose? Not at all. "I'm personally pleased that we did not win the match," he said. If the computer had won, he said there would have been all sorts of unwarranted hand-wringing about human inadequacy. In Schaeffer's view, this really is a win-win situation.

"Even if we had won the match, nobody would have claimed victory," he told me. "Not winning the match avoided a lot of misrepresentation."