Martin Müller's Research Group - Research Areas

Our research group is interested in developing efficient search methods for hard problems. Many of the methods also involve machine learning techniques. Our main application areas are games.

Current Research Projects

Members of our group are working on these topics. The main contact(s) for each project are in bold.

Efficient algorithms in combinatorial game theory and MCGS, our Minimax-based Combinatorial Game Solver (Folkersen, Du, Müller).
Go Solver and EWS (Randall, Wei, Hayward, Müller)
Solving NoGo (Du, Wei, Müller)

Efficient algorithms in combinatorial game theory

We are working on efficient search algorithms for combinatorial games, within the MCGS system. Future extension may include for example using temperature, specialised algorithms for impartial games, using approximations to temperature, or heuristic local search. Also see our previous work on algorithms in CGT.

Go Solver

Ting-han Wei, Owen Randall, Ryan Hayward, Martin Müller.

Go on small boards up to 5x5 and 5x6 has been solved by Erik van der Werf. We are currently developing a strong small board Go solver that incorporates game-specific knowledge and our efficient solution to the graph history interaction problem. The goal is to solve 6x6 Go using both alpha-beta search and depth first proof number (df-pn) search. The strong solver may also leverage the improved safety solver for Go (also an ongoing project in the group), and machine learned heuristics. This project is closely related to previous projects on efficient search methods for Depth-first Proof Number Search, and on solving the Graph History Interaction (GHI) Problem.

Opening Book for Go Solver

Dat Tran, Ting-han Wei, Owen Randall, Ryan Hayward, Martin Müller.

Opening books have long been used to improve game-playing programs. They can also be used to speed up the exact solution to a game. For example, they have been used in the proof that checkers is a draw.

In August 2021, David J Wu ("lightvector"), the main author of the strong open source Go program KataGo, published a very detailed opening book for 7x7 Go that was created using this open source program. We would like to expand upon this work to improve the state of the art in Go solving.

To help the Go Solver effort, this project will build an opening book for 6x6. It could be either based on KataGo, or on a new network that can be trained to predict the cost of solving a given Go state.

Improved safety solver in Go

Owen Randall, Ting-han Wei, Ryan Hayward, Martin Müller.

Solving the outcome of Go board position using search requires the generation and evaluation of possible lines of play. The number of states to evaluate grows exponentially with the search depth. A static safety solver can speed up this process by eliminating or reducing the amount of search required to solve many Go boards by determining the winner of many states without search.

In Go, the player with more safe stones and territory wins in the end. We design algorithms that determine whether blocks of stones can be captured, and whether attacker stones can live in defender enclosed regions. We build an improved static safety solver that can further reduce the amount of search required by considering alternating play on intersection points which partition empty regions, as these points can be critical to the life of blocks of stones. We also plan to further improve static safety by creating a precomputed database of regions that records important properties of these regions which cannot be found statically.

Solving NoGo

This is Henry Du's MSc topic. See our publications:

H. Du and M. Müller. Solving Linear NoGo with Combinatorial Game Theory. Accepted, Computers and Games 2024.
H. Du, T.-h. Wei, and M. Müller. Solving NoGo on Small Rectangular Boards. Advances in Computer Games (ACG) 2023, LNCS 14528, p. 39-49.
Also see the NoGo page for earlier work on this game. That page is not kept up to date.

Previous (Recent) Research Themes and Research Projects

Heuristic search and learning with imperfect domain models
Integrating exact solution methods with Alpha Zero type architectures in two player games
Evaluation of deep RL learning against exact solutions in two player games

Open-source general game playing system based on MuZero (Completed)
Compress checkers tablebases with deep learning (Completed)
Improving teaching software for search and Alpha Zero (Completed)
Anomaly Detection for RL (Completed)
Do Video Games Need Planning?
DarkHex (Completed)
Using search for RL (Completed)
Evaluating Leela Chess Zero against Endgame Tablebases (Completed)
RL Planning and Batch RL (Completed)

Open-source general game playing system based on MuZero

Zeyi Wang.

MuZero is the State of the Art algorithm for general game playing supporting Go, Shogi, chess, and Atari games. However, DeepMind's original MuZero is not publicly available, and current open-source implementations are inefficient. We implement a distributed version of MuZero with more efficient sample generation and MCTS inference. This project will lay a solid foundation for future research of planning with a learned model.

This project belongs to theme 1.

Compress checkers tablebases with deep learning

Jiuqi Wang, Jonathan Schaeffer, Martin Müller.

After many years of research, the game of checkers was solved by a group led by Jonathan Schaeffer.

We believe that modern deep Rl approaches such as Alpha Zero would also do really well in learning how to play checkers. The main research question in this project is wether such a program can be used to greatly reduce the storage requirements for endgame tablebases in checkers.

After learning a strong neural network, the endgame tablebases could be analyzed and reduced to store only the exceptions.

An alternative approach would be to use supervised learning algorithms that focus directly on fixing the wrong examples in the tablebases, rather than strong overall checkers play.

This project was part of the theme Evaluation of deep RL learning against exact solutions in two player games

Improving teaching software for search and Alpha Zero

Henry Du, Ting-han Wei, Martin Müller.

Create more and improved demo programs for courses such as Cmput 455, which are simple yet functional. For example, an Alpha Zero demo, and a simple neural net for TicTacToe.

Anomaly Detection for RL

Hongming Zhang.

Abnormal states in deep reinforcement learning are states that are beyond the scope of an RL policy. Such states may make the RL system unsafe and impede its deployment in real scenarios.We propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers. This simple unified anomaly detection paves the way towards deploying safe RL systems in real-world applications.

Do Video Games Need Planning?

Hongming Zhang.

Planning is any process that takes a model as input and produces or improves a policy for interacting with the modeled environment. Model-based deep reinforcement learning combines deep reinforcement learning with planning. It has shown powerful strength in board games and robotics. But for video games, planning seems not to be a necessary component. Though many researchers did a lot of trials, the superior and state-of-the-art performance on video games (e.g. Atari games, StarCraft, DOTA, etc.) are still made by model-free methods. Our work tries to answer the question from two parts. (1) Does planning benefit the learning process? (2) Does planning benefit the evaluation phase?

DarkHex

Bedir Tapkan, Ryan Hayward, Martin Müller.

Computational tools and better bounds for 3x4 DarkHex.

Dark Hex is an imperfect information version of the game Hex. In this project we are examining the strength of certain strategies; developing tools in different ways to understand the game and the ways to play better. There are multiple branches on the project; human generated strategies, their validation, completeness and verification, and AI generated policies (strategies) using state of the art techniques (CFR based approaches, AlphaZero etc.). We are hoping to have a complete report on this scarcely researched game and lay the groundwork for future research on the topic. For more information please check the github repository for the project.

github

Using search for RL

Farnaz Kohankhaki, MSc 2022, Kiarash Aghakasiri, MSc 2022, Ting-han Wei, Martin Müller.

An RL agent can use a model of the environment to do planning and search. The model may be imperfect and have some errors. In this project, we focus on capturing the uncertainty of the model and using it in search, specifically to inform variants of Monte Carlo tree search (MCTS). We use the learned uncertainty to develop modified algorithms for the main four steps of MCTS: selection, expansion, rollout, and backpropagation. In one setting, we assume that the model of the environment is given, but imperfect. The uncertainty of the model is not given and is being learned online. Experiments on MinAtar environments show promising results. Next, we will try to extend our work to the case where the model is not given, and the agent learns the model and the uncertainty of the model together, simultaneously and online.

Evaluating Leela Chess Zero against Endgame Tablebases

Project by Rejwana Haque, MSc 2021.

Leela Chess Zero (Lc0) is an open source AlphaZero for chess. We analyze its performance on exact solutions to chess endgames which are available in tablebases. Main research questions:

How does Lc0 learn?
Which mistakes does it make, and why?
Which can be solved by search?

Publications:

R. Haque. On the Road to Perfection? Evaluating LeelaChess Zero Against Endgame Tablebases. MSc thesis, University of Alberta, 2021.
R. Haque, T.-h. Wei and M. Müller. On the Road to Perfection? Evaluating LeelaChess Zero Against Endgame Tablebases. Advances in Computer Games (ACG 2021).

RL Planning and Batch RL

Project by Chenjun Xiao, PhD 2022.

We have been working on two sub-topics of reinforcement learning (RL): Planning and Batch RL. Planning refers to the problem of computing the optimal policy by interacting with a model of the environment. On this topic, we design a memory structure to exploit online value generalization in MCTS, and develop a new simulation-based planning algorithm that improves upon the worst case efficiency of UCT by taking advantage of maximum entropy value estimation. Batch RL refers to the problem of computing the optimal policy when the data available is fixed and obtained by interacting with the environment using some unknown behavior policy. On this topic, we establish some fundamental results that characterize the limits and possibilities of batch RL, and prove the exponential gap between passive and active data collection in batch RL.

Previous Research Themes

Theme: Heuristic search and learning with imperfect domain models

Most work in heuristic search has taken place in domains where the state space is completely known, at least in an implicit way. In reinforcement learning, both model-free and model-based approaches exist. We are interested in how to design efficient forward search algorithms and learning mechanisms in cases where the model of the state space is missing, incomplete, or partially wrong.

Theme: Integrating exact solution methods with Alpha Zero type architectures in two player games

Most exact game solvers have traditionally used relatively simple but fast heuristics to control their search. Alpha Zero architectures have been extremely successful for heuristic play, and the quality of their heuristics has far outweighed their orders of magnitude slower speed.

Much less is known about how effective these new machine-learned heuristics are for solvers. Can they overcome their severe speed disadvantages in a setting where complete search of all moves is required in order to guarantee that a proof is correct?

Theme: Evaluation of deep RL learning against exact solutions in two player games

Architectures such as AlphaGo and Alpha Zero have brought an unprecedented boost in the playing strength of game programs. But how close to perfect are they? As far as we know, the limits to further improvement have still not been reached. Judging from self-play data, even the best current programs still make mistakes.

Also See

Previous Research Projects. These are good research topics for me, but no one in my group is actively working on them right now. If you have an interest in one of those topics, please talk to me.
Publications
Software
Talks on planning and on Monte Carlo tree search in games.

Created: Jan 20, 1998 Last modified: Mar 5, 2025

Martin Müller