Our research group is interested in developing efficient search methods for hard problems. Many of the methods also involve machine learning techniques. Our main application areas are games.
Members of our group are recently or currently working on these topics. Many of them align with the broader themes linked above. The main contact(s) for each project are in bold.
Ting-han Wei, Owen Randall, Ryan Hayward, Martin Müller.
Go on small boards up to 5x5 and 5x6 has been solved by Erik van der Werf. We are currently developing a strong small board Go solver that incorporates game-specific knowledge and our efficient solution to the graph history interaction problem. The goal is to solve 6x6 Go using both alpha-beta search and depth first proof number (df-pn) search. The strong solver may also leverage the improved safety solver for Go (also an ongoing project in the group), and machine learned heuristics. This project is closely related to previous projects on efficient search methods for Depth-first Proof Number Search, and on solving the Graph History Interaction (GHI) Problem.
This project partially belongs to theme 2.
Opening books have long been used to improve game-playing programs. They can also be used to speed up the exact solution to a game. For example, they have been used in the proof that checkers is a draw.
In August 2021, David J Wu ("lightvector"), the main author of the strong open source Go program KataGo, published a very detailed opening book for 7x7 Go that was created using this open source program. We would like to expand upon this work to improve the state of the art in Go solving.
To help the Go Solver effort, this project will build an opening book for 6x6. It could be either based on KataGo, or on a new network that can be trained to predict the cost of solving a given Go state.
Owen Randall, Ting-han Wei, Ryan Hayward, Martin Müller.
Solving the outcome of Go board position using search requires the generation and evaluation of possible lines of play. The number of states to evaluate grows exponentially with the search depth. A static safety solver can speed up this process by eliminating or reducing the amount of search required to solve many Go boards by determining the winner of many states without search.
In Go, the player with more safe stones and territory wins in the end. We design algorithms that determine whether blocks of stones can be captured, and whether attacker stones can live in defender enclosed regions. We build an improved static safety solver that can further reduce the amount of search required by considering alternating play on intersection points which partition empty regions, as these points can be critical to the life of blocks of stones. We also plan to further improve static safety by creating a precomputed database of regions that records important properties of these regions which cannot be found statically.
MuZero is the State of the Art algorithm for general game playing supporting Go, Shogi, chess, and Atari games. However, DeepMind's original MuZero is not publicly available, and current open-source implementations are inefficient. We implement a distributed version of MuZero with more efficient sample generation and MCTS inference. This project will lay a solid foundation for future research of planning with a learned model.
This project belongs to theme 1.
Henry Du, Ting-han Wei, Martin Müller.
Create more and improved demo programs for courses such as Cmput 455, which are simple yet functional. For example, an Alpha Zero demo, and a simple neural net for TicTacToe.
Farnaz Kohankhaki, Kiarash Aghakasiri, Ting-han Wei, Martin Müller.
An RL agent can use a model of the environment to do planning and search. The model may be imperfect and have some errors. In this project, we focus on capturing the uncertainty of the model and using it in search, specifically to inform variants of Monte Carlo tree search (MCTS). We use the learned uncertainty to develop modified algorithms for the main four steps of MCTS: selection, expansion, rollout, and backpropagation. In one setting, we assume that the model of the environment is given, but imperfect. The uncertainty of the model is not given and is being learned online. Experiments on MinAtar environments show promising results. Next, we will try to extend our work to the case where the model is not given, and the agent learns the model and the uncertainty of the model together, simultaneously and online.
This project belongs to theme 1.
Abnormal states in deep reinforcement learning are states that are beyond the scope of an RL policy. Such states may make the RL system unsafe and impede its deployment in real scenarios.We propose a simple yet effective anomaly detection framework for deep RL algorithms that simultaneously considers random, adversarial and out-of-distribution~(OOD) state outliers. This simple unified anomaly detection paves the way towards deploying safe RL systems in real-world applications.
This project relates to theme 1.
Planning is any process that takes a model as input and produces or improves a policy for interacting with the modeled environment. Model-based deep reinforcement learning combines deep reinforcement learning with planning. It has shown powerful strength in board games and robotics. But for video games, planning seems not to be a necessary component. Though many researchers did a lot of trials, the superior and state-of-the-art performance on video games (e.g. Atari games, StarCraft, DOTA, etc.) are still made by model-free methods. Our work tries to answer the question from two parts. (1) Does planning benefit the learning process? (2) Does planning benefit the evaluation phase?
This project relates to theme 1.
Bedir Tapkan, Ryan Hayward, Martin Müller.
Computational tools and better bounds for 3x4 DarkHex.
Dark Hex is an imperfect information version of the game Hex. In this project we are examining the strength of certain strategies; developing tools in different ways to understand the game and the ways to play better. There are multiple branches on the project; human generated strategies, their validation, completeness and verification, and AI generated policies (strategies) using state of the art techniques (CFR based approaches, AlphaZero etc.). We are hoping to have a complete report on this scarcely researched game and lay the groundwork for future research on the topic. For more information please check the github repository for the project.
We have been working on two sub-topics of reinforcement learning (RL): Planning and Batch RL. Planning refers to the problem of computing the optimal policy by interacting with a model of the environment. On this topic, we design a memory structure to exploit online value generalization in MCTS, and develop a new simulation-based planning algorithm that improves upon the worst case efficiency of UCT by taking advantage of maximum entropy value estimation. Batch RL refers to the problem of computing the optimal policy when the data available is fixed and obtained by interacting with the environment using some unknown behavior policy. On this topic, we establish some fundamental results that characterize the limits and possibilities of batch RL, and prove the exponential gap between passive and active data collection in batch RL.
This project relates to theme 1.
Jiuqi Wang, Jonathan Schaeffer, Martin Müller.
After many years of research, the game of checkers was solved by a group led by Jonathan Schaeffer.
We believe that modern deep Rl approaches such as Alpha Zero would also do really well in learning how to play checkers. The main research question in this project is wether such a program can be used to greatly reduce the storage requirements for endgame tablebases in checkers.
After learning a strong neural network, the endgame tablebases could be analyzed and reduced to store only the exceptions.
An alternative approach would be to use supervised learning algorithms that focus directly on fixing the wrong examples in the tablebases, rather than strong overall checkers play.
This project is part of the theme Evaluation of deep RL learning against exact solutions in two player games
Most work in heuristic search has taken place in domains where the state space is completely known, at least in an implicit way. In reinforcement learning, both model-free and model-based approaches exist. We are interested in how to design efficient forward search algorithms and learning mechanisms in cases where the model of the state space is missing, incomplete, or partially wrong.
Most exact game solvers have traditionally used relatively simple but fast heuristics to control their search. Alpha Zero architectures have been extremely successful for heuristic play, and the quality of their heuristics has far outweighed their orders of magnitude slower speed.
Much less is known about how effective these new machine-learned heuristics are for solvers. Can they overcome their severe speed disadvantages in a setting where complete search of all moves is required in order to guarantee that a proof is correct?
Architectures such as AlphaGo and Alpha Zero have brought an unprecedented boost in the playing strength of game programs. But how close to perfect are they? As far as we know, the limits to further improvement have still not been reached. Judging from self-play data, even the best current programs still make mistakes.
We want to study ways of scaling up search algorithms for combinatorial games, for example by using approximations to temperature, or heuristic local searches. Also see our previous work on algorithms in CGT.
Created: Jan 20, 1998 Last modified: Apr 10, 2022Martin Müller