Sriram Srinivasan, Erik Talvitie, and Michael Bowling. Improving Exploration in UCT Using Local Manifolds. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI), 2015. To Appear
Monte-Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. We show that this algorithm, like UCT, converges asymptotically to the optimal action. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse.
@InProceedings(15aaai-mnnuct, Title = "Improving Exploration in {UCT} Using Local Manifolds", Author = "Sriram Srinivasan and Erik Talvitie and Michael Bowling", Booktitle = "Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)", Year = "2015", Note = "To Appear", AcceptRate = "27%", AcceptNumbers = "531 of 1991" )