Sriram Srinivasan, Erik Talvitie, and Michael Bowling. Improving Exploration in UCT Using Local Manifolds. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI), 2015. To Appear
Monte-Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. We show that this algorithm, like UCT, converges asymptotically to the optimal action. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse.
@InProceedings(15aaai-mnnuct,
Title = "Improving Exploration in {UCT} Using Local Manifolds",
Author = "Sriram Srinivasan and Erik Talvitie and Michael Bowling",
Booktitle = "Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)",
Year = "2015",
Note = "To Appear",
AcceptRate = "27%",
AcceptNumbers = "531 of 1991"
)