Improving Exploration in UCT Using Local Manifolds

Sriram Srinivasan, Erik Talvitie, and Michael Bowling. Improving Exploration in UCT Using Local Manifolds. In Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI), 2015. To Appear

Download

[PDF] 

Abstract

Monte-Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. We show that this algorithm, like UCT, converges asymptotically to the optimal action. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse.

BibTeX

@InProceedings(15aaai-mnnuct,
  Title = "Improving Exploration in {UCT} Using Local Manifolds",
  Author = "Sriram Srinivasan and Erik Talvitie and Michael Bowling",
  Booktitle = "Proceedings of the Twenty-Ninth Conference on Artificial Intelligence (AAAI)",
  Year = "2015",
  Note = "To Appear",
  AcceptRate = "27%",
  AcceptNumbers = "531 of 1991"
)

Generated by bib2html.pl (written by Patrick Riley) on Fri Feb 13, 2015 15:54:26