Compact, Convex Upper Bound Iteration for Approximate POMDP Planning

Tao Wang, Pascal Poupart, Michael Bowling, and Dale Schuurmans. Compact, Convex Upper Bound Iteration for Approximate POMDP Planning. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), pp. 1245–1251, 2006.

Download

[PDF] 

Abstract

Partially observable Markov decision processes (POMDPs) are an intuitive and general way to model sequential decision making problems under uncertainty. Unfortunately, even approximate planning in POMDPs is known to be hard, and developing heuristic planners that can deliver reasonable results in practice has proved to be a significant challenge. In this paper, we present a new approach to approximate value-iteration for POMDP planning that is based on quadratic rather than piecewise linear function approximators. Specifically, we approximate the optimal value function by a convex upper bound composed of a fixed number of quadratics, and optimize it at each stage by semidefinite programming. We demonstrate that our approach can achieve competitive approximation quality to current techniques while still maintaining a bounded size representation of the function approximator. Moreover, an upper bound on the optimal value function can be preserved if required. Overall, the technique requires computation time and space that is only linear in the number of iterations (horizon time).

BibTeX

@InProceedings(06aaai-qppomdp,
  title = "Compact, Convex Upper Bound Iteration for Approximate {POMDP} Planning",
  author = "Tao Wang and Pascal Poupart and Michael Bowling and Dale Schuurmans",
  booktitle = "Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI)",
  year = "2006",
  pages = "1245--1251",
  AcceptRate = "30\%",
  AcceptNumbers = "236 of 774"
)

Generated by bib2html.pl (written by Patrick Riley) on Fri Feb 13, 2015 15:54:29