Sigma Point Policy Iteration

Michael Bowling, Alborz Geramifard, and David Wingate. Sigma Point Policy Iteration. In Proceedings of the Seventh International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 379–386, 2008.

Download

[PDF] 

Abstract

In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policy-independent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or replayed, meaning that for even moderate amounts of experience, SPPI is an order of magnitude faster than LSPI.

BibTeX

@InProceedings(08aamas-sppi,
  Title = "Sigma Point Policy Iteration",
  Author = "Michael Bowling and Alborz Geramifard and David Wingate",
  Booktitle = "Proceedings of the Seventh International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)",
  Year = "2008",
  Pages = "379--386",
  AcceptRate = "22\%",
  AcceptNumbers = "142 of 640"
)

Generated by bib2html.pl (written by Patrick Riley) on Fri Feb 13, 2015 15:54:28