Tao Wang, Daniel Lizotte, Michael Bowling, and Dale Schuurmans. Stable Dual Dynamic Programming. In Advances in Neural Information Processing Systems 20 (NIPS), pp. 713–720, 2008.
Recently, we have introduced a novel approach to dynamic programming and re- inforcement learning that is based on maintaining explicit representations of stationary distributions instead of value functions. In this paper, we investigate the convergence properties of these dual algorithms both theoretically and empirically, and show how they can be scaled up by incorporating function approximation.
@InProceedings(07nips-dualrl, Title = "Stable Dual Dynamic Programming", Author = "Tao Wang and Daniel Lizotte and Michael Bowling and Dale Schuurmans", Booktitle = "Advances in Neural Information Processing Systems 20 (NIPS)", Year = "2008", Pages = "713--720", AcceptRate = "22\%", AcceptNumbers = "217 of 975" )