I've written a chapter on reinforcement learning in continuous spaces that is far more extensive than the following pages. It can be found in my publications and there is also a HTML version.

The aforementioned chapter covers value-approximation and policy-approximation techniques to solve (large or continuous) MDPs. It includes discussions on temporal-difference learning, policy-gradients, (natural) actor-critic methods, and direct policy-search techniques such as evolutionary algorithms.

The pages below on online temporal-difference methods will remain online as a reference. Some of the algorithms on these pages are not covered in the chapter.

Implementations are now available. See also this page.

This page aims to give a short introduction to some well known and less well known reinforcement learning algorithms.

This page intends to reflect my personal experience with and knowledge about these algorithms. Therefore, it is undoubtably incomplete. If you feel some of the information on this page is incorrect, or something important is missing, please contact me. This page will probably be a continuing work in progress, so the information may be subject change and restructuring.

This page mainly focuses on the different algorithms and is mainly aimed at researchers and developers that are interested in the different available options. All these algorithms can be used to solve Markov Decision Processes (MDPs) with arbitrary (bounded) reward and transition functions. For an introduction to reinforcement learning and the types of problems that fall into this category, I refer to the excellent book by Sutton and Barto.

All the algorithms on this page use some type of value functions. This means that algorithms that search directly in policy space are not included at this time. Also, all the algorithms are model free, which implies that for instance dynamic programming as a technique is missing. I may or may not include these later.

All the pages in this section have a section with selected relevant publications. Usually these sections only contain two or three papers were more information can be found. These lists are not complete. If you feel important references are missing at some points, please contact me and I'll be happy to comsider adding these. However, I aim to keep the number of references at a minimum to avoid giving pointers in too many directions at once.

This page intends to give an overview of some of the available options. It has been shown that in some cases the conventional algorithms do not reach the same performance as some of the newer options. Also, in ensemble methods, one may wish to include different types of algorithms and indeed we have shown combinations of agents that use different algorithms usually perform better than combinations of agents that use the same algorithm. As a final point, we believe it to be useful to investigate the properties of a variety of algorithms in order to facilitate the selection of a good algorithm for a problem, based on an analysis of the problem and the known characteristics of the algorithms instead of empirically trying many options or - even worse - just going with the best known algorithm.

**Selected relevant publications:**

- R.S. Sutton and A.G. Barto (1998), Reinforcement Learning: An Introduction, MIT Press, 1998.
- Marco Wiering and Hado van Hasselt (2008). "Ensemble Algorithms in Reinforcement Learning". In: IEEE Transactions, SMC Part B, special issue on Adaptive Dynamic Programming and Reinforcement Learning in Feedback Control. August 2008. pp. 930-936.

- Notation
- Using only state-action values:
- Also using state values:
- Using continuous actions: