Stone, P., Sutton, R.S. (2001).
Scaling reinforcement learning toward RoboCup soccer. Proceedings
of the 18th International Conference on Machine Learning, pp.
537-544.
Precup, D., Sutton, R.S., Dasgupta, S. (2001).
Off-policy temporal-difference learning with function approximation.
Re-typeset version.
Proceedings of the 18th International Conference on Machine Learning.
Associated talk from 7/1/01.
Stone, P., Sutton, R.S., Singh, S.
(2001).
Reinforcement Learning for 3 vs. 2 Keepaway. In: RoboCup-2000:
Robot Soccer World Cup IV, P. Stone, T. Balch, and G.
Kraetszchmar, Eds., Springer Verlag. An earlier
version appeared in the Proceedings
of the RoboCup-2000 Workshop, Melbourne, Australia.
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y. (2000).
Policy Gradient Methods for Reinforcement Learning with Function
Approximation.
Advances in Neural Information Processing Systems 12
(Proceedings of the 1999 conference), pp. 1057-1063.
MIT Press. An earlier version, as submitted May 1999, appeared as an
AT&T Labs Technical Report. Later we began but never finished
another paper, Comparing
Policy-Gradient Algorithms.
Precup, D., Sutton, R.S., Singh, S. (2000).
Eligibility Traces for Off-Policy Policy Evaluation.
Proceedings of the 17th International Conference on Machine
Learning, pp. 759-766. Morgan Kaufmann.
Sutton, R.S. (1999). Open
theoretical
questions in reinforcement learning. In Proceedings of the
Fourth European Conference on
Computational Learning Theory (Proceedings EuroCOLT'99), pp.
11-17, Fischer, P., Simon, H.U., Eds. Springer-Verlag.
Sutton, R.S. (1999). Reinforcement
Learning. In Rob Wilson and Frank Keil (Eds.) The MIT Encyclopedia of the
Cognitive Sciences,
MIT Press.
Sutton, R.S., Precup, D., Singh, S. (1999).
Between MDPs and semi-MDPs:
A Framework for Temporal Abstraction in Reinforcement Learning. Artificial
Intelligence 112:181-211. An earlier version appeared as
Technical Report 98-74, Department of
Computer
Science, University of Massachusetts, Amherst, MA 01003. April, 1998. Associated talk from 3/5/98.
Humorous excerpts
from the JAIR reviews recommending rejection of this paper.
Sutton, R.S., Singh, S., Precup, D.,
Ravindran, B. (1999).
Improved switching among temporally abstract actions. Djvu version.
Advances in Neural Information Processing Systems 11
(Proceedings of the 1998 conference),
MIT Press. Associated
talk from 12/2/98.
Moll, R., Barto, A.G., Perkins, T. J., Sutton, R.S. (1999).
Learning instance-independent value functions to enhance local search.
Advances in Neural Information Processing Systems 11
(Proceedings of the 1998 conference), pp. 1017-1023.
MIT Press.
Sutton, R. S., Reinforcement learning:
Past, present, and future. Extended abstract in Simulated Evolution and Learning,
McKay, B., Yao, X., Newton, C. S., Kim, J.-H., Furuhashi, T., Eds.
Published as Lecture Notes in
Computer Science 1585, pp. 195–197, Springer, 1999.
Sutton, R.S., Barto, A.G. (1998).
Reinforcement
Learning: An Introduction.
MIT Press.
Sutton, R.S., Precup, D., Singh, S. (1998).
Intra-option learning about temporally abstract actions.
Proceedings of the 15th International Conference on Machine
Learning,
pp. 556-564.
Morgan Kaufmann.
Precup, D., Sutton, R.S. (1998)
Multi-time models for temporally abstract planning. Djvu version.
Advances in Neural Information Processing Systems 10, MIT
Press.
McGovern, A., and Sutton, R.S. (1998).
Macro-actions in reinforcement learning: An empirical analysis.
Technical Report 98-70, University of Massachusetts, Department of
Computer Science.
Precup, D., Sutton, R.S., Singh, S. (1998).
Theoretical results on reinforcement learning with temporally abstract
options.
In Machine Learning: ECML-98, Proceedings of the 10th European
Conference on Machine Learning, Chemnitz, Germany, pp. 382-393.
Springer Verlag.
Santamaria, J.C., Sutton, R.S., Ram, A.
(1998).
Experiments with reinforcement learning in problems with continuous
state and action spaces,
Adaptive Behavior 6(2): 163-218. postscript.
An earlier version appeared
as
Technical Report UM-CS-1996-088, Department of Computer Science,
University of Massachusetts, Amherst, MA 01003.
The source code for all the experiments is also available here.
McGovern, A., Precup, D., Ravindran, B.,
Singh, S., Sutton, R.S.
(1998). Hierarchical Optimal Control of
MDPs. Proceedings of the Tenth Yale Workshop on Adaptive and
Learning Systems,
pp. 186-191.
Barto, A. G., Sutton, R. S., Reinforcement
learning in artificial
intelligence. In Neural Network Models of Cognition, Donahoe, J.
W.,
Packard Dorsel, V., Eds., pp. 358–386, Elsevier, 1997.
Sutton, R.S. (1997).
On the significance of Markov decision
processes.
In W. Gerstner, A. Germond, M. Hasler, and J.-D. Nicoud (Eds.) Artificial
Neural Networks -- ICANN'97, pp. 273-282. Springer.
Precup, D., Sutton, R.S. (1997).
Multi-time models for
reinforcement learning. Proceedings of the
ICML'97 Workshop on
Modelling in Reinforcement Learning.
Precup, D., Sutton, R.S. (1997).
Exponentiated gradient methods
for
reinforcement learning.
Proceedings of the 14th International Conference on Machine
Learning,
pp. 272-277, Morgan Kaufmann.
McGovern, A., Sutton, R.S., Fagg,
A.H. (1997).
Roles of macro-actions in accelerating
reinforcement learning.
Proceedings of the 1997 Grace Hopper Celebration of Women in
Computing,
pp. 13-17.
Precup, D., Sutton, R.S., Singh, S.P. (1997).
Planning with closed-loop macro actions.
Working notes of the 1997 AAAI Fall Symposium on Model-directed
Autonomous
Systems, pp. 70-76.
Mehra, R.K., Ravichandran, B., Cabrera, J.B.D., Greve, D.N., Sutton,
R.S. (1997).
Towards self-learning
adaptive scheduling for ATM networks, Proceedings of the
36th Conference on Decision and Control, pp. 2393-2398, San Diego,
California USA.
Sutton, R.S. (1996).
Generalization in reinforcement
learning: Successful examples using sparse coarse
coding. Camera ready postscript. Digitally remastered pdf.
Djvu version.
Advances in Neural Information Processing Systems 8
(Proceedings of the 1995 conference),
pp. 1038-1044, MIT Press. Errata: there is small error in the equations
of motion of the acrobot; for the correct equations, see the
RL textbook page on the acrobot (thanks to Barry Nichols for
spotting this).
Singh, S.P., Sutton, R.S. (1996).
Reinforcement learning with
replacing eligibility traces. Machine Learning 22:
123-158.
Precup, D., Sutton, R.S. (1996).
Empirical
comparison of gradient descent and exponentiated gradient descent in
supervised and reinforcement learning. Technical Report
UM-CS-1996-070, Department of Computer Science, University of
Massachusetts, Amherst, MA 01003.
Kuvayev, L., Sutton, R.S. (1996).
Model-based reinforcement
learning with an approximate, learned model.
Proceedings of the Ninth Yale Workshop on Adaptive and Learning
Systems,
pp. 101-105, Yale University, New Haven, CT. Some additional results
are in this earlier version of the
same paper.
Mehra, R.K., Ravichandran, B., Sutton, R.S. (1996). Adaptive
intelligent scheduling for ATM networks.
Proceedings of the Ninth Yale Workshop on Adaptive and Learning
Systems,
pp. 106-111, Yale University, New Haven, CT.
Sutton, R.S. (1995). On
the Virtues of Linear Learning and Trajectory Distributions. Proceedings of the Workshop on Value
Function Approximation, Machine Learning Conference.
Sutton, R.S. (1995).
TD models: Modeling the world at a
mixture of time scales.
Proceedings of the Twelfth International Conference on Machine
Learning,
pp. 531-539, Morgan Kaufmann.
Sutton, R.S., Singh, S.P. (1994).
On bias and step size in
temporal-difference learning.
Proceedings of the Eighth Yale Workshop on Adaptive and Learning
Systems, pp. 91-96, Yale University, New Haven, CT.
Sutton, R.S., Whitehead, S.D. (1993).
Online learning with random
representations.
Proceedings of the Tenth International Conference on Machine
Learning, pp. 314-321, Morgan Kaufmann.
Sutton, R.S. (1992).
Adapting bias by gradient
descent: An
incremental version of delta-bar-delta.
Proceedings of the Tenth National Conference on Artificial
Intelligence,
pp. 171-176, MIT Press.
Associated talk from 2/2/04.
Sutton, R.S.
(1992).
Gain adaptation beats least squares? Proceedings of the
Seventh Yale Workshop on Adaptive and Learning Systems, pp.
161-166, Yale University, New Haven, CT.
Sutton, R.S. (1992).
Machines that Learn and
Mimic the Brain.
In ACCESS, GTE's Journal of Science and Technology, 1992.
Reprinted in Stethoscope Quarterly, Spring.
Sutton, R.S. (1992).
Reinforcement learning
architectures.
Proceedings ISKIT'92 International Symposium on Neural Information
Processing,
Fukuoka, Japan.
Sutton, R.S. (Ed.) (1992). Reinforcement
Learning, book
version of a special issue of Machine Learning on
reinforcement learning (Volume 8, Numbers 3/4). Kluwer. Introduction: The challenge of
reinforcement learning.
Sanger, T.D., Sutton, R.S., Matheus, C.J. (1992).
Iterative construction of sparse polynomial approximations.
Advances in Neural Information Processing Systems 4, pp.
1064-1071, Morgan Kaufmann.
Gluck, M., Glauthier, P., Sutton, R.S. (1992).
Adaptation of cue-specific learning rates
in network models of human
category learning, Proceedings of the Fouteenth
Annual Conference of the Cognitive
Science Society, pp. 540-545, Erlbaum.
Sutton,
R.S. (1991).
Planning by incremental dynamic programming.
Proceedings of the Eighth International Workshop on Machine
Learning, pp. 353-357, Morgan Kaufmann.
Sutton, R.S. (1991).
Dyna, an integrated architecture for
learning, planning and reacting.
Working Notes of the 1991 AAAI Spring Symposium on
Integrated Intelligent Architectures and SIGART Bulletin 2,
pp. 160-163.
Sutton, R.S. (1991).
Integrated
modeling and control based on reinforcement
learning and dynamic programming.
In D. S. Touretzky (ed.), Advances in Neural
Information Processing Systems 3, pages 471-478.
Sutton, R.S. (1991).
Reinforcement learning
architectures for animats,
Proceedings of the First International Conference on Simulation
of Adaptive Behavior: From Animals to Animats, pp. 288-296. smaller gzipped version.
Sutton, R.S., Matheus, C.J. (1991).
Learning polynomial functions
by feature construction. Proceedings of the Eighth
International Workshop on Machine Learning, pp. 208-212, Morgan
Kaufmann.
Sutton, R.S., Barto,
A.G., Williams, R. (1991).
Reinforcement learning is
direct adaptive optimal control, Proceedings of the American
Control Conference, pages 2143-2146. Also published in IEEE
Control
Systems Magazine 12, No. 2, 19-22.
Miller, W. T., Sutton, R. S.,
Werbos, P. J. (Eds.), Neural Networks for Control. MIT Press,
1991.
Sutton, R.S. (1990).
Integrated architectures for learning, planning, and reacting based on
approximating dynamic programming.
Proceedings of the Seventh International Conference on Machine
Learning, pp. 216-224, Morgan Kaufmann. Also appeared as
"Artificial intelligence by dynamic programming," in
Proceedings of the Sixth Yale Workshop on Adaptive and Learning Systems,
pp. 89-95. smaller gzipped
postscript version.
Sutton, R.S. (1990).
First results with Dyna,
an integrated architecture for
learning, planning, and reacting. In Neural Networks for
Control,
Miller, T., Sutton, R.S., & Werbos, P., Eds., MIT Press.
Sutton,
R.S., Barto, A.G. (1990).
Time-derivative models of pavlovian reinforcement.
In Learning and Computational Neuroscience: Foundations of
Adaptive Networks, M. Gabriel and J. Moore, Eds., pp. 497-537. MIT
Press.
Barto, A.G., Sutton, R.S., Watkins, C.J.C.H. (1990). Learning and
sequential decision making. In Learning and Computational
Neuroscience: Foundations of Adaptive Networks, M. Gabriel and
J.W. Moore, Eds., pp. 539-602, MIT Press. The following abstract is
from the
technical report version
of this paper.
Barto, A.G., Sutton, R.S., & Watkins, C.
(1990).
Sequential
decision problems and neural networks.
In D. S. Touretzky (ed.), Advances in Neural
Information Processing Systems 2, pp. 686-693.
Whitehead, S., Sutton, R.S., & Ballard, D. (1990).
Advances in reinforcement learning
and their implications for intelligent control, Proceedings
of
the
Fifth IEEE International Symposium on Intelligent Control 1990,
pp. 1289-1297.
Franklin, J., Sutton, R.S., Anderson, C., Selfridge, O., &
Schwartz, D. (1990).
Connectionist learning control at GTE
laboratories, Intelligent
Control and Adaptive Systems, G. Rodriguez, Ed., Proc. SPIE 1196,
pp. 242-253.
Anderson, C., Franklin, J., & Sutton, R.S. (1990).
Learning a nonlinear model
of a manufacturing process using multilayer connectionist networks,
Proceedings of the
Fifth IEEE International Symposium on Intelligent Control 1990,
pp. 404-409.
Sutton, R.S. (1989).
Artificial intelligence as a control
problem: Comments on the
relationship between machine learning and intelligent control.
Appeared in Machine Learning in
a dynamic world.
Proceedings of the IEEE International Symposium on Intelligent Control
1988, pp. 500-507.
Sutton, R.S. (1989).
Implementation details of the
TD(lambda) procedure for the case
of vector predictions and backpropagation.
GTE Laboratories Technical Report TR87-509.1, as corrected August 1989.
GTE Laboratories, 40 Sylvan Road, Waltham, MA 02254.
Franklin, J., Sutton, R.S., & Anderson, C. (1989).
Application of connectionist learning
methods to manufacturing process monitoring, Proceedings of
the
IEEE International Symposium on Intelligent Control 1988, pp.
709-712.
Sutton, R.S. (1988).
Learning to predict by the methods of temporal differences.
Machine Learning 3: 9-44, erratum p. 377. Scan of paper as published,
with erratum. Digitally remastered
with missing figure in place.
Sutton, R.S. (1988).
Convergence theory for a new kind
of
prediction learning,
Proceedings of the 1988 Workshop on Computational Learning Theory,
pp. 421-42.
Sutton, R.S. (1988).
NADALINE: A normalized adaptive
linear element that learns efficiently.
GTE Laboratories Technical Report TR88-509.4. GTE Laboratories, 40
Sylvan Road, Waltham, MA 02254.
Selfridge, O., Sutton, R.S., & Anderson, C. (1988).
Selected bibliography on connectionism,
In: Evolution, Learning, and Cognition,
Y.C. Lee (Ed.), pp. 391-403, World Scientific Publishing.
Sutton, R.S., & Barto, A.G. (1987).
A temporal-difference model of
classical conditioning,
Proceedings of the Ninth Annual Conference of the Cognitive
Science Society, pp. 355-378.
Sutton, R.S. (1986).
Two problems with backpropagation and
other
steepest-descent learning procedures for networks,
Proceedings of the Eighth Annual Conference of the Cognitive
Science Society, pp. 823-831.
Reprinted in Artificial Neural Networks: Concepts and
Theory, edited by P. Mehra and B. Wah, IEEE Computer Society
Press,
1992. smaller gzipped version.
Moore, J., Desmond, J, Berthier, N., Blazis,
D., Sutton, R.S.,
& Barto, A.G. (1986). Simulation of
the classically conditioned
nictitating membrane
response by a neuron-like adaptive element: Response topography,
neuronal firing, and interstimulus intervals, Behavioural
Brain
Research 21: 143-154.
Sutton, R.S. (1985).
Learning distributed, searchable,
internal models,
Proceedings of the Distributed Artificial Intelligence Workshop,
pp. 287-289.
Sutton, R.S., & Pinette, B. (1985).
The learning of world models by
connectionist networks, Proceedings of the Seventh Annual
Conference of the Cognitive Science Society, pp. 54-64.
Barto, A.G. & Sutton, R.S. (1985).
Neural problem solving. In:
Synaptic
Modification, Neuron Selectivity, and Nervous System Organization,
W.B. Levy & J.A. Anderson (Eds.), pp. 123-152.
Lawrence Erlbaum.
Selfridge, O., Sutton, R.S., &
Barto, A.G. (1985).
Training and tracking in robotics,
Proceedings of the Ninth
International Joint Conference on Artificial Intelligence, pp.
670-672.
Moore, J., Desmond, J., Berthier, N., Blazis, D., Sutton, R.S., &
Barto, A.G. (1985).
Connectionist learning in real
time: Sutton-Barto
adaptive element and classical conditioning of the nictitating membrane
response, Proceedings of the Seventh Annual
Conference of the Cognitive Science Society, pp. 318-322.
Sutton, R.S. (1984).
Temporal credit assignment in
reinforcement learning (106 Mbytes). Ph.D.
dissertation, Department of Computer Science,
University of Massachusetts, Amherst, MA 01003.
Published as COINS Technical Report 84-2.
Barto, A.G., Sutton, R.S., & Anderson,
C. (1983).
Neuron-like adaptive
elements that can solve difficult learning control problems,
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:
834-846. 12.4 Mb pdf,
3.5 Mb gzipped
Sutton, R.S. (1982 - unpublished draft).
A theory of salience change dependent
on the relationship between discrepancies
on successive trials on which the stimulus is present. smaller gzipped version.
Barto, A.G. & Sutton, R.S. (1982).
Simulation of anticipatory
responses in classical conditioning by a neuron-like adaptive element,
Behavioral Brain Research 4:221-235.
Barto, A.G., Anderson, C., &
Sutton, R.S. (1982). Synthesis of
nonlinear control
surfaces by a layered associative network, Biological
Cybernetics 43:175-185.
Barto, A.G., Sutton, R.S., & Anderson, C. (1982).
Spatial learning simulation
systems, Proceedings of the 10th IMACS World Congress on
Systems
Simulation and Scientific Computation, pp. 204-206.
Sutton, R.S. (1981). Adaptation of learning rate parameters.
In: Goal Seeking Components for
Adaptive Intelligence: An Initial Assessment,
by A. G. Barto and R. S. Sutton. Air Force Wright Aeronautical
Laboratories Technical Report AFWAL-TR-81-1070. Wright-Patterson Air
Force Base, Ohio 45433.
Sutton, R.S., & Barto, A.G.
(1981).
Toward a modern theory of
adaptive networks: Expectation and prediction, Psychological
Review 88:135-140. Translated into Spanish by G.
Ruiz to appear in the journal Estudios de Psicologia.
Sutton, R.S., & Barto,
A.G. (1981).
An adaptive network that
constructs and uses
an internal model of its world, Cognition and Brain Theory 4:217-246.
Barto, A.G., Sutton, R.S. (1981).
Goal seeking components for
adaptive intelligence: An initial
assessment.
Air Force Wright Aeronautical Laboratories Technical Report
AFWAL-TR-81-1070.
Wright-Patterson Air Force Base, Ohio 45433. 524 pages. (Appendix C is available separately.)
Barto, A.G., Sutton, R.S., &
Brouwer, P. (1981).
Associative search network:
A reinforcement learning associative memory, Biological
Cybernetics 40:201-211.
Barto, A.G. & Sutton, R.S. (1981).
Landmark learning: An
illustration of associative search,
Biological Cybernetics 42:1-8.
Sutton, R.S. (1978).
Single channel theory: A neuronal
theory of learning,
Brain Theory Newsletter 3, No. 3/4, pp. 72-75.
(earliest publication)
Sutton, R.S. (1978 -
unpublished).
A unified theory of expectation in
classical and instrumental
conditioning.
Bachelors thesis, Stanford University.
Sutton, R.S. (1978 - unpublished).
Learning theory support for a
single channel theory of the brain.