Teaching
CMPUT 365: Introduction to Reinforcement Learning - F23
Lectures: Mon/Wed/Fri 13:00 - 13:50, SAB 4-36.
Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].
Instruction team:
- Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 10:00 - 12:00 (ATH 3-08) - Anna Hakhverdyan
cmput365@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50) - David Szepesvari
cmput365@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)
- Bryan Chan
cmput365@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13) - Gábor Mihucz
cmput365@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13) - Marcos Menon José
cmput365@ualberta.ca
Office hours: Friday 10:00 - 12:00 (CAB 3-13)
Lecture slides:
See Syllabus for more information.
- Lecture 1: Course overview [pdf]
- Lecture 2: Background review I [pdf]
- Lecture 3: Background review II [pdf] Additional reading:
- Lecture 4: Bandits I [pdf]
- Lecture 5: Bandits II [pdf]
- Lecture 6: MDPs I [pdf]
- Lecture 7: MDPs II [pdf]
- Lecture 8: MDPs III [pdf]
- Lecture 9: Value Functions & Bellman Equations I [pdf]
- Lecture 10: Value Functions & Bellman Equations II [pdf]
- Lecture 11: Value Functions & Bellman Equations III [pdf]
- Lecture 12: Dynamic Programming I [pdf]
- Lecture 13: Dynamic Programming II [pdf]
- Lecture 14: Dynamic Programming III [pdf]
- Lecture 15: Monte Carlo Methods I [pdf]
- Lecture 16: Monte Carlo Methods II [pdf]
- Lecture 17: Monte Carlo Methods III [pdf]
- Lecture 18: Temporal-difference Learning for Prediction I [pdf]
- Lecture 19: Temporal-difference Learning for Prediction II [pdf]
- Lecture 20: Temporal-difference Learning for Prediction III [pdf]
- Lecture 21: Temporal-difference Learning for Control I [pdf]
- Lecture 22: Temporal-difference Learning for Control II [pdf]
- Lecture 23: Temporal-difference Learning for Control III [pdf]
- Lecture 24: Planning and Learning with Tabular Methods I [pdf]
- Lecture 25: Planning and Learning with Tabular Methods II [pdf]
- Lecture 26: Planning and Learning with Tabular Methods III [pdf]
- Lecture 27: RL with Function Approximation I [pdf]
- Lecture 28: RL with Function Approximation II [pdf]
- Lecture 29: RL with Function Approximation III [Recorded - Unlisted on YouTube]
- Lecture 30: Feature Construction for RL I [pdf]
- Lecture 31: Feature Construction for RL II [pdf]
- Lecture 32: Feature Construction for RL III [pdf]
- Lecture 33: Guest Lecture by Rich Sutton
- Lecture 34: Control with Function Approximation I [pdf]
- ...
CMPUT 655: Reinforcement Learning I - F23
Lectures: Fri 14:00 - 16:50, ETLC E2-001.
Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].
Instruction team:
- Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 14:00 - 16:00 (ATH 3-08) - Anna Hakhverdyan
cmput655@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50) - David Szepesvari
cmput655@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)
- Bryan Chan
cmput655@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13) - Gábor Mihucz
cmput655@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13)
Lecture slides:
See Syllabus for more information.
- Lecture 1: Course overview & background review [pdf] Additional reading:
- Probabilities and Expectations by A. Rupam Mahmood
- The Matrix Cookbook by K. B. Petersen and M. S. Pedersen
- Lecture 2: An introduction to sequential decision-making (Bandits) [pdf] Additional reading:
- P. Auer, N. Cesa-Bianchi, P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2-3): 235-256 (2002) [link]
- Blog posts with different presentations of the proof of UCB1's regret bound, by Ann He and Jeremy Kun
- Lecture by Balaraman Ravindran going over the proof of UCB1's regret bound
- Lecture 3: Markov Decision Processes, and Value Functions [pdf] Additional reading:
- T. Wang, D. Lizotte, M. Bowling, D. Schuurmans: Dual Representations for Dynamic Programming. [link]
- Lecture 4: Dynamic Programming and Monte Carlo Prediction [pdf]
- Lecture 5: Temporal-Difference Learning [pdf]
- Lecture 6: Multi-step TD, GVFs, and Planning [pdf] Additional reading:
- R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. AAMAS 2011. [link]
- L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning. ECML 2006. [link]
- Lecture 7: TD Learning with Function Approximation I [pdf]
- Lecture 8: TD Learning with Function Approximation II [pdf]
and guest lecture by Andy Patterson - Lecture 9: TD Learning with Function Approximation III and Eligibility Traces [pdf]
- Lecture 9.5: Algorithm Selection and Evaluation in RL [pdf]
- Lecture 10: Policy Gradient Methods [Recorded - Unlisted on YouTube]
- Lecture 11: Deep Reinforcement Learning I [pdf]
and guest lecture by Rich Sutton - ...