Teaching

Current

CMPUT 628: Deep Reinforcement Learning - W25

Past

CMPUT 365: Introduction to Reinforcement Learning - F24

Lectures: Mon/Wed/Fri 13:00 - 13:50, ESB 3-27.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 13:00 - 15:00 (ATH 3-08)
Prabhat Nagarajan
cmput365@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 2-50)
Lucas Cruz
cmput365@ualberta.ca
Office hours: Tue 10:00 - 12:00 (CAB 3-13)

Harshil Kotamreddy
cmput365@ualberta.ca
Office hours: Wednesday 10:00 - 12:00 (CAB 3-13)
Mohamed Mohamed
cmput365@ualberta.ca
Office hours: Thursday 10:00 - 12:00 (CAB 3-13)
Marcos Menon José
cmput365@ualberta.ca
Office hours: Friday 10:00 - 12:00 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview [pdf]
Lecture 2: Background review [pdf]

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood

Lecture 3: Bandits I [pdf]
Lecture 4: Bandits II [pdf]
Lecture 5: MDPs I [pdf]
Lecture 6: MDPs II [pdf]
Lecture 7: Value Functions & Bellman Equations I [pdf]
Lecture 8: Value Functions & Bellman Equations II [pdf]
Lecture 9: Value Functions & Bellman Equations III [pdf]
Lecture 10: Dynamic Programming I [pdf]
Lecture 11: Dynamic Programming II [pdf]
Lecture 12: Q&A: General Overview [pdf]
Lecture 13: Monte Carlo Methods I [pdf]
Lecture 14: Monte Carlo Methods II [pdf]
Lecture 15: Monte Carlo Methods III [pdf]
Lecture 16: Temporal-difference Learning for Prediction I [pdf]
Lecture 17: Temporal-difference Learning for Prediction II [pdf]
Lecture 18: Temporal-difference Learning for Prediction III [pdf]
Lecture 19: Temporal-difference Learning for Control I [pdf]
Lecture 20: Temporal-difference Learning for Control II [pdf]
Lecture 21: Planning and Learning with Tabular Methods I [pdf]
Lecture 22: Planning and Learning with Tabular Methods II [pdf]
Lecture 23: RL with Function Approximation I [pdf]
Lecture 24: RL with Function Approximation II [pdf]
Lecture 25: RL with Function Approximation III [pdf]
Lecture 26: Feature Construction for RL I [pdf]
Lecture 27: Feature Construction for RL II [pdf]
Lecture 28: Feature Construction for RL III [pdf]
Lecture 29: Control with Function Approximation I [pdf]
Lecture 30: Control with Function Approximation II [pdf]
Lecture 31: Control with Function Approximation III [pdf]
Lecture 32: Policy Gradient Methods I [pdf]
Lecture 33: Policy Gradient Methods II [pdf]
Lecture 34: Policy Gradient Methods III [pdf]
Lecture 35: Guest Lecture by Rich Sutton

CMPUT 365: Introduction to Reinforcement Learning - F23

Lectures: Mon/Wed/Fri 13:00 - 13:50, SAB 4-36.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 10:00 - 12:00 (ATH 3-08)
Anna Hakhverdyan
cmput365@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50)
David Szepesvari
cmput365@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)

Bryan Chan
cmput365@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
Gábor Mihucz
cmput365@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13)
Marcos Menon José
cmput365@ualberta.ca
Office hours: Friday 10:00 - 12:00 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview
Lecture 2: Background review I
Lecture 3: Background review II

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood

Lecture 4: Bandits I
Lecture 5: Bandits II
Lecture 6: MDPs I
Lecture 7: MDPs II
Lecture 8: MDPs III
Lecture 9: Value Functions & Bellman Equations I
Lecture 10: Value Functions & Bellman Equations II
Lecture 11: Value Functions & Bellman Equations III
Lecture 12: Dynamic Programming I
Lecture 13: Dynamic Programming II
Lecture 14: Dynamic Programming III
Lecture 15: Monte Carlo Methods I
Lecture 16: Monte Carlo Methods II
Lecture 17: Monte Carlo Methods III
Lecture 18: Temporal-difference Learning for Prediction I
Lecture 19: Temporal-difference Learning for Prediction II
Lecture 20: Temporal-difference Learning for Prediction III
Lecture 21: Temporal-difference Learning for Control I
Lecture 22: Temporal-difference Learning for Control II
Lecture 23: Temporal-difference Learning for Control III
Lecture 24: Planning and Learning with Tabular Methods I
Lecture 25: Planning and Learning with Tabular Methods II
Lecture 26: Planning and Learning with Tabular Methods III
Lecture 27: RL with Function Approximation I
Lecture 28: RL with Function Approximation II
Lecture 29: RL with Function Approximation III [Recorded - Unlisted on YouTube]
Lecture 30: Feature Construction for RL I
Lecture 31: Feature Construction for RL II
Lecture 32: Feature Construction for RL III
Lecture 33: Guest Lecture by Rich Sutton
Lecture 34: Control with Function Approximation I
Lecture 35: Control with Function Approximation II

CMPUT 655: Reinforcement Learning I - F23

Lectures: Fri 14:00 - 16:50, ETLC E2-001.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 14:00 - 16:00 (ATH 3-08)
Anna Hakhverdyan
cmput655@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50)
David Szepesvari
cmput655@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)

Bryan Chan
cmput655@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
Gábor Mihucz
cmput655@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview & background review [pdf]

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood
The Matrix Cookbook by K. B. Petersen and M. S. Pedersen

Lecture 2: An introduction to sequential decision-making (Bandits) [pdf]

Additional reading:

P. Auer, N. Cesa-Bianchi, P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2-3): 235-256 (2002) [link]
Blog posts with different presentations of the proof of UCB1's regret bound, by Ann He and Jeremy Kun
Lecture by Balaraman Ravindran going over the proof of UCB1's regret bound

Lecture 3: Markov Decision Processes, and Value Functions [pdf]

Additional reading:

T. Wang, D. Lizotte, M. Bowling, D. Schuurmans: Dual Representations for Dynamic Programming. [link]

Lecture 4: Dynamic Programming and Monte Carlo Prediction [pdf]
Lecture 5: Temporal-Difference Learning [pdf]
Lecture 6: Multi-step TD, GVFs, and Planning [pdf]

Additional reading:

R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. AAMAS 2011. [link]
L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning. ECML 2006. [link]

Lecture 7: TD Learning with Function Approximation I [pdf]
Lecture 8: TD Learning with Function Approximation II [pdf]
and guest lecture by Andy Patterson
Lecture 9: TD Learning with Function Approximation III and Eligibility Traces [pdf]
Lecture 9.5: Algorithm Selection and Evaluation in RL [pdf]
Lecture 10: Policy Gradient Methods [Recorded - Unlisted on YouTube]
Lecture 11: Deep Reinforcement Learning I [pdf]
and guest lecture by Rich Sutton
Lecture 12: Deep Reinforcement Learning II [pdf]