Teaching

Current

CMPUT 628: Deep Reinforcement Learning - W26

CMPUT 365: Introduction to Reinforcement Learning - W26

Lectures: Mon/Wed/Fri 13:00 - 13:50, TEL 150.

Link to course's page on Canvas and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: After Class
Diego Gomez
cmput365@ualberta.ca
Office hours: TBD (TBD)
Parham Panahi
cmput365@ualberta.ca
Office hours: TBD (TBD)
Sai Shashank Gunuputi
cmput365@ualberta.ca
Office hours: TBD (TBD)
Siddarth Chandrasekar
cmput365@ualberta.ca
Office hours: TBD (TBD)

Lucas Cruz
cmput365@ualberta.ca
Office hours: TBD (TBD)
Aaron Ma
cmput365@ualberta.ca
Office hours: TBD (TBD)
Yuyang Wang
cmput365@ualberta.ca
Office hours: TBD (TBD)
Daria Brovchenko
cmput365@ualberta.ca
Office hours: TBD (TBD)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview [pdf]
Lecture 2: Background review [pdf]
Lectures 3-4: Bandits [pdf]
Lectures 5-6: MDPs [pdf]
Lectures 7-9: Value Functions and Bellman Equations [pdf]
Lectures 10-11: Dynamic Programming [pdf]
Lectures 12-14: Monte Carlo Methods [pdf]
Lectures 15-17: TD Learning for Prediction [pdf]
Lectures 18-19: TD Learning for Control [pdf]
Lectures 20-21: Planning and Learning with Tabular Methods [pdf]
Lectures 22-24: TD with Function Approximation [pdf]
Lectures 25-27: Feature Construction [pdf]
Lectures 28-30: Control with FA [pdf]
Lectures 31-33: Policy Gradient [pdf]
Lecture 34: Guest Lecture by Rich Sutton
Lecture 35: Q&A

Past

CMPUT 365: Introduction to Reinforcement Learning - F25

Lectures: Mon/Wed/Fri 13:00 - 13:50, ESB 3-27.

Link to course's page on Canvas and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: After Class
Amirhossein Rajabpour
cmput365@ualberta.ca
Office hours: Thu 11:00 - 13:00 (UCOMM 2-138)
Bavish Kulur
cmput365@ualberta.ca
Office hours: Fri 15:00 - 17:00 (UCOMM 2-138)
Dikshant
cmput365@ualberta.ca
Office hours: Wed 09:00 - 11:00 (UCOMM 2-138)

Lucas Cruz
cmput365@ualberta.ca
Office hours: Mon 15:00 - 17:00 (UCOMM 2-138)
Siddarth Chandrasekar
cmput365@ualberta.ca
Office hours: Tue 13:00 - 15:00 (UCOMM 3-162)
Sai Shashank Gunuputi
cmput365@ualberta.ca
Office hours: Fri 10:00 - 12:00 (UCOMM 2-138)
Tian Tian
cmput365@ualberta.ca
Office hours: Tue 15:00 - 17:00 (UCOMM 2-138)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview
Lecture 2: Bandits I
Lecture 3: Background review [Recorded - Unlisted on YouTube]

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood

Lecture 4: Guest Lecture by Rich Sutton
Lecture 5: Bandits II
Lecture 6: MDPs I
Lecture 7: MDPs II
Lecture 8: Value Functions & Bellman Equations I
Lecture 9: Value Functions & Bellman Equations II
Lecture 10: Value Functions & Bellman Equations III
Lecture 11: Dynamic Programming I
Lecture 12: Dynamic Programming II
Lecture 13: Overview and Q&A
Lecture 14: Monte Carlo Methods I
Lecture 15: Monte Carlo Methods II
Lecture 16: Monte Carlo Methods III
Lecture 17: TD Learning for Prediction I
Lecture 18: TD Learning for Prediction II
Lecture 19: TD Learning for Prediction III
Lecture 20: TD Learning for Control I
Lecture 21: TD Learning for Control II
Lecture 22: Planning and Learning with Tabular Methods I
Lecture 23: Planning and Learning with Tabular Methods II
Lecture 24: On-policy Prediction with Function Approximation I
Lecture 25: On-policy Prediction with Function Approximation II
Lecture 26: On-policy Prediction with Function Approximation III
Lecture 27: Feature Construction for RL I
Lecture 28: Feature Construction for RL II
Lecture 29: Feature Construction for RL III
Lecture 30: Control with Function Approximation I
Lecture 31: Control with Function Approximation II
Lecture 32: Control with Function Approximation III
Lecture 33: Policy Gradient Methods I
Lecture 34: Policy Gradient Methods II
Lecture 35: Policy Gradient Methods III
Lecture 36: Overview and Q&A

CMPUT 628: Deep Reinforcement Learning - W25

CMPUT 365: Introduction to Reinforcement Learning - F24

Lectures: Mon/Wed/Fri 13:00 - 13:50, ESB 3-27.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 13:00 - 15:00 (ATH 3-08)
Prabhat Nagarajan
cmput365@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 2-50)
Lucas Cruz
cmput365@ualberta.ca
Office hours: Tue 10:00 - 12:00 (CAB 3-13)

Harshil Kotamreddy
cmput365@ualberta.ca
Office hours: Wednesday 10:00 - 12:00 (CAB 3-13)
Mohamed Mohamed
cmput365@ualberta.ca
Office hours: Thursday 10:00 - 12:00 (CAB 3-13)
Marcos Menon José
cmput365@ualberta.ca
Office hours: Friday 10:00 - 12:00 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview
Lecture 2: Background review

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood

Lecture 3: Bandits I
Lecture 4: Bandits II
Lecture 5: MDPs I
Lecture 6: MDPs II
Lecture 7: Value Functions & Bellman Equations I
Lecture 8: Value Functions & Bellman Equations II
Lecture 9: Value Functions & Bellman Equations III
Lecture 10: Dynamic Programming I
Lecture 11: Dynamic Programming II
Lecture 12: Q&A: General Overview
Lecture 13: Monte Carlo Methods I
Lecture 14: Monte Carlo Methods II
Lecture 15: Monte Carlo Methods III
Lecture 16: Temporal-difference Learning for Prediction I
Lecture 17: Temporal-difference Learning for Prediction II
Lecture 18: Temporal-difference Learning for Prediction III
Lecture 19: Temporal-difference Learning for Control I
Lecture 20: Temporal-difference Learning for Control II
Lecture 21: Planning and Learning with Tabular Methods I
Lecture 22: Planning and Learning with Tabular Methods II
Lecture 23: RL with Function Approximation I
Lecture 24: RL with Function Approximation II
Lecture 25: RL with Function Approximation III
Lecture 26: Feature Construction for RL I
Lecture 27: Feature Construction for RL II
Lecture 28: Feature Construction for RL III
Lecture 29: Control with Function Approximation I
Lecture 30: Control with Function Approximation II
Lecture 31: Control with Function Approximation III
Lecture 32: Policy Gradient Methods I
Lecture 33: Policy Gradient Methods II
Lecture 34: Policy Gradient Methods III
Lecture 35: Guest Lecture by Rich Sutton

CMPUT 365: Introduction to Reinforcement Learning - F23

Lectures: Mon/Wed/Fri 13:00 - 13:50, SAB 4-36.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 10:00 - 12:00 (ATH 3-08)
Anna Hakhverdyan
cmput365@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50)
David Szepesvari
cmput365@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)

Bryan Chan
cmput365@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
Gábor Mihucz
cmput365@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13)
Marcos Menon José
cmput365@ualberta.ca
Office hours: Friday 10:00 - 12:00 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview
Lecture 2: Background review I
Lecture 3: Background review II

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood

Lecture 4: Bandits I
Lecture 5: Bandits II
Lecture 6: MDPs I
Lecture 7: MDPs II
Lecture 8: MDPs III
Lecture 9: Value Functions & Bellman Equations I
Lecture 10: Value Functions & Bellman Equations II
Lecture 11: Value Functions & Bellman Equations III
Lecture 12: Dynamic Programming I
Lecture 13: Dynamic Programming II
Lecture 14: Dynamic Programming III
Lecture 15: Monte Carlo Methods I
Lecture 16: Monte Carlo Methods II
Lecture 17: Monte Carlo Methods III
Lecture 18: Temporal-difference Learning for Prediction I
Lecture 19: Temporal-difference Learning for Prediction II
Lecture 20: Temporal-difference Learning for Prediction III
Lecture 21: Temporal-difference Learning for Control I
Lecture 22: Temporal-difference Learning for Control II
Lecture 23: Temporal-difference Learning for Control III
Lecture 24: Planning and Learning with Tabular Methods I
Lecture 25: Planning and Learning with Tabular Methods II
Lecture 26: Planning and Learning with Tabular Methods III
Lecture 27: RL with Function Approximation I
Lecture 28: RL with Function Approximation II
Lecture 29: RL with Function Approximation III [Recorded - Unlisted on YouTube]
Lecture 30: Feature Construction for RL I
Lecture 31: Feature Construction for RL II
Lecture 32: Feature Construction for RL III
Lecture 33: Guest Lecture by Rich Sutton
Lecture 34: Control with Function Approximation I
Lecture 35: Control with Function Approximation II

CMPUT 655: Reinforcement Learning I - F23

Lectures: Fri 14:00 - 16:50, ETLC E2-001.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:

Marlos C. Machado, Instructor
machado@ualberta.ca
Office hours: Thu 14:00 - 16:00 (ATH 3-08)
Anna Hakhverdyan
cmput655@ualberta.ca
Office hours: Mon 11:00 - 13:00 (CSC 3-50)
David Szepesvari
cmput655@ualberta.ca
Office hours: Tue 13:00 - 15:00 (CSC 3-50)

Bryan Chan
cmput655@ualberta.ca
Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
Gábor Mihucz
cmput655@ualberta.ca
Office hours: Wednesday 9:15-11:15 (CAB 3-13)

Lecture slides:
See Syllabus for more information.

Lecture 1: Course overview & background review [pdf]

Additional reading:

Probabilities and Expectations by A. Rupam Mahmood
The Matrix Cookbook by K. B. Petersen and M. S. Pedersen

Lecture 2: An introduction to sequential decision-making (Bandits) [pdf]

Additional reading:

P. Auer, N. Cesa-Bianchi, P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2-3): 235-256 (2002) [link]
Blog posts with different presentations of the proof of UCB1's regret bound, by Ann He and Jeremy Kun
Lecture by Balaraman Ravindran going over the proof of UCB1's regret bound

Lecture 3: Markov Decision Processes, and Value Functions [pdf]

Additional reading:

T. Wang, D. Lizotte, M. Bowling, D. Schuurmans: Dual Representations for Dynamic Programming. [link]

Lecture 4: Dynamic Programming and Monte Carlo Prediction [pdf]
Lecture 5: Temporal-Difference Learning [pdf]
Lecture 6: Multi-step TD, GVFs, and Planning [pdf]

Additional reading:

R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. AAMAS 2011. [link]
L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning. ECML 2006. [link]

Lecture 7: TD Learning with Function Approximation I [pdf]
Lecture 8: TD Learning with Function Approximation II [pdf]
and guest lecture by Andy Patterson
Lecture 9: TD Learning with Function Approximation III and Eligibility Traces [pdf]
Lecture 9.5: Algorithm Selection and Evaluation in RL [pdf]
Lecture 10: Policy Gradient Methods [Recorded - Unlisted on YouTube]
Lecture 11: Deep Reinforcement Learning I [pdf]
and guest lecture by Rich Sutton
Lecture 12: Deep Reinforcement Learning II [pdf]