# Teaching

# Current

CMPUT 365: Introduction to Reinforcement Learning - F24

**Lectures: Mon/Wed/Fri 13:00 - 13:50, ESB 3-27**.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).

Course syllabus [link].

**Instruction team:**

- Marlos C. Machado,
**Instructor**

machado@ualberta.ca

Office hours: Thu 13:00 - 15:00 (ATH 3-08) - Prabhat Nagarajan

cmput365@ualberta.ca

Office hours: Mon 11:00 - 13:00 (CSC 2-50) - Lucas Cruz

cmput365@ualberta.ca

Office hours: Tue 10:00 - 12:00 (CAB 3-13)

- Harshil Kotamreddy

cmput365@ualberta.ca

Office hours: Wednesday 10:00 - 12:00 (CAB 3-13) - Mohamed Mohamed

cmput365@ualberta.ca

Office hours: Thursday 10:00 - 12:00 (CAB 3-13) - Marcos Menon José

cmput365@ualberta.ca

Office hours: Friday 10:00 - 12:00 (CAB 3-13)

**Lecture slides:**

See Syllabus for more information.

- Lecture 1: Course overview [pdf]
- Lecture 2: Background review [pdf] Additional reading:
- Lecture 3: Bandits I [pdf]
- Lecture 4: Bandits II [pdf]
- Lecture 5: MDPs I [pdf]
- Lecture 6: MDPs II [pdf]
- Lecture 7: Value Functions & Bellman Equations I [pdf]
- Lecture 8: Value Functions & Bellman Equations II [pdf]
- Lecture 9: Value Functions & Bellman Equations III [pdf]
- Lecture 10: Dynamic Programming I [pdf]
- Lecture 11: Dynamic Programming II [pdf]
- Lecture 12: Q&A: General Overview [pdf]
- Lecture 13: Monte Carlo Methods I [pdf]
- Lecture 14: Monte Carlo Methods II [pdf]
- Lecture 15: Monte Carlo Methods III [pdf]
- Lecture 16: Temporal-difference Learning for Prediction I [pdf]
- Lecture 17: Temporal-difference Learning for Prediction II [pdf]
- Lecture 18: Temporal-difference Learning for Prediction III [pdf]
- Lecture 19: Temporal-difference Learning for Control I [pdf]
- Lecture 20: Temporal-difference Learning for Control II [pdf]
- Lecture 21: Planning and Learning with Tabular Methods I [pdf]
- Lecture 22: Planning and Learning with Tabular Methods II [pdf]
- Lecture 23: RL with Function Approximation I [pdf]
- Lecture 24: RL with Function Approximation II [pdf]
- Lecture 25: RL with Function Approximation III [pdf]

# Past

CMPUT 365: Introduction to Reinforcement Learning - F23

**Lectures: Mon/Wed/Fri 13:00 - 13:50, SAB 4-36**.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).

Course syllabus [link].

**Instruction team:**

- Marlos C. Machado,
**Instructor**

machado@ualberta.ca

Office hours: Thu 10:00 - 12:00 (ATH 3-08) - Anna Hakhverdyan

cmput365@ualberta.ca

Office hours: Mon 11:00 - 13:00 (CSC 3-50) - David Szepesvari

cmput365@ualberta.ca

Office hours: Tue 13:00 - 15:00 (CSC 3-50)

- Bryan Chan

cmput365@ualberta.ca

Office hours: Wednesday 14:00 - 16:00 (CAB 3-13) - Gábor Mihucz

cmput365@ualberta.ca

Office hours: Wednesday 9:15-11:15 (CAB 3-13) - Marcos Menon José

cmput365@ualberta.ca

Office hours: Friday 10:00 - 12:00 (CAB 3-13)

**Lecture slides:**

See Syllabus for more information.

- Lecture 1: Course overview
- Lecture 2: Background review I
- Lecture 3: Background review II Additional reading:
- Lecture 4: Bandits I
- Lecture 5: Bandits II
- Lecture 6: MDPs I
- Lecture 7: MDPs II
- Lecture 8: MDPs III
- Lecture 9: Value Functions & Bellman Equations I
- Lecture 10: Value Functions & Bellman Equations II
- Lecture 11: Value Functions & Bellman Equations III
- Lecture 12: Dynamic Programming I
- Lecture 13: Dynamic Programming II
- Lecture 14: Dynamic Programming III
- Lecture 15: Monte Carlo Methods I
- Lecture 16: Monte Carlo Methods II
- Lecture 17: Monte Carlo Methods III
- Lecture 18: Temporal-difference Learning for Prediction I
- Lecture 19: Temporal-difference Learning for Prediction II
- Lecture 20: Temporal-difference Learning for Prediction III
- Lecture 21: Temporal-difference Learning for Control I
- Lecture 22: Temporal-difference Learning for Control II
- Lecture 23: Temporal-difference Learning for Control III
- Lecture 24: Planning and Learning with Tabular Methods I
- Lecture 25: Planning and Learning with Tabular Methods II
- Lecture 26: Planning and Learning with Tabular Methods III
- Lecture 27: RL with Function Approximation I
- Lecture 28: RL with Function Approximation II
- Lecture 29: RL with Function Approximation III [Recorded - Unlisted on YouTube]
- Lecture 30: Feature Construction for RL I
- Lecture 31: Feature Construction for RL II
- Lecture 32: Feature Construction for RL III
- Lecture 33:
*Guest Lecture by Rich Sutton* - Lecture 34: Control with Function Approximation I
- Lecture 35: Control with Function Approximation II

CMPUT 655: Reinforcement Learning I - F23

**Lectures: Fri 14:00 - 16:50, ETLC E2-001**.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).

Course syllabus [link].

**Instruction team:**

- Marlos C. Machado,
**Instructor**

machado@ualberta.ca

Office hours: Thu 14:00 - 16:00 (ATH 3-08) - Anna Hakhverdyan

cmput655@ualberta.ca

Office hours: Mon 11:00 - 13:00 (CSC 3-50) - David Szepesvari

cmput655@ualberta.ca

Office hours: Tue 13:00 - 15:00 (CSC 3-50)

- Bryan Chan

cmput655@ualberta.ca

Office hours: Wednesday 14:00 - 16:00 (CAB 3-13) - Gábor Mihucz

cmput655@ualberta.ca

Office hours: Wednesday 9:15-11:15 (CAB 3-13)

**Lecture slides:**

See Syllabus for more information.

- Lecture 1: Course overview & background review [pdf] Additional reading:
- Probabilities and Expectations by A. Rupam Mahmood
- The Matrix Cookbook by K. B. Petersen and M. S. Pedersen
- Lecture 2: An introduction to sequential decision-making (Bandits) [pdf] Additional reading:
- P. Auer, N. Cesa-Bianchi, P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2-3): 235-256 (2002) [link]
- Blog posts with different presentations of the proof of UCB1's regret bound, by Ann He and Jeremy Kun
- Lecture by Balaraman Ravindran going over the proof of UCB1's regret bound
- Lecture 3: Markov Decision Processes, and Value Functions [pdf] Additional reading:
- T. Wang, D. Lizotte, M. Bowling, D. Schuurmans: Dual Representations for Dynamic Programming. [link]
- Lecture 4: Dynamic Programming and Monte Carlo Prediction [pdf]
- Lecture 5: Temporal-Difference Learning [pdf]
- Lecture 6: Multi-step TD, GVFs, and Planning [pdf] Additional reading:
- R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. AAMAS 2011. [link]
- L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning. ECML 2006. [link]
- Lecture 7: TD Learning with Function Approximation I [pdf]
- Lecture 8: TD Learning with Function Approximation II [pdf]

*and guest lecture by Andy Patterson* - Lecture 9: TD Learning with Function Approximation III and Eligibility Traces [pdf]
- Lecture 9.5: Algorithm Selection and Evaluation in RL [pdf]
- Lecture 10: Policy Gradient Methods [Recorded - Unlisted on YouTube]
- Lecture 11: Deep Reinforcement Learning I [pdf]

*and guest lecture by Rich Sutton* - Lecture 12: Deep Reinforcement Learning II [pdf]