Teaching

CMPUT 365: Introduction to Reinforcement Learning - F23

Lectures: Mon/Wed/Fri 13:00 - 13:50, SAB 4-36.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:
 
  • Marlos C. Machado, Instructor
    machado@ualberta.ca
    Office hours: Thu 10:00 - 12:00 (ATH 3-08)
     
  • Anna Hakhverdyan
    cmput365@ualberta.ca
    Office hours: Mon 11:00 - 13:00 (CSC 3-50)
     
  • David Szepesvari
    cmput365@ualberta.ca
    Office hours: Tue 13:00 - 15:00 (CSC 3-50)
  • Bryan Chan
    cmput365@ualberta.ca
    Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
     
  • Gábor Mihucz
    cmput365@ualberta.ca
    Office hours: Wednesday 9:15-11:15 (CAB 3-13)
     
  • Marcos Menon José
    cmput365@ualberta.ca
    Office hours: Friday 10:00 - 12:00 (CAB 3-13)

Lecture slides:
See Syllabus for more information.
 
  • Lecture 1: Course overview [pdf]
  • Lecture 2: Background review I [pdf]
  • Lecture 3: Background review II [pdf]
  • Additional reading:
  • Lecture 4: Bandits I [pdf]
  • Lecture 5: Bandits II [pdf]
  • Lecture 6: MDPs I [pdf]
  • Lecture 7: MDPs II [pdf]
  • Lecture 8: MDPs III [pdf]
  • Lecture 9: Value Functions & Bellman Equations I [pdf]
  • Lecture 10: Value Functions & Bellman Equations II [pdf]
  • Lecture 11: Value Functions & Bellman Equations III [pdf]
  • Lecture 12: Dynamic Programming I [pdf]
  • Lecture 13: Dynamic Programming II [pdf]
  • Lecture 14: Dynamic Programming III [pdf]
  • Lecture 15: Monte Carlo Methods I [pdf]
  • Lecture 16: Monte Carlo Methods II [pdf]
  • Lecture 17: Monte Carlo Methods III [pdf]
  • Lecture 18: Temporal-difference Learning for Prediction I [pdf]
  • Lecture 19: Temporal-difference Learning for Prediction II [pdf]
  • Lecture 20: Temporal-difference Learning for Prediction III [pdf]
  • Lecture 21: Temporal-difference Learning for Control I [pdf]
  • Lecture 22: Temporal-difference Learning for Control II [pdf]
  • Lecture 23: Temporal-difference Learning for Control III [pdf]
  • Lecture 24: Planning and Learning with Tabular Methods I [pdf]
  • Lecture 25: Planning and Learning with Tabular Methods II [pdf]
  • Lecture 26: Planning and Learning with Tabular Methods III [pdf]
  • Lecture 27: RL with Function Approximation I [pdf]
  • Lecture 28: RL with Function Approximation II [pdf]
  • Lecture 29: RL with Function Approximation III [Recorded - Unlisted on YouTube]
  • Lecture 30: Feature Construction for RL I [pdf]
  • Lecture 31: Feature Construction for RL II [pdf]
  • Lecture 32: Feature Construction for RL III [pdf]
  • Lecture 33: Guest Lecture by Rich Sutton
  • Lecture 34: Control with Function Approximation I [pdf]
  • Lecture 35: Control with Function Approximation II [pdf]

CMPUT 655: Reinforcement Learning I - F23

Lectures: Fri 14:00 - 16:50, ETLC E2-001.

Link to course's page on eClass and Coursera (Modules 1, 2, and 3).
Course syllabus [link].

Instruction team:
 
  • Marlos C. Machado, Instructor
    machado@ualberta.ca
    Office hours: Thu 14:00 - 16:00 (ATH 3-08)
     
  • Anna Hakhverdyan
    cmput655@ualberta.ca
    Office hours: Mon 11:00 - 13:00 (CSC 3-50)
     
  • David Szepesvari
    cmput655@ualberta.ca
    Office hours: Tue 13:00 - 15:00 (CSC 3-50)
  • Bryan Chan
    cmput655@ualberta.ca
    Office hours: Wednesday 14:00 - 16:00 (CAB 3-13)
     
  • Gábor Mihucz
    cmput655@ualberta.ca
    Office hours: Wednesday 9:15-11:15 (CAB 3-13)

Lecture slides:
See Syllabus for more information.
 
  • Lecture 1: Course overview & background review [pdf]
  • Additional reading:
  • Lecture 2: An introduction to sequential decision-making (Bandits) [pdf]
  • Additional reading:
    • P. Auer, N. Cesa-Bianchi, P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning 47(2-3): 235-256 (2002) [link]
    • Blog posts with different presentations of the proof of UCB1's regret bound, by Ann He and Jeremy Kun
    • Lecture by Balaraman Ravindran going over the proof of UCB1's regret bound
  • Lecture 3: Markov Decision Processes, and Value Functions [pdf]
  • Additional reading:
    • T. Wang, D. Lizotte, M. Bowling, D. Schuurmans: Dual Representations for Dynamic Programming. [link]
  • Lecture 4: Dynamic Programming and Monte Carlo Prediction [pdf]
  • Lecture 5: Temporal-Difference Learning [pdf]
  • Lecture 6: Multi-step TD, GVFs, and Planning [pdf]
  • Additional reading:
    • R. S. Sutton, J. Modayil, M. Delp, T. Degris, P. M. Pilarski, A. White, D. Precup: Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. AAMAS 2011. [link]
    • L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning. ECML 2006. [link]
  • Lecture 7: TD Learning with Function Approximation I [pdf]
  • Lecture 8: TD Learning with Function Approximation II [pdf]
            and guest lecture by Andy Patterson
  • Lecture 9: TD Learning with Function Approximation III and Eligibility Traces [pdf]
  • Lecture 9.5: Algorithm Selection and Evaluation in RL [pdf]
  • Lecture 10: Policy Gradient Methods [Recorded - Unlisted on YouTube]
  • Lecture 11: Deep Reinforcement Learning I [pdf]
            and guest lecture by Rich Sutton
  • Lecture 12: Deep Reinforcement Learning II [pdf]



© Copyright 2023 Marlos C. Machado. Inspired by al-folio theme.