CS 786 - Machine Learning: Lectures

Fall 2001
Department of Computer Science
University of Waterloo

Instructor: Dale Schuurmans, DC1310, x3005, dale@cs.uwaterloo.ca


Room: DC 3313
Time: Mon 2:30-5:00

Lectures

Lecture 0 Introduction
Fundamental types of learning problems
Examples of learning problems
Primary task: learning a function from examples

Part 1: Learning to make approximate predictions (``regression'')

Lecture 1 Learning real predictors 1
Loss functions, linear regression
Generalized linear regression
Learning local representations
Lecture 2 Learning real predictors 2
Neural networks
Heuristic search: gradient descent
Backpropagation, local minima, matching loss functions
Regularization
Lecture 3 Learning theory 1
Statistics of learning, decomposition of expected hypothesis error
Bias and variance
Learning curves, overfitting curves
Model selection: penalization, cross validation, metric distance

Part 2: Learning to make exact predictions (``classification'')

Lecture 4 Learning real classifiers
Linear discriminants, maximum margin classifiers
Norms, soft margins
Dual maximum margin methods, support vectors
Generalized linear discriminants, kernels, support vector machines (SVMs)
Lecture 5 Learning propositional classifiers
Boolean formulae, decision trees, linear discriminants, neural networks
Minimizing error, minimizing size---rationale
Consistent vs noisy case
Computational complexity, greedy heuristics, approximations
Lecture 6 Learning theory 2
Worst case analysis: expected error, tail probabilities, PAC learning
VC dimension, upper bounds, lower bounds
Uniform convergence
Fat-shattering dimension for regression
Data-dependent estimates of generalization error for SVMs

Assignment 1: 25% of final grade. Covers Lectures 1-6.
Due in class, Mon, Oct 29.

Part 3: Learning with probability models

Lecture 7 Probability models
Joint and conditional models, Bayes rule
Optimal classification and prediction
Naive Bayes classification
Multivariate Gaussian prediction
Bayesian networks
Efficient marginalization and conditioning in trees
Lecture 8 Maximum likelihood learning
Maximum likelihood
Bernoulli, Gaussian, multivariate Gaussian, Bayesian networks
Maximizing joint versus conditional likelihood
Maximum likelihood with missing components---EM
EM increases likelihood
Lecture 9 Bayesian learning
Bayesian learning on joint versus conditional models
Prior, posterior, prediction, MAP approximation
Conjugate priors, Beta-Bernoulli, Gaussian-Gaussian
Bayesian learning of Bayesian networks

Assignment 2: 25% of final grade. Covers Lectures 7-9.
Due in class, Monday, Nov 26.

Part 4: Other function learning techniques

Lecture 10 Ensemble learning methods
Bayesian model averaging
Bagging
Boosting, relationship to SVMs, generalization theory
Stacking
Convergent versus divergent ensemble methods
Kernel methods, Gaussian processes
Lecture 11 On-line learning
Approximating the best expert, relative loss bound
Approximating the best linear discriminant, Perceptron and Winnow
Worst case loss bound analysis