Fall 2001
Department of Computer Science
University of Waterloo
Instructor: Dale Schuurmans, DC1310, x3005, dale@cs.uwaterloo.ca
| Lecture 0 | Introduction |
| Fundamental types of learning problems Examples of learning problems Primary task: learning a function from examples |
Part 1: Learning to make approximate predictions (``regression'')
| Lecture 1 | Learning real predictors 1 |
| Loss functions, linear regression Generalized linear regression Learning local representations | |
| Lecture 2 | Learning real predictors 2 |
| Neural networks
Heuristic search: gradient descent Backpropagation, local minima, matching loss functions Regularization | |
| Lecture 3 | Learning theory 1 |
| Statistics of learning,
decomposition of expected hypothesis error Bias and variance Learning curves, overfitting curves Model selection: penalization, cross validation, metric distance |
Part 2: Learning to make exact predictions (``classification'')
| Lecture 4 | Learning real classifiers |
|
Linear discriminants, maximum margin classifiers
Norms, soft margins Dual maximum margin methods, support vectors Generalized linear discriminants, kernels, support vector machines (SVMs) | |
| Lecture 5 | Learning propositional classifiers |
|
Boolean formulae, decision trees, linear discriminants, neural networks
Minimizing error, minimizing size---rationale Consistent vs noisy case Computational complexity, greedy heuristics, approximations | |
| Lecture 6 | Learning theory 2 |
|
Worst case analysis: expected error, tail probabilities, PAC learning
VC dimension, upper bounds, lower bounds Uniform convergence Fat-shattering dimension for regression Data-dependent estimates of generalization error for SVMs |
| Assignment 1: | 25% of final grade. Covers Lectures 1-6. |
| Due in class, Mon, Oct 29. |
Part 3: Learning with probability models
| Lecture 7 | Probability models |
|
Joint and conditional models, Bayes rule
Optimal classification and prediction Naive Bayes classification Multivariate Gaussian prediction Bayesian networks Efficient marginalization and conditioning in trees | |
| Lecture 8 | Maximum likelihood learning |
|
Maximum likelihood
Bernoulli, Gaussian, multivariate Gaussian, Bayesian networks Maximizing joint versus conditional likelihood Maximum likelihood with missing components---EM EM increases likelihood | |
| Lecture 9 | Bayesian learning |
|
Bayesian learning on joint versus conditional models
Prior, posterior, prediction, MAP approximation Conjugate priors, Beta-Bernoulli, Gaussian-Gaussian Bayesian learning of Bayesian networks |
| Assignment 2: | 25% of final grade. Covers Lectures 7-9. |
| Due in class, Monday, Nov 26. |
Part 4: Other function learning techniques
| Lecture 10 | Ensemble learning methods |
|
Bayesian model averaging
Bagging Boosting, relationship to SVMs, generalization theory Stacking Convergent versus divergent ensemble methods Kernel methods, Gaussian processes | |
| Lecture 11 | On-line learning |
|
Approximating the best expert, relative loss bound
Approximating the best linear discriminant, Perceptron and Winnow Worst case loss bound analysis |