Fall 2001
Department of Computer Science
University of Waterloo
Instructor: Dale Schuurmans, DC1310, x3005, dale@cs.uwaterloo.ca
Lecture 0 | Introduction |
Fundamental types of learning problems Examples of learning problems Primary task: learning a function from examples |
Part 1: Learning to make approximate predictions (``regression'')
Lecture 1 | Learning real predictors 1 |
Loss functions, linear regression Generalized linear regression Learning local representations | |
Lecture 2 | Learning real predictors 2 |
Neural networks
Heuristic search: gradient descent Backpropagation, local minima, matching loss functions Regularization | |
Lecture 3 | Learning theory 1 |
Statistics of learning,
decomposition of expected hypothesis error Bias and variance Learning curves, overfitting curves Model selection: penalization, cross validation, metric distance |
Part 2: Learning to make exact predictions (``classification'')
Lecture 4 | Learning real classifiers |
Linear discriminants, maximum margin classifiers
Norms, soft margins Dual maximum margin methods, support vectors Generalized linear discriminants, kernels, support vector machines (SVMs) | |
Lecture 5 | Learning propositional classifiers |
Boolean formulae, decision trees, linear discriminants, neural networks
Minimizing error, minimizing size---rationale Consistent vs noisy case Computational complexity, greedy heuristics, approximations | |
Lecture 6 | Learning theory 2 |
Worst case analysis: expected error, tail probabilities, PAC learning
VC dimension, upper bounds, lower bounds Uniform convergence Fat-shattering dimension for regression Data-dependent estimates of generalization error for SVMs |
Assignment 1: | 25% of final grade. Covers Lectures 1-6. |
Due in class, Mon, Oct 29. |
Part 3: Learning with probability models
Lecture 7 | Probability models |
Joint and conditional models, Bayes rule
Optimal classification and prediction Naive Bayes classification Multivariate Gaussian prediction Bayesian networks Efficient marginalization and conditioning in trees | |
Lecture 8 | Maximum likelihood learning |
Maximum likelihood
Bernoulli, Gaussian, multivariate Gaussian, Bayesian networks Maximizing joint versus conditional likelihood Maximum likelihood with missing components---EM EM increases likelihood | |
Lecture 9 | Bayesian learning |
Bayesian learning on joint versus conditional models
Prior, posterior, prediction, MAP approximation Conjugate priors, Beta-Bernoulli, Gaussian-Gaussian Bayesian learning of Bayesian networks |
Assignment 2: | 25% of final grade. Covers Lectures 7-9. |
Due in class, Monday, Nov 26. |
Part 4: Other function learning techniques
Lecture 10 | Ensemble learning methods |
Bayesian model averaging
Bagging Boosting, relationship to SVMs, generalization theory Stacking Convergent versus divergent ensemble methods Kernel methods, Gaussian processes | |
Lecture 11 | On-line learning |
Approximating the best expert, relative loss bound
Approximating the best linear discriminant, Perceptron and Winnow Worst case loss bound analysis |