CS 786 - Machine Learning: Lectures

Fall 2001
Department of Computer Science
University of Waterloo

Instructor: Dale Schuurmans, DC1310, x3005, dale@cs.uwaterloo.ca

Room: DC 3313
Time: Mon 2:30-5:00

Lectures

Lecture 0	Introduction
	Fundamental types of learning problems Examples of learning problems Primary task: learning a function from examples

Part 1: Learning to make approximate predictions (``regression'')

Lecture 1 Learning real predictors 1

Loss functions, linear regression
Generalized linear regression
Learning local representations

Lecture 2 Learning real predictors 2

Neural networks
Heuristic search: gradient descent
Backpropagation, local minima, matching loss functions
Regularization

Lecture 3 Learning theory 1

Statistics of learning, decomposition of expected hypothesis error
Bias and variance
Learning curves, overfitting curves
Model selection: penalization, cross validation, metric distance

Part 2: Learning to make exact predictions (``classification'')

Lecture 4 Learning real classifiers

Linear discriminants, maximum margin classifiers
Norms, soft margins
Dual maximum margin methods, support vectors
Generalized linear discriminants, kernels, support vector machines (SVMs)

Lecture 5 Learning propositional classifiers

Boolean formulae, decision trees, linear discriminants, neural networks
Minimizing error, minimizing size---rationale
Consistent vs noisy case
Computational complexity, greedy heuristics, approximations

Lecture 6 Learning theory 2

Worst case analysis: expected error, tail probabilities, PAC learning
VC dimension, upper bounds, lower bounds
Uniform convergence
Fat-shattering dimension for regression
Data-dependent estimates of generalization error for SVMs

Assignment 1: 25% of final grade. Covers Lectures 1-6.

Due in class, Mon, Oct 29.

Part 3: Learning with probability models

Lecture 7 Probability models

Joint and conditional models, Bayes rule
Optimal classification and prediction
Naive Bayes classification
Multivariate Gaussian prediction
Bayesian networks
Efficient marginalization and conditioning in trees

Lecture 8 Maximum likelihood learning

Maximum likelihood
Bernoulli, Gaussian, multivariate Gaussian, Bayesian networks
Maximizing joint versus conditional likelihood
Maximum likelihood with missing components---EM
EM increases likelihood

Lecture 9 Bayesian learning

Bayesian learning on joint versus conditional models
Prior, posterior, prediction, MAP approximation
Conjugate priors, Beta-Bernoulli, Gaussian-Gaussian
Bayesian learning of Bayesian networks

Assignment 2: 25% of final grade. Covers Lectures 7-9.

Due in class, Monday, Nov 26.

Part 4: Other function learning techniques

Lecture 10 Ensemble learning methods

Bayesian model averaging
Bagging
Boosting, relationship to SVMs, generalization theory
Stacking
Convergent versus divergent ensemble methods
Kernel methods, Gaussian processes

Lecture 11 On-line learning

Approximating the best expert, relative loss bound
Approximating the best linear discriminant, Perceptron and Winnow
Worst case loss bound analysis

Lecture 1	Learning real predictors 1
	Loss functions, linear regression Generalized linear regression Learning local representations
Lecture 2	Learning real predictors 2
	Neural networks Heuristic search: gradient descent Backpropagation, local minima, matching loss functions Regularization
Lecture 3	Learning theory 1
	Statistics of learning, decomposition of expected hypothesis error Bias and variance Learning curves, overfitting curves Model selection: penalization, cross validation, metric distance

Lecture 4	Learning real classifiers
	Linear discriminants, maximum margin classifiers Norms, soft margins Dual maximum margin methods, support vectors Generalized linear discriminants, kernels, support vector machines (SVMs)
Lecture 5	Learning propositional classifiers
	Boolean formulae, decision trees, linear discriminants, neural networks Minimizing error, minimizing size---rationale Consistent vs noisy case Computational complexity, greedy heuristics, approximations
Lecture 6	Learning theory 2
	Worst case analysis: expected error, tail probabilities, PAC learning VC dimension, upper bounds, lower bounds Uniform convergence Fat-shattering dimension for regression Data-dependent estimates of generalization error for SVMs

Assignment 1:	25% of final grade. Covers Lectures 1-6.
	Due in class, Mon, Oct 29.

Lecture 7	Probability models
	Joint and conditional models, Bayes rule Optimal classification and prediction Naive Bayes classification Multivariate Gaussian prediction Bayesian networks Efficient marginalization and conditioning in trees
Lecture 8	Maximum likelihood learning
	Maximum likelihood Bernoulli, Gaussian, multivariate Gaussian, Bayesian networks Maximizing joint versus conditional likelihood Maximum likelihood with missing components---EM EM increases likelihood
Lecture 9	Bayesian learning
	Bayesian learning on joint versus conditional models Prior, posterior, prediction, MAP approximation Conjugate priors, Beta-Bernoulli, Gaussian-Gaussian Bayesian learning of Bayesian networks

Assignment 2:	25% of final grade. Covers Lectures 7-9.
	Due in class, Monday, Nov 26.

Lecture 10	Ensemble learning methods
	Bayesian model averaging Bagging Boosting, relationship to SVMs, generalization theory Stacking Convergent versus divergent ensemble methods Kernel methods, Gaussian processes
Lecture 11	On-line learning
	Approximating the best expert, relative loss bound Approximating the best linear discriminant, Perceptron and Winnow Worst case loss bound analysis