CS 486/686 - Artificial Intelligence: Readings and study guide

CS 486/686 - Introduction to Artificial Intelligence

Spring 2003
School of Computer Science
University of Waterloo

Instructors:	Relu Patrascu	DC2127, x3299	rpatrasc@cs.uwaterloo.ca
	Dale Schuurmans	DC1310, x3005	dale@cs.uwaterloo.ca

Course text

Russell, Norvig (03) Artificial Intelligence: A Modern Approach. 2nd Edition. Prentice Hall.

Supplementary readings

These books will be placed on reserve in the Davis Center Library for 3 hour loans.

Dean, Allen, Aloimonos (95) Artificial Intelligence: Theory and Practice. Q 335 .D4 1995
Burris (98) Logic for Mathematics and Computer Science. QA 9 .B86 1998
Genesereth, Nilsson (87) Logical Foundations of Artificial Intelligence. Q 335 .G37 1987
Hastie, Tibshirani, Friedman (01) The Elements of Statistical Learning. Q325.75 .H38 2001

Some background readings on Cognitive Science

These are all really interesting books about the innate intelligence of the human species---read any of them if you get the chance.

Damasio (99) The Feeling of What Happens: Body and Emotion in the Making of Consciousness. Harcourt Brace.
Pinker (97) How the Mind Works. W.W. Norton.
Gopnik, Meltzoff, Kuhl (99) The Scientist in the Crib: What Early Learning Tells Us About the Mind. Perennial.
Pinker (94) The Language Instinct: How the Mind Creates Language. Harper-Perennial.
Damasio (94) Descarte's Error: Emotion, Reason and the Human Brain. Avon.
Dehaene (97) The Number Sense: How the Mind Creates Mathematics. Oxford.
Lakoff, Nunez (00) Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being. Basic Books.

Study guide

Lecture 1: Introduction to AI: Concepts: Systems that reason, know, interpret, behave, learn.; Readings: Russell & Norvig (Chaps 1, 2), Dean, Allen, Aloimonos (Chap 1).

Part 1: Reasoning

Lecture 2: Automating reasoning: formal inference

Concepts: Propositional language (atomic & compound propositions, logical symbols).
Formal inference rules, formal inference system.
Derivation (proof), closure, monotonicity, contradiction, tautology, question answering.
Examples of formal inference systems: natural deduction, resolution (conjunctive normal form).
Nonstandard inference systems: relevance logic.
Conflict resolution, nonmonotonicity.

Readings: Burris (Chap 1, 2), Dean, Allen, Aloimonos (Chap 3), Russell & Norvig (Sect 7.4-7.5).

Lecture 3: Correct and exhaustive reasoning

Concepts: Interpretation (basic value assignments to atomic propositions, e.g. truth assignments).
Evaluating compound propositions (recursive evaluation using truth tables of logical symbols).
Satisfiability, unsatisfiability (inconsistency), falsifiability, unfalsifiability (validity), entailment.
Correctness of a formal inference system w.r.t. evaluations (soundness).
Exhaustiveness of a formal inference system w.r.t. evaluations (completeness).
Resolution system: correctness and refutation-exhaustiveness w.r.t. truth evaluations.
Alternative approach to question answering: direct search for satisfying assignments (SAT).

Readings: Burris (Chap 2), Dean, Allen, Aloimonos (Chap 3), Russell & Norvig (Chap 7, 9*). Genesereth & Nilsson (Chap 3*, 4*). * - ignoring material on first order variables and quantification

Lecture 4: Constraint satisfaction search

Concepts: Constraint satisfaction problems: cartesian product search space + constraints, (e.g. SAT search).
Unsystematic (incomplete) search strategies: global random search, local random walk.
Speeding up unsystematic search: heuristic function, local greedy search (hill climbing), simulated annealing.
Systematic (complete) search: enumeration.
Speeding up systematic search: backtracking in partial assignments (pruning), variable and value selection heuristics.

Readings: Russell & Norvig (Chap 5), Dean, Allen, Aloimonos (Sect 4.4).

Lecture 5: Problem solving search

Concepts: Search space generated by objects and operators on objects (implicit graph).
Basic search strategies: depth first search, breadth first search, iterative deepening search.
Heuristic search: heuristic evaluation function, best first search.
Additional criteria: shortest solution sequence.
Extra pruning information: action preconditions, redundant action sequences (revisiting states).
Heuristic best first search: admissible heuristic, A* search, IDA* search.
Forward versus backward search (i.e. initial towards goal vs. goal back to initial)

Readings: Russell & Norvig (Sect 3.2-3.7, 4.1-4.2), Dean, Allen, Aloimonos (Sect 4.1-4.3).

Lecture 6: Automated planning

Concepts: States and actions with represented structure instead of just atomic units.
State representation: truth assignment to atomic propositions.
Goal representation: conjunctive propositional formula.
Action representation: preconditions = conjunction, effects = conjunction + frame assumption.
State progression (strongest postconditions) vs. goal regression (weakest preconditions).
Applying forward vs. backward problem solving search, analyzing branching factors.

Readings: Dean, Allen, Aloimonos (Chap 7), Russell & Norvig (Chap 11).

Lecture 7: Planning algorithms

Concepts:
Exploiting structure in search: heuristics, approximate divide and conquer.
Partial order plans, searching with partial order plans (POP algorithm).

Readings: Weld, AI Magazine 15(4),
Also see: " Recent advances in AI planning" by Weld for a survey of more recent developments.

Lecture 8: General first order representaton

Concepts: First order language primitives: constants, functions, predictates, logical connectives; variables, quantifiers.
First order language composites: terms, ground terms, open terms;
atomic formulae, compound formulae, quantified formulae;
atomic sentence, compound sentence, quantified sentence;
free variable, bound variable.
Variable substitution: term and formula specialization, formula unification, most general unifier.
Formal inference rules: specialization (factoring) and resolution.
First order conjunctive normal form.

Readings: Genesereth & Nilsson (Chap 4), Russell & Norvig (Chap 8, 9).

Lecture 9: Planning in logic, First order inference

Concepts: Primitives: situations, actions, fluents, special "do" function.
Specifying initial situation and goal condition.
Specifying actions: preconditions, after effects, effect axiom.
Using resolution to implement planning, using variable substitutions to extract plan.
Blocks world example.
Frame problem, successor state axiomatization.

Correct & exhaustive first order reasoning:
First order semantics: interpretation I=(D,C,F,R).
D = domain, C maps constants, F maps function symbols, R maps predicates.
Evaluating ground sentences without variables and quantifiers.
Variable assignment V maps variables.
Evaluating formulae with variables and quantifiers.
Interpretation satisfies, falsifies a sentence.
Sentence meaning: restriction on satisfying interpretations (possible states of affairs).
Sentence satisfiable, falsifiable, unsatisfiable (inconsistent), unfalsifiable (valid).
Sentence semantically entails another sentence.
Correctness (soundness) of resolution+factoring w.r.t. interpretations.
Exhaustiveness (completeness) of resolution+factoring w.r.t. interpretations.

Readings: Russell & Norvig (Sect 10.3), Genesereth & Nilsson (Sect 2.3, 4.10), Burris (Chap 4, esp. 4.10).

Part 2: Knowing

Reading *: Knowledge representation: Concepts: Knowledge engineering process (5 steps), ontology.
General purpose (common sense) knowledge.; Readings: Russell & Norvig (Chap 10).

Part 3: Interpreting

Lecture 10: Introduction to interpretation systems: Concepts: Interpreting: plausible inference of hidden semantic structure from observable inputs
Specifying plausible inference: distinct from logical inference: monotonicity vs. nonmonotonicity.
AI applications: natural language interpretation, artificial perception (vision, hearing, etc.), expert systems
Interpretation systems: represent knowledge about semantic structures, observables, relationship between them
Interpretation systems: encode principles of evidence combination
Languages for specifying plausible inference: rule based systems, default "logic", fuzzy "logic", Dempster-Shafer theory, probability theory.
Representation: forward generative models (distribution over hidden truths, conditional distribution over observables)
Principle of evidence combination: Bayesian inference; Readings: Dean, Allen, Aloimonos (Sect 3.7), Russell & Norvig (Sect 14.7).
Lecture 11: Probability modelling: Concepts: Probability: distribution over possible states of affairs ( e.g.assignments to propositional variables).
Joint distribution over set of random variables.
Representation, simulation, evaluation (prob. of complete config.)
Inference: marginalization, conditioning, completion.
Naive (enumerative) inference algorithms.
Structured joint distributions: independence, conditional independence.; Readings: Russell & Norvig (Chap 14), Dean, Allen, Aloimonos (Sect 8.2) .
Lecture 12: Structured probability models: Concepts: Representing structured joint distributions: Bayesian networks.
Bayes nets: variables, arcs, conditional probability tables, conditional independence assumptions.
Basic representation, simulation, and evaluation.
Naive inference: marginalization, conditioning, completion.
Application to interpretation problems: generative models, interpretation via Bayesian inference; Readings: Russell & Norvig (Chap 14,15), Dean, Allen, Aloimonos (Sect 8.3) .
Lecture 13: Efficient probabilistic inference: Concepts: Tree structured Bayesian networks.
Factor graphs (variables, functions (C.P.T.s), arcs).
Mapping Bayesian networks to factor graphs.
Efficient message passing algorithm on tree structured factor graphs.
Application to: marginalization, conditioning, completion.; Readings: Dean, Allen, Aloimonos (Sect 8.3), Russell & Norvig (Sect 14.4) .
Lecture 14: Inference in complex models: Concepts: General Bayesian networks (not trees---i.e. networks that contain loops).
Computing approximate marginals conditionals NP-hard in general Bayes nets.
Exact methods for coping with loops: combining variables, chosing good variable elimination order
Monte Carlo approximation for marginals: forward sampling.
Monte Carlo approximation for conditionals: logic sampling, importance sampling (likehihood weighting).; Readings: Russell & Norvig (Sect 14.5), Dean, Allen, Aloimonos (Sect 8.3).
Lecture 15: Interpreting senses (perception): Concepts: External sensory modalities: sight, sound, touch, smell, taste.
Computer vision: tasks: recognition (shapes, symbols, objects, scenes), motion, stereo.
Computer vision: approaches: explicit bottom up processing vs. generative probability models.
Implementing interpreters using probabilistic inference.; Readings: Russell & Norvig (Chap 24), Dean, Allen, Aloimonos (Chap 9).
Lecture 16: Interpreting natural language: Concepts: Mapping strings of words (characters) to meaning.
Representing meaning (semantics): logic.
Syntax: legal input strings.
Words: morphology.
Word classes: noun, verb, adjective, adverb, preposition, article, pronoun, connective.
Types of word classes: open/closed, function/content words.
Nested phrase structure (syntax), head words, phrase types (NP, VP, AdjP, AdvP).
Context free grammars: parsing.
Need for additional "features" to coordinate subtrees.
Semantic interpretation: involves plausible inference to resolve ambiguities.
Parsing ambiguity, co-reference ambiguity, semantic ambiguity.; Readings: Russell & Norvig (Chap 22, 23), Dean, Allen, Aloimonos (Sects 10.1, 10.6-9).

Part 4: Behaving

Lecture 17: Optimal behavior: Decision theory: Concepts: Earlier lectures 6 & 7 (planning) had assumed deterministic actions.
Nondeterministic actions (i.e. uncertainty in state dynamics) require conditional plans: tree of actions for each possible outcome.
Behavior policy ("controller"): maps states to actions.
Uncertainty in state dynamics: assume stochastic state transition model.
Assume can completely identify current state (i.e. "fully accessible," or "full information").
Assume quantitative goals: reward values over possible states.
Single step problem: optimize expected immediate reward, tradeoff between certainty and reward.
Multi-step problem: compute optimal policy, tradeoff short term vs. long term reward ("Markov decision process").
Simple case: acyclic state transition model: utility function, compute optimal policy with dynamic programming.; Readings: Russell & Norvig (Chap 12, Sect 16.1), Dean, Allen, Aloimonos (Sect 8.4).
Lecture 18: Optimal sequential decision making: Concepts: General case: cyclic state transition model.
Value of a state: expected discounted reward over an infinite horizon.
Policy evaluation: compute value function for a fixed policy.
Compute optimal policy using policy iteration.
Compute optimal policy using value iteration.; Readings: Russell & Norvig (Chap 17).
Lecture 19: Optimal behavior: Game theory: Concepts: Quantitative goals: reward values over possible states.
Uncertainty in state dynamics: assume adversarial state transition model.
Case 1: Assume can completely identify current state (i.e. "fully accessible," or "full information"), e.g., chess, checkers, go.
Single step problem: maximize minimum immediate reward (maxi-min).
Multi-step problem: compute optimal maxi-min policy, tradeoff short term vs. long term reward.
Simple case: acyclic state transition model: utility function, compute optimal policy with dynamic programming (minimax algorithm).
Alpha-beta pruning.
Case 2: Combined stochastic and disjunctive state transition models, e.g. backgammon.
Single step problem: maximize expected minimum reward (maxi-expecti-min).
Multi-step problem: compute optimal maxi-expecti-min policy, tradeoff short term vs. long term reward.
Simple case: acyclic state transition model: utility function, compute optimal policy with dynamic programming (expectiminimax algorithm).; Readings: Russell & Norvig (Sect 6.1, 6.2, 6.5).
Lecture 20: Scaling up: Partial observability: Concepts: Decision theory: What if uncertainty about current state?
Uncertainty in state requires policy with memory: actions might depend on state history (i.e. disambiguate current state by previous sequence of partial observations).
Distinguish perceptual state, information state, complete world state.
Value of information.
Encode history in current information state, optimal behavior policy maps: information state -> action.
Game theory:
Imperfect information games (partial information games), e.g. poker, hearts.
Optimal strategies are randomized.
Normal form of a 2 person, zero sum game: payoff matrix.
Saddle point equilibrium iff optimal policy deterministic if full information game.
Partial information game matrix may not have saddle point, optimal strategies are probability distributions over deterministic policies.
Von Neumann minimax theorem: for any payoff matrix there is an optimal distribution for each player, and these are in equilibrium.; Readings: Russell & Norvig (Sect 16.6), Dean, Allen, Aloimonos (Sect 8.4).
Reading *: Scaling up / Robotics and control: Concepts: Precomputing optimal behavior strategy too expensive for large spaces.
Let behavior strategy search on-line (i.e. interleave "thinking" and acting).
Use heuristic function to estimate total state utility, and use depth bounded search.
Text gives some general tools for building agents that behave successfully in complex, uncertain domains.; Readings: Russell & Norvig (Sect 6.4, Sect 25.7).

Part 5: Learning

Lecture 21: Types of learning problems: Concepts: Systems that improve with experience.
Learning a function from examples: classification vs. regression.
Empirical error minimization within a class H: overfitting, underfitting.
Types: batch vs. online, complete vs. partial vs. pointwise feedback, passive vs. active, acausal vs. causal, stationary vs. nonstationary.; Readings: Dean, Allen, Aloimonos (Sect 5.1, 5.2), Russell & Norvig (Sect 18.1, 18.2, 18.5), Hastie, Tibshirani, Friedman (Chap 1)
Lecture 22: Function learning algorithms: Concepts: Parameterized function classes: e.g. linear functions
Prediction error measures: squared (L2), absolute (L1), minimax (Linfty)
Algorithms for minimizing training error: L2 - linear equations, L1 and Linfty - linear program
Robustness to outliers
Generalized linear functions: polynomials, splines, radial basis functions, Fourier basis
Feedforward neural networks
Training generalized linear function (same as linear)
Training a neural network: empirical error minimization, gradient descent, backpropagation.; Readings: Russell & Norvig (Chap 20), Dean, Allen, Aloimonos (Sect 5.5, 5.6, 5.8). Hastie, Tibshirani, Friedman (Chaps 2, 3, 11)
Lecture 23: Generalization theory / Overfitting: Concepts: Over-fitting/under-fitting
Distributions of training and testing error (least squares)
Expected train error, test error; Optimal error
Fundamental expected error inequalities
Learning curves, over-fitting curves
Automated complexity control: complexity penalization, hold-out cross-validation, metric based; Readings: Hastie, Tibshirani, Friedman (Sects 2.9, 5.1-5.5)
Lecture 24: Course review: Concepts: What will (and will not) be covered on the final exam.; Readings: See above.