[120 total points]
Refer to the web page for policies regarding collaboration, due dates, and extensions.
In many classification tasks, naive Bayes is either competitive with, or is, the best Bayesiannets model... which is surprising, given that the naive Bayes model is so trivial that it essentially ignores dependencies between features! This seems to be an argument against structure learning, until one realizes that most structure learning methods are trying to model the joint distribution, which does not necessarily corresponds to a good estimate of the classconditional distribution.
TreeAugmented Naive Bayes (TAN) is a model that attempts to
optimize this
class conditional distribution;
augmenting naive Bayes by adding arcs between features such
that each feature has as its parents the class node, and (at most) one
other feature. (If you remove the Class node, the remaining
featurefeature arcs form a tree
structure.)
The following is an example of a TAN model.
(Note that the graph on only the evidence variables (W, X, Y, Z) forms
a
tree.)
The algorithm for learning TAN models is a variant of the ChowLiu algorithm for learning treestructured Bayes nets. Let C represent the class variable, and {X_{i}}_{i=1}^{n} be the features (nonclass variables).
(a: 15 points) Implement the above algorithm for learning the structure of a TAN model, and submit your code as tanstruct.m.
(b: 5 points)
Here, we consider the task of breast cancer typing 
that is, classifying a tumor as either malignant or benign;
see breast.csv within
zip file.
Learn a TAN structure for this data,
and draw the structure (directed acyclic graph) produced in your
writeup.
(c: 10 points) Classification
In this question you will compare the classification accuracy of naive
Bayes and TAN.
First, randomly withhold 183 records as a test set. Then, using a
training set of size m, for m = 100, 200, 300, 400, 500, ...
(You may want to consider several such sets.)
Consider a factor produced as a product of some of the CPDs in a Bayesian network B:
(a: 5 points) Show that ν is a conditional probability in some network. More precisely, construct another Bayesian network B' and a disjoint partition W = Y ∪ Z such that ν(W) = P_{B'}( Y  Z ).
(b: 5 points) Show that the intermediate factors produced by the variable elimination algorithm are also conditional probabilities in some network.
[10 points] Markov Networks & Factorization
Consider a
distribution P over four binary random variables {X_{1},
X_{2,}, X_{3}, X_{4}}
that assigns probability 1/8 to each of the following eight
configurations:
(0,0,0,0) (1,0,0,0) (1,1,0,0) (1,1,1,0)
(0,0,0,1) (0,0,1,1) (0,1,1,1) (1,1,1,1)
and probability 0 to the other 8 configurations.
The distribution P satisfies the global Markov property with respect to the graph H = {X_{1}  X_{2}  X_{3}  X_{4}  X_{1}}. (Note this graph is a circle.) For example, consider the independence claim (X_{1}⊥X_{3}X_{2,}X_{4}). For the assignment X_{2 }= x_{2}^{1}, X_{4} = x_{4}^{0}, we have that only assignments where X_{1} = x_{1}^{1} receive positive probability. Thus, P(x_{1}^{1}x_{2}^{1}, x_{4}^{0}) = 1, and X_{1} is trivially independent of X_{3} in the conditional distribution. A similar analysis applies to all other cases, so that the global Markov assumptions hold. However, the distribution P does not factorize according to H. Give a proof of this. (Hint: Use a proof by contradiction.)
(b: 5 points) Build a cluster graph for this Bayesian net, and determine the message passing scheme and initial potential of each cluster (use a cluster that contains variable "C" in it as the root).
(d: 5 points)
Answer the following queries using the calibrated cluster tree.
Use values for the Bayes model parameters as follows:
[a=.25, b=.5, c=.65, d=.55, e=.4, f=.4, g=.5, h=.3, i=.8, j=.4, k=.5].

[Question removed  no time to cover Markov Network Approximations]