Many of today's artificial intelligence
tools inherently involve probabilistic reasoning. Bayesian Belief Nets
have
become the standard tool for these applications. Much of this current
work
provides effective ways tolearn these systems from data (finding both
the structure/model and the parameters that are best, where "best" can
be generative -- best model
of world -- and discriminative
-- best classifier).
Generative
Learning |
Discriminative
Learning |
|
Parameters |
||
Structure |
Note this page is NOT being maintained... see PapersDB ref for the up-to-date list.
Russell Greiner and Wei
Zhou
Conditional
Independence Structures and Graphical Models, Toronto,
September 1999.
Wei Zhou and
Russell
Greiner
Conditional
Independence Structures and Graphical Models, Toronto,
September 1999.
Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely ``class label'' for each instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function (viz., likelihood, rather than classification accuracy), typically by first learning an appropriate graphical structure, then finding the maximal likelihood parameters for that structure. As these parameters may not maximize the classification accuracy, ``discriminative learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, and shows how it relates to logistic regression, which corresponds to finding the optimal CL parameters for a naive-bayes structure. After analyzing its inherent (sample and computational) complexity, we then present a general algorithm for this task, ELR, which applies to arbitrary BN structures and which works effectively even when the training data is only partial. This paper analyses this approach, presenting empirical evidence that it works better than the standard ``generative'' approach in a variety of situations, especially in common situation where the BN-structure is incorrect.
The empirical data used in our experiments is here.
Greiner and Zhou [1] presented ELR, a discriminative parameter-learning algorithm that maximizes conditional likelihood (CL) for a fixed Bayesian Belief Network (BN) structure, and demonstrated that it often produces classifiers that are more accurate than the ones produced using the generative approach (OFE), which finds maximal likelihood parameters. This is especially true when learning parameters for incorrect structures, such as Na ve Bayes (NB). In searching for algorithms to learn better BN classifiers, this paper uses ELR to learn parameters of more nearly correct BN structures e.g., of a general Bayesian network (GBN) learned from a structure-learning algorithm [2]. While OFE typically produces more accurate classifiers with GBN (vs. NB), we show that ELR does not, when the training data is not sufficient for the GBN structure learner to produce a good model.. Our empirical studies also suggest that the better the BN structure is, the less advantages ELR has over OFE, for classification purposes. ELR learning on NB (i.e., with little structural knowledge) still performs about the same as OFE on GBN in classification accuracy, over a large number of standard benchmark datasets.
The empirical data used in our experiments is here.
Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely class label for each specified instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function --- viz., likelihood, rather than classification accuracy --- typically by first learning an appropriate graphical structure, then finding the parameters for that structure that maximize the likelihood of the data. As these parameters may not maximize the classification accuracy, ``discriminative parameter learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, shows how it extends standard logistic regression, and analyzes its inherent sample and computational complexity. We then present a general algorithm for this task, ELR, that applies to arbitrary BN structures and that works effectively even when given incomplete training data. Unfortunately, ELR is not guaranteed to find the parameters that optimize conditional likelihood; moreover, even the optimal-CL parameters need not have minimal classification error. This paper therefore presents empirical evidence that ELR produces effective classifiers, often superior to the ones produced by the standard ``generative'' algorithms, especially in common situations where the given BN-structure is incorrect.
The empirical data used in our experiments is here.
Auxiliary information is ELR webpage.
We are interested in the problem of learning the dependency structure of a belief net, which involves a trade-off between simplicity and goodness of fit to the training data. We describe the results of an empirical comparison of three standard model selection criteria --- viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a Cross-Validation criterion --- applied to this problem. Our results suggest that AIC and cross-validation are both good criteria for avoiding overfitting, but MDL does not work well in this context.
Bayesian belief nets (BNs) are often used for classification tasks, typically to return the most likely class label for a specified instance. Many BN-learners, however, attempt to nd the BN that maximizes a different objective function --- viz., likelihood, rather than classiifcation accuracy, typically by first using some model selection criterion to identify an appropriate graphical structure, then finding good parameters for that structure. This paper considers a number of possible criteria for selecting the best structure, both generative (i.e., based on likelihood; BIC, BDe) and discriminative (i.e., Conditional BIC (CBIC), resubstitution Classification Error (CE) and Bias2+Variance (BV) ). We empirically compare these criteria against a variety of different correct BN structures, both real-world and synthetic, over a range of complexities. We also explore different ways to set the parameters, dealing with two issues: (1) Should we seek the parameters that maximize likelihood versus the ones that maximize conditional likelihood? (2) Should we use (i) the entire training sample first to learn the best parameters and then to evaluate the models, versus (ii) only a partition for parameter estimation and another partition for evaluation (cross-validation)? Our results show that the discriminative BV model selection criterion is one of the best measures for identifying the optimal structure, while the discriminative CBIC performs poorly; that we should use the parameters that maximize likelihood; and that it is typically better to use cross-validation here.
See also webpage.Appears in IAAI'06
See also Explaining Naive Bayes Classifications A Bayesian Belief Network (BN) is a model of a joint distribution
over a finite set of variables, with a DAG structure to represent the
immediate dependencies between the variables, and a set of parameters
(aka CPTables) to represent the local conditional probabilities of a
node, given each assignment to its parents. In many situations, the
parameters are themselves treated as random variables --- reflecting
the uncertainty remaining after drawing on knowledge of domain experts
and/or observing data generated by the network. A distribution over the
CPtable parameters induces a distribution for the response the BN will
return to any ``What is P(H | E)?'' query. This paper investigates the
distribution of this response, shows that it is asymptotically normal,
and derives closed-form expressions for its mean and asymptotic
variance. We show that this computation has the same complexity as
simply computing the (mean value of the) response --- ie, O(n *
exp(w)), where n is the number of variables and w
is
the effective tree width. We also provide empirical evidence showing
that the error-bars computed from our estimates are fairly accurate in
practice, over a wide range of belief net structures and queries.
See also Webpage .
This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define a hierarchical PRM, which extends standard PRMs by dynamically refining classes into hierarchies, which improves the expressiveness as well as the context sensitivity of the PRM. Finally, we show that hierarchical PRMs achieve state-of-the-art results on this dataset.
In this paper, we empirically evaluate algorithms for learning four
types of Bayesian network (BN) classifiers -- NaïveBayes,
tree augmented NaïveBayes (TANs), BN augmented
NaïveBayes (BANs) and general BNs (GBNs), where the GBNs and
BANs are learned using two variants of a conditional independence
based BNlearning algorithm. Based on their performance, we then
define a new type of classifier. Experimental results show the
resulting classifiers, learned using the proposed learning algorithms,
are competitive with (or superior to) the best classifiers, based on
both Bayesian networks and other formalisms, and that the computational
time for learning and using these classifiers is relatively small.
These
results argue that BN classifiers deserve more attention in machine
learning and data mining communities.