Results related to Belief Nets

Many of today's artificial intelligence tools inherently involve probabilistic reasoning. Bayesian Belief Nets have become the standard tool for these applications. Much of this current work provides effective ways tolearn these systems from data (finding both the structure/model and the parameters that are best, where "best" can be generative -- best model of world -- and discriminative -- best classifier). 


Generative Learning
Discriminative Learning
Parameters

Structure

We also provide other systems for...
Information about Bayesian belief nets in general
Work related to UNDIRECTED models (random fields)

Note this page is NOT being maintained... see PapersDB ref for the up-to-date list.


Learning Accurate BN

Learning Bayesian Nets that Perform Well

A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize some other criteria (eg, based on Kullback-Leibler divergence, or Bayesian Information Criterion), which is independent of the distribution of queries that are posed. This paper takes the ``performance criteria'' seriously, and considers the challenge of computing the BN whose performance --- read ``accuracy over the distribution of queries'' --- is optimal. We show that many aspects of this learning task are more difficult than the corresponding subtasks in the standard model, and then present an important subclass of queries that greatly simplifies our learning task.

Learning Accurate Belief Nets using Explicitly-Labeled Queries

Russell Greiner and Wei Zhou
Conditional Independence Structures and Graphical Models,  Toronto, September 1999.


Learning Accurate Belief Nets using Implicitly-Labeled Queries

Wei Zhou and Russell Greiner
Conditional Independence Structures and Graphical Models,  Toronto, September 1999.


Exploiting Common SubRelations: Learning One Belief Net for Many Classification Tasks (with Wei Zhou)
Paper presented at Learning Relations workshop, Stanford, 1999?  
Structural Extension to Logistic Regression

Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

See also Webpage

Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely ``class label'' for each instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function (viz., likelihood, rather than classification accuracy), typically by first learning an appropriate graphical structure, then finding the maximal likelihood parameters for that structure. As these parameters may not maximize the classification accuracy, ``discriminative learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, and shows how it relates to logistic regression, which corresponds to finding the optimal CL parameters for a naive-bayes structure. After analyzing its inherent (sample and computational) complexity, we then present a general algorithm for this task, ELR, which applies to arbitrary BN structures and which works effectively even when the training data is only partial. This paper analyses this approach, presenting empirical evidence that it works better than the standard ``generative'' approach in a variety of situations, especially in common situation where the BN-structure is incorrect.

The empirical data used in our experiments is here.
 


Discriminative Parameter Learning of General Bayesian Network Classifiers

Greiner and Zhou [1] presented ELR, a discriminative parameter-learning algorithm that maximizes conditional likelihood (CL) for a fixed Bayesian Belief Network (BN) structure, and demonstrated that it often produces classifiers that are more accurate than the ones produced using the generative approach (OFE), which finds maximal likelihood parameters. This is especially true when learning parameters for incorrect structures, such as Na ve Bayes (NB). In searching for algorithms to learn better BN classifiers, this paper uses ELR to learn parameters of more nearly correct BN structures e.g., of a general Bayesian network (GBN) learned from a structure-learning algorithm [2]. While OFE typically produces more accurate classifiers with GBN (vs. NB), we show that ELR does not, when the training data is not sufficient for the GBN structure learner to produce a good model.. Our empirical studies also suggest that the better the BN structure is, the less advantages ELR has over OFE, for classification purposes. ELR learning on NB (i.e., with little structural knowledge) still performs about the same as OFE on GBN in classification accuracy, over a large number of standard benchmark datasets.

The empirical data used in our experiments is here.


Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers

Bayesian belief nets (BNs) are often used for classification tasks --- typically to return the most likely class label for each specified instance. Many BN-learners, however, attempt to find the BN that maximizes a different objective function --- viz., likelihood, rather than classification accuracy --- typically by first learning an appropriate graphical structure, then finding the parameters for that structure that maximize the likelihood of the data. As these parameters may not maximize the classification accuracy, ``discriminative parameter learners'' follow the alternative approach of seeking the parameters that maximize conditional likelihood (CL), over the distribution of instances the BN will have to classify. This paper first formally specifies this task, shows how it extends standard logistic regression, and analyzes its inherent sample and computational complexity. We then present a general algorithm for this task, ELR, that applies to arbitrary BN structures and that works effectively even when given incomplete training data. Unfortunately, ELR is not guaranteed to find the parameters that optimize conditional likelihood; moreover, even the optimal-CL parameters need not have minimal classification error. This paper therefore presents empirical evidence that ELR produces effective classifiers, often superior to the ones produced by the standard ``generative'' algorithms, especially in common situations where the given BN-structure is incorrect.

The empirical data used in our experiments is here.

Auxiliary information is ELR webpage.


Selection Criteria (Generative)

A Model Selection Criteria for Learning Belief Nets: An Empirical Comparison

We are interested in the problem of learning the dependency structure of a belief net, which involves a trade-off between simplicity and goodness of fit to the training data. We describe the results of an empirical comparison of three standard model selection criteria --- viz., a Minimum Description Length criterion (MDL), Akaike's Information Criterion (AIC) and a Cross-Validation criterion --- applied to this problem. Our results suggest that AIC and cross-validation are both good criteria for avoiding overfitting, but MDL does not work well in this context.

  • Short version: in "Selecting and Combining Models" workshop
  • Technical report
  • Data Used

  • Selection Criteria (Discriminative)

    Discriminative Model Selection for Belief Net Structures

    Bayesian belief nets (BNs) are often used for classification tasks, typically to return the most likely class label for a specified instance. Many BN-learners, however, attempt to nd the BN that maximizes a different objective function --- viz., likelihood, rather than classiifcation accuracy, typically by first using some model selection criterion to identify an appropriate graphical structure, then finding good parameters for that structure. This paper considers a number of possible criteria for selecting the best structure, both generative (i.e., based on likelihood; BIC, BDe) and discriminative (i.e., Conditional BIC (CBIC), resubstitution Classification Error (CE) and Bias2+Variance (BV) ). We empirically compare these criteria against a variety of different correct BN structures, both real-world and synthetic, over a range of complexities. We also explore different ways to set the parameters, dealing with two issues: (1) Should we seek the parameters that maximize likelihood versus the ones that maximize conditional likelihood? (2) Should we use (i) the entire training sample first to learn the best parameters and then to evaluate the models, versus (ii) only a partition for parameter estimation and another partition for evaluation (cross-validation)? Our results show that the discriminative BV model selection criterion is one of the best measures for identifying the optimal structure, while the discriminative CBIC performs poorly; that we should use the parameters that maximize likelihood; and that it is typically better to use cross-validation here.

    See also webpage.
    Using Information-Theory to Learn Bayesian Nets

    Learning Bayesian Networks from Data: An Information-Theory Based Approach   (doc)

    This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn belief networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.

    Explaining Naive Bayes Classifiers 

    Visual Explanation of Evidence in Additive Classifiers

    Abstract Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naive Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.

    Appears in IAAI'06

    See also Explaining Naive Bayes Classifications
    ErrorBars

    Bayesian Error-Bars for Belief Net Inference

    A Bayesian Belief Network (BN) is a model of a joint distribution over a finite set of variables, with a DAG structure to represent the immediate dependencies between the variables, and a set of parameters (aka CPTables) to represent the local conditional probabilities of a node, given each assignment to its parents. In many situations, the parameters are themselves treated as random variables --- reflecting the uncertainty remaining after drawing on knowledge of domain experts and/or observing data generated by the network. A distribution over the CPtable parameters induces a distribution for the response the BN will return to any ``What is P(H | E)?'' query. This paper investigates the distribution of this response, shows that it is asymptotically normal, and derives closed-form expressions for its mean and asymptotic variance. We show that this computation has the same complexity as simply computing the (mean value of the) response --- ie, O(n * exp(w)), where n is the number of variables and w is the effective tree width. We also provide empirical evidence showing that the error-bars computed from our estimates are fairly accurate in practice, over a wide range of belief net structures and queries.

    See also Webpage .


    Probabilistic Relational Models

    Hierarchical Probabilistic Relational Models for Collaborative Filtering

    This paper applies Probabilistic Relational Models (PRMs) to the Collaborative Filtering task, focussing on the EachMovie data set. We first learn a standard PRM, and show that its performance is competitive with the best known techniques. We then define a hierarchical PRM, which extends standard PRMs by dynamically refining classes into hierarchies, which improves the expressiveness as well as the context sensitivity of the PRM. Finally, we show that hierarchical PRMs achieve state-of-the-art results on this dataset.


    Comparing Bayesian Network Classifiers

    Comparing Bayesian Network Classifiers

    In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers -- Naïve­Bayes, tree augmented Naïve­Bayes (TANs), BN augmented Naïve­Bayes (BANs) and general BNs (GBNs), where the GBNs and BANs are learned using two variants of a conditional­ independence based BN­learning algorithm. Based on their performance, we then define a new type of classifier. Experimental results show the resulting classifiers, learned using the proposed learning algorithms, are competitive with (or superior to) the best classifiers, based on both Bayesian networks and other formalisms, and that the computational time for learning and using these classifiers is relatively small. These results argue that BN classifiers deserve more attention in machine learning and data mining communities.


    Comparing Bayesian Network Classifiers #2

    Learning Bayesian Belief Network Classifiers: Algorithms and System

    This paper investigates the methods for learning predictive classifiers based on Bayesian belief networks (BN) -- primarily unrestricted Bayesian networks and Bayesian multi-nets. We present our algorithms for learning these classifiers, and discuss how these methods address the overfitting problem and provide a natural method for feature subset selection. Using a set of standard classification problems, we empirically evaluate the performance of various BN-based classifiers. The results show that the proposed BN and Bayes multi-net classifiers are competitive with (or superior to) the best known classifiers, based on both BN and other formalisms; and that the computational time for learning and using these classifiers is relatively small. These results argue that BN based classifiers deserve more attention in the data mining community.


    Return to Greiner's home page