Summary of our Information Theory based Method

(Download the one page summary in compressed postscript format, 18KB)

(Download a detailed technical report: report98.ps.gz, 764KB or report98.pdf, 321KB)

1 Introduction

Belief network learning algorithms can be grouped into two categories: network scoring based algorithms and conditional independence (CI) test based algorithms. Some of these algorithms only deal with a special case where node (attribute) ordering is known (i.e., the parent nodes of a node must appear earlier in the order).

Our algorithm belongs to the second category and it does not require node ordering. Unlike other CI based algorithms that must use exponential number of CI tests (we call them qualitative CI methods), our algorithm uses mutual information calculation as the quantitative CI test to avoid the exponential complexity on CI tests. This algorithm takes a database as input and constructs the belief network structure as output. Our algorithm also assumes that the database attributes have discrete values and there are no missing values in all the records.

2 The Algorithm

This algorithm extends Chow and Liu's tree construction algorithm [Chow and Liu, 1968] to belief network construction by using a three phase mechanism. The three phases are: drafting, thickening and thinning. In the first phase, this algorithm computes the mutual information of each pair of nodes as a measure of closeness and creates a draft based on this information. In the second phase, the algorithm adds edges when the pairs of nodes cannot be d-separated [Pearl, 1988]. In the third phase, each edge of the current graph is examined using CI tests and removed if the two nodes of the edge can be d-separated. An edge orientation procedure is also conducted if the node ordering is not known.

In the general case, this algorithm requires CI tests O(N4) times. Under an assumption slightly stronger than DAG-Isomorph [Pearl, 1988], it is able to construct the exact network of the underlying model. It can also orient most of the edges correctly. In the special case where node ordering is given, our algorithm requires only O(N2) times of CI tests [Cheng et al., 1997]. It also guarantees that the perfect map [Pearl, 1988] of underlying model is generated given that the underlying model is DAG-Isomorphic. Other forms of background knowledge such as partial node ordering and direct cause and effect relationships can also be used in the algorithm.

3 Experimental Results

We tested our algorithm on the ALARM network data, which is the widely accepted benchmark and has 37 attributes and 10,000 records. The true structure has 46 arcs. We used all three versions of the network that have been used by other researchers. The results are shown in the following table.

Datasets

with ordering

without ordering

  m.e. e.e. m.e. e.e. m.o. w.o.
dataset1

0

0

1

1

4

0

dataset2

1

0

1

0

4

1

dataset3

0

0

1

0

5

0

(1. m.e. and e.e. stand for missing edges and extra edges; m.o. and w.o. stand for missing orientation and wrongly oriented.   2. Dataset 1 has the underlying belief network described in the web page of Norsys Software Corp. Dataset 2 has the underlying belief network used by David Heckerman. Dataset 3 is generated by Gregory F. Cooper and Edward Herskovits. We generated Dataset 1 and Dataset 2 using a Monte Carlo technique.)

These experiments have been conducted on a 90MHz PC and the data sets are stored in a MS-access database. The running time is about 20 to 30 minutes. More than 95% of the running time is consumed on database queries.

4 Conclusion

By using the three phase construction mechanism and mutual information tests, our algorithm can avoid exponential number of CI tests and high order CI tests. Experimental results show that this algorithm is very efficient and reliable. A belief network learning system PowerConstructor which is based on this method in now available for free download.

References

[Pearl, 1988] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kaufmann, 1988.

[Chow and Liu, 1968] C.K. Chow and C.N. Liu. Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462-467, 1968.

[Cheng et al., 1997] J. Cheng, D.A. Bell and W. Liu. An algorithm for Bayesian belief network construction from data, Proceedings of AI & STAT'97, Ft. Lauderdale, Florida, 1997.

[Cheng et al., 1997] J. Cheng, D.A. Bell and W. Liu. Learning belief networks from data: an information theory based approach. Proceedings of the Sixth ACM International Conference on Information and Knowledge Management, 1997.