Jie Cheng's Research
Bayesian Belief Network Induction from Databases (BNIDB)
Brief Description
Bayesian belief-network is a powerful common knowledge representation and reasoning tool for partial beliefs under conditions of uncertainty. A Bayesian belief-network is a directed acyclic graph (DAG) with conditional probability for each node. The DAG structure contains nodes representing domain variables and arcs between nodes representing probabilistic dependencies.
The aim of this project is to develop efficient algorithms to construct belief networks from large databases and promote such algorithms to be used into real world situations as a data mining and automated modeling tool.
Algorithms
Two three-phase construction algorithms have been developed to deal with the situation where node ordering is given and not given respectively. Both of the algorithms are based on information theory and conditional independence (CI) tests. See a one page summary.
The algorithm that requires node ordering uses ordinary CI test method (qualitative CI test method) and requires CI tests only O(N2) times where N is the number of nodes. This is correct under the assumption of DAG-faithful. This algorithm is described in my AI&STAT'97 paper.
The algorithm that does not require node ordering uses quantitative CI test method and requires CI tests O(N4) times. This is correct under an assumption slightly stronger than DAG-faithful. This algorithm is described in my CIKM'97 paper.
Both algorithms and our PowerConstructor system are presented in detail in a recent Technical Report (report98.ps.gz, 764KB or report98.pdf, 321KB).
The application of these algorithms in data mining (classification tasks) is described in Comparing Bayesian Network Classifiers and Learning Bayesian Belief Network Classifiers: Algorithms and System.
Publications
Cheng, J. and Greiner, R., Learning Bayesian Belief Network Classifiers: Algorithms and System. (proceedings of the fourteenth Canadian conference on artificial intelligence, 2001) AI'2001
Cheng, J. and Greiner, R., Comparing Bayesian Network Classifiers. (proceedings of the fifteenth conference on uncertainty in artificial intelligence, 1999) UAI'99
Cheng, J., Bell, DA, Liu, W., Learning belief networks from data: an information theory based approach. (proceedings of the Sixth ACM International Conference on Information and Knowledge Management, 1997) CIKM'97
Cheng, J., Bell, DA, Liu, W., An algorithm for Bayesian network construction from data. (proceedings of the 6th International Workshop on Artificial Intelligence and Statistics, 1997) AI&STAT'97
Cheng, J., Bell, DA, Liu, W., A belief network learning algorithm based on information theory. (poster of IJCAI'97)
System
Belief Network Related Software (including BN PowerConstructor, BN PowerPredictor and Data PreProcessor).
The algorithms have been tested on the ALARM network data, which is the widely accepted benchmark and has 37 attributes and 10,000 records. The true structure has 46 arcs. We used all three versions of the network previously used by other researchers. On all these data sets, we got very encouraging results with or without the node ordering. Using only 1,000 records of the ALARM data sets, the algorithms can still generate good results. Some smaller data sets have also been used to test my algorithms. The algorithms have also been used successfully on a real world telecommunication fault diagnosis data set which has 34 attributes.
All the data sets used in the following experiments are created by myself using a Monte Carlo technique. The probability distributions and real structures are described in the Network Library of Norsys Software Corp. All the experiments were conducted on my Pentium II 300MHz PC. A sample database that contains a 'Chest clinic' data set (1,000 cases) and a 'ALARM' data set (10,000 cases) is included in the PowerConstructor download.
Experiment 1:
1000 cases of the 'Chest clinic' network (used by Lauritzen et al., also called 'Asia' network) which has 8 attributes and 8 arcs.
Running time: Without ordering, 1.4 seconds; with ordering, 1.1 seconds.Experiment 2:
1000, 3000, 10000 cases of the 'ALARM' network (used by Beinlich et al.) which has 37 attributes and 46 arcs.
Running time: Without ordering, 28", 71" and 204" respectively; with ordering, 25", 63", 180" respectively.
After analyzing the experimental results from different domains, we found that the system running time is roughly linear to the number of cases and O(N2) to the number of attributes, no matter node ordering is given or not. Among the running time, 99.3% of it is consumed by database engine for query processing. We believe high performance database servers or new database engines designed for data mining should be able to speed up the learning process greatly.
Deductive Database System (DDB)
Brief Description
Deductive databases (DDB) extend relational databases by providing knowledge manipulation ability. In recent years, researchers have established a sound theoretical foundation of DDB but practicable systems are still in the early stage of development. This is partly due to the difficulties of developing large commercial deductive database systems and transferring relational database users to DDB systems. We developed a deductive reasoning add-in module which can be easily integrated with relational databases. Our approach has two main advantages: first, users can get deduction functionality without changing their database systems and query languages; second, the deduction module is efficient and easy to implement.
Publications
Cheng, J., Bell, DA, Liu, W., A practical approach to knowledge representation and reasoning in relational databases, proceedings of 1996 IEEE International Conference on Tools with Artificial Intelligence.
Huai, JP., Cheng, J., DDBS1.0: A Deductive Database System with two query languages, IASTED International Conference Artificial intelligence, expert systems and neural networks, 1994.
Huai, JP., Cheng, J., Default Database and Its Beliefs Maintenance System, ICARCV'94 Third International Conference on automation, robotics and computer vision, 1994.
Systems
Deductive Database System for windows (using Visual-Basic 3.0 Pro. and MS-Access Basic 2.0.)
Deductive Database System 1.0 (using C and ORACLE).
![]()
This page was last updated on 21 April, 2001.