Theory Revision

Large-scale expert systems have found widespread use. However, developers have found that the cost of maintaining a knowledge base, over its lifetime, can be as high as the initial cost of its development. One response is to use machine learning techniques to correct the knowledge base as problems emerge. Unfortunately, standard induction methods seem ill-suited to this task, as they are designed to use training data to construct a knowledge base from scratch, and the rate at which training data is generated by field users is typically too low to support regeneration of the knowledge base each time a revision is needed. As domain knowledge is available (in the form of the knowledge base), it makes sense to take advantage of this information (even if imperfect) to bias the induction process. Techniques for theory revision are suited for the task of knowledge maintenance, since they use training data to improve an initial domain theory.

This page overviews the theory revision project, within the kme programme at Siemens Corporate Research. The "Implementation" papers describe the systems we developed for this task. These systems, in effect, hill-climb to a theory whose accuracy is a local optimum. The "Theory" papers explain why we seek local optima, by proving that no efficient system can always find the global optimum.

• Implementation
• Theory Revision in Fault Hierarchies ("Delta" System, accuracy)

• See also US Patent 5987445: "Delta learning system for using expert advice to revise diagnostic expert system fault hierarchies" (Awarded 16 Nov 1999).
• An Optimized Theory Revision Module ("DeltaDelta" System, efficiency)

• See also US Patent 5787232: Efficient data-driven theory revision system (Awarded 28 July 1998).
• Theoretic Analysis (Sample/Computational Complexity)
• The Complexity of Theory Revision ("pure" theory: adding/deleting antecedents/rules)
• The Complexity of Revising Logic Programs ("impure" theory: reordering antecedents/rules)

• Fault Hierarchies

Theory Revision in Fault Hierarchies

The Fifth International Workshop on Principles of Diagnosis (Dx94), New York, October 1994.

The fault hierarchy representation is widely used in expert systems for the diagnosis of complex mechanical devices. On the assumption that an appropriate bias for a knowledge representation language is also an appropriate bias for learning in this domain, we have developed a theory revision method that operates directly on a fault hierarchy. This domain presents several challenges: A typical training instance is missing most feature values, and the pattern of missing features is significant, rather than merely an effect of noise. Moreover, features that are present may be corrupted. Finally, the accuracy of a candidate theory is measured by considering both the sequence of tests required to arrive at a diagnosis, and its agreement with the diagnostic endpoints provided by an expert. We describe the DELTA algorithm for theory revision of fault hierarchies, its application in knowledge base maintenance, and report on experiments with a diagnostic system in current use.

• (updated version)

• Optimized

An Optimized Theory Revision Module

In progress

Essentially all theory revision systems use a set of theory-to-theory transformations { t_k } to hill-climb from a given initial theory to a new theory whose empirical accuracy, over a given set of labeled training instances { q_j }, is a local optimum. At the heart of each such process is a "evaluator", which compares the accuracy of the current theory KB with that of each of its "neighbors" N(KB) = { t_k(KB) }, each formed by applying one transformation to the KB, with the goal of determining which neighbor has the highest score. One obvious way to implement such an evaluator involves simply running each individual neighbor theory KB_k = t_k(KB) on each instance q_j; this corresponds to the "wrapper" model [JKP'93]. In practice, however, this approach can be prohibitively slow. We therefore built an alternative system DD, which employs a smarter evaluator that can quickly compute the score of a transformed theory t_k(KB), relative to the current theory KB, by "looking inside" KB and reasoning about the effects of the t_k transformation. This paper presents data that shows DD runs around 35 times faster than the naive wrapped system Delta, attaining the same accuracy. We also discuss DD's source of power, and generalize from this specific implementation to suggest other situations to which these ideas may be applicable.

Complex - Pure

TheComplexity of Theory Revision (ps)

Artificial Intelligence Journal, 107:2 (February 1999), 175-217.

A knowledge-based expert system uses its database (a.k.a. its "theory") to produce answers to the queries it receives. Unfortunately, these systems will produce inaccurate answers if their theories include erroneous information. Standard "theory revision" systems use a given set of "labeled queries" (each a query paired with its correct answer) to incrementally transform the given theory, by adding and/or deleting either rules and/or antecedents, into a related one that is as accurate as possible. This paper uses both sample and computational complexity arguments to analyze this process. After formally defining the theory revision task, we use the observation that we have access to only a limited number of labeled queries to constrain the space of new theories that can be considered; in particular, we describe when a polynomial number of such queries will be sufficient to obtain the information required to identify a theory whose accuracy is close to optimal with high probability. We then consider the computational complexity of finding the best theory within this space, and prove that, unless P=NP, no polynomial time algorithm can identify this near-optimal theory, even given the exact distribution of queries. We also present conditions under which a polynomial-time algorithm can produce a theory whose (in)accuracy is even close (ie, within a particular polynomial factor) to optimal. These results justify the standard practice of using these types of transformations to hill-climb to a locally-optimal theory, based on a given set of labeled samples.

Expanded from

• The Complexity of Theory Revision, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI95), Montreal, August 1995.

• Complex - Impure

The Complexity of Revising Logic Programs (ps)

Journal of Logic Programming, 40:2-3 (Aug-Sept 1999), pp. 273-298.

A rule-based program will return a set of answers to each query. An impure program, which includes the Prolog cut "!" and not(.) operators, can return different answers if its rules are re-ordered. There are also many reasoning systems that return only the first answer found for each query; these first answers, too, depend on the rule order, even in pure rule-based systems. A theory revision algorithm, seeking a revised rule-base whose expected accuracy, over the distribution of queries, is optimal, should therefore consider modifying the order of the rules. This paper first shows that a polynomial number of training labeled queries'' (each a query paired with its correct answer) provides the distribution information necessary to identify the optimal ordering. It then proves, however, that the task of determining which ordering is optimal, once given this distributional information, is intractable even in trivial situations; eg, even if each query is an atomic literal, we are seeking only a perfect'' theory, and the rule base is propositional. We also prove that this task is not even approximable: Unless P=NP, no polynomial time algorithm can produce an ordering of an n-rule theory whose accuracy is within n^{\gamma} of optimal, for some \gamma > 0. We next prove similar hardness, and non-approximatability, results for the related tasks of determining, in these impure contexts, (1) the optimal ordering of the antecedents; (2) the optimal set of new rules to add; and (3) the optimal set of existing rules to delete.

Expanded from