Handling Missing Information

Most learning algorithms use completely specified training instances to produce classifiers that can classify completely specified (but unlabeled) test instances.  In general, of course, the instances will not be complete: the training instances, and/or the test instances, may be "blocked", for various reasons.  This research investigates learning in these contexts.

  • Learning classifiers that can effectively handle blocked test data
  • Learners that can exploit training data that omits irrelevant values
  • Learning "active classifiers" that can (at a cost) obtain values of blocked attributes

  • Blocked Test Data

    Learning to Classify Incomplete Examples

    Dale Schuurmans and Russell Greiner

    Computational Learning Theory and Natural Learning Systems: Making Learning Systems Practical, MIT Press, 1996.

    An inductive inference system uses labeled training examples to learn a classification function for labeling subsequent unlabeled performance examples. Most theoretical analyses assume that both training and performance examples are complete, in that the value of every attribute is known to both learner and classifier. Real-world data, however, is usually incomplete. This paper addresses this discrepancy by formally analyzing the task of learning to classify incompletely specified performance examples, based on completely- and incompletely-specified training examples, respectively. This formalism requires an extension of the classical notion of concept definition, called "default concept definition" (dcd), whose classification behavior can be nonmonotonic. We first present a formal account of these dcds and show that they is similar, but not identical, to important existing ideas from both learnability and AI knowledge representation formalisms. We next define a generalization of Valiant's probabilistic model of instance presentation that allows attribute values to be hidden from the classifier. Finally, we use this model to develop an extension to the PAC learning framework that can learn dcds from examples, and prove a number of learnability results, both positive and negative, which collectively suggest the appropriate ways of treating missing attribute values.


    Learning Default Concepts

    Dale Schuurmans and Russell Greiner

    Proceedings of the Tenth Canadian Conference on Artificial Intelligence (CSCSI-94), Banff, May 1994.

    Classical concepts, based on necessary and sufficient defining conditions, cannot classify logically insufficient object descriptions. Many reasoning systems avoid this limitation by using "default concepts" to classify incompletely described objects. This paper addresses the task of learning such default concepts from observational data. We first model the underlying performance task --- classifying incomplete examples --- as a probabilistic process that passes random test examples through a "blocker" that can hide object attributes from the classifier. We then address the task of learning accurate default concepts from random training examples. After surveying the learning techniques that have been proposed for this task in the machine learning and knowledge representation literatures, and investigating their relative merits, we present a more data-efficient learning technique, developed from well-known statistical principles. Finally, we extend Valiant's PAC-learning framework to this context and obtain a number of useful learnability results.


    Training data omits irrelevant values

    Knowing What doesn't Matter: Exploiting the Omission of Irrelevant Data

    Russell Greiner, Adam Grove and Alexander Kogan

    Artificial Intelligence, 97:1-2 (December 1997), p. 345-380.

    Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only irrelevant attribute values, ie, values that are not needed to classify an instance, given the values of the other unblocked attributes. We first motivate and formalize this model of ``superfluous-value blocking,'' and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model --- viz., decision trees and DNF formulae --- are trivial to learn in this setting. We also show that this model can be extended to deal with (1) theory revision (ie, modifying an existing formula); (2) blockers that occasionally include superfluous values or exclude required values; and (3) other corruptions of the training data.  

    Expanded from

  • Exploiting the Omission of Irrelevant Data, Proceedings of the Thirteenth International Conference on Machine Learning (IMLC96), Bari Italy, July 1996.

  • Learning Active Classifiers

    Learning Cost-Sensitive Active Classifiers

    Russell Greiner, Adam Grove and Dan Roth

    Artificial Intelligence, 139:2, pp. 137--174, Sept 2002.

    Most classification algorithms are ``passive'', in that they assign a class label to each instance based only on the description given, even if that description is incomplete. By contrast, an active classifier can --- at some cost --- obtain the values of some unspecified attributes, before deciding upon a class label. This can be useful, for instance, when deciding whether to gather information relevant to a medical procedure or experiment. The expected utility of using an active classifier depends on both the cost required to obtain the values of additional attributes and the penalty incurred if the classifier outputs the wrong classification. This paper analyzes the problem of learning optimal active classifiers, using a variant of the probably-approximately-correct (PAC) model. After defining the framework, we show that this task can be achieved efficiently when the active classifier is allowed to perform only (at most) a constant number of tests. We then show that, in more general environments, this task of learning optimal active classifiers is often intractable.


    Expanded from


    Return to Greiner's home page