Handling Missing Information
Most learning algorithms use completely specified training
instances to produce classifiers that can classify completely specified
(but unlabeled) test instances. In general, of course, the instances
will not be complete: the training instances, and/or the test instances,
may be "blocked", for various reasons. This research investigates
learning in these contexts.
Blocked Test Data
An inductive inference system uses labeled training examples to learn a
classification function for labeling subsequent unlabeled performance examples.
Most theoretical analyses assume that both training and performance examples
are complete, in that the value of every attribute is known to both learner
and classifier. Real-world data, however, is usually incomplete. This paper
addresses this discrepancy by formally analyzing the task of learning to
classify incompletely specified performance examples, based on completely-
and incompletely-specified training examples, respectively. This formalism
requires an extension of the classical notion of concept definition, called
"default concept definition" (dcd), whose classification behavior can be
nonmonotonic. We first present a formal account of these dcds and show
that they is similar, but not identical, to important existing ideas from
both learnability and AI knowledge representation formalisms. We next define
a generalization of Valiant's probabilistic model of instance presentation
that allows attribute values to be hidden from the classifier. Finally,
we use this model to develop an extension to the PAC learning framework
that can learn dcds from examples, and prove a number of learnability results,
both positive and negative, which collectively suggest the appropriate
ways of treating missing attribute values.
Proceedings of the Tenth Canadian Conference on Artificial
Intelligence (CSCSI-94), Banff, May 1994.
Classical concepts, based on necessary and sufficient defining conditions,
cannot classify logically insufficient object descriptions. Many reasoning
systems avoid this limitation by using "default concepts" to classify incompletely
described objects. This paper addresses the task of learning such default
concepts from observational data. We first model the underlying performance
task --- classifying incomplete examples --- as a probabilistic process
that passes random test examples through a "blocker" that can hide object
attributes from the classifier. We then address the task of learning accurate
default concepts from random training examples. After surveying the learning
techniques that have been proposed for this task in the machine learning
and knowledge representation literatures, and investigating their relative
merits, we present a more data-efficient learning technique, developed
from well-known statistical principles. Finally, we extend Valiant's PAC-learning
framework to this context and obtain a number of useful learnability results.
Training
data omits irrelevant values
Knowing
What doesn't Matter:
Exploiting
the Omission of Irrelevant Data
Russell Greiner, Adam Grove and Alexander
Kogan
Most learning algorithms work most effectively when their training data
contain completely specified labeled samples. In many diagnostic
tasks, however, the data will include the values of only some of the attributes;
we model this as a blocking process that hides the values of those
attributes from the learner. While blockers that remove the values of critical
attributes can handicap a learner, this paper instead focuses on blockers
that remove only irrelevant attribute values, ie, values that are
not needed to classify an instance, given the values of the other
unblocked attributes. We first motivate and formalize this model of ``superfluous-value
blocking,'' and then demonstrate that these omissions can be useful, by
proving that certain classes that seem hard to learn in the general PAC
model --- viz., decision trees and DNF formulae --- are trivial to learn
in this setting. We also show that this model can be extended to deal with
(1) theory revision (ie, modifying an existing formula); (2) blockers that
occasionally include superfluous values or exclude required values; and
(3) other corruptions of the training data.
Expanded from
Learning Active
Classifiers
Russell Greiner, Adam Grove and
Dan Roth
Artificial Intelligence,
139:2, pp. 137--174, Sept 2002.
Most classification algorithms are ``passive'', in that they assign a
class label to each instance based only on the description given, even
if that description is incomplete.
By contrast, an active
classifier can --- at some cost --- obtain the values of
some unspecified attributes, before deciding upon a class label.
This can be useful, for instance, when deciding whether to gather information
relevant to a medical procedure or experiment.
The expected utility of using an active classifier depends on both the
cost required to obtain the values of additional attributes and the
penalty incurred if the classifier outputs the wrong classification.
This paper analyzes the problem of learning optimal active
classifiers, using a variant of the probably-approximately-correct
(PAC) model. After defining the framework, we show that this task can
be achieved efficiently when the active classifier is allowed to
perform only (at most) a constant number of tests. We then show that,
in more general environments,
this task of learning optimal active classifiers
is often intractable.
Expanded from
- Learning
Active Classifiers, Proceedings of the Thirteenth International
Conference on
Machine Learning (IMLC96), Bari Italy, July 1996.
Return to Greiner's
home page