------------------------------------------------------------------------------
			  * * * * ABSTRACTS * * * *
		   * * * Russ Greiner's recent papers * * *
				[from Aug'90]
------------------------------------------------------------------------

Available from public ftp site:
    ftp://scr.siemens.com/pub/learning/Papers/greiner/
(or see http://www.cs.toronto.edu/~greiner/ )

{In reverse chronological order}

------------------------------------------------------------------------------

33. 		Exploiting the Omission of Irrelevant Data
	     Russell Greiner, Adam Grove and Alexander Kogan

Most learning algorithms work most effectively when their training data
contain *completely specified* labeled samples.  In many diagnostic tasks,
however, the data will include the values of only some of the attributes; we
model this as a *blocking* process that hides the values of those attributes
from the learner.  While blockers that remove the values of critical
attributes can handicap a learner, this paper instead focuses on blockers that
remove only *irrelevant* attribute values, ie, values that are *not needed to
classify an instance*, given the values of the other unblocked attributes.  We
first motivate and formalize this model of ``superfluous-value blocking,'' and
then demonstrate that these omissions can be useful, by proving that certain
classes that seem hard to learn in the general PAC model --- viz., decision
trees and DNF formulae --- are trivial to learn in this setting.  We also show
that this model can be extended to deal with (1) theory revision (ie,
modifying an existing formula); (2) blockers that occasionally include
superfluous values or exclude required values; and (3) other corruptions of
the training data.


(to appear,
   Artificial Intelligence, 1997.
 { superfluous-journal.ps }

[expanded from paper in
   "Proceedings of the Thirteenth International Conference on
  Machine Learning (IMLC96)", Bari Italy, July 1996)
  { exploit-omit-imlc96.ps }

  R. Bharat Rao, Russell Greiner and Tom Hancock,
 "Exploiting the Absence of Irrelevant Information:
  What You Don't Know *Can* Help You",
 Proceedings of the AAAI Fall Symposium on `Relevance',
 New Orleans, November 1994.
 { superfluous.ps }
]

------------------------------------------------------------------------------

32.   Computational Learning Theory and Natural Learning Systems ---
            Volume IV: Making Learning Systems Practical

	     R. Greiner, T. Petsche, and S. Hanson  (ed)

From the introduction:

   This is the fourth and final volume of papers from a series of workshops
   called "Computational Learning Theory and `Natural' Learning Systems."
   The purpose of the workshops was to explore the emerging intersection
   of theoretical learning research and natural learning systems.
   The workshops drew researchers from three historically distinct styles of
   learning research: computational learning theory, neural networks, and
   machine learning (a subfield of AI). 

   [...]

   The editors divide the twenty-one contributions into four sections. The
   first three cover critical problem areas: 
   (1) scaling up from small problems to realistic ones with
       large input dimensions, 
   (2) increasing efficiency and robustness of learning methods, and 
   (3) developing strategies to obtain good generalization from limited or
       small data samples.  
   The fourth section discusses examples of real-world learning systems. 

(available from MIT Press, 1997 (ISBN 0-262-57118-8)
 {http://www-mitpress.mit.edu/mitp/recent-books/comp/greop.html} )

------------------------------------------------------------------------------

31.		Learning Bayesian Nets that Perform Well
		  R. Greiner, A. Grove, D. Schuurmans

A Bayesian net (BN) is more than a succinct way to encode a probabilistic
distribution; it also corresponds to a function used to answer queries.  A BN
can therefore be evaluated by the accuracy of the answers it returns.  Many
algorithms for learning BNs, however, attempt to optimize another criterion
(usually likelihood, possibly augmented with a regularizing term), which is
independent of the distribution of queries that are posed.  This paper takes
the ``performance criteria'' seriously, and considers the challenge of
computing the BN whose performance --- read ``accuracy over the distribution
of queries'' --- is optimal.  We show that many aspects of this learning task
are more difficult than the corresponding subtasks in the standard model.

(appears in
"Proceedings of the Thirteenth  Conference on 
  Uncertainty in Artificial Intelligence (UAI97)", 
Providence, Aug 1997
{ ggs-bnpw-uai97.ps })

------------------------------------------------------------------------------
30.     Why Experimentation can be better than `Perfect Guidance'
	       T. Scheffer, R. Greiner and C. Darken,

Many problems correspond to the classical control task of determining the
appropriate control action to take, given some (sequence of) observations.
One standard approach to learning these control rules, called *behavior
cloning*, involves watching a perfect operator operate a plant, and then
trying to emulate its behavior.  In the *experimental learning* approach, by
contrast, the learner first guesses an initial operation-to-action policy and
tries it out.  If this policy performs sub-optimally, the learner can modify
it to produce a new policy, and recur.  This paper discusses the relative
effectiveness of these two approaches, especially in the presence of
perceptual aliasing, showing in particular that the experimental learner can
often learn more effectively than the cloning one.


(appears in
 "Proceedings of the Fourteenth International Conference on
   Machine Learning (IMLC-97)", 
 Nashville, July 1997
{ sgd-experiment-ml97.ps })

------------------------------------------------------------------------------


29.		The Complexity of Revising Logic Programs
			 Russell Greiner

A rule-based program will return a set of answers to each query.  An *impure*
program, which includes the Prolog cut ``!'' and ``not(.)''  operators, can
return different answers if its rules are re-ordered.  There are also many
reasoning systems that return only the *first* answer found for each query;
these first answers, too, depend on the rule order, even in pure rule-based
systems.  A theory revision algorithm, seeking a revised rule-base whose
*expected accuracy*, over the distribution of queries, is optimal, should
therefore consider modifying the order of the rules.  This paper first shows
that a polynomial number of training ``labeled queries'' (each a query paired
with its correct answer) provides the distribution information necessary to
identify the optimal ordering.  It then proves, however, that the task of
determining which ordering is optimal, once given this distributional
information, is intractable even in trivial situations; eg, even if each query
is an atomic literal, we are seeking only a ``perfect'' theory, and the rule
base is propositional.  We also prove that this task is not even approximable:
Unless $P=NP$, no polynomial time algorithm can produce an ordering of an
$n$-rule theory whose accuracy is within $n^{\gamma}$ of optimal, for some
$\gamma > 0$.  We next prove similar hardness, and non-approximatability,
results for the related tasks of determining, in these impure contexts,
(1) the optimal *ordering of the antecedents*; (2) the optimal set of 
*new rules to add*; and (3) the optimal set of *existing rules to delete*.

(accepted,
  The Journal of Logic Programming, 1997.
 { impure-journal.ps }

[expanded from 
  R. Greiner: 
  "The Challenge of Revising an Impure Theory",
  Proceedings of the Thirteenth International Conference on
  Machine Learning (IMLC96), Bari Italy, July 1996.
  { impure-ml95.ps }
]

------------------------------------------------------------------------------

28. 		    Learning Active Classifiers
		 Russell Greiner, Adam Grove and Dan Roth

Many classification algorithms are ``passive'', in that they assign a
class-label to each instance based only on the description given, even 
if that description is incomplete.  In contrast, an *active* classifier
can --- at some cost --- obtain the values of missing attributes, before
deciding upon a class label.  The expected utility of using an active 
classifier depends on both the cost required to obtain the additional 
attribute values and the penalty incurred if it outputs the wrong 
classification.  This paper considers the problem of *learning* near-optimal
active classifiers, using a variant of the probably-approximately-correct
(PAC) model.  After defining the framework --- which is perhaps the main
contribution of this paper --- we describe a situation where this task can be
achieved efficiently, but then show that the task is often intractable.

(appeared in "Proceedings of the Thirteen International Conference on
Machine Learning (IMLC96)", Bari Italy, July 1996)
{ active-class-imlc96.ps }

------------------------------------------------------------------------------

27. 			Practical PAC Learning
		   Dale Schuurmans and Russell Greiner

We present new strategies for ``probably approximately correct'' (pac)
learning that use fewer training examples than previous pac-learning
techniques, while maintaining the exact same worst-case guarantees.  We
achieve this improvement by using *sequential* learning procedures that
observe training examples one-at-a-time, and decide ``on-line'' whether to
halt and return a hypothesis, or continue training.  We first provide a
theoretical worst-case upper-bound on the average number of training examples
these procedures will use, and show that it is often better than the fixed
number required by previous ``batch'' learning algorithms.  However, the
efficiency of our approach is primarily established by a series of experiments
which show a sequential learning algorithm that uses many times fewer training
examples in practice than previous techniques.  These results are robust to
changes in the target concept, domain distribution, etc., and moreover, the
reductions are obtained with little (additional) computational overhead.
Overall, these results demonstrate how pac-learning can be achieved far more
efficiently in practice than previously thought, even in the worst case,
countering the claim that pac-learning can never be feasibly achieved in real
applications.

(appeared in "Proceedings of the Fourteenth International Joint Conference on
Artificial  Intelligence (IJCAI95)", Montreal, August 1995)
{ pracpac-ijcai95.ps }

------------------------------------------------------------------------------

26. 		The Challenge of Revising an Impure Theory
		          Russell Greiner

A *pure* rule-based program will return a set of answers to each query; and
will return the same answer set even if the rules are re-ordered.  To work
effectively, however, many real-world programs are ``impure'', ie, include the
Prolog cut ``!'' and not(.) operators.  The answers returned by such programs
*do* depend on the rule ordering.  There are also many reasoning systems that
return only the *first* answer found for each query; these first answers, too,
depend on the rule order, even in *pure* rule-based systems.  A theory
revision algorithm, seeking the revised rule-based system that works best in
such impure contexts, should therefore consider modifying the order of the
rules, in addition to adding and deleting rules and antecedents.  Such
revision algorithms apply a sequence of modifications, seeking a rule-base
whose expected accuracy, over the distribution of queries, is optimal.  This
paper first shows that a polynomial number of training labeled queries (each a
query coupled with its correct answer) is sufficient to provide the
distribution information necessary to identify the optimal ordering.  It then
proves, however, that the task of determine which ordering is optimal, once
given this information, is intractable even in trivial situations; eg, even if
each query is an atomic literal, we are seeking only a ``perfect'' theory, and
the knowledge base is propositional.  We also prove that this task is not even
approximable: Unless P=NP, no polynomial time algorithm can produce an
ordering whose accuracy is within $n^{\epsilon}$ of optimal, for some
$\epsilon > 0$, where $n$ is the number of rules.  We also prove similar
hardness, and non-approximatability, results for the related tasks of
determining, in these impure contexts, (1) the optimal ordering of the
antecedents; (2) the optimal set of rules to add or (3) to delete; and (4) the
optimal priority values for a set of defaults.

(appeared in "Proceedings of the Twelfth International Conference on 
Machine Learning (ML95)", Lake Tahoe, July 1995)
{ impure-ml95.ps -- see also impure-journal.ps }

------------------------------------------------------------------------------

25. 			Sequential PAC Learning
		   Dale Schuurmans and Russell Greiner

A "PAC-learning algorithm" uses labeled samples to produce a hypothesis
whose misclassification rate is at most $\epsilon$, with probability at least
$1-\delta$.  While standard PAC-algorithms work in "batch" mode, producing a
hypothesis after examining a *fixed* number of labeled samples, this
report considers *sequential* PAC-algorithms, which observe one labeled
sample at a time, and can decide to stop, and return a hypothesis, after
seeing any set of samples.  We first consider learning when the learner knows
the sample distribution, providing first bounds on the "expected sample
size" of such algorithms (ie, expected number of samples until the algorithm
terminates), then necessary and sufficient conditions under which one can
learn $\epsilon$-good hypotheses *with certainty* (ie, with $\delta = 0$), and
finally "truncated sequential learners", that require only (at most) a
specified number of samples to produce an appropriate $(\epsilon,\delta)$-good
hypothesis.  We next consider distribution *independent* learning, and
provide bounds in terms of the concept class's mistake bound and then its
VCdimension.  While our theoretical results prove that the worst-case expected
sample complexity for sequential learners scales in essentially the same
manner as sample complexity scales for fixed-learners in most situations
(meaning essentially that the same classes can be PAC-learned in both models),
we present empirical evidence that shows that a sequential learner can require
*significantly* fewer samples in practice; often by orders of magnitude.

(appeared in "Proceedings of the Eighth Annual Conference
on Computational Learning Theory (COLT95)", Santa Cruz, July 1995.)
{ seqpac-colt95.ps }
------------------------------------------------------------------------------

24.		 The Complexity of Theory Revision
			Russell Greiner

A knowledge-based expert system uses its database (a.k.a. its "theory")
to produce answers to the queries it receives.  Unfortunately, these systems
will produce inaccurate answers if their theories include erroneous
information. Standard "theory revision" systems use a given set of "labeled
queries" (each a query paired with its correct answer) to incrementally
transform the given theory, by adding and/or deleting either rules and/or
antecedents, into a related one that is as accurate as possible.  This paper
uses both sample and computational complexity arguments to analyze this process.
After formally defining the theory revision task, we use the observation that
we have access to only a limited number of labeled queries to constrain the
space of new theories that can be considered;  in particular, we describe when
a polynomial number of such queries will be  sufficient to obtain the
information required to identify a theory  whose accuracy is close to optimal
with high probability.  We then consider the computational complexity of
finding the best theory within this space, and prove that, unless $P=NP$, no
polynomial time algorithm can identify this near-optimal theory, even given
the exact distribution of queries.  We also present conditions under which a
polynomial-time algorithm can produce a theory whose (in)accuracy is even
close (ie, within a particular polynomial factor) to optimal.  These results
justify  the standard practice of using these types of transformations to
hill-climb to a locally-optimal theory, based on a given set of labeled
samples.

(appeared in "Proceedings of the Fourteenth International Joint Conference on
Artificial  Intelligence (IJCAI95)", Montreal, 1995)
{ comp-tr-ijcai95.ps; 
  comp-tr-AIJ.ps is current version of paper accepted, subject to revision,
  by "Artificial Intelligence" journal }

------------------------------------------------------------------------------

23.		Learning to Select Useful Landmarks
	       Russell Greiner and Ramana Isukapalli

To navigate effectively, an autonomous agent must be able to quickly and
accurately determine its current location.  Given an initial estimate of its
position (perhaps based on dead-reckoning) and an image taken of a known
environment, our agent first attempts to locate a set of landmarks (real-world
objects at known locations), then uses their angular separation to obtain an
improved estimate of its current position.  Unfortunately, some landmarks may
not be visible, or worse, may be confused with other landmarks, resulting in
both time wasted in searching for invisible landmarks, and in further errors
in the agent's estimate of its position.  To address these problems, we
propose a method that uses previous experiences to learn a selection function
that, given the set of landmarks that might be visible, returns the subset
which can probably be found correctly, and so provide an accurate registration
of the agent's position.  We use statistical techniques to prove that the
learned selection function is, with high probability, effectively at a local
optimal in the space of such functions.  This report also presents empirical
evidence, using real-world data, that demonstrate the effectiveness of our
approach.

(appeared in "Proceedings of the Annual National Conference on Artificial
Intelligence (AAAI-94)", Seattle, August 1994)
{ useful-lms-aaai.ps }

----------------    ----------------    ----------------    ----------------

23a.		Learning to Select Useful Landmarks
	       Russell Greiner and Ramana Isukapalli

(A revised journal-length version of 23 above.
Appeared in "IEEE Transactions on Systems, Man and Cybernetics",
June 1996, p. 437--449.)
{ useful-lms-smc.ps }

------------------------------------------------------------------------------

22.		     Theory Revision in Fault Hierarchies
		Pat Langley, George Drastal, R. Bharat Rao and Russell Greiner

The fault hierarchy representation is widely used in expert systems for the
diagnosis of complex mechanical devices.  On the assumption that an appropriate
bias for a knowledge representation language is also an appropriate bias for
learning in this domain, we have developed a theory revision method that
operates directly on a fault hierarchy.  This domain presents several
challenges: A typical training instance is missing most feature values, 
and the pattern of missing features is significant, 
rather than merely an effect of noise.  Moreover, features that are
present may be corrupted.  Finally, the accuracy of a candidate theory is
measured by considering both the sequence of tests required to arrive at a
diagnosis, and its agreement with the diagnostic endpoints provided by an
expert.  We describe the DELTA algorithm for theory revision of fault
hierarchies, its application in knowledge base maintenance,
and report on experiments with a diagnostic system in current use.

(appeared in "The Fifth International Workshop on Principles of Diagnosis
(Dx94)", New York, October 1994.)
{ th-rev.ps }
[see also th-rev-update.ps]

------------------------------------------------------------------------------

21.		   Learning Default Concepts
		Dale Schuurmans and Russell Greiner

Classical concepts, based on necessary and sufficient defining conditions,
cannot classify logically insufficient object descriptions.  Many reasoning
systems avoid this limitation by using "default concepts" to classify
incompletely described objects.  This paper addresses the task of learning
such *default concepts* from observational data.  We first model the
underlying performance task --- classifying incomplete examples --- as a
probabilistic process that passes random test examples through a "blocker"
that can hide object attributes from the classifier.  We then address the task
of learning accurate default concepts from random training examples.  After
surveying the learning techniques that have been proposed for this task in the
machine learning and knowledge representation literatures, and investigating
their relative merits, we present a more data-efficient learning technique,
developed from well-known statistical principles.  Finally, we extend
Valiant's PAC-learning framework to this context and obtain a number of useful
learnability results.

(appeared in "Proceedings of the Tenth Canadian Conference on Artificial
Intelligence (CSCSI-94)", Banff, May 1994.)
{ default.ps }

------------------------------------------------------------------------------

20.	       Fast Distribution-Specific Learning
	        Dale Schuurmans and Russell Greiner

PAC-learning results are often criticized for demanding impractically large
training samples.  The common wisdom is that these large samples follow from
the worst case nature of the analysis, and therefore PAC-learning, though
desirable, must not be a practical goal.  We however consider an alternative
view: perhaps these large sample sizes are due to the presumed learning
strategies which make inefficient use of the available training data.

To demonstrate this, we consider *sequential* learning strategies that
autonomously decide when to stop training based on observing training examples
as they arrive.  We show that for distribution specific learning these
algorithms require far fewer training examples (on average) than existing
fixed sample size approaches, and are able to learn with *certainty* not just
high probability.  In fact, a simple sequential strategy is *optimally*
efficient in many cases.

(appears in "Computational Learning Theory and Natural Learning Systems:
   Making Learning Systems Practical", MIT Press, 1997.)
{ fast-disp-spec.ps }
------------------------------------------------------------------------------

19.	    Learning to Classify Incomplete Examples
	        Dale Schuurmans and Russell Greiner


An inductive inference system uses labeled training examples to learn a
classification function for labeling subsequent unlabeled performance
examples.  Most theoretical analyses assume that both training and performance
examples are complete,  in that the value of every attribute is known to both
learner and classifier.  Real-world data, however, is usually incomplete.
This paper addresses this discrepancy by formally analyzing the task of  
learning to classify incompletely specified performance examples, based on
completely- and incompletely-specified training examples, respectively.  This 
formalism requires an extension of the classical notion of concept definition,
called "default concept definition" (dcd), whose classification behavior can
be nonmonotonic.  We first present a formal account of these dcds and show
that they are similar, but not identical, to important existing ideas from both
learnability and AI knowledge representation formalisms.  We next define a
generalization of Valiant's probabilistic model of instance presentation that
allows attribute values to be hidden from the classifier.  Finally, we use this
model to develop an extension to the PAC learning framework that can learn
dcds from examples, and prove a number of learnability results, both positive
and negative, which collectively suggest the appropriate ways of treating
missing attribute values.

(appears in "Computational Learning Theory and Natural Learning Systems:
   Making Learning Systems Practical", MIT Press, 1997.)
{ missing.ps }
------------------------------------------------------------------------------

18. 			Adaptive Derivation Processes
			       R. Greiner

Many reasoning systems must reach conclusions based on stored information;
we can often model this as deriving logical conclusions from a given knowledge
base of facts.   We of course prefer derivation systems that draw all and only
the correct conclusions, and that reach these conclusions as quickly as
possible.  Unfortunately, a sound and complete derivation process can be
intractable, if not undecidable, in the worse case.  This position paper
discusses the general challenge of producing a derivation process that is as
effective as possible, and argues for using a (cautious) adaptive derivation
process here.

(appeared in "Workshop on Knowledge Compilation and Speedup Learning",
      Amherst, Massachusetts, June 1993.)
{ adaptive.ps }
------------------------------------------------------------------------------

16.	  Probably Approximately Optimal Satisficing Strategies
		  Russell Greiner  and  Pekka Orponen

A "satisficing search problem" consists of a set of probabilistic experiments
to be performed in some order, seeking a satisfying configuration of successes
and failures.  The expected cost of performing the experiments depends both on
the success probability of the individual experiments, and on the "search
strategy", which specifies the order in which the experiments are to be
performed. A strategy is optimal if its expected cost is minimal.  Earlier work
has provided "optimizing functions" that find optimal strategies for certain
classes of search problems, when given the probabilities.  This paper extends
those results by addressing the complexities of identifying an approximately
optimal strategy when the probability values are not known, within a fully
general model.  We present an algorithm PAO that first obtains estimates of
the probabilities by observing a computed number of trials of each
undetermined experiment, and then uses these estimates (and the proper
optimization function)  to identify a strategy whose cost is, with high
probability,  close to optimal.  We also show that PAO can gather the
necessary statistics while solving relevant problems. 

(appeared in "Artificial Intelligence", 82:1--2, April 1996, pp. 21--44.)
{ pao.ps }
------------------------------------------------------------------------------

15.          Book Review: Building Large Knowledge-Based Systems:
	       Representation and Inference in the CYC Project
		      Russell Greiner and Charles Elkan

The book under review here, "Building Large Knowledge-Based Systems:
Representation and Inference in the Cyc Project", describes progress so far in
an attempt to build a system that is intended to exhibit general common-sense
reasoning ability.  This review first discusses aspects of the Cyc system,
with a focus on important decisions made in designing its knowledge
representation language, and on how claims about the performance of the
system might be validated.  The review then turns to the book itself,
discussing both its merits and its faults.

(appeared in "Artificial Intelligence", 61:1 (1993) 41--52.)
{ cyc.ps }
------------------------------------------------------------------------------

14. 	 Learning an Optimally Accurate Representational System
		   Russell Greiner and Dale Schuurmans

A default theory can sanction different, mutually incompatible, answers 
to certain queries. We can identify each such theory with a set of related
credulous theories, each of which produces but a single response to each
query, by imposing a total ordering on the defaults. Our goal is to identify
the credulous theory with optimal "expected accuracy" averaged over the
natural distribution of queries in the domain. There are two obvious
complications:  First, the expected accuracy of a theory depends on the query
distribution, which is usually not known. Second, the task of identifying the
optimal theory, even given that distribution information, is intractable.
This paper presents a method, OptAcc, that side-steps these problems by using
a set of samples to estimate the unknown distribution, and by hill-climbing to
a local optimum.  In particular, given any error and confidence parameters
\epsilon, \delta>0, OptAcc produces a theory whose expected accuracy is, with
probability at least 1-\delta, within \epsilon of a local optimum.

(appeared in "ECAI Workshop on Theoretical Foundations of Knowledge
Representation and Reasoning", Springer Verlag (LNAI series), 1993.
{  accuracy-ecai.ps }
[Expanded from workshop preceedings, Vienna, August 1992.])
{  accurate.ps }


------------------------------------------------------------------------------
13.	  A Statistical Approach to Solving the EBL Utility Problem
		       Russell Greiner and Igor Jurisica

Many "learning from experience" systems use information extracted from problem
solving experiences to modify a performance element PE, forming a new one PE'
that can solve these and similar problems more efficiently.  However, as
transformations that improve performance on one set of problems can degrade
performance on other sets, the new PE' is not always better than the original
PE; this depends on the distribution of problems.  We therefore seek the the
performance element whose expected performance, over this distribution, is
optimal.  Unfortunately, the actual distribution, which is needed to determine
which element is optimal, is usually not known.  Moreover, the task of finding
the optimal element, even knowing the distribution, is intractable for most
interesting spaces of elements.  This paper presents a method, PALO, that
side-steps these problems by using a set of samples to estimate the unknown
distribution, and by using a set of transformations to hill-climb to a local
optimum:  Given any parameters $\epsilon, \delta > 0$, PALO produces an
element PE' whose expected performance is, with probability $\geq 1-\delta$,
within $\epsilon$ of a local optimum.  This process is based on a
mathematically rigorous form of utility analysis: in particular, it uses
statistical techniques to determine whether the result of a proposed
transformation will be better than the original system.  We also present an
efficient way of implementing this learning system in the context of a general
class of performance elements; and include empirical evidence that this
approach can work effectively.

(appeared in "Proceedings of the Annual National Conference on Artificial
Intelligence (AAAI-92)", San Jose, July 1992.)
{ utility.ps }
------------------------------------------------------------------------------

12.   "A Correction to the Algorithm in Reiter's Theory of Diagnosis"
            Russell Greiner, Barbara A. Smith and R. W. Wilkerson

Reiter [1987] has developed a general theory of diagnosis based on first
principles.  His algorithm computes all diagnoses which explain the
differences between the predicted and observed behavior of a given system.
Unfortunately, Reiter's description of the algorithm is incorrect in that 
some diagnoses can be missed under certain conditions.  This note presents
a revised algorithm and a proof of its correctness.

(appeared in  "Artificial Intelligence", 41:1 (79--88), November 1989;
 [Reprinted in "Readings in Model-based Diagnosis", edited by
  W. Hamscher, J. deKleer and L. Console, Morgan Kaufmann, 1992.] )
{ correct.ps }

------------------------------------------------------------------------------

11.		Learning Efficient Query Processing Strategies
			     Russell Greiner

A query processor QP uses the rules in a rule base to reduce a given query to 
a series of attempted retrievals from a database of facts.  The QP's "expected
cost" is the average time it requires to find an answer, averaged over its
anticipated set of queries.  This cost depends on the QP's "strategy", which
specifies the order in which it considers the possible rules and retrievals. 
This paper provides two related learning algorithms, PIB and PAO, for
improving the QP's strategy, ie, for producing new strategies with lower
expected costs.  Each algorithm first monitors the QP's operations over a set
of queries, observing how often each path of rules leads to a sufficient set
of successful retrievals, and then uses these statistics to suggest a new
strategy.  PIB hill-climbs to strategies that are, with high probability,
successively better; and PAO produces a new strategy that probably is
approximately  optimal.  We describe how to implement both learning systems
unobtrusively, discuss their inherent time and space complexities, and use
methods from mathematical statistics to prove their correctness.  We also
discuss additional applications of these approaches to several other database
tasks. 

(appeared in "Eleventh Symposium on Principles of Database Systems (PODS-92)",
San Diego, June 1992)
{ qp.ps }

------------------------------------------------------------------------------


10.		Probabilistic Hill-Climbing: Theory and Applications
			       Russell Greiner

Many learning tasks search through a space of possible performance elements,
seeking an element whose expected utility (ie, utility averaged over a
distribution P) is high.  As the task of finding the globally optimal element
is usually intractable, many practical learning systems use hill-climbing to
find a local optimum.  Unfortunately, even this is difficult, as the
distribution P is typically unknown.  This paper address the task of
approximating this hill-climbing search when the utility function can only be
estimated by sampling.  We present and prove correct an algorithm that, with
provably high probability, returns an element arbitrarily close to a local
optimum. We then demonstrate the generality of this algorithm by sketching
three meaningful applications, that find an element whose efficiency, accuracy
or completeness (respectively)  is nearly optimal.  These results suggest
approaches to solving the utility problem from explanation-based learning,
the multiple extension problem from nonmononotic reasoning and the
tractability/accuracy tradeoff problem from knowledge representation.

(appeared in "Proceedings of the Ninth Canadian Conference on Artificial
Intelligence (CSCSI-92)", Vancouver, May 1992)
{ phc-applic.ps }
----------------    ----------------    ----------------    ----------------

10a.		PALO: A Probabilistic Hill-Climbing Algorithm
			       Russell Greiner

Many learning systems search through a space of possible performance elements,
seeking an element whose expected utility, over the distribution of problems,
is high.  As the task of finding the globally optimal element is usually
intractable, many practical learning systems use hill-climbing to find a local
optimum.  Unfortunately, even this is problematic as the underlying
distribution of problems, required to determine an element's expected utility 
and hence essential for comparing different elements, is typically unknown.
This paper addresses the task of approximating this hill-climbing search
when the utility function can only be estimated by sampling.  We present an
algorithm that returns an element that is, with provably high probability,
essentially a local optimum.  We then demonstrate the generality of this
algorithm by sketching three meaningful applications, that respectively find
an element whose efficiency, accuracy or completeness is nearly optimal.
These results suggest approaches to solving the utility problem from
explanation-based learning, the multiple extension problem from nonmonotonic
reasoning and the tractability/completeness  tradeoff problem from knowledge
representation. 

(A revised journal-length version of 10 above.)
Appeared in "Artificial Intelligence", 84:1-2, July 1996, p. 177-204.
{ palo-aij.ps }

------------------------------------------------------------------------------

9.	       A Formal Analysis of Solution Caching 
	        Vinay Chaudhri and Russell Greiner

Many knowledge representation systems and deductive databases store and
maintain the conclusions found during a derivation process, in a form that
allows these conclusions to be used during subsequent derivations.  As this
approach, called "solution caching", allows the system to avoid repeating
these derivations, it can reduce the system's overall cost for query
answering.  Unfortunately,  as there is a cost for storing these conclusions,
it is not always desirable to cache every solution found --- this depends on
whether the savings achieved  by performing fewer inference steps for these
future queries exceeds the storage overhead incurred.  This paper formally
characterizes this tradeoff, and presents an efficient algorithm that produces
an optimal caching strategy: i.e., given an inference graph of a knowledge
base,  anticipated number of queries at each node, and various parameters, 
it determines which nodes in the inference graph should cache their solutions.

(appeared in "Proceedings of the Ninth Canadian Conference on Artificial
Intelligence (CSCSI-92)", Vancouver, May 1992)
{ cache.ps }
------------------------------------------------------------------------------

8.      	  Learning Useful Horn Approximations
		  Russell Greiner and Dale Schuurmans

While the task of answering queries from an arbitrary propositional theory 
is intractable in general,  it can typically be performed efficiently if the
theory is Horn. This suggests that it may be more efficient to answer queries
using a "Horn approximation"; ie, a horn theory that is semantically similar
to the original theory.   The utility of any such approximation depends on how 
often it produces answers to the queries that the system actually encounters;
we therefore seek an approximation whose expected "coverage" is maximal.
Unfortunately, there are several obstacles to achieving this goal in practice:
(i) The optimal approximation depends on the query distribution, which is
typically not known a priori; (ii) identifying the optimal approximation is
intractable, even given the query distribution; and (iii) the optimal
approximation might be too large to guarantee tractable inference.  This paper
presents an approach that overcomes (or side-steps) each of these obstacles.
We define a learning process, AdComp, that uses observed queries to estimate
the query distribution "online", and then uses these estimates to hill-climb,
efficiently, in the space of size-bounded Horn approximations, until reaching
one that is, with provably high probability, effectively at a local optimum.

(appeared in "Proceedings of the Third International Conference on
Knowledge Representation and Reasoning (KR92)", Boston, October 1992.)
[Expanded from "Learning Near-Optimal Horn Approximations", in 
 "AAAI Spring Symposium on Knowledge Assimilation", Stanford, March 1992.]
{ horn.ps }
------------------------------------------------------------------------------

7.		       Probabilistic Hill-Climbing
	   William W. Cohen, Russell Greiner and Dale Schuurmans

Many learning tasks involve searching through a discrete space of performance
elements, seeking an element whose future utility is expected to be high.
As the task of finding the global optimum is often intractable, many practical
learning systems use simple forms of hill-climbing to find a locally optimal
element.  However, hill-climbing can be complicated by the fact that the
utility value of a performance element can depend on the distribution of
problems, which typically is unknown.  This paper formulates the problem of
performing hill-climbing search in settings where the required utility values 
can only be estimated on the basis of their performance on random test cases.
We present and prove correct an algorithm that returns a performance element
that is arbitrarily close to a local optimum with arbitrarily high probability.

(appeared in "Computational Learning Theory and Natural Learning Systems",
Volume 2, MIT Press, 1993.)
{ clnl-phc.ps }
------------------------------------------------------------------------------


6.	Measuring and Improving the Effectiveness of Representations
	         Russell Greiner  and  Charles Elkan

This report discusses what it means to claim that a representation is an
effective encoding of knowledge.   We first present dimensions of merit for
evaluating representations, based on the view that usefulness is an external
property, and is necessarily relative to a specified task.  We then provide
methods (based on results of mathematical statistics) for reliably measuring
effectiveness empirically, and hence for comparing different representations.
We also discuss weak but guaranteed methods of improving inadequate
representations.  Our results are an application of the ideas of formal
learning theory to concrete knowledge representation formalisms.

(appeared in "Proceedings of the Twelfth International Joint Conference on
Artificial  Intelligence (IJCAI91)", August 1991)
{ ijcai-repn.ps }
------------------------------------------------------------------------------

5.	Probably Approximately Optimal Derivation Strategies
		  Russell Greiner  and  Pekka Orponen

An inference graph can have many "derivation strategies", each a particular
ordering of the steps involved in reducing a given query to a sequence of
database retrievals.  An "optimal strategy" for a given distribution of
queries is a complete strategy whose "expected cost" is minimal, where the
expected cost depends on the conditional probabilities that each requested
retrieval succeeds, given that a member of this class of queries is posed.
This paper describes the PAO algorithm that first uses a set of training
examples to approximate these probability values, and then uses these
estimates to produce a "probably approximately optimal" strategy --- i.e.,
given any $\epsilon, \delta>0$, PAO produces a strategy whose cost is within
$\epsilon$ of the cost of the optimal strategy, with probability greater than
$1-\delta$.  This paper also shows how to obtain these strategies in time
polynomial in $1/\epsilon$, $1/\delta$ and the size of the inference graph,
for many important classes of graphs.

(appeared in "Proceedings of the Second International Conference on Knowledge
Representation and Reasoning (KR91)", April 1991)
{ pao_kr91.ps }
------------------------------------------------------------------------------

4.		Classical and Logic-Based Dynamic Observers
		   P.E. Caines, R. Greiner and S. Wang

This report formulates the state estimation problem for a finite
partially observed input-state-output automaton in terms of an 
"observer automaton" whose nodes each correspond to the set of states
consistent with a particular sequence of observations.  It next provides
several complexity results about these observers --- bounding their
convergence time and size.  The final part shows how to encode these observers
in predicate calculus, and illustrates some of the advantages of this
approach.

(appeared in "IMA Journal on Control and Information", 8 (45--80), 1991.)
------------------------------------------------------------------------------

3.   Finding Optimal Derivation Strategies in Redundant Knowledge Bases
			   Russell Greiner

A backward chaining process uses a collection of rules to reduce a given goal
to a sequence of data-base retrievals.  A "derivation strategy" is an
ordering on these steps, specifying when to use each rule and when to perform
each retrieval. Given the costs of reductions and retrievals, and the {\em a
priori} likelihood that each particular retrieval will succeed,  one can
compute the expected cost of any strategy, for answering a specific query from
a given knowledge base. Smith \cite{ControlBC} presents an algorithm that
finds the minimal cost strategy in time (essentially) linear in the number of
rules, for any disjunctive, irredundant knowledge base.  This paper proves
that the addition of redundancies renders this task NP-hard.  Many
Explanation-Based Learning systems work by adding in redundancies; this shows
the complexities inherent in their task. 

(appeared in "Artificial Intelligence", 50:1 (1991) 95-115.)
{ redund.ps }
------------------------------------------------------------------------------

2.		A Distributed Plan Verifier
		Yan Xiao and Russell Greiner

This paper describes the architecture of an efficient plan verifier that can
first detect faults in a planner's plans and use these observed errors to
identify possible problems in the planner's knowledge base and suggest
appropriate corrections.

(appeared in "Proceedings of Third UNB Artificial Intelligence Workshop",
Oct 1990.)
{ distrib.ps }
------------------------------------------------------------------------------
1.	On the Sample Complexity of Finding Good Search Strategies
		  Russell Greiner  and  Pekka Orponen

A satisficing search problem consists of a set of probabilistic experiments to
be performed in some order, without repetitions, until a satisfying
configuration of successes and failures has been reached.  The cost of
performing the experiments depends on the order chosen. Earlier work has
concentrated on finding optimal search strategies in special cases of this
model, such as search trees and and-or graphs, when the cost function and the
success probabilities for the experiments are given.  In contrast, we study
the complexity of "learning" an approximately optimal search strategy when
some of the success probabilities are not known at the outset.  Working in the
fully general model, we show that if $n$ is the number of unknown
probabilities, and $C$ is the maximum cost of performing all the experiments,
then 
     2(\frac{nC}{\epsilon})^2 \ln \frac{2n}{\delta}
trials of each undetermined experiment are sufficient to identify, with
confidence $1-\delta$, a search strategy whose cost is within $\epsilon$ of
the optimal.

(appeared in "Proceedings of the Third Annual Workshop on Computational
Learning Theory", Aug 1990.)
{ colt_search.ps }

------------------------------------------------------------------------------