Learning Effective Performance Systems

Many learning systems search through a space of possible performance elements, seeking an element whose expected utility, over the distribution of problems, is high. As the task of finding the globally optimal element is often intractable, many practical learning systems instead hill-climb to a local optimum. Even this is problematic as the learner typically does not know the underlying distribution of problems, which is essential for computing the utility of an element, and hence for determining which element is optimal. This research addresses the task of approximating this hill-climbing search when the utility function can only be estimated by sampling. The paper presents a general algorithm, PALO, that returns an element that is, with provably high probability, essentially a local optimum; see picture. That material also summarize how PALO has been to applied to various tasks. More detailed applications include

• finding an optimally efficient search strategy, perhaps for ordering the rules in a rule-base: Efficient Strategy
• finding an optimally accurate theory (wrt set of default rules): Accurate Theory
• finding an optimally categorical (Horn) approximation to a given theory: Categorical Approximation
• finding an optimally effective set of landmarks for robot navigation: Effective Landmark Set

• The following papers contain related work
• Measuring the Effectiveness of a Representation
• Probably Approximately Optimal Search Strategies

• PALO Overview

A Probabilistic Hill-Climbing Algorithm

Artificial Intelligence, 84:1--2 (July 1996), p. 177--204.

Many learning systems search through a space of possible performance elements, seeking an element whose expected utility, over the distribution of problems, is high. As the task of finding the globally optimal element is usually intractable, many practical learning systems use hill-climbing to find a local optimum. Unfortunately, even this is problematic as the underlying distribution of problems, required to determine an element's expected utility and hence essential for comparing different elements, is typically unknown. This paper addresses the task of approximating this hill-climbing search when the utility function can only be estimated by sampling. We present an algorithm that returns an element that is, with provably high probability, essentially a local optimum. We then demonstrate the generality of this algorithm by sketching three meaningful applications, that respectively find an element whose efficiency, accuracy or completeness is nearly optimal. These results suggest approaches to solving the utility problem from explanation-based learning, the multiple extension problem from nonmonotonic reasoning and the tractability/completeness tradeoff problem from knowledge representation.

Expanded from

• "Probabilistic Hill-Climbing: Theory and Applications", Proceedings of the Ninth Canadian Conference on Artificial Intelligence (CSCSI-92), Vancouver, May 1992.

• Efficient Strategy

A Statistical Approach to Solving the EBL Utility Problem

Proceedings of the Annual National Conference on Artificial Intelligence (AAAI-92), San Jose, July 1992.

Many "learning from experience" systems use information extracted from problem solving experiences to modify a performance element PE, forming a new one PE' that can solve these and similar problems more efficiently. However, as transformations that improve performance on one set of problems can degrade performance on other sets, the new PE' is not always better than the original PE; this depends on the distribution of problems. We therefore seek the the performance element whose expected performance, over this distribution, is optimal. Unfortunately, the actual distribution, which is needed to determine which element is optimal, is usually not known. Moreover, the task of finding the optimal element, even knowing the distribution, is intractable for most interesting spaces of elements. This paper presents a method, PALO, that side-steps these problems by using a set of samples to estimate the unknown distribution, and by using a set of transformations to hill-climb to a local optimum: Given any parameters $\epsilon, \delta > 0$, PALO produces an element PE' whose expected performance is, with probability $\geq 1-\delta$, within $\epsilon$ of a local optimum. This process is based on a mathematically rigorous form of utility analysis: in particular, it uses statistical techniques to determine whether the result of a proposed transformation will be better than the original system. We also present an efficient way of implementing this learning system in the context of a general class of performance elements; and include empirical evidence that this approach can work effectively.

Short version:

• Adaptive Derivation Processes (appeared in Workshop on Knowledge Compilation and Speedup Learning, Amherst, Massachusetts, June 1993.

• the proof that computing the optimal strategy is intractable appears in

Finding Optimal Derivation Strategies in Redundant Knowledge Bases

Artificial Intelligence, 50:1 (1991) 95-115.

A backward chaining process uses a collection of rules to reduce a given goal to a sequence of data-base retrievals. A "derivation strategy" is an ordering on these steps, specifying when to use each rule and when to perform each retrieval. Given the costs of reductions and retrievals, and the a priori likelihood that each particular retrieval will succeed, one can compute the expected cost of any strategy, for answering a specific query from a given knowledge base. Smith \cite{ControlBC} presents an algorithm that finds the minimal cost strategy in time (essentially) linear in the number of rules, for any disjunctive, irredundant knowledge base. This paper proves that the addition of redundancies renders this task NP-hard. Many Explanation-Based Learning systems work by adding in redundancies; this shows the complexities inherent in their task.
for a database slant to these results:

Learning Efficient Query Processing Strategies (pdf)

Proceedings of the Eleventh Symposium on Principles of Database Systems (PODS-92), San Diego, June 1992.

A query processor QP uses the rules in a rule base to reduce a given query to a series of attempted retrievals from a database of facts. The QP's "expected cost" is the average time it requires to find an answer, averaged over its anticipated set of queries. This cost depends on the QP's "strategy", which specifies the order in which it considers the possible rules and retrievals. This paper provides two related learning algorithms, PIB and PAO, for improving the QP's strategy, ie, for producing new strategies with lower expected costs. Each algorithm first monitors the QP's operations over a set of queries, observing how often each path of rules leads to a sufficient set of successful retrievals, and then uses these statistics to suggest a new strategy. PIB hill-climbs to strategies that are, with high probability, successively better; and PAO produces a new strategy that probably is approximately optimal. We describe how to implement both learning systems unobtrusively, discuss their inherent time and space complexities, and use methods from mathematical statistics to prove their correctness. We also discuss additional applications of these approaches to several other database tasks.

Accurate Theory

Learning an Optimally Accurate Representational System

ECAI Workshop on Theoretical Foundations of Knowledge Representation and Reasoning, Springer Verlag (LNAI series), 1993.

A default theory can sanction different, mutually incompatible, answers to certain queries. We can identify each such theory with a set of related credulous theories, each of which produces but a single response to each query, by imposing a total ordering on the defaults. Our goal is to identify the credulous theory with optimal "expected accuracy" averaged over the natural distribution of queries in the domain. There are two obvious complications: First, the expected accuracy of a theory depends on the query distribution, which is usually not known. Second, the task of identifying the optimal theory, even given that distribution information, is intractable. This paper presents a method, OptAcc, that side-steps these problems by using a set of samples to estimate the unknown distribution, and by hill-climbing to a local optimum. In particular, given any error and confidence parameters \epsilon, \delta>0, OptAcc produces a theory whose expected accuracy is, with probability at least 1-\delta, within \epsilon of a local optimum.

Categorical Horn Approximation

Learning Useful Horn Approximations

Russell Greiner and Dale Schuurmans

Proceedings of the Third International Conference on Knowledge Representation and Reasoning (KR92), p. 383--392, Boston, October 1992.

While the task of answering queries from an arbitrary propositional theory is intractable in general, it can typically be performed efficiently if the theory is Horn. This suggests that it may be more efficient to answer queries using a "Horn approximation"; ie, a horn theory that is semantically similar to the original theory. The utility of any such approximation depends on how often it produces answers to the queries that the system actually encounters; we therefore seek an approximation whose expected "coverage" is maximal. Unfortunately, there are several obstacles to achieving this goal in practice: (i) The optimal approximation depends on the query distribution, which is typically not known a priori; (ii) identifying the optimal approximation is intractable, even given the query distribution; and (iii) the optimal approximation might be too large to guarantee tractable inference. This paper presents an approach that overcomes (or side-steps) each of these obstacles. We define a learning process, AdComp, that uses observed queries to estimate the query distribution "online", and then uses these estimates to hill-climb, efficiently, in the space of size-bounded Horn approximations, until reaching one that is, with provably high probability, effectively at a local optimum.

Effective Landmark Set

Learning to Select Useful Landmarks

IEEE Transactions on Systems, Man and Cybernetics, Part B, June 1996, p 437-449.

To navigate effectively, an autonomous agent must be able to quickly and accurately determine its current location. Given an initial estimate of its position (perhaps based on dead-reckoning) and an image taken of a known environment, our agent first attempts to locate a set of landmarks (real-world objects at known locations), then uses their angular separation to obtain an improved estimate of its current position. Unfortunately, some landmarks may not be visible, or worse, may be confused with other landmarks, resulting in both time wasted in searching for invisible landmarks, and in further errors in the agent's estimate of its position. To address these problems, we propose a method that uses previous experiences to learn a selection function that, given the set of landmarks that might be visible, returns the subset which can probably be found correctly, and so provide an accurate registration of the agent's position. We use statistical techniques to prove that the learned selection function is, with high probability, effectively at a local optimal in the space of such functions. This report also presents empirical evidence, using real-world data, that demonstrate the effectiveness of our approach.

Expanded from ...

• Learning to Select Useful Landmarks, Proceedings of the Annual National Conference on Artificial Intelligence (AAAI-94), Seattle, August 1994.

• Measuring the Effectiveness of a Representation

Measuring and Improving the Effectiveness of Representations

Proceedings of the Twelfth International Joint Conference on Artificial Intelligence (IJCAI91), August 1991.

This report discusses what it means to claim that a representation is an effective encoding of knowledge. We first present dimensions of merit for evaluating representations, based on the view that usefulness is an external property, and is necessarily relative to a specified task. We then provide methods (based on results of mathematical statistics) for reliably measuring effectiveness empirically, and hence for comparing different representations. We also discuss weak but guaranteed methods of improving inadequate representations. Our results are an application of the ideas of formal learning theory to concrete knowledge representation formalisms.

Probably Approximately Optimal

Probably Approximately Optimal Satisficing Strategies  (pdf)

Artificial Intelligence, 82:1-2, April 1996 .

A "satisficing search problem" consists of a set of probabilistic experiments to be performed in some order, seeking a satisfying configuration of successes and failures. The expected cost of performing the experiments depends both on the success probability of the individual experiments, and on the "search strategy", which specifies the order in which the experiments are to be performed. A strategy is optimal if its expected cost is minimal. Earlier work has provided "optimizing functions" that find optimal strategies for certain classes of search problems, when given the probabilities. This paper extends those results by addressing the complexities of identifying an approximately optimal strategy when the probability values are not known, within a fully general model. We present an algorithm PAO that first obtains estimates of the probabilities by observing a computed number of trials of each undetermined experiment, and then uses these estimates (and the proper optimization function) to identify a strategy whose cost is, with high probability, close to optimal. We also show that PAO can gather the necessary statistics while solving relevant problems.

Expanded from ...

• Probably Approximately Optimal Derivation Strategies, Proceedings of the Second International Conference on Knowledge Representation and Reasoning (KR91), April 1991.  (pdf)
• On the Sample Complexity of Finding Good Search Strategies, Proceedings of the Third Annual Workshop on Computational Learning Theory, Aug 1990.