Efficient Interpretation

Algorithms for efficiently interpreting some (visual) scene.
See Brain Tumor Analysis Project for applications of imaging techniques in the context of brain tumors; Efficient Inference for general comments about efficient inference in general, and Effective Performance for results related to effective performance (PALO), in general.


Efficient Interpretation

Efficient Interpretation Policies

Ramana Isukapalli and Russell Greiner
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Seattle, 2001.

Many imaging systems seek a good interpretation of the scene presented --- ie, a plausible (perhaps optimal) mapping from aspects of the scene to real-world objects. This paper addresses the issue of finding such likely mappings efficiently. In general, an ``(interpretation) policy'' specifies when to apply which ``imaging operators'', which can range from low-level edge-detectors and region-growers through high-level token-combination--rules and expectation-driven object-detectors. Given the costs of these operators and the distribution of possible images, we can determine both the expected cost and expected accuracy of any such policy. Our task is to find a maximally effective policy --- typically one with sufficient accuracy, whose cost is minimal. We explore this framework in several contexts, including the eigenface approach to face recognition. Our results show, in particular, that policies which select the operators that maximize information gain per unit cost work more effectively than other policies, including ones that, at each stage, simply try to establish the putative most-likely interpretation.


Efficient Car Recognition

Efficient Car Recognition Policies

Ramana Isukapalli and Russell Greiner
Proceedings of ICRA, Seoul, 2001.

Many tasks require an imaging system to identify an object, such as the type of a car; in many cases, it is critical to make this identification quickly, as well as accurately. This paper addresses the challenges of producing recognition systems that consider both of these objectives. In general, an ``(recognition) policy'' specifies when to apply which ``imaging operators'', which can range from low-level edge-detectors and region-growers through high-level token-combination--rules and expectation-driven object-detectors. Given the costs of these operators and the distribution of possible images, we can determine both the expected cost and expected accuracy of any such policy. Our task is to find a maximally effective policy --- typically one with sufficient accuracy, whose cost is minimal. We compare various ways to produce such policies in general, and show that policies that select the operators that maximize information gain per unit cost work effectively.


DP for Face Recognition

Use of Off-line Dynamic Programming for Efficient Image Interpretation

Ramana Isukapalli and Russell Greiner
Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), Acalpulco, 2003.

An interpretation system finds the likely mappings from portions of an image to real-world objects. An interpretation policy specifies when to apply which imaging operator, to which portion of the image, during every stage of interpretation. Earlier results compared a number of policies, and demonstrated that policies that select operators which maximize the information gain per cost, worked most effectively. However, those policies are myopic --- they rank the operators based only on their immediate rewards. This can lead to inferior overall results: it may be better to use a relatively expensive operator first, if that operator provides information that will significantly reduce the cost of the subsequent operators.

This suggests using some lookahead process to compute the quality for operators non-myopically. Unfortunately, this is prohibitively expensive for most domains, especially for domains that have a large number of complex states. We therefore use ideas from reinforcement learning to compute the utility of each operator sequence. In particular, our system first uses dynamic programming, over abstract simplifications of interpretation states, to precompute the utility of each relevant sequence. It does this off-line, over a training sample of images. At run time, our interpretation system uses these estimates to decide when to use which imaging operator. Our empirical results, in the challenging real-world domain of face recognition, demonstrate that this approach works more effectively than myopic approaches.


Return to Greiner's home page