Projects (Russ Greiner)

Mixture Using Variance

Some of today's best classification results are obtained by combining base classifiers to produce a single label for an instance. This paper considers a number of ways of learning a set of simple belief networks, each either naïve Bayes or tree-augmented naïve Bayes, then combining their responses to produce a single answer to a query, using estimates of their respective variances. Our results on UCI datasets show that these ``mixture-using-variance Bayesian classifiers'' MUVs work effectively, especially when the different classifiers were learned using balanced bootstrap samples and when their results are combined using James-Stein shrinkage. We also show further improvement by learning a set of these composite MUVs, and then for each query returning the answer produced by the single MUV that has the smallest total variance -- a form of instance-specific model selection.

Budgeted Learning

Aloak Kapoor; Dan Lizotte; Omid Madani

There is often a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, can only see the data that it has `purchased' during training. Our learner is allowed to sequentially specify which particular test to run on which specific individual, until exhausting the budget. One obvious option is the simple "round robin" approach: run all tests on each individual, for as many individuals as possible. (So if the budget is $200 and there are 10 tests, each costing $5, this method would run all 10 tests on 4 individuals.) Alternatively, the learner could decide which test on which individual sequentially, based on the results of the prior tests --- eg, first run test#1 on patient#1 then, if that returns "+", run test#2 on patient#8 (otherwise run test#7 on patient#5), and so forth, until spending all $200. In any case, after collecting $200-worth of data, the learner then uses this accumulated data to produce a classifier.

Our preliminary results confirm that the standard "round robin" performs significantly worse than many other algorithms in that these alternatives lead to better classifiers. Our novel "biased-robin" algorithm appears the best.

We plan to continue investigating this budgeted learning task, both theoretically (to better understand the challenges of this task, towards determining which algorithms are likely to be effective) and empirically, exploring diverse application domains, from medical- and bio-informatics, to software engineering (deciding which tests to run to debug a class of software systems).

Learning Belief Net Classifiers

Wei Zhou; Yuhong Guo; Jie Cheng; Bin Shen; Xiaoyuan Su

While (Bayesian) belief networks (BNs) are generative models, capable of modeling a joint probability distribution over a set of variables, they are typically used discriminatively for some classification task --- eg, to predict the probability of some disease, given some evidence about the patient. We are currently exploring ways to learn an effective BN-based classifier from a datasample.

In general, learning a BN involves first finding an appropriate structure (which identifies which variables depends which others), then finding the parameters for this structure. Our existing results deal with each of these separately: the ELR algorithm that uses gradient descent to produce the parameters that optimal conditional likelihood, and a "Biases^2 + Variance" approach that identifies a good structure. We plan to explore ways to combine these approaches, as well as experiment with other approaches, such as maximizing margins.

Collaborative Filtering, using Hierarchical Probabilistic Relational Models

Jack Newton

Personlized recommender systems, which recommend specific products (eg books, movies) to individuals, have become very prevalent --- see the success of widely used systems like Amazon.com's book recommender and Yahoo!'s LAUNCHcast music recommender system. The challenge faced by these system is predicting what each individual will want.

Many of these systems use collaborative filtering (CF): If person P1 likes many of the same things that person P2 likes, and if P2 liked object O7 (say "StarWars"), then we might guess that P1 will also like O7. A pure CF is unable to use information about P1 (eg, "teenage male") nor about O7 (eg, "SciFi, Action"). By contrast, a content based recommender could use such information, which would allow it to find rules of the form "teenage males typically like SciFi movies"; however such systems would be unable to explicitly use information about colleagues.

We are developing a system that can use both types of information --- collaborative and content --- to make predictions about a new pair: whether P will like O. This will be within a single framework, hierarchical Probabilistic Relational Models (hPRMs), which provides a solid statistical framework. In general, Probabilistic Relational Models (PRMs) extend standard Bayesian Belief Nets by dealing with entire databases rather than tuples; hPRMs extend PRMs by automatically arranging the set of values for an attribute into a hierarchy (eg, a single term representing either "SciFi" or "Action" for a movie's Genre). A hPRM can then use this hierarchical values when determining dependencies; eg, the "Score" for how much person P will like object O can depend on properties of P and of O. Here, if O is in this "SciFi or Action" genre, then the score can depend on the P's gender. By contrast, if O is in the "ForeignFilm" genre, then the score can depend on the P's education.

Our preliminary results show that standard PRMs are competitive with the best recommenders, and that hPRMs strictly dominate PRMs. We plan to continue developing ideas that extend our current hPRM, towards obtaining world-class performance for this task, then to apply these ideas to other tasks, such as learning patterns in microarrays.

Automatic Construction of Personalized Customer Interfaces

Robert Price, Gerald Haeubl, Paul Messinger

Customer interfaces, whether physical retail outlets or online storefronts, help shoppers to search the assortment of products sold by the store. The design of an effective interface requires careful tradeoffs amongst both products and navigation structures. Today's customer interfaces typically make these tradeoffs with a view to creating a single interface that is identical for all shoppers, guided by what is deemed to be most suitable for the average customer. Naturally, such an interface tends to be suboptimal for almost all shoppers.

Electronic shopping environments, however, offer the opportunity to create personalized customer interfaces with a unique store layout for each individual customer, tailored to his or her needs, preferences, and interests. We show how such personalized interfaces can be constructed automatically and in real time using a model of the shopper-interface interaction and a record of the customer's past buying behaviour. This technology will tend to be most useful in the presence of (1) a large product assortment, (2) multi-item purchases, and (3) a substantial amount of preference heterogeneity among shoppers.

The full paper introduces a sufficient set of features for representing a personalized customer interface, explains how the interface can be viewed as a dynamic language created between the user and the system, and shows how formal decision-theoretic techniques can be used to optimize this language. We illustrate the proposed technology in a typical repeated multi-item purchase setting (grocery shopping), and demonstrate how our approach can be used to optimize the online grocery shopping experience for consumers through minimizing, on a user-by-user basis, the effort each requires to fill his or her weekly shopping basket.

Learning Effective Interpretation Systems

Algorithms for effectively interpreting some visual scene.

Completed Projects

Coping with Missing Data

Most learning algorithms use completely specified training instances to produce classifiers that can classify completely specified (but unlabeled) test instances. In general, of course, the instances will not be complete: the training instances, and/or the test instances, may be "blocked", for various reasons. This research investigates learning in these contexts.

Theory Revision

Most standard learning algorithms attempt to produce a classifier (theory) from scratch. Theory revision system attempt to modify an existing theory, based on additional information.

Learning Effective Performance Systems

Many learning algorithms will first sample, to estimate the relevant distribution, then find a classifier that optimizes this distribution. Here, we present the PALO system, that interleaves hill-climbs (towards optimizing the classifier) with sampling.

Efficient Reasoning / Inference

AndOr trees, etc.