Projects for Cmput 466/551

Computing Science 466/551
Topics in Machine Learning
Suggested Research Projects

* * * DRAFT [ALWAYS being revised!!] * * *

There is a lot of latitude in choosing a project but I reserve the right to veto or alter a proposed project if I think it falls too far outside the scope of the class, or is too ambitious, or too unambitious.

(For further suggestions, see also my proposed project ideas or possible 652 projects or Projects for Mitchell's Class .)

Title of previous projects: (2000, 2001, 2002, 2003, 2005, 2008)

New proposals -- Sept'2009

Slides from Nasimeh Asgarian's presentation
Slides from Barnabas Poczos' presentation

From R Goebel + O Zaiane: (slides)

The overall idea is to build user logs of how gestures are used (gestures in general, and gesture specific actions within a variety of Android phone applications), and then apply some simply classification ML techniques to the data, to try and build some abstractions of how users are using gestures on the Android phones.

Nathan Taylor completed this first phase, including developing some software extensions to Android, and an Android Market app for relaying the logs from the phones to a central server. A next phase would be to try some simple classification algorithms on example logs, and then summarize where we might look for more data.

Mike Ritter at Google is aware of the project and is supportive (he is in charge of the Android to HTC phone project at Google).

Summary, some simple ML application to gesture use on Android touch screen smart phones; we have development software, some progress and starting point, and the Google phones.

From E Ludvig

... In one of my collaborative research projects, we collect data from pigeons searching for food in a large enclosure. The pigeons are videotaped from above and then the location of their searching is hand coded by an undergraduate watching the video. Distinguishing searching behaviour from non-searching behaviour is a relatively difficult discrimination, and it is not 100% clear how people do it, even if we can reliably train undergrads to do so. Automating this classification from real-time video would be a significant achievement and a very useful piece of technology (both in terms of labour and reliability). I think this problem could easily be cast as a classification problem where the goal would be to separate out the searching behaviour, or even an unsupervised one, where the goal could be finding related clusters of pigeon behaviour. We have a bunch of (~25 or so) scored videos to share that could serve as initial project data and can generate more as needed. From Istvan Szita

The goal of the project would be to try out various representation search / feature selection / feature construction methods. The S/Z-Tetris is an ideal testbed for representation search: (1) data generation and fitness evaluation is quick and cheap, (2) the states are fully observable, that is, the information we're looking for is there for sure; (3) raw observations cannot be used directly to get good controllers; (4) hand-coded features are too weak. Relevant literature: [background on the math of Tetris] [state of the art, survey of Tetris-solvers]

From R Greiner (moved from below)

Proteome Analyst

D Szafron

) attempts to determine various aspects about a given protein sequence. The current system uses Machine Learning algorithms to learn a protein's "general function" (resp., "subcellular localization"), based on terms found in homologues; see

PA Overview poster

PA Protein Function poster

There are several sources of information that can be used to predict a protein's class: (1) "homology"s based on BLASTing against SwisProt; (2) the amino-acid sequence; (3) text from relevant (Medline) abstracts. This project would involve finding the best way to uses these different types of information to make a prediction. Eg: build a single classifier that takes as input the UNION of all of these features? Or build 3 different classifiers, then use some mixture model to combine them? (Note that the PA team has already built and honed such classifiers!) Or use something more like stacking? Or ...

We are also looking for a team to extend some recent results related to predicting proteins in a signalling pathway --- trying some new learners on some structured data. Let me know if you are interested.

From R Greiner

"sample" complexity of learning, wrt number of probes required to learn (not full instances)
dealing with "noisy" teachers -- ie, when the response to a probe might not be accurate
discrimative learning given an arbitrary belief net structure (not just naive bayes)
...

Earlier stuff.
WARNING: the remaining material is EMBARASSINGLY out of date! Please just glance thru, just to get an iea of things that might be interesting types of directions.

APPLICATION PULL:

Try to model something related to YOUR areas -- wrt poultry, business models, ...

Robust portfolio optimization with return predictability (Yuxi Li)
investigate an extension of the modern portfolio theory considering parameters (return and variance) and model uncertainties, with new approaches for return prediction.
The techniques involved in the project would be quadratic programming and Kalman filtering.
It would basically be based on the following two papers:
- Lorenzo Garlappi, Raman Uppal and Tan Wang, Portfolio Selection with Parameter and Model Uncertainty: A Multi-Prior Approach, Review of Financial Studies, 20(1) 2007, pp 41-81
- Oleg Rytchkov, Filtering Out Expected Dividends and Expected Returns, working paper, 2007. Presented in Western Finance Association 2007.
Automated brain cancer prognosis (Matt Brown)
Use graphical models (conditional random fields, Markov random fields, belief nets) to predict survival times (or related variables) of patients with brain cancer based on patient history, magnetic resonance imaging (MRI) scans of patient brains, and histological data. Our Brain Tumour Analysis Project maintains a database with all the data you would need for this project.

NetFlix Prize (B Hoehn)
Well... Netflix DONE!
- Apply HPRMs to the dataset (could use external-data data to fill in movie characteristics, but the method would need to be modified to not have user characteristics).
- We're currently applying matrix factorization to the entire matrix (which is extremely sparse) and using the approximated result to look up predictions - maybe for each query we could identify a more dense, much much smaller submatrix and perform matrix factorization on it instead.
- The paper "Improving Accuracy of Recommender System by Clustering Items Based on Stability of User Similarity" claims that users may have very similar preferences on some categories of movies but be very different on other categories. When generating a prediction for a query, we should first identify the category of the movie (ie, the cluster of movies it belongs to) and then compute other user's similarities over this category (rather than over all movies). Not sure their method of finding movie clusters is practical on the Netflix dataset, but the overall idea sounds cool.
  One modification that could easily be tried is the following: when calculating the user similarities for a query, multiply the ratings in common (ratings by users who rated the query movie on other movies the query user rated) by the similarities (scaled to be nonnegative) of the movies in common to the query movie.
- There are other methods for doing matrix factorization (eg. Max Margin MF) which we've yet to implement, and which might perform better or comparably to our current methods.
- We were never able to get good results from Naive Bayes (I remember RMSEs in the neighbourhood of 1.06), although a very simple Bayesian-based predictor gets an RMSE of 0.997. So there's probably major enhancements that could be added to get much better results with an NB approach.
Another field, Protein-Protein Interaction, may be more complicated, as it relates to pairs of proteins (rather than just a property of a single protein); moreover, we anticipate the abstracts in BIND may be useful.

One goal of PolyomX project is patient-specific treatments. To do this, they are collecting comprehensive molecular and clinical data from patients undergoing cancer treatment, including SNPs, microarray, metabonomics (urinalysis), ... together with patient treatment information and outcome. The machine learning challenge, then, is to determine which factors determine whether or not a treatment will be effective for a specific patient.
We now have various data sets that can be examined:
- SNPs of patients who have prostate cancer, vs control
- SNPs of prostate cancer patients who had adverse reaction to treatments, vs control
- microarray of patients with breast cancer who responded to treatments, vs control
- ...
We have some ideas for building a biclustering algorithm for analyzing microarray data, and need help in further designing and validating this approach. (We also have lots of clinical data!)

Related to WebIC:
The vision of the WebIC project is to have a computer application monitor users and proactively suggest resources from the Internet that would help the user with his or her current activity. We have completed a working prototype that uses machine learning techniques to implement various functions. Using this prototype we have gathered a significant data set from 140 users working on day to day tasks. We have a few general ideas about how to improve the performance of WebIC but they would need some clever people to implement. We have sketched a few of these ideas below:
Session Segmentation
- User's work on many different tasks from day to day. The user might want information on C++ member function pointers while working on one project and then want information on user interface guidelines when working on a different project. Systems that can identify when the user switches tasks could improve the accuracy of the resources they recommend.
- We have data that records which sites the user has visited, what actions the user took and the end of his or her session. Could you figure out features and a machine learning algorithm to identify the end of a session on unlabeled data? Could you figure out how to identify the beginning of a session?
- Considerable existing infrastructure exists to support this project
Session Structure Identification
- We have extensive logs of user's browsing behaviour. This includes both the sites the user visits and the actions the user takes at those sites (following a link, backing up to previous page, etc.). We hypothesize that the user's actions at the sites the user visits can tell us about the attitude of the user towards the site. A user who repeatedly visits sites about chromoly 4130 aircraft tubing is problem interested in this topic.
- Can you find patterns in this dataset that would indicate which sites.
- Could you craft a set of features and suggest a machine learning paradigm to identify sites that interest users?
- Considerable existing infrastructure exists to support this project
Search Query Construction:
- We often send queries to search engines. We have data to suggest which queries are best, but we need a paradigm for learning these queries.
- The problem is, we want to learn a specific sequence of words as the order of words in a query can change the outcome of the query.
- We want to learn the behaviour of search engines in response to these queries
Related to the WWW
- Organizing pages within a website... (example)
- Search Result Clustering: Build a web search engine that organizes/clusters the websites, like Vivisimo, NorthernLight, seeking a clustering that is likely to help the user find the information s/he needs. (The system would have first learned which clustering-parameters is best.)
- Build a tool that will help a webmaster to reconfigure his/her web-site, to help future users "get where they want to go", quickly. This requires first learning characteristics of users and their navagational patterns.
PhysioNet: the research resource for complex physiologic signals
Warning: you should have a background in signal processing before considering these challenge problems.

wrt Games
- Adaptive interface to some interactive game (perhaps EA's FIFA code?)
- Learn to play Blokus
- Backgammon (w/M Buro )
  ... heuristic BG endgame and doubling cube algorithms
  replace neural networks by pattern-based evaluators (as used Logistello for Othello)

The iNRIIS project (w/V Bulitko ) has a number of intriguing "real world" problems related to classifying forestry data (that is, finding those pixels in the image that belong to a certain tree type).Eg, compare "two-level" vs "one-level" tree detection (as well as camparing different classifier techniques if time permits). Two level (and higher level) classificiation means that the classifier gets as input the labeling results from a lower-level classifier. Questions to answer: whether or not this improves classification performance, and how much and why. (While this does not necessarily have to be in the context of tree detection, we have labeled images in this case, and you can compare your results against the existing system (which uses template matching and hierarchical Markov models) and your results may eventually get incorporated into the system!

Active image interpretation: say you want to detect multiple objects in an image, or a single object that may have multiple parts. The way you order the sequence of detectors may affect detection accuracy and the time it takes to detect the object. Learn the best way to sequence the detectors (perhaps as a function of the features of the image). See Effective Interpretation Systems, 2000
datamining of consumer data

Personalized Customer Interface Projects

When users come back to the same application or web-site repeatedly, there is the potential to modify or personalize the interface to improve its usefulness to the user. For instance, in a grocery shopping domain, the interface could be optimized to make the type of products the user prefers easier to shop for. There are obvious categories such as economy vs. luxury, organic vs. inorganic, etc. One can also imagine subtler categories. Perhaps the consumer doesn't like anything with beans in it.

We are interested in modeling the temporal dynamics of consumer purchases. What do the past purchases and their inter-purchase times tell us about the probability of a purchase today?
Can you integrate the purchase quantities into this problem?

text classification
face recognition
....

EDUCATIONAL:

AIxploratorium : eg, helping to design webpages that communicate the often subtle ideas related to machine learning -- eg, neural nets, reinforcement learning, PAC learning, ...

TECHNOLOGY PUSH

The Mixture using Variance paper describes a way to use "variance" values to combine difference bayesian-net-based classifiers, learned from different subsets of instances.
- Can this analysis be applied to combine different subsets of features?
- In general, can we use this analysis for selecting features, of a single classifier... eg select only the features that contribute least to the total variance.
- Can we use this analysis to determine the change of "Good decision, bad outcome"? And perhaps for inbalanced data? See KDD Cup 2006.
- Can this help in active learning ... related to Active Learning with Statistical Models
This technology provides a way to compute the variance of a class, produced by a classifier based on a bayesian-net. Are there related ways to produce the variance of the response from other types of classifiers -- eg, nearest neighbor, or decision trees, or linear separators (eg, SVM), ...?

There are lots of projects related to Active/budgeted learning:
* Does active learning really work ? Carefully re-analyse the results in the literature, to see if they hold up... Ie, replicate the tests to see if the claims are as general as the papers paint them.
* how to actively learn a descriminative classifier from relational non-iid data (eg, for segmenting an image)?
* how to actively learn a generative model;
...
Combining experts to improve ROC curves
The well-known "mixture of experts" approach is designed to improve the likelihood (of future predictions). When data is imbalanced, or if there are different penalty values for different types of mistakes (eg false positive vs false negative), it is appropriate to measure performance based on ROC curves. How to combine predictions from different experts, when goal is improving ROC values?
Put the Cost-Curve/ROC-Curve tool through its paces. Eg, take some famous study, such as "does pruning improve decision trees", and redo it, using cost curves instead of accuracy for evaluation.

Most classifiers are designed to optimize accuracy. For some tasks, it is important to optimize some other function, such as F-measure. How to motify existing algorithms for learning some structure (Decision Tree learning? Linear classifiers? SVM? ...) to optimize such a measure.
Eg... how to design an learner to return an SVM model that has the highest possible accuracy (over training data), subject to constraint that the precision must be at least 90%?
Learning BeliefNets from imbalanced data
Most algorithms for learning belief nets try to maximize the likelihood of the data. This is problematic when the data is imbalanced. What is the best way to learn a belief net structure that will lead to an effective classifier, from imbalanced data?
Explore techniques to avoid overfitting and address "false discovery rate" problems
Cross-validation, Permutation tests
Explore the effects of boolean combinations of features (say simple conjunctions and disjunctions) on classification performance. Researchers have observed that this technique of building more complicated features often improves accuracy significantly.
Learn where to restart a search algorithm Consider a "rapid restart" algorithm, that starts a hill-climber from several places within a search space. After several previous climbs, can we learn to characterize which locations are most likely to be fruitful? (Note we can also learn from previous search problems, as well as the current search problem.)
Different approaches to learning decision trees: Axis Parallel or Oblique Partitions

D. Heath, S. Kasif, and S. Salzberg. Induction of Oblique Decision Trees. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1002-1007, 1993.
S. K. Murthy, S. Kasif, S. Salzberg, and R. Beigel. OC1: Randomized Induction of Oblique Decision Trees. Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 322-327), 1993.

Different approaches to learning decision trees: "decision stumps" versus complete trees

Rob Holte, Very Simple Classification Rules Perform Well on Most Commonly Used Datasets, Machine Learning, 11(63-91), 1993
D. Dobkin, D. Gunopoulos, and S. Kasif, Computing optimal shallow decision trees, Proc. International Workshop on Mathematics in Articial Intelligence, 1996.
J. Ross Quinlan, C4.5: Programs for machine learning, Morgan Kaufmann, 1993,

Using Prior Knowledge (Theory Revision)

(Belief revision vs theory revision)
- M. Dalal, Investigations into a theory of knowledge base revision: Preliminary report, Proc. AAAI, pages 475--479, 1988.
- N. Friedman and J. Halpern, Belief revision: A critique, Proc. 6th Inter. Conf. on Principles of Knowledge Representation and Reasoning, pp. 421-- 431, 1996.
P. Langley, G. Drastal, B. Rao and R. Greiner, Theory Revision in Fault Hierarchies, Proc. Fifth International Workshop on Principles of Diagnosis (DX-94), 1994,
R. Greiner, The complexity of theory revision, Artificial Intelligence, 1999.
R. Greiner, The Complexity of Revising Logic Programs, Journal of Logic Programming, 40:2-3, (Aug-Sept 1999), p. 273--298.
Craw/Sleeman, Automating the Refinement of Knowledge-Based Systems, Proceedings of ECAI 90, 1990.
Mooney, A Preliminary PAC Analysis of Theory Revision, Third Annual Workshop on Computational Learning Theory and Natural Learning Systems (CLNL-92), 1994.
Baffes/Mooney, Symbolic Revision of Theories with M-of-N Rules, IJCAI93, 1993.
Wogulis/Pazzani, A Methodology for Evaluating Theory Revision Systems: Results with Audrey II, IJCAI, 1993.

Getting gradient descent to work

adjusting learning rate, using line-search...
conjugate gradient? vs "momentum"

Relations between decision tree algorithms and boosting/weak-learning:

Kearns-Mansour: On the Boosting Ability of Top-Down Decision Tree Learning Algorithms , in STOC '96, pp.459-468.
See also Dietterich-Kearns-Mansour: Applying the Weak Learning Framework to Understand and Improve C4.5 in ICML '96, pp. 96-104.

Models/algorithms for learning probabilistic functions:

Kearns-Schapire: Efficient Distribution-free Learning of Probabilistic Concepts Journal of Computer and System Sciences 48(3), pp. 464-497.

Ways to learn Belief Net Parameters (CPtables) -- Discriminative vs Generative:

APN: Binder/Koller/Russell/Kanazawa, Adaptive probabilistic networks with hidden variables, Machine Learning, 29:213--244, 1997.
Jordan, Why the logistic function? A tutorial discussion on probabilities and neural networks, 1995, citeseer.nj.nec.com/jordan95why.html
R. Greiner, A. Grove and D. Schuurmans, Learning Bayesian Nets that Perform Well, Proc. Thirteen Conference on Uncertainty in Artificial Intelligence (UAI97), August 1997.

Ways to learn Belief network (structure):

Heckerman - Max Likelihood
Spirtes, Conditional Independence

Bayesian Models of Belief Nets
- T. Van Allen, R. Greiner and P. Hooper, Bayesian Error-Bars for Belief Net Inference, Proc. Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01), Aug 2001.
- Heckerman, D., Geiger, D., and Chickering, D., Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning, 20(3):197--243.
- Koller, Friedman, "Bayesian about learning Bayesian Nets"
Hidden Markov Models
Probabilistic Relational Models
- N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, Learning Probabilistic Relational Models
- L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, Learning Probabilistic Relational Models (book chapter)
- L. Getoor, N. Friedman, D. Koller, and B. Taskar, Learning Probabilistic Models of Relational Structure
- E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller, Rich Probabilistic Models for Gene Expressions
Online portfolio selection:

Blum-Kalai: Universal Portfolios With and Without Transaction Costs , in COLT '97, pp. 309--313.
Cover: "Universal portfolios", Math. Finance 1(1):1-29, January 1991.
Helmbold-Schapire-Singer-Warmuth: On-line portfolio selection using multiplicative updates , in ICML '96.

Extensions to VC-dimension: bounds that depend on the "amount" of separation acheived (aka the "margin"):

Bartlett: The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network , in NIPS '96.
Schapire-Freund-Bartlett-Lee: Boosting the margin: A new explanation for the effectiveness of voting methods , in ICML '97.

Extensions to VC-dimension: bounds that depend on the input distribution and "error shells":

Haussler-Kearns-Seung-Tishby: Rigorous Learning Curve Bounds from Statistical Mechanics , Machine Learning Vol. 25, pp. 195-236, 1997.

Extensions to VC-dimension: bounds based on metric entropy:

Haussler-Opper: Metric Entropy and Minimax Risk in Classification , Lecture Notes in Computer Science: Studies in Logic and Computer Science Vol. 1261, 212-235 (1997)

Exploring an unknown environment:

Awerbuch-Betke-Rivest-Singh: Piecemeal Graph Exploration by a Mobile Robot , in COLT '95. See also notes from Ron Rivest's class .

How to handle missing (training)data.

D. Schuurmans and R. Greiner. Learning default concepts, Proc. Tenth Canadian Conference on Artificial Intelligence (CSCSI-94), 1994.
R. Greiner, A. Grove, and A. Kogan, Knowing what doesn't matter: Exploiting the omission of irrelevant data, Artificial Intelligence, December 1997.
Jordan/Jacobs, "EM and friends"

On-Line Learning, vs Batch Learning
Formal Analysis: Gold-style, PAC, Bayesian, ...
Considering costs when performing experiments:

R. Greiner, A. Grove and D. Roth: Learning Active Classifiers, Proc. Thirteenth International Conference on Machine Learning (IMLC96), July, 1996.
Turney, ... ICML

Comparing different learners

Neural Nets vs Decision Trees

Comparing Ensembles of Learners

Boosting vs Bagging
Dealing with Error-Correcting Classifiers

Reinforcement Learning, MDPs,
Support Vector Machines (Vapnik), Margins, ...
Understanding Minimum Description Length (NFL?)
Inductive Logic Programming

Computing Science 466/551 Topics in Machine Learning Suggested Research Projects