Computing Science 466/551
Topics in Machine Learning
Suggested Research Projects
* * * DRAFT [ALWAYS being revised!!] * * *
There is a lot of latitude in choosing a project
but I reserve the right to veto or alter a proposed project if I think it
falls too far outside the scope of the class,
or is too ambitious, or too unambitious.
(For further suggestions, see also my proposed
project
ideas
or
possible
652 projects
or
Projects for Mitchell's Class
.)
Title of previous projects:
(2000, 2001, 2002, 2003, 2005, 2008)
New proposals -- Sept'2009
Slides from
Nasimeh Asgarian's
presentation
Slides from
Barnabas Poczos' presentation
From R Goebel + O Zaiane:
(slides)
We have several Android HTC phones, and have completed the first phase
of a project to tackle the instrumentation of the open source Android
operating system to gather gestures use on the HTC touch screen phones.
The overall idea is to build user logs of how gestures are used
(gestures in general, and gesture specific actions within a variety of
Android phone applications), and then apply some simply classification
ML techniques to the data, to try and build some abstractions of how
users are using gestures on the Android phones.
Nathan Taylor completed this first phase, including developing some
software extensions to Android, and an Android Market app for relaying
the logs from the phones to a central server. A next phase would be
to try some simple classification algorithms on example logs, and then
summarize where we might look for more data.
Mike Ritter at Google is aware of the project and is supportive (he is
in charge of the Android to HTC phone project at Google).
Summary, some simple ML application to gesture use on Android touch
screen smart phones; we have development software, some progress and
starting point, and the Google phones.
From E Ludvig
...
In one of my collaborative research
projects, we collect data from pigeons searching for food in a large
enclosure. The pigeons are videotaped from above and then the location
of their searching is hand coded by an undergraduate watching the
video. Distinguishing searching behaviour from non-searching behaviour
is a relatively difficult discrimination, and it is not 100% clear how
people do it, even if we can reliably train undergrads to do so.
Automating this classification from real-time video would be a
significant achievement and a very useful piece of technology (both in
terms of labour and reliability). I think this problem could easily be
cast as a classification problem where the goal would be to separate
out the searching behaviour, or even an unsupervised one, where the
goal could be finding related clusters of pigeon behaviour.
We have a bunch of (~25 or so) scored videos to share that could serve
as initial project data and can generate more as needed.
From Istvan Szita
Tetris seems to be a well-studied game: although it is NP-hard and all
strategies will fail eventually with probability one, the best
controllers are able to make 100millions of steps. This seems like a
lot, but it turns out that these controllers are pretty weak in tough
situations. Consider the restricted version of Tetris where only "S" and
"Z" shaped pieces are falling. This version is much, much harder than
the original, it is bound to end in 70,000 steps, no matter how good
one's strategy is. Humans were able to make around 1000 steps, but the
best machine controllers cannot do more than 400. The reason seems to be
that the representation is not good enough: all existing machine
controllers use 20-30 simple hand-coded features to describe the game
state. The goal of the project is to improve the results on the
(restricted) Tetris game.
The goal of the project would be to try out various representation
search / feature selection / feature construction methods. The
S/Z-Tetris is an ideal testbed for representation search: (1) data
generation and fitness evaluation is quick and cheap, (2) the states are
fully observable, that is, the information we're looking for is there
for sure; (3) raw observations cannot be used directly to get good
controllers; (4) hand-coded features are too weak.
Relevant literature:
[background on the math of Tetris]
[state of the art, survey of Tetris-solvers]
From R Greiner (moved from below)
From R Greiner
There are several open THEORETIC problems related to active/budgeted learning:
- "sample" complexity of learning, wrt number of probes required to learn (not full
instances)
- dealing with "noisy" teachers -- ie, when the response to a probe might
not be accurate
- discrimative learning given an arbitrary belief net structure (not just
naive bayes)
- ...
Earlier stuff.
WARNING: the remaining material is EMBARASSINGLY out of date!
Please just glance thru, just to get an iea of things that might be
interesting types of directions.
APPLICATION PULL:
- Try to model something related to YOUR areas -- wrt poultry,
business models, ...
- Robust portfolio optimization with return predictability (Yuxi Li)
investigate an extension of the modern portfolio theory
considering parameters (return and variance) and model uncertainties,
with new approaches for return prediction.
The techniques involved in the project would be
quadratic programming and Kalman filtering.
It would basically be based on the following two papers:
-
Automated brain cancer prognosis (Matt Brown)
Use graphical models (conditional random fields, Markov random
fields, belief nets) to predict survival times (or related variables)
of patients with brain cancer based on patient history, magnetic
resonance imaging (MRI) scans of patient brains, and histological
data. Our
Brain Tumour Analysis Project
maintains a database with all the data you would need for this project.
- NetFlix Prize
(B Hoehn)
Well... Netflix DONE!
- Apply HPRMs to the dataset (could use
external-data
data to fill in
movie characteristics, but the method would need to be modified to not
have user characteristics).
- We're currently applying matrix factorization to the entire matrix
(which is extremely sparse) and using the approximated result to look up
predictions - maybe for each query we could identify a more dense, much
much smaller submatrix and perform matrix factorization on it instead.
- The paper "Improving Accuracy of Recommender
System by Clustering Items Based on Stability of User Similarity"
claims that users may have
very similar preferences on some categories of movies but be very
different on other categories. When generating a prediction for a query,
we should first identify the category of the movie (ie, the cluster of
movies it belongs to) and then compute other user's similarities
over this category (rather than over all movies).
Not sure their
method of finding movie clusters is practical on the Netflix dataset, but
the overall idea sounds cool.
One modification that could easily be tried is the following:
when calculating the user similarities for a query, multiply the ratings
in common (ratings by users who rated the query movie on other movies the
query user rated) by the similarities (scaled to be nonnegative) of the
movies in common to the query movie.
- There are other methods for doing matrix factorization (eg. Max Margin
MF) which we've yet to implement, and which might perform better or
comparably to our current methods.
- We were never able to get good results from Naive Bayes (I remember
RMSEs in the neighbourhood of 1.06), although a very simple Bayesian-based
predictor gets an RMSE of 0.997. So there's probably major enhancements
that could be added to get much better results with an NB approach.
Another field, Protein-Protein Interaction, may be more complicated,
as it relates to pairs of proteins (rather than just a property of a single
protein);
moreover, we anticipate the abstracts in
BIND may be useful.
- One goal of PolyomX
project is patient-specific treatments. To do this, they are collecting
comprehensive molecular and clinical data from patients undergoing cancer
treatment, including SNPs, microarray, metabonomics (urinalysis), ... together
with patient treatment information and outcome. The machine learning challenge,
then, is to determine which factors determine whether or not a treatment
will be effective for a specific patient.
We now have various data sets that can be examined:
- SNPs of patients who have prostate cancer, vs control
- SNPs of prostate cancer patients who had adverse reaction to treatments,
vs control
- microarray of patients with breast cancer who responded to treatments, vs control
- ...
We have some ideas for building a biclustering algorithm for analyzing
microarray data, and need help in further designing and validating this approach.
(We also have lots of clinical data!)
- Related to WebIC:
The vision of the WebIC project is to
have a computer application monitor users and proactively suggest
resources from the Internet that would help the user with his or her
current activity. We have completed a working prototype that uses
machine learning techniques to implement various functions. Using
this prototype we have gathered a significant data set from 140 users
working on day to day tasks. We have a few general ideas about how to
improve the performance of WebIC but they would need some clever
people to implement. We have sketched a few of these ideas below:
Session Segmentation
User's work on many different
tasks from day to day. The user might want information on C++ member
function pointers while working on one project and then want
information on user interface guidelines when working on a different
project. Systems that can identify when the user switches tasks
could improve the accuracy of the resources they recommend.
We have data that records which
sites the user has visited, what actions the user took and the end
of his or her session. Could you figure out features and a machine
learning algorithm to identify the end of a session on unlabeled
data? Could you figure out how to identify the beginning of a
session?
Considerable existing
infrastructure exists to support this project
Session Structure Identification
We have extensive logs of user's browsing behaviour. This
includes both the sites the user visits and the actions the user
takes at those sites (following a link, backing up to previous page,
etc.). We hypothesize that the user's actions at the sites the user
visits can tell us about the attitude of the user towards the site.
A user who repeatedly visits sites about chromoly 4130 aircraft
tubing is problem interested in this topic.
Can you find patterns in this dataset that would indicate
which sites.
Could you craft a set of features and suggest a machine
learning paradigm to identify sites that interest users?
Search Query Construction:
- We often send queries to search engines.
We have data to suggest which queries are best, but
we need a paradigm for learning these queries.
The problem is, we want to learn a
specific sequence of words as the order of words in a query can
change the outcome of the query.
We want to learn the behaviour of
search engines in response to these queries
- Related to the WWW
- Organizing pages within a website...
(example)
- Search Result Clustering:
Build a web search engine that organizes/clusters the websites, like
Vivisimo,
NorthernLight,
seeking a clustering that is likely to help the user find the information s/he
needs.
(The system would have first learned which clustering-parameters is best.)
- Build a tool that will help a webmaster to reconfigure his/her web-site,
to help future users "get where they want to go", quickly.
This requires first learning characteristics of users and their navagational
patterns.
- PhysioNet: the research resource for complex physiologic signals
Warning: you should have a background in signal processing before
considering these challenge problems.
- wrt Games
- The iNRIIS
project (w/V Bulitko)
has a number of intriguing "real world" problems related to classifying
forestry data
(that is, finding those pixels in the image that belong to a certain tree type).Eg, compare "two-level" vs "one-level" tree detection (as well as camparing
different classifier techniques if time permits).
Two level (and higher level) classificiation means
that the classifier gets as input the labeling results from a lower-level
classifier. Questions to answer: whether or not this improves
classification performance, and how much and why.
(While this does not
necessarily have to be in the context of tree detection,
we have labeled images in this case, and you can compare your results against
the existing system (which uses template matching and hierarchical Markov
models) and your results may eventually get incorporated into the system!
-
Active image interpretation: say you want to detect multiple objects in an
image, or a single object that may have multiple parts. The way you order
the sequence of detectors may affect detection accuracy and the time it
takes to detect the object. Learn the best way to sequence the detectors
(perhaps as a function of the features of the image).
See
Effective Interpretation Systems, 2000
- datamining of consumer data
Personalized Customer Interface Projects
When users come back to the same application or web-site repeatedly,
there is the potential to modify or personalize the
interface to improve its usefulness to the user. For instance, in a
grocery shopping domain, the interface could be optimized to make the
type of products the user prefers easier to shop for. There are
obvious categories such as economy vs. luxury, organic vs. inorganic,
etc. One can also imagine subtler categories. Perhaps the consumer
doesn't like anything with beans in it.
We are interested in modeling the temporal dynamics of
consumer purchases. What do the past purchases and their inter-purchase
times tell us about the probability of a purchase today?
Can you integrate the purchase quantities into this problem?
- text classification
- face recognition
- ....
EDUCATIONAL:
AIxploratorium
: eg, helping to design webpages that communicate the often subtle ideas
related to machine learning -- eg, neural nets, reinforcement learning, PAC
learning, ...
TECHNOLOGY PUSH
- The Mixture
using Variance paper describes a way to use "variance" values
to combine difference bayesian-net-based
classifiers, learned from different subsets of instances.
- Can this analysis be applied to combine different subsets of features?
- In general, can we use this analysis for selecting features,
of a single classifier... eg select only the features that contribute
least to the total variance.
- Can we use this analysis to determine the change of "Good decision, bad outcome"?
And perhaps for inbalanced data?
See KDD Cup 2006.
- Can this help in active learning ... related to Active
Learning with Statistical Models
This technology provides a way to compute the variance of a class,
produced by a classifier based on a bayesian-net.
Are there related ways to produce the variance of the response from other
types of classifiers -- eg, nearest neighbor, or decision trees, or linear
separators (eg, SVM), ...?
- There are lots of projects related to Active/budgeted learning:
* Does active learning really work ?
Carefully re-analyse the results in the literature, to see if they hold up...
Ie, replicate the tests to see if the claims are as general as the papers paint them.
* how to actively learn a descriminative classifier from relational non-iid
data (eg, for segmenting an image)?
* how to actively learn a generative model;
...
- Combining experts to improve ROC curves
The well-known "mixture of experts" approach is designed to improve
the likelihood (of future predictions).
When data is imbalanced, or if there are different penalty values for
different types of mistakes (eg false positive vs false negative),
it is appropriate to measure performance
based on ROC curves.
How to combine predictions from different experts, when goal is improving ROC values?
- Put the Cost-Curve/ROC-Curve tool through its paces.
Eg, take some famous study, such as "does pruning improve decision trees",
and redo it, using cost curves instead of accuracy for evaluation.
-
Most classifiers are designed to optimize accuracy.
For some tasks, it is important to optimize some other function, such as
F-measure.
How to motify existing algorithms for learning some structure
(Decision Tree learning? Linear classifiers? SVM? ...)
to optimize such a measure.
Eg... how to design an learner to return an SVM model that has the highest
possible accuracy (over training data), subject to constraint that the
precision must be at least 90%?
- Learning BeliefNets from imbalanced data
Most algorithms for learning belief nets try to maximize
the likelihood of the data.
This is problematic when the data is imbalanced.
What is the best way to learn a belief net structure that will lead to an effective
classifier, from imbalanced data?
- Explore techniques to avoid overfitting and address "false discovery rate"
problems
Cross-validation, Permutation tests
-
Explore the effects of boolean combinations of features (say simple
conjunctions and disjunctions) on classification performance.
Researchers have observed that this technique of building more
complicated features often improves accuracy significantly.
- Learn where to restart a search algorithm
Consider a "rapid restart" algorithm, that starts a hill-climber from several
places within a search space.
After several previous climbs, can we learn to characterize which locations
are most likely to be fruitful?
(Note we can also learn from previous search problems, as well as the current
search problem.)
- Different approaches to learning decision trees:
Axis Parallel or Oblique Partitions
- D. Heath, S. Kasif, and S. Salzberg. Induction of Oblique Decision
Trees. Proceedings of the Thirteenth International Joint Conference on
Artificial Intelligence, pp. 1002-1007, 1993.
- S. K. Murthy, S. Kasif, S. Salzberg, and R. Beigel. OC1: Randomized
Induction of Oblique Decision Trees. Proceedings of the Eleventh
National Conference on Artificial Intelligence, pp. 322-327), 1993.
- Different approaches to learning decision trees: "decision stumps"
versus complete trees
- Rob Holte, Very Simple Classification Rules Perform Well on Most
Commonly Used Datasets, Machine Learning, 11(63-91), 1993
- D. Dobkin, D. Gunopoulos, and S. Kasif,
Computing optimal shallow decision trees, Proc. International
Workshop on Mathematics in Articial Intelligence, 1996.
- J. Ross Quinlan, C4.5: Programs for machine learning, Morgan
Kaufmann, 1993,
- Using Prior Knowledge (Theory Revision)
- (Belief revision vs theory revision)
- M. Dalal, Investigations into a theory
of knowledge base revision: Preliminary report, Proc. AAAI, pages 475--479,
1988.
- N. Friedman and J. Halpern, Belief
revision: A critique, Proc. 6th Inter. Conf. on Principles of Knowledge
Representation and Reasoning, pp. 421-- 431, 1996.
- P. Langley, G. Drastal, B. Rao and
R. Greiner, Theory Revision in Fault Hierarchies, Proc. Fifth International
Workshop on Principles of Diagnosis (DX-94), 1994,
- R. Greiner, The complexity of theory
revision, Artificial Intelligence, 1999.
- R. Greiner, The Complexity of Revising
Logic Programs, Journal of Logic Programming, 40:2-3, (Aug-Sept 1999),
p. 273--298.
- Craw/Sleeman, Automating the Refinement of Knowledge-Based Systems,
Proceedings of ECAI 90, 1990.
- Mooney, A Preliminary PAC Analysis of Theory Revision,
Third Annual Workshop on Computational Learning Theory and Natural
Learning Systems (CLNL-92), 1994.
- Baffes/Mooney, Symbolic Revision of Theories with M-of-N Rules,
IJCAI93, 1993.
- Wogulis/Pazzani, A Methodology for Evaluating Theory Revision
Systems: Results with Audrey II, IJCAI, 1993.
- Getting gradient descent to work
- adjusting learning rate, using line-search...
- conjugate gradient? vs "momentum"
- Relations between decision tree algorithms and boosting/weak-learning:
- Models/algorithms for learning probabilistic functions:
- Ways to learn Belief Net Parameters (CPtables) -- Discriminative vs Generative:
- APN: Binder/Koller/Russell/Kanazawa,
Adaptive probabilistic networks with hidden variables,
Machine Learning, 29:213--244, 1997.
- Jordan, Why the logistic function? A tutorial discussion on
probabilities and neural networks, 1995,
citeseer.nj.nec.com/jordan95why.html
- R. Greiner, A. Grove and D. Schuurmans,
Learning Bayesian Nets that Perform Well, Proc. Thirteen Conference
on Uncertainty in Artificial Intelligence (UAI97), August 1997.
- Ways to learn Belief network (structure):
- Heckerman - Max Likelihood
- Spirtes, Conditional Independence
- Bayesian Models of Belief Nets
- T. Van Allen, R. Greiner and P. Hooper, Bayesian Error-Bars for
Belief Net Inference, Proc. Seventeenth Conference on Uncertainty in
Artificial Intelligence (UAI-01), Aug 2001.
- Heckerman, D., Geiger, D., and Chickering, D., Learning Bayesian
networks: The combination of knowledge and statistical data, Machine Learning,
20(3):197--243.
- Koller, Friedman, "Bayesian about learning Bayesian Nets"
- Hidden Markov Models
- Probabilistic Relational Models
See PRM Home Page, in general...
Tutorial
-
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer, Learning Probabilistic
Relational Models
- L. Getoor, N. Friedman, D. Koller, and A. Pfeffer, Learning
Probabilistic Relational Models (book chapter)
- L. Getoor, N. Friedman, D. Koller, and B. Taskar, Learning Probabilistic
Models of Relational Structure
- E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller, Rich
Probabilistic Models for Gene Expressions
- Online portfolio selection:
- Extensions to VC-dimension: bounds that depend on the "amount"
of separation acheived (aka the "margin"):
- Extensions to VC-dimension: bounds that depend on the input
distribution and "error shells":
- Extensions to VC-dimension: bounds based on metric entropy:
- Exploring an unknown environment:
- How to handle missing (training)data.
- D. Schuurmans and R. Greiner. Learning
default concepts, Proc. Tenth Canadian Conference on Artificial Intelligence
(CSCSI-94), 1994.
- R. Greiner, A. Grove, and A. Kogan,
Knowing what doesn't matter: Exploiting the omission of irrelevant
data, Artificial Intelligence, December 1997.
- Jordan/Jacobs, "EM and friends"
- On-Line Learning, vs Batch Learning
- Formal Analysis: Gold-style, PAC, Bayesian, ...
- Considering costs when performing
experiments:
- R. Greiner, A. Grove and D. Roth:
Learning Active Classifiers, Proc. Thirteenth International Conference
on Machine Learning (IMLC96), July, 1996.
- Turney, ... ICML
- Comparing different learners
- Neural Nets vs Decision Trees
- Comparing Ensembles of Learners
- Boosting vs Bagging
- Dealing with Error-Correcting Classifiers
- Reinforcement Learning, MDPs,
- Support Vector Machines (Vapnik),
Margins, ...
- Understanding Minimum Description
Length (NFL?)
- Inductive Logic Programming