Computing Science 466/551
Introduction to Machine Learning
Information about Research Projects
For your project, you should investigate some
interesting aspect of machine learning. This should include
- a broad thorough literature review
, overviewing this general topic;
- Ex1: techniques for learning motifs in
DNA
- Ex2: ways to cope with missing data
- a deeper discussion of some specific
subtopic
- Ex1: using hidden markov models to learn
probabilistic motifs
- Ex2: statistically motivated ways to
handle independently blocked attribute values
- an analysis of several systems for this task -- theoretical
and/or empirical (preferably based on your implementation, or at least your
runs on various data)
- Ex1: an empirical comparison of several gene-finding tools, on novel
datasets
- Ex2: an empirical, and perhaps theoretical, analysis of several specific
techniques, on novel data
As the examples above show, the project can
involve either an "application pull" -- seeking ways to solve some specific
problem (Ex1); or a "technology push" -- exploring ways of coping with some
specific technical challenge (Ex2).
Note that this investigation may begin by
reading two or more recent, related papers from conferences/journals on artificial
intelligence, where the papers are related by tackling the same problem but
using different approaches; or by employing similar techniques to solve different
problems; or as one is a follow-up work of the other; or they hold the opposite
points of view over some problems; etc.
Evaluation Criteria.
Each project will involve
Your project will be evaluated based on
- Apparent effort.
- Clarity. Does your paper demonstrate
a good understanding and give a good analysis of the underlying challenge?
Is it well-written and well-organized? Does it make good use of examples
to illustrate the problem and the solutions?
- Originality. Does your paper
show independent thought and consideration, identify any unsolved problems
and propose any initial solutions, or propose any new approaches to existing
problems?
Rough Guidelines
75% |
Content of Written Report |
Understanding of basic Idea:
Implementation
Evaluation of ideas |
15% |
Form of Written Report |
Clarify of presentation, ... |
10% |
Verbal Presentations, I + II |
Conciseness, preparations, appropriate content
|
Required Components of
WriteUp.
- Clear statement of problem being addressed
- Motivation -- i.e., Why is this
problem interesting and challenging?
- Necessary background material, review
of previous work and limitations of previous work.
- Clear statement of technical solutions
used to solve the problem, how successful they were, and why they were successful.
- use clearly defined terms. It is fine to use intuititions to motivate
an idea, but thereafter use include precise statements to state your claim(s),
then support these claims with either theorems or meaningful empirical results.
- Identification of remaining problems
and future research directions.
- in the form of a NIPS paper
(8 pages maximum in NIPS format, including references;
see NIPS Style)
But DO include your name!
It should answer the questions:
What was the problem you worked on,
why is it important,
what did you learn, and
why are these results important.
Try to make your report EASY to read.
Format/Style
In a nutshell, your write-up tells a story, in a clear fashion. Your paper
should work to establish some explicitly-stated "falsifiable conclusion"
(which should, of course, be related to learning...) Every section, paragraph,
section, figure, table, ... should contribute to establishing this specific
claim. Towards enforcing this, your first section should include an overview,
outlining the contents of the paper. You may also want to begin each section
with an overview, indicating both what will be included here, and also connecting
this to the central theme. As an example, suppose you are claiming that algX
works effectively at taskY (eg, algX=="Support Vector Machines", taskY=="detecting
patterns in heart rhythms"). Here, it makes sense to describe algX, and perhaps
its precursors, and to contrast algX with other related algorithms. (Note
this contrast is typically in the form "algQ does BLAH; our algX differs
by doing the subBLAH differently, our report proves that this is an improvement";
etc.) You should also discuss the effects of changing the settings for various
parameters. Similarly, you should precisely define taskY, and perhaps contrast
it with other related tasks. You should then provide evidence to establish
the claim -- either empirical or theoretical.
If your report contrasts algQ with algR, you should explain why that is
relevant. Or if it digresses to consider some taskW, again explain why this
is included. (If you simply want to include such analysis -- perhaps to indicate
that you had read an article -- you may include it in an appendix, possibly
labeled as "not completely irrelevant asides" :-) )
Your report should contain precisely-defined terms; do not be afraid of
using mathematical notation! Similarly, if you use comparisons, be sure to
specify the details; eg, state "algX is an improvement over algZ", rather
than just "algX is an improvement".
You should include simple illustrative examples! One that conveys the basic
ideas, to help the reader understand the various points.
Be sure to re-read your report! Imagine this topic was new to you... would
you understand the material presented? You may assume your reader knows only
material presented in 466/551; if you use any other terms, be sure they are
defined. You should also explain why you are including that term -- ie, how
does it relate to the overall theme of the paper.
Don't make your reader guess at your meanings!
Be sure to label figures/tables. (Eg, if you write "10%", is this 10% error,
or 10% accuracy?)
NOTES:
- REQUIREMENTS:
Each report must include a simple specific example, providing the I/O showing
how the output is related to the input specifying the desired/achived properties
of the output illustrating the basic terms used.
It must also include an outline of the paper (probably at end of section1),
that specifies the goals of the research, and overview the paper. - There
is often a large number of parameters that can be adjusted; typically too
many to exhaustively try every combination. Here, you should still explore
the space. First, you can argue that some specific parameters appear largely
irrelevant -- here, by considering a few specific setting for n-1 parameters,
and for each such setting, varying the remaining parameter. If the result
for each setting does not change, we can typically ignore that extra parameter.
Second, you can in general look for correlations amoung the parameters. Etc.
Logistics
- I encourage people to work in teams of
three or four; note that all Grads in a team will receive the same grade,
as will all Undergrads.
- Your should ALSO hand in a 9th page, that quickly summarizes who did
what for this project -- wrt conceptualizing the problem, implementing the algorithms,
running experiments, proving theorems, writing the final document, preparing
the presentations, etc etc etc.
This can be a simple table (perhaps summarizing which team member did what percent of each -- in round figures!)
or whatever is easiest for your team.
Timing
- Decide on topics, and teams, by around 5th week of term (~5/Feb; ~6/Oct).
Each team must email to
me
a one-page, PLAIN-TEXT
document
that provides:
- Title of your project
+ Proposed topic
- Team members (also specify Undergrad/Grad, and major, as well as email address)
- What you plan to do, or avoid doing
- Papers you plan to read/critique
- Code you plan to write (or download)
- Experiments you plan to perform
- Rough outline of who will do what; and
estimate of time requirements
- When your team can meet with me, for
our bi-weekly meeting. (Suggest 3 times.)
See my schedule
for times to avoid.
- Bi-weekly meeting with me thenafter.
Issues for First Meeting [week of 11-15/Feb; 12-16/Oct]
- Make sure the task and evaluation criteria are both well-defined
- If the task involves data:
Do you have it? How will you get it?
- Do you have the resources you need?
- General questions?
Issues for Second Meeting [week of 3-7/Mar; 26-30/Oct]
- Follow-up on "Issues for First Meeting":
- Well defined task, data, resources, ...
- Prepare for Presentation#1: Lay of the Land
- General questions ?
Issues for Third Meeting [week of 17-24/Mar; 9-13/Nov]
- Continue preparing for Presentation#1
- General question ?
Issues for Fourth Meeting [week of ~1/Apr; 23-27/Nov]
- Begin preparation for Presentation#2
- General question ?
- If your project is an empirical study, see below
.
- Presentation#1
will be around Week#8.
- Presentation#2 will be during the last
week of the class.
- The final write-up (discussing empirical
results, etc etc etc) is due two weeks after the end of the course.
To hand-in your reports...
Hand me a hard-copy of your write-up (or
put it in my mailbox, or under my door, or ...)
Create a webpage containing pointers to
- the write-up [eg, *.ps, *.pdf, *.doc,
or whatever]
- any other files you'd like me to have
eg, data, algorithms, charts, ...
Be sure each is labeled appropriately.
Note "PLAIN-TEXT" means just regular text,
which is NOT *.doc, NOT *.rtf, ...
Also: this plain-text should be email-ed to me; I do not want just
a hard-copy.
wrt Empirical Studies
Many people are considering empirical studies -- eg, "application pulls".
Here, the learning challenge is
how to use some "experiences" to improve "performance" on some
"performance task".
Your proposal should therefore include the following information:
The Learning Task
- a PERFORMANCE TASK [eg, playing poker, driving a car, ...]
Notice this is independent of learning -- eg, one could build a NON-LEARNING
performance system that does this. - a well-defined, objectively-measured
PERFORMANCE CRITERIA, for evaluating a performance system
[eg, %ofhands won, or "average number of miles driven before accident".
Note: "kinda seems good" is not objective :-) ] - the type of "EXPERIENCES"
the learner will use, both the "instances" and how they are "labeled"
[eg, the instances could be the poker hands played against a random opponents,
or perhaps against a series of progressively cleverer opponents, or ... The
"label" could be perhaps a numeric score indicating the quality of each action
for a given hand (eg, a label of -1 for "raising" on 4h8d, or "0" for folding,
etc) or it could be feedback after a SEQUENCE of actions (eg, this sequence
of moves led to a lost game; or the plant was fined at time 7, or ...); or
... ]
- how you will evaluate the LEARNER
Typically this will be in terms of a learning curve: after experiences S,
learner L will produce the performance system L(S). We can then measure the
true error rate for L(S) on new data. ... we can also consider, eg, how L
improved the performance L(S) - L({}), or typically measure the EXPECTED value
of L(S)-L({}) over a range of "training sets of size m=|S|".
You should also include statistical tests to show whether your claims are
statistically value.
Here, be sure to explicitly state which test you ran, and the confidence
--- as in "... this result is significant based on a paired t-test, p<, 0.05."
Notice the LearningTask is independent of "implementation details":
- the actual type of performance system (eg, decision tree, rule set,
neural net, ...); and
- the actual learner involved (eg, reinforcement learning, consistency
filtering, backpropagation, ...)
This is intensional, as this means you can compare different learning algorithms
over the same task. The other parts of the proposal should
- suggest the actual experiments that will be run, typically comparing
different learners for some learning task. (Eg, Neural Nets vs Decision Trees,
or Reinforcement learning vs ILP, or ...)
- what literature you plan to read as well as logistic issues, etc etc
etc.
NOTE: Just building a single learner for a learning task is typically not
interesting; I am much more interested in claims of the form LearningAlgorithm
A1 did better than A2 at some task, in that A1's learning curve is steeper,
or converges to better result, or ...
Other comments
- Don't be afraid to use formulas!
- Think of how the material should be organized.
- Be sure to distinguish LEARNER from CLASSIFIER
- If you don't use some aspect, you don't have to discuss it!
- Abstract != overview
The abstract should say what the results are!
- Think of your target audience...
provide information that they need to know,
and only information that they do not know.
-
See Hints about preparing PowerPoints documents.
Info for Coaches