Computing
Science 651: Probablistic Graphical Models
Research
Project
* * * DRAFT * * *
Your class project is an opportunity
for you to explore an interesting multivariate analysis problem of your
choice, typically in the context of a real-world data set.
This should include
- a broad thorough literature review, overviewing this general topic;
- Ex1: techniques for learning motifs in DNA
- Ex2: ways to learn structure, given missing data
- a deeper discussion of some specific subtopic
- Ex1: using hidden markov models to learn
probabilistic motifs
- Ex2: statistically motivated ways to model blocked attribute values
- an analysis of several systems for this task --
theoretical
and/or empirical (preferably based on your implementation, or at least
your
runs on various data)
- Ex1: an empirical comparison of several gene-finding tools,
on novel datasets
- Ex2: an empirical, and perhaps theoretical,
analysis of several specific techniques, on novel data
As the examples above show, the project
can involve either an "application pull" -- seeking ways to solve some
specific problem (Ex1); or a "technology push" -- exploring ways of
coping with some
specific technical challenge (Ex2).
This investigation may begin by reading two or more recent, related papers
from conferences/journals on artificial intelligence, where the papers are
related by tackling the same problem but
using different approaches; or by employing similar techniques to solve
different
problems; or as one is a follow-up work of the other; or they hold the
opposite
points of view over some problems; etc.
See topics.
Evaluation Criteria.
Each project will involve
- Deciding on some specific topic
- Reading the relevant literature
- Implementing some of the ideas and
gathering experimental data
and/or providing some new theoretical insights
- Writing an original 8-page paper
- critically overviewing the
literature
- presenting your results
[Due about two weeks after the last day
of class.]
There are 3 ways to hand-in your final project report:
- Slip the hard-copy under my door (Ath 359)
- If it is in *.pdf, you may EMAIL the file to your coach
(me for the RGx teams; Dr M Brown for the MBx teams).
Please include "Cmput651 Project" in the subject line.
- You put a copy on a webpage, then EMAIL that URL to your coach.
Or you can do more than one of the above...
N.b., do NOT send me *.doc,... files!
See RequiredComponents
- Giving two presentations to the class (perhaps during lab time).
- The first, around week#9
[~10/Mar], will
summarize the area of your project -- giving a "lay of the
land";
see Guide
.
Each member of the audience will fill out this Feedback
form
- The second presentation will
summarize your results. It will be given
during the last week of the semester. (Again, see
Guide
.)
Each member of the audience will fill out this Feedback
form
Each presentation will be 15 minutes: 12 minute presentation and 3
minutes for questions
Your project will be evaluated based on
- Apparent effort.
- Clarity. Does your paper
demonstrate a good understanding and give a good analysis of the
underlying challenge? Is it well-written and
well-organized? Does it make good use of examples to illustrate
the problem and the solutions?
- Originality. Does
your paper show independent thought and consideration, identify any
unsolved problems and propose any initial solutions, or propose any new
approaches to existing problems?
75% |
Content of Written Report |
Understanding of basic Idea:
Implementation
Evaluation of ideas |
15% |
Form of Written Report |
Clarify of presentation, ... |
10% |
Verbal Presentations, I + II |
Conciseness, preparations, appropriate content |
Required Components of
WriteUp.
- Clear statement of problem being
addressed
- Motivation -- i.e., why is
this
problem interesting and challenging?
- Necessary background material,
review
of previous work and limitations of previous work.
- Clear statement of technical
solutions
used to solve the problem, how successful they were, and why they were
successful.
- use clearly defined terms. It is fine to use intuititions to motivate
an idea, but thereafter use include precise statements to state your
claim(s), then support these claims with either theorems or meaningful
empirical results.
- Identification of remaining
problems
and future research directions.
- in the form of a NIPS paper (8 pages
maximum in NIPS format, including references;
see NIPS Style)
But DO include your name!
It should answer the questions:
What was the problem you worked on,
why is it important,
what did you learn, and
why are these results important.
Try to make your report EASY to read.
Format/Style
In a nutshell, your write-up tells a story, in a clear fashion. Your
paper
should work to establish some explicitly-stated "falsifiable
conclusion" (which should, of course, be related to learning...)
Every section, paragraph, section, figure, table, ... should contribute
to establishing this specific claim. Towards enforcing this, your first
section should include an overview, outlining the contents of the
paper. You may also want to begin each section with an overview,
indicating both what will be included here, and also connecting
this to the central theme. As an example, suppose you are claiming that
algX
works effectively at taskY (eg, algX=="Support Vector Machines",
taskY=="detecting
patterns in heart rhythms"). Here, it makes sense to describe algX, and
perhaps
its precursors, and to contrast algX with other related algorithms.
(Note
this contrast is typically in the form "algQ does BLAH; our algX
differs
by doing the subBLAH differently, our report proves that this is an
improvement";
etc.) You should also discuss the effects of changing the settings for
various
parameters. Similarly, you should precisely define taskY, and perhaps
contrast
it with other related tasks. You should then provide evidence to
establish
the claim -- either empirical or theoretical.
If your report contrasts algQ with algR, you should explain why that
is
relevant. Or if it digresses to consider some taskW, again explain why
this
is included. (If you simply want to include such analysis -- perhaps to
indicate
that you had read an article -- you may include it in an appendix,
possibly
labeled as "not completely irrelevant asides" :-) )
Your report should contain precisely-defined terms; do not be afraid
of
using mathematical notation! Similarly, if you use comparisons, be sure
to
specify the details; eg, state "algX is an improvement over algZ",
rather than just "algX is an improvement".
You should include simple illustrative examples! One that conveys
the basic
ideas, to help the reader understand the various points.
Be sure to re-read your report! Imagine this topic was new to you...
would
you understand the material presented? You may assume your reader knows
only
material presented in Cmput651; if you use any other terms, be sure they
are
defined. You should also explain why you are including that term -- ie,
how
does it relate to the overall theme of the paper.
Don't make your reader guess at your meanings!
Be sure to label figures/tables. (Eg, if you write "10%", is this
10% error,
or 10% accuracy?)
NOTES:
- REQUIREMENTS:
Each report must include a simple specific example, providing the I/O
showing
how the output is related to the input specifying the desired/achived
properties
of the output illustrating the basic terms used.
It must also include an outline of the paper (probably at end of
Section1), that specifies the goals of the research, and overview the
paper.
- There
is often a large number of parameters that can be adjusted; typically
too
many to exhaustively try every combination. Here, you should still
explore
the space. First, you can argue that some specific parameters appear
largely
irrelevant -- here, by considering a few specific setting for n-1
parameters,
and for each such setting, varying the remaining parameter. If the
result
for each setting does not change, we can typically ignore that extra
parameter.
Second, you can in general look for correlations amoung the parameters.
Etc.
Logistics
- By default, each project will be
done by a team of one or two students.
Note that all members of a team will receive
the same grade for their mutual project.
Timing
(TimeTable)
- Decide on topics, and teams, by
around
5th week of term (~5/Oct).
Each team must email to me
a one-page, PLAIN-TEXT document that provides:
- Title of your project
+ Proposed topic
- Team members (also specify Undergrad/Grad, and major, as well
as email address)
- What you plan to do, or avoid
doing
- Papers you plan to
read/critique
- Code you plan to write (or
download)
- Experiments you plan to perform
- Rough outline of who will do what; and
estimate of time requirements
- When your team can meet with me,
for our bi-weekly meeting. (Suggest 3 times.)
See my
schedule for times to avoid.
- Bi-weekly meeting with me thenafter.
- If your project is an empirical study, see below
.
- Presentation#1 will be around Week#8.
- Presentation#2 will be during the last week of the class.
- The final write-up (discussing empirical
results, etc etc etc) is due two weeks after the end of the course.
To hand-in your reports... Hand me a hard-copy of your write-up (or
put it in my mailbox, or under my door, or ...)
Create a webpage containing pointers
to
- the write-up [eg, *.ps, *.pdf, or whatever]
- any other files you'd like me to have -- eg, data, algorithms, charts, ...
Be sure each is labeled appropriately.
wrt Empirical Studies
Many people are considering empirical studies -- eg, "application pulls".
Here, the learning challenge is
how to use some "experiences" to improve "performance" on
some "performance task".
Your proposal should therefore include the following information:
The Underlying Task
- a PERFORMANCE TASK [eg, playing poker, driving a car, ...]
(Notice this is independent of learning -- eg, one could build a
NON-LEARNING performance system that does this.)
- a well-defined, objectively-measured
PERFORMANCE CRITERIA, for evaluating a performance system
[eg, %ofhands won, or "average number of miles driven before accident".
Note: "kinda seems good" is not objective :-) ]
- the type of "EXPERIENCES"
the learner will use, both the "instances" and how they are "labeled"
[eg, the instances could be the poker hands played against a random
opponents,
or perhaps against a series of progressively cleverer opponents, or ...
The
"label" could be perhaps a numeric score indicating the quality of each
action
for a given hand (eg, a label of -1 for "raising" on 4h8d, or "0" for
folding,
etc) or it could be feedback after a SEQUENCE of actions (eg, this
sequence
of moves led to a lost game; or the plant was fined at time 7, or ...);
or
... ]
- how you will evaluate the LEARNER
Typically this will be in terms of a learning curve: after
experiences S,
learner L will produce the performance system L(S). We can then measure
the
true error rate for L(S) on new data. ... we can also consider, eg, how
L
improved the performance L(S) - L({}), or typically measure the
EXPECTED value
of L(S)-L({}) over a range of "training sets of size m=|S|".
Notice the LearningTask is independent of "implementation details":
- the actual type of performance system (eg, decision tree, rule
set,
neural net, ...); and
- the actual learner involved (eg, reinforcement learning,
consistency
filtering, backpropagation, ...)
This is intensional, as this means you can compare different learning
algorithms over the same task. The other parts of the proposal should
- suggest the actual experiments that will be run, typically
comparing
different learners for some learning task. (Eg, Neural Nets vs Decision
Trees,
or Reinforcement learning vs ILP, or ...)
- what literature you plan to read as well as logistic issues, etc
etc
etc.
NOTE: Just building a single learner for a learning task is typically
not
interesting; I am much more interested in claims of the form
LearningAlgorithm A1 did better than A2 at some task, in that A1's
learning curve is steeper, or converges to better result, or ...
Other comments (2008)
- Ask the prof if you need computational resources -- we typically have
access to a cluster or two... :-)
- Don't be afraid to use formulas!
- Think of how the material should be organized.
- If you don't use some part, you don't have to discuss it.
- Abstract != overview
The abstract should say what the results were
- If you are dealing with a learning task, be sure to distinguish
LEARNER from CLASSIFIER
- Think of your target audience... provide information that they
need to know,
and only information that they do not know.
- See Hints about preparing PowerPoints documents.