Projects for Cmput651

Computing Science 651: Probablistic Graphical Models

Research Project

* * * DRAFT * * *

Your class project is an opportunity for you to explore an interesting multivariate analysis problem of your choice, typically in the context of a real-world data set.
This should include

a broad thorough literature review, overviewing this general topic;

Ex1: techniques for learning motifs in DNA
Ex2: ways to learn structure, given missing data

a deeper discussion of some specific subtopic

Ex1: using hidden markov models to learn probabilistic motifs
Ex2: statistically motivated ways to model blocked attribute values

an analysis of several systems for this task -- theoretical and/or empirical (preferably based on your implementation, or at least your runs on various data)

Ex1: an empirical comparison of several gene-finding tools, on novel datasets
Ex2: an empirical, and perhaps theoretical, analysis of several specific techniques, on novel data

As the examples above show, the project can involve either an "application pull" -- seeking ways to solve some specific problem (Ex1); or a "technology push" -- exploring ways of coping with some specific technical challenge (Ex2).

This investigation may begin by reading two or more recent, related papers from conferences/journals on artificial intelligence, where the papers are related by tackling the same problem but using different approaches; or by employing similar techniques to solve different problems; or as one is a follow-up work of the other; or they hold the opposite points of view over some problems; etc. See topics.

Evaluation Criteria.

Each project will involve

Deciding on some specific topic
Reading the relevant literature
Implementing some of the ideas and gathering experimental data
and/or providing some new theoretical insights
Writing an original 8-page paper

critically overviewing the literature
presenting your results

Slip the hard-copy under my door (Ath 359)
If it is in *.pdf, you may EMAIL the file to your coach (me for the RGx teams; Dr M Brown for the MBx teams). Please include "Cmput651 Project" in the subject line.
You put a copy on a webpage, then EMAIL that URL to your coach.

N.b., do NOT send me *.doc,... files!

RequiredComponents

Giving two presentations to the class (perhaps during lab time).

The first, around week#9 [~10/Mar], will summarize the area of your project -- giving a "lay of the land"; see Guide .
Each member of the audience will fill out this Feedback form
The second presentation will summarize your results. It will be given during the last week of the semester. (Again, see Guide .)
Each member of the audience will fill out this Feedback form

Your project will be evaluated based on

Apparent effort.
Clarity. Does your paper demonstrate a good understanding and give a good analysis of the underlying challenge? Is it well-written and well-organized? Does it make good use of examples to illustrate the problem and the solutions?
Originality. Does your paper show independent thought and consideration, identify any unsolved problems and propose any initial solutions, or propose any new approaches to existing problems?

Rough Guidelines

75%	Content of Written Report	Understanding of basic Idea: Implementation Evaluation of ideas
15%	Form of Written Report	Clarify of presentation, ...
10%	Verbal Presentations, I + II	Conciseness, preparations, appropriate content

Required Components of WriteUp.

Clear statement of problem being addressed
- clearly describes the underlying performance task (and if appropriately, distinguish this from the learning task).
  It is often (?always?) useful to include one figure that explicitly shows the input and the output of the performance system, and another figure showing the input and the output of the learning system.
Motivation -- i.e., why is this problem interesting and challenging?
Necessary background material, review of previous work and limitations of previous work.
Clear statement of technical solutions used to solve the problem, how successful they were, and why they were successful.

use clearly defined terms. It is fine to use intuititions to motivate an idea, but thereafter use include precise statements to state your claim(s), then support these claims with either theorems or meaningful empirical results.

Identification of remaining problems and future research directions.
in the form of a NIPS paper (8 pages maximum in NIPS format, including references; see NIPS Style)
But DO include your name!

It should answer the questions:

Try to make your report EASY to read.

Be sure to include a overview in the beginning, which outlines what the paper will be describing, in a section-by-section fashion.

Include simple examples (or better, a single simple example throughout), to help illustrate the ideas.

A picture is worth (at least) a thousand words. Use figures, flow-charts, graphs, whenever appropriate.

The material should be structured, and flow. It should NOT be a core-dump of everything you happened to read when you were looking at things related to X. Readers (read "the people who will assign your grade!") get annoyed by having to wade through irrelevant material.

If you are giving a high-level description of an algorithm, be sure to explicitly state its input and output.

Many algorithms have a flow of information, from one subroutine to another. Provide one or more figures, to make the ideas clear.

Also, proof-read your report. As a grader, I find it very irritating to read a report that has pages of easy-to-fix typos, illegible figures, missing citations, etc. And you really don't want to irritate the person who is assigning your grade...

If you are describing a precise algorithm, you should give the actual formulas, using terms that are well-defined, in the report.

Your report should be self-contained. You are allowed to copy figures from other sources (if they are properly credited). But if you do, be sure to define the terms that appear in that figure!

Save trees -- hand in a 2-sided version. And use section numbers, and page numbers!

Format/Style

In a nutshell, your write-up tells a story, in a clear fashion. Your paper should work to establish some explicitly-stated "falsifiable conclusion" (which should, of course, be related to learning...) Every section, paragraph, section, figure, table, ... should contribute to establishing this specific claim. Towards enforcing this, your first section should include an overview, outlining the contents of the paper. You may also want to begin each section with an overview, indicating both what will be included here, and also connecting this to the central theme. As an example, suppose you are claiming that algX works effectively at taskY (eg, algX=="Support Vector Machines", taskY=="detecting patterns in heart rhythms"). Here, it makes sense to describe algX, and perhaps its precursors, and to contrast algX with other related algorithms. (Note this contrast is typically in the form "algQ does BLAH; our algX differs by doing the subBLAH differently, our report proves that this is an improvement"; etc.) You should also discuss the effects of changing the settings for various parameters. Similarly, you should precisely define taskY, and perhaps contrast it with other related tasks. You should then provide evidence to establish the claim -- either empirical or theoretical.

If your report contrasts algQ with algR, you should explain why that is relevant. Or if it digresses to consider some taskW, again explain why this is included. (If you simply want to include such analysis -- perhaps to indicate that you had read an article -- you may include it in an appendix, possibly labeled as "not completely irrelevant asides" :-) )

Your report should contain precisely-defined terms; do not be afraid of using mathematical notation! Similarly, if you use comparisons, be sure to specify the details; eg, state "algX is an improvement over algZ", rather than just "algX is an improvement".

You should include simple illustrative examples! One that conveys the basic ideas, to help the reader understand the various points.

Be sure to re-read your report! Imagine this topic was new to you... would you understand the material presented? You may assume your reader knows only material presented in Cmput651; if you use any other terms, be sure they are defined. You should also explain why you are including that term -- ie, how does it relate to the overall theme of the paper.
Don't make your reader guess at your meanings!

Be sure to label figures/tables. (Eg, if you write "10%", is this 10% error, or 10% accuracy?)

NOTES:

REQUIREMENTS:
Each report must include a simple specific example, providing the I/O showing how the output is related to the input specifying the desired/achived properties of the output illustrating the basic terms used.
It must also include an outline of the paper (probably at end of Section1), that specifies the goals of the research, and overview the paper.
There is often a large number of parameters that can be adjusted; typically too many to exhaustively try every combination. Here, you should still explore the space. First, you can argue that some specific parameters appear largely irrelevant -- here, by considering a few specific setting for n-1 parameters, and for each such setting, varying the remaining parameter. If the result for each setting does not change, we can typically ignore that extra parameter. Second, you can in general look for correlations amoung the parameters. Etc.

Logistics

By default, each project will be done by a team of one or two students. Note that all members of a team will receive the same grade for their mutual project.

Timing (TimeTable)

Decide on topics, and teams, by around 5th week of term (~5/Oct).

Title of your project
+ Proposed topic
Team members (also specify Undergrad/Grad, and major, as well as email address)
What you plan to do, or avoid doing

Papers you plan to read/critique
Code you plan to write (or download)
Experiments you plan to perform

Rough outline of who will do what; and estimate of time requirements
When your team can meet with me, for our bi-weekly meeting. (Suggest 3 times.)
See my schedule for times to avoid.

Bi-weekly meeting with me thenafter.
If your project is an empirical study, see below .
Presentation#1 will be around Week#8.
Presentation#2 will be during the last week of the class.
The final write-up (discussing empirical results, etc etc etc) is due two weeks after the end of the course.

To hand-in your reports...

Hand me a hard-copy of your write-up (or put it in my mailbox, or under my door, or ...)

Create a webpage containing pointers to

the write-up [eg, *.ps, *.pdf, or whatever]
any other files you'd like me to have -- eg, data, algorithms, charts, ...

wrt Empirical Studies

Many people are considering empirical studies -- eg, "application pulls". Here, the learning challenge is

how to use some "experiences" to improve "performance" on some "performance task".

Your proposal should therefore include the following information:

The Underlying Task

a PERFORMANCE TASK [eg, playing poker, driving a car, ...]

a well-defined, objectively-measured PERFORMANCE CRITERIA, for evaluating a performance system

the type of "EXPERIENCES" the learner will use, both the "instances" and how they are "labeled"
[eg, the instances could be the poker hands played against a random opponents, or perhaps against a series of progressively cleverer opponents, or ... The "label" could be perhaps a numeric score indicating the quality of each action for a given hand (eg, a label of -1 for "raising" on 4h8d, or "0" for folding, etc) or it could be feedback after a SEQUENCE of actions (eg, this sequence of moves led to a lost game; or the plant was fined at time 7, or ...); or ... ]
how you will evaluate the LEARNER
Typically this will be in terms of a learning curve: after experiences S, learner L will produce the performance system L(S). We can then measure the true error rate for L(S) on new data. ... we can also consider, eg, how L improved the performance L(S) - L({}), or typically measure the EXPECTED value of L(S)-L({}) over a range of "training sets of size m=|S|".

Notice the LearningTask is independent of "implementation details":

the actual type of performance system (eg, decision tree, rule set, neural net, ...); and
the actual learner involved (eg, reinforcement learning, consistency filtering, backpropagation, ...)

This is intensional, as this means you can compare different learning algorithms over the same task. The other parts of the proposal should

suggest the actual experiments that will be run, typically comparing different learners for some learning task. (Eg, Neural Nets vs Decision Trees, or Reinforcement learning vs ILP, or ...)
what literature you plan to read as well as logistic issues, etc etc etc.

NOTE: Just building a single learner for a learning task is typically not interesting; I am much more interested in claims of the form LearningAlgorithm A1 did better than A2 at some task, in that A1's learning curve is steeper, or converges to better result, or ...

Other comments (2008)

Ask the prof if you need computational resources -- we typically have access to a cluster or two... :-)
Don't be afraid to use formulas!
Think of how the material should be organized.
If you don't use some part, you don't have to discuss it.
Abstract != overview
The abstract should say what the results were
If you are dealing with a learning task, be sure to distinguish LEARNER from CLASSIFIER
Think of your target audience... provide information that they need to know, and only information that they do not know.
See Hints about preparing PowerPoints documents.