List of proposed Projects (Winter 2007):

ProjectMay Lead to MScTaken byBrief Description
Conference Ranking Yes Conferences are currently ranked subjectively by reputation. The project consists of devising a metric to rank conferences based on their paper citation, the reputation of the authors publishing in them, etc. This information could be extracted from scholar.google or citeseer in addition to the public database dblp.
Plan Protein Localization Yes   This project consist of using an associative classifier to build a model to identify whether a protein is intra or extra cellular and compare the accuracy of the classifier with results obtained with SVM and Adaboot. In addition to features such as protein composition and frequent subsequences, the location of frequent subsequences should also be considered in the model then the accuracy again compared.
Assymetric Parallel Frequent Itemset Mining Yes Scott B. Typically, a parallel data mining program runs the same program on all processors. This project consists of identifying transactional data features that wouyld identify the most appropriate algorithm to run on each processor given its data partition.
Emmerging Sequences Yes Pouria P. Comparing sets of sequences is relevant to many applications. This project consists of implementing algorithms to identify contrasting sequences among sets of sequences.
Mammography Classification Yes   An associative classifier should be used to classify mammograms into cancerous and normal cases. An interface is to be built receiving mammograms and simulating the diagnostic is to be built.
Feature Space Conversion for SVM Possible   The associative classifier can be used to transform the feature space of a training set. Would this transformation help SVM in its prediction? The project is to test this hypothesis.
Question-Answer recommender System Yes   In the context of a search engine, one could send a question instead of keywords. The idea is to expend the question, retrieve pretinent documents using search engines, then summarise these resources and provide the concise summaries as recommeded answers to the original question.
Spatial Outlier Detection with Constraints Yes Jeff B. & John M. DBCluC is a variation of DBSCAN to cluster date while considering constratins. Can we do a similar update in the definitions used in LOF for outlier detection to discover spatial outliers given physical constraints?
Classification Explanation Generation for Associative Classifiers Possible   When a classifier predicts a class for a new object a straight forward question is "how was the conclusion drawn?" The project is to built an UI that would explain the results of an associative classifier.
Interface for Constraint based Frequent Itemset Mining Possible Ananth V. The project is to build a user interface allowing the expression of constraints to be pushed in frequent itemset mining.