Russell Greiner: Presentations

## Presentation Abstracts (R Greiner)- Budgeted Learning of Effective Classifiers
- Patient-Specific Survival Prediction
- Explaining the Gene Signature Anomaly: Estimating the Overlap of Two Ranked Lists
- WebIC: An Effective "Complete-Web" Recommender System
- Proteome Analyst: A Web-based tool for Predicting Properties of Proteins
- Using Machine Learning to Predict Protein Roles in Signaling Pathways
- A Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers
- Quantifying the Uncertainty of a Belief Net Response
Tutorials (not research material... ):
- Introduction to Machine Learning
- Towards Patient-Specific Treatment: Medical Applications of Machine Learning
- Introduction to Bayesian Belief Nets
- How to Write Effectively
- How to Prepare and Deliver Platform and Poster Presentations
Recent Presentations
## Budgeted Learning of Effective ClassifiersResearchers often use clinical trials to collect the data needed to evaluate some hypothesis, or produce a classifier. During this process, they have to pay the cost of performing each test. Many studies will run a comprehensive battery of tests on each subject, for as many subjects as their budget will allow -- ie, "round robin" (RR). We consider a more general model, where the researcher can sequentially decide which single test to perform on which specific individual; again subject to spending only the available funds. Our goal here is to use these funds most effectively, to collect the data that allows us to learn the most accurate classifier. We first explore the simplified
Joint work with Aloak Kapoor, Dan Lizotte and Omid Madani.
## Patient-Specific Survival Prediction: An Effective Way to Estimate the Survival Distributions of Individual Patients
An accurate model of a patient’s individual survival distribution
can help determine the appropriate treatment and care of terminal patients.
The common practice of estimating such survival distributions uses only
population averages for (say) the site and stage of cancer;
however, this is not very precise,
as it ignores many important individual differences among patients.
This paper describes a novel technique, PSSP (patient-specific survival prediction),
for estimating a patient’s Joint work with Chun-Nam Yu, Vickie Baracos,
Roman Eisner,
Ping Jin,
Bret Hoehn,
Karim Abou-Moustafa,
Haipeng Li
## Explaining the Gene Signature Anomaly: Estimating the Overlap of Two Ranked Lists
Recent advances in high-throughput technologies, such as genome-wide SNP analysis and microarray gene expression profiling,
have led to a multitude of ranked lists,
where the features (SNPs, genes) are sorted based on their individual correlation with a phenotype.
Many studies will then return the top This paper formally investigates the overlap of the top ranked features in two lists whose elements are ranked by their respective Pearson correlation coefficients with the phenotype outcome. We show that our model is able to accurately predict the expected overlap between two ranked lists, based on reasonable assumptions. This finding explains why the overlap in a pair of gene signatures (for same phenotype) should be relatively small, for purely statistical reasons, given today's typical sample size. Joint work with Babak Damavandi,
Chun-Nam Yu
## WebIC: An Effective "Complete-Web" Recommender SystemMany web recommendation systems direct users to webpages, from a single website, that other similar users have visited. By contrast, our WebIC web recommendation system is designed to locate "information content (IC) pages" --- pages the current user needs to see to complete her task --- from essentially anywhere on the web. WebIC first extracts the "browsing properties" of each word encountered in the user's current click-stream --- eg, how often each word appears in the title of a page in this sequence, or in the "anchor" of a link that was followed, etc. It then uses a user- and site-independent model, learned from a set of annotated web logs acquired in a user study, to determine which of these words is likely to appear in an IC page. We discuss how to use these IC-words to find IC-pages, and demonstrate empirically that this browsing-based approach works effectively.
Joint work with Tingshao Zhu, Gerald Haeubl and Bob Price
## Proteome Analyst: A web-based tool for Predicting Properties of ProteinsProteome Analyst (PA) is a publicly available, high-throughput, web-based system for predicting various properties of each protein in an entire proteome. Using machine-learned classifiers, PA can predict, for example, the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein. In addition, PA is one of the most accurate and most comprehensive systems for predicting subcellular localization, the location within a cell where a protein performs its main function. Two other capabilities of PA are notable. First, PA can create a custom classifier to predict a new property, without requiring any programming, based on labeled training data (i.e. a set of examples, each with the correct classification label) provided by a user. PA has been used to create custom classifiers for potassium-ion channel proteins and other general function ontologies. Second, PA provides a sophisticated explanation feature that shows why one prediction is chosen over another. The PA system produces a linear classifier, which is amenable to a graphical and interactive approach to explanations for its predictions; transparent predictions increase the user's confidence in, and understanding of, PA. Finally, we also present a similar technique that predicts which proteins will participate in a known (signaling) pathway, and how. Joint work with
Paul Lu, Duane Szafron, David S. Wishart,
Alona Fyshe, Brandon Pearcy, Brett Poulin, Roman Eisner, Danny Ngo,
Nicholas Lamb, Jordan Patterson, Kurt McMillan, Kevin Jewell
## Using Machine Learning to Predict Protein Roles in Signaling PathwaysIn general, each cell signaling pathway involves many proteins, each with one or more specific roles. As they are essential components of cell activity, iti s important to understand how these proteins work -- and in particular, to determine which of a species' proteins participate in each role. Experimentally determining this mapping of proteins to roles is difficult and time consuming. Fortunately, many pathways are similar across species, so we may be able to use known pathway information of one species to understand the corresponding pathway of another. We present an automatic approach, Predict Signaling Pathway (PSP), that uses the signaling pathways in well-studied species to predict the roles of proteins in less-studied species. We use a machine learning approach to create a predictor that achieves a generalization F-measure of 78.2% when applied to 11 different pathways across 14 different species. We also show our approach is very effective in predicting the pathways that have not yet been experimentally studied completely. Joint work with
Babak Bostan, Paul Lu, Duane Szafron,
## A Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers
Bayesian belief nets (BNs) are often used for classification tasks ---
typically to return the most likely class label for each specified instance.
Many BN-learners, however, attempt to find the BN that maximizes a different
objective function --- viz., likelihood, rather than classification accuracy
--- typically by first learning an appropriate graphical structure, then
finding the maximal likelihood parameters for that structure. As these
parameters may not maximize the classification accuracy, ``discriminative
learners'' follow the alternative approach of seeking the parameters that
maximize Joint work with Wei Zhou, Xiaoyuan Su and Bin Shen
## Quantifying the Uncertainty of a Belief Net Response
A Bayesian Belief Network (BN) models a joint distribution over a set of n variables,
using a DAG structure to represent the immediate dependencies
between the variables, and a set of parameters (aka "CPTables") to represent
the local conditional probabilities of a node, given each assignment to its
parents.
In many situations, these parameters are themselves random variables
--- this may reflect the uncertainty of the domain expert,
or may come from a training sample used to estimate the parameter values.
The distribution over these "CPtable variables" induces a distribution over the response the BN
will return to any "What is w is the effective tree width.
Finally, we provide empirical evidence that a Beta approximation works much better than
the normal distribution, especially for small sample sizes,
and that our algorithm works effectively in practice,
over a range of belief net structures, sample sizes and queries.
Joint work with Tim Van Allen, Ajit Singh and Peter Hooper.
## Tutorial: Introduction to Machine LearningMachine Learning (ML) basically involves finding patterns in data. This presentation focusses on patterns that can lead to effective classifiers. It first providing a number of real-world deployed applications based on ML, and motivates why this process is essential for such tasks. It first provides a number of real-world deployed applications based on ML, and motivates why ML is essential for such tasks. It then overviews three simple examples of learnable classifiers (linear separators, artificial neural nets and decision trees), and quickly suggests some of the important statistical foundations that underlie this field. ## Towards Patient-Specific Treatment: Medical Applications of Machine LearningPatient-specific treatment requires determining which treatment has best chance of success for anindividual patient,
based on all available information.
As this typically depends on many patient characteristics,
finding a single biomarker is not sufficient;
nor is it enough to find the set of top biomarkers,
as the best treatment depends on how multiple factors collectively
relate to the outcome,
perhaps combined using a classifier such as a decision tree.
In many situations, these "best treatment" classifiers are not known initially.
Fortunately, there is often a corpus of historical data, which includes both
descriptions of previous patients, as well as the treatment outcomes. The field of Machine Learning
(ML) provides tools to help here -- tools that can "learn" which treatment is
most effective for a given patient, based on his/her specific symptoms.
This presentation introduces the relevant ideas, using real-world medical examples -- starting with a way to help predict which breast cancer patients are likely to suffer a relapse, based on the subcellular location of certain adhesion proteins, as well as the standard "Adjuvant!" features. We use this real example to introduce some basic ML technologies (like linear separators, support vector machines, and decision trees), as well as some important foundational ideas -- including ways to statistically validate the resulting classifier. En route, we explain how this machine learning approach differs from the standard biostatistical framework. This part concludes by explaining the decision tree learned for this task, and showing that it is effective -- capable of correctly predicting the outcome for 80% of the patients. We then explain how the same methodology can be used to (1) find patterns over a patient's metabolic profiles (extracted from an NMR analysis of his/her urine sample) to predict whether s/he will become cachexic, or is at risk for colon cancer; (2) predict whether a breast cancer patient is ER+ vs ER-, by analyzing a microarray of her tumor biopsy; (3) find patterns over multiple Single Nucleotide Polyomorphisms (SNPs) that can predict breast cancer; (4) locate a tumor within an magnetic resonance image of a brain and predict how that tumor will grow; (5) predict a patient's psychiatric disorder from an fMRI scan; and (6) help patients manage their Type I diabetes. We also discuss some foundational work, providing an effective way (a) to predict how long an specific patient will survive, and (b) to efficiently design experiments over multiple sub-populations. Joint work with various colleagues in the Cross Cancer Institute, the UofA Medical School and UofA Computing Science, including M Pasdar, J Mackey, S Damaraju, K Graham (and others of the PolyomX project), D Wishart, V Baracos, and A Murtha (and others of the Brain Tumor Analyst Project), M Brown (+ S Dursun, A Greenshaw), E Ryan. ## Tutorial: Introduction to Bayesian Belief Nets
Many tasks require building and using a model ---
eg, to relate a patient's disease state
with the possible symptoms, underlying causes,
and effects of various treatments.
These relationships are often
Slides: (Intro)
(Topics)
Presentations
How to Prepare and Deliver
Platform Presentations (+ Posters)
This presentation focuses on preparing, and delivering, platform presentations: describing what material to present, and how to show this material effectively, and then some ideas relevant to the presentation itself: both before and during. The main ideas are to view the presentation as telling a well-structured story, that is relevant (and helpful) for the intended audience. If time permits, we will also discuss how to prepare and present posters; we will that many of the same ideas apply here, as well. How to Write Effectively
Slides |