Budgeted Learning

Learning tasks typically begin with a data sample --- eg, symptoms and test results for a set of patients, together with their clinical outcomes. By contrast, many real-world studies begin with no actual data, but instead with a budget --- funds that can be used to collect the relevant information. For example, one study has allocated $30 thousand to develop a system to diagnose cancer, based on a battery of patient tests, each with its own (known) costs and (unknown) discriminative powers. Given our goal of identifying the most accurate classifier, what is the best way to spend the $30 thousand? Should we indiscriminately run every test on every patient, until exhausting the budget? Or, should we selectively, and dynamically, determine which tests to run on which patients? We call this task budgeted learning.

Our initial work on this task studied the theoretical foundations of budgeted learning and proved several important results, such as NP-hardness. Our other work builds upon the theory, and provides algorithms for budgeted learning a passive (Naive Bayes) classifer. Finally, our most recent extensions consider both learning and classifying under a budget, and thus budgeted-learn a bounded active classifier. As budgeted learning is a sequential decision problem, we also provide empirical results which demonstrate that the obvious Reinforcement Learning techniques do not perform particularly well on this high-dimensional and complex task, and are typically bested by our simpler, heuristic policies. A list of all these works and additional experiments is given below. (See also Open Problems.)

	Title	Authors	Summary	Appears In	Links
1	Active Model Selection	Omid Madani, Dan Lizotte, Russell Greiner	Explores the budgeted multi-armed bandit task.	UAI 2004	Details, or Paper
2	Reinforcement Learning for Active Model Selection	Aloak Kapoor, Russell Greiner	Compares RL to heuristic spending policies.	UBDM 2005 (KDD Workshop)	Details, or Paper
3	Budgeted Learning of Naive-Bayes Classifiers	Dan Lizotte, Omid Madani, Russell Greiner	Provides effective algorithms for budgeted learning a passive classifier.	UAI 2003	Details, or Paper
4	Learning and Classifying under Hard Budgets	Aloak Kapoor, Russell Greiner	Considers budgeted learning a bounded active classifier.	ECML 2005	Details, or Paper*
5	Using Value of Information to Learn and Classify under Hard Budgets	Russell Greiner	Short abstract summarizing budgeted learning results, in context of Value-of-Information.	VOI 2005 (NIPS Workshop)	Paper
6	Budgeted Learning of Naive Bayes Classifiers	Dan Lizotte	MSc dissertation Everything known about 1 and 3.		Dissertation
7	Learning and Classifying under Hard Budgets	Aloak Kapoor	MSc dissertation Everything known about 2 and 4.		Dissertation
8	Budgeted Distribution Learning in Parametric Models	Liuyang (Spike) Li Barnabas Poczos Csaba Szepesvári, Russell Greiner	Learning the parameters for belief net, to minimize expected KL divergence	ICML 2010	Paper
9	Actively Learning Generative Model	Liuyang (Spike) Li	MSc dissertation Everything known about 8.		Dissertation
10	Online Learning with Costly Features and Labels	N Zolghadr, G Bartok, R Greiner, C Szepesvari, A Gyorgy	On-line analysis of how many probes are needed	NIPS 2013	Paper
11	Probe Efficient Learning	Navid Zolghadr	MSc dissertation Everything known about 10.		Dissertation
12	The Budgeted Biomarker Discovery Problem: A Variant of Association Studies	S Khan, R Greiner	Efficiently identifying biomarkers	MAIHA 2014 (AAAI Workshop)	Paper
13	Budgeted Transcript Discovery: A Framework For Joint Exploration And Validation Studies	S Khan, R Greiner	Efficiently identifying biomarkers, and verify them	BIBM 2014	Paper
14	The Budgeted Biomarker Discovery Problem	Sheehan Khan	PhD dissertation Everything about 12, 13 + more.		Dissertation

Open Problems

Coins (Active Model Selection)
- Better heuristics -- eg, better BiasedRobin, with reordering
- Extend the formal analysis:
  - hardness of task if unit cost and unimodal distributions (beta)
  - approximatibility of other heuristics
Budgeted Learning of Classifiers
- Better (heuristic) algorithms
- Extend the formal analysis:
  - Hardness of task for Naive Bayes classifier
    if unit cost and unimodal distributions (beta)
  - PAC-bounds on "sample" size, wrt "probes".
    What if get to select ENTIRE feature vector at unit cost?
Budgeted Learning of Bounded Active Classifiers
- Better (heuristic?) algorithms
  - Given naive-bayes assumption, is there more efficient exact algorithm
- Extend the formal analysis: