Prognostic Datasets | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P1: BreastCancer (1) | P2: AML (Outcome) (2) |
P3: Central Nervous System | P4: Prostate (Outcome) (3) |
|||||||||
Data | Ptr Data | Ptr Data | Ptr Data | Ptr Data | ||||||||
Number of Samples | 76 | 15 | 60 | 21 | ||||||||
Number of Genes/Attributes | 23625 | 7129 | 7129 | 12600 | ||||||||
Class Distribution | relapse (32) / non-relapse (44) |
fail (8) / success (7) |
fail (39) / survive (21) |
recurrent (8) / non-recurrent (13) |
||||||||
Base % | 57.9% | 53.3% | 65% | 61.9% | ||||||||
Original Result | 73% | 78% | 90% | |||||||||
Naive Bayes | 63.18% | 46.67% | 63.33 | 47.62% | ||||||||
SVM | 67.11% | 53.33% | 65% | 47.62% | ||||||||
PAM (4) δ = 0 |
67.11% | 53.33% | 65% | 47.62% | ||||||||
BiC(Plaid (5) ,NB) # of biclusters |
63.16% 1 |
73.33% 19 |
65% 1 |
71.43% 6 |
||||||||
BiC(Plaid,SVM) # of biclusters |
63.16% 1 |
60% 10 |
65% 1 |
61.9% 2 |
||||||||
BiC(RoBiC, NB) # of biclusters |
88.16 ±10.11 % 3 |
80 ±18.2 % 30 |
95 ±7.5 % 2 |
76.19 ±22.8 % 1 |
||||||||
BiC(RoBiC,SVM) # of biclusters(6) |
90.79 ±7.6 % 2 |
80 ±18.2 % 16 |
95 ±7.5 % 2 |
85.71 ±12.0 % 13 |
||||||||
Permutation Test (1000) Average # above Bic(RobiC,SVM) value |
54.08%
|
51.66%
|
60.5%
|
56.77%
|
(1) We removed patients #9 and #10 (which had lots of missing genes: 12,632), and also removed the genes whose values were missing for at least one of the remaining patients (after removing #9 and #10). This meant removing 856 of the 24,481 genes.
(2) This dataset is for predicting the clinical outcome for AML samples.
(3) This dataset is for predicting the clinical outcome (recurrent/non-recurrent).
(4)
PAM is one of the standard
algorithms for learning classifiers for microarray data.
We empirically
found setting δ = 0 worked best for the
Breast Cancer dataset;
we continued to use this setting for the other data.
(5) Plaid is a biclustering algorithm for microarray data.
(6) RoBiC bicluster characteristics can be found here.
Return to main RoBiC page.