Prognostic Datasets  

P_{1}: BreastCancer ^{(1)}  P_{2}: AML (Outcome) ^{(2)} 
P_{3}: Central Nervous System  P_{4}: Prostate (Outcome) ^{(3)} 

Data  Ptr Data  Ptr Data  Ptr Data  Ptr Data  
Number of Samples  76  15  60  21  
Number of Genes/Attributes  23625  7129  7129  12600  
Class Distribution  relapse (32) / nonrelapse (44) 
fail (8) / success (7) 
fail (39) / survive (21) 
recurrent (8) / nonrecurrent (13) 

Base %  57.9%  53.3%  65%  61.9%  
Original Result  73%  78%  90%  
Naive Bayes  63.18%  46.67%  63.33  47.62%  
SVM  67.11%  53.33%  65%  47.62%  
PAM ^{(4)} δ = 0 
67.11%  53.33%  65%  47.62%  
BiC(Plaid ^{(5)} ,NB) # of biclusters 
63.16% 1 
73.33% 19 
65% 1 
71.43% 6 

BiC(Plaid,SVM) # of biclusters 
63.16% 1 
60% 10 
65% 1 
61.9% 2 

BiC(RoBiC, NB) # of biclusters 
88.16 ±10.11 % 3 
80 ±18.2 % 30 
95 ±7.5 % 2 
76.19 ±22.8 % 1 

BiC(RoBiC,SVM) # of biclusters^{(6)} 
90.79 ±7.6 % 2 
80 ±18.2 % 16 
95 ±7.5 % 2 
85.71 ±12.0 % 13 

Permutation Test (1000) Average # above Bic(RobiC,SVM) value 
54.08%

51.66%

60.5%

56.77%

^{(1)} We removed patients #9 and #10 (which had lots of missing genes: 12,632), and also removed the genes whose values were missing for at least one of the remaining patients (after removing #9 and #10). This meant removing 856 of the 24,481 genes.
^{(2)} This dataset is for predicting the clinical outcome for AML samples.
^{(3)} This dataset is for predicting the clinical outcome (recurrent/nonrecurrent).
^{(4)}
PAM is one of the standard
algorithms for learning classifiers for microarray data.
We empirically
found setting δ = 0 worked best for the
Breast Cancer dataset;
we continued to use this setting for the other data.
^{(5)} Plaid is a biclustering algorithm for microarray data.
^{(6)} RoBiC bicluster characteristics can be found here.
Return to main RoBiC page.