Summary of RoBiC Data/Results - Prognostic Datasets

See also Diagnostic datasets.
Data taken from > Kent Ridge Bio-medical Data Set Repository.


Prognostic Datasets

P1: BreastCancer (1)  P2: AML
(Outcome)
(2)
P3: Central Nervous System P4: Prostate
(Outcome)
(3)
Data Ptr Data Ptr Data Ptr Data Ptr Data
Number of Samples 76 15 60 21
Number of Genes/Attributes 23625 7129 7129 12600
Class Distribution relapse (32) /
non-relapse (44)
fail (8) /
success (7)
fail (39) /
survive (21)
recurrent (8) /
non-recurrent (13)
Base % 57.9% 53.3% 65% 61.9%
Original Result 73%
78% 90%
Naive Bayes 63.18% 46.67% 63.33 47.62%
SVM 67.11% 53.33% 65% 47.62%
PAM (4) 
δ = 0
67.11% 53.33% 65% 47.62%
BiC(Plaid (5) ,NB)
# of biclusters
63.16%
1
73.33%
19
65%
1
71.43%
6
BiC(Plaid,SVM)
# of biclusters
63.16%
1
60%
10
65%
1
61.9%
2
BiC(RoBiC, NB)
# of biclusters
88.16 ±10.11 %
3
80 ±18.2 %
30
95 ±7.5 %
2
76.19 ±22.8 %
1
BiC(RoBiC,SVM)
# of biclusters(6)
90.79 ±7.6 %
2
80 ±18.2 %
16
95 ±7.5 %
2
85.71 ±12.0 %
13
Permutation Test (1000)

Average

# above Bic(RobiC,SVM) value

54.08%

0

1000


51.66%

27

1000


60.5%

0

1000


56.77%

22

1000

(1) We removed patients #9 and #10 (which had lots of missing genes: 12,632), and also removed the genes whose values were missing for at least one of the remaining patients (after removing #9 and #10). This meant removing 856 of the 24,481 genes.

(2) This dataset is for predicting the clinical outcome for AML samples.

(3) This dataset is for predicting the clinical outcome (recurrent/non-recurrent).

(4) PAM is one of the standard algorithms for learning classifiers for microarray data.
We empirically found setting δ = 0 worked best for the Breast Cancer dataset; we continued to use this setting for the other data.

(5) Plaid is a biclustering algorithm for microarray data. 

(6) RoBiC bicluster characteristics can be found here.


Return to main RoBiC page.