Other Approaches to finding Classifiers,
based on BiClusters
This page presents other approaches that we have tested. Table (2) shows the results for each approach on the
breast cancer data.
Approach 1: The current BiC(RoBiC,...) algorithm
finds the biclusters using both the training set and the test set.
Another possibility is to find the biclusters using the training set only.
Then for each of the biclusters, B_{1}, …, B_{K},
build a classifier, C_{1}, …, C_{K} respectively, based
on the expression values of the genes in that bicluster. Now, when
each new sample comes in (the test set), run each classifiers C ∈ {C_{1},
…, C_{K}} on it to determine whether this sample belongs to
this bicluster. This produces a bit-vector for each new sample. The
rest of this model is the same as the RoBiC bicluster classifier.
Approach 2: This approach is similar to Approach 1 but
with the addition of feature selection. That is,
find the biclusters for the training set.
Use feature selection to find a subset of genes for each bicluster,
then build the classifiers based only on these selected genes.
Use these selected genes or the union of the selected genes to predict
if a new patient is in each bicluster or not.
Approach 3: In RoBiC, we use the biclusters to reduce
the dimensionality of the data matrix. Instead, we can use the union of
the genes in the found biclusters to reduce the dimensionality. That
is, we learn a classifier based only on these genes in the union of all
biclusters.
Approach 4: For finding the subset of samples, instead
of looking at the elements at the beginning of the vector, look at the
elements at the start and end of the vector. That
is, find samples from index {1,…,j_{1}} and {j_{2},…,p}
in α (by best line approximation, for example). As mentioned earlier,
when we find the β vector and sort it, most of the values at the
beginning of the sorted vector corresponds to a specific class of
samples, and (most of) the values at the end of the sorted vector
corresponds to the other class of samples.
Then we can build the classifier matrix, R, such that each
element r_{ij} of the matrix is:
1 if the j^{th} sample is in the k^{th}
bicluster, and j ∈ {1,…,j_{1}},
-1 if the j^{th} sample
is in the k^{th} bicluster, and j ∈ {j_{2},…,p},
Otherwise, it is 0.
Approach 5: In addition to all of these approaches, we
tried 1-dimensional clustering algorithms combined with classification
algorithms, too.
Here, we report on the best prediction accuracy that we found,
based on K-mean clustering (K = 2).
Approach No.
Breast Cancer
1
56.25%
2
73.75%
3
67.11%
4
76%
5
65.79%
BiC(RoBic,SVM)
90.79%±7.6
Table 2: Summary of the Results for
Different Approaches on Breast Cancer data.
SVD Approaches: For description of SVD approaches click here.