Other Approaches to finding Classifiers, based on BiClusters

This page presents other approaches that we have tested. Table (2) shows the results for each approach on the breast cancer data.

Approach 1: The current BiC(RoBiC,...) algorithm finds the biclusters using both the training set and the test set. Another possibility is to find the biclusters using the training set only. Then for each of the biclusters, B₁, …, B_K, build a classifier, C₁, …, C_K respectively, based on the expression values of the genes in that bicluster. Now, when each new sample comes in (the test set), run each classifiers C ∈ {C₁, …, C_K} on it to determine whether this sample belongs to this bicluster. This produces a bit-vector for each new sample. The rest of this model is the same as the RoBiC bicluster classifier.

Approach 2: This approach is similar to Approach 1 but with the addition of feature selection. That is, find the biclusters for the training set. Use feature selection to find a subset of genes for each bicluster, then build the classifiers based only on these selected genes. Use these selected genes or the union of the selected genes to predict if a new patient is in each bicluster or not.

Approach 3: In RoBiC, we use the biclusters to reduce the dimensionality of the data matrix. Instead, we can use the union of the genes in the found biclusters to reduce the dimensionality. That is, we learn a classifier based only on these genes in the union of all biclusters.

Approach 4: For finding the subset of samples, instead of looking at the elements at the beginning of the vector, look at the elements at the start and end of the vector. That is, find samples from index {1,…,j₁} and {j₂,…,p} in α (by best line approximation, for example). As mentioned earlier, when we find the β vector and sort it, most of the values at the beginning of the sorted vector corresponds to a specific class of samples, and (most of) the values at the end of the sorted vector corresponds to the other class of samples.
Then we can build the classifier matrix, R, such that each element r_ij of the matrix is:
- 1 if the j^th sample is in the k^th bicluster, and j ∈ {1,…,j₁},
- -1 if the j^th sample is in the k^th bicluster, and j ∈ {j₂,…,p},
- Otherwise, it is 0.

Approach 5: In addition to all of these approaches, we tried 1-dimensional clustering algorithms combined with classification algorithms, too. Here, we report on the best prediction accuracy that we found, based on K-mean clustering (K = 2).

Approach No. Breast Cancer

1 56.25%

2 73.75%

3 67.11%

4 76%

5 65.79%

BiC(RoBic,SVM) 90.79%±7.6

Table 2: Summary of the Results for Different Approaches on Breast Cancer data.

SVD Approaches: For description of SVD approaches click here.
Return to main RoBiC page.