# Other Hinge Functions

RoBiC find a subset of genes and a subset of patients that have similar patterns within the given n × p (gene * patient) matrix M.
It first finds two vectors, α and β of size |α| = n and |β| = p, such that α S.βT is the best  rank-one approximation of M . (Here, α represents the gene's vector and β represents the patient's vector.)   It then sorts the values of α (resp., β), producing α(s) and β(s).  See the top row of Figure 1 below.

Finally, RoBiC identifies the bicluster with the best prefixes, based on some "hinge" function", that separates genes {g1, ..., gk} from the remaining {gk+1, ..., gn}, and separates patients {s1, ..., sm} from {sm+1, ..., sp}.   The challenge is finding the appropriate hinge.

The current RoBiC hinge function finds the best-fitting pair of lines:
For each i=1..n,
• find the best-fit line from 1:i and the best fit line from i+1 to n
• let err(i) be sum of residual errors
Select the k = argmini { err(i) }  that minimizes err(i).

See bottom row of Figure 0.

Figure 0: [Top] Sorted α(s) (resp., β(s), β(s)) values corresponds to genes (resp., patients, patients), with "best-fit two lines" superimposed.
[Bottom]~Error values, for each position. All figures are from the first bicluster; left 2 columns are from the Breast Cancer P1 dataset, while the far right is from Lung Cancer D3.

This page considers a number of variants of this basic hinge function.
Table (1) provides prediction accuracy for different models on Breast Cancer data.
We considered three classes of hinge functions: the two models in Class A are based on partial matrices, the two models in Class B are based on simple statistical measures, and the 5 models in Class 3 all use the component values of the actual eigenvectors:
In all models, we sort the matrix, M, according to the values in α(s) and β(s) vectors.

• Model A1: In this model, we use the original α and β from matrix M. We increase i from 1 to n for genes (500 at a time, because of the large number of genes that we have in the microarray data), and j from 1 to p to get partial matrix, Mpart, and partial vectors αpart and βpart. For each (i,j) we calculate the residual error between Mpart and αpartpartT. This produces a residual matrix :

 E(i,j) = M(1..i,1..j) - α(1..i).β(1..j)T .
We used three different residual error calculations:
• (a): the largest eigenvalue of matrix E(i,j).  (This is based on the Matlab norm, LEV)
• (b): the sum of the square of the elements in the residual matrix E.
• (c): the second largest eigenvalue of matrix E.
The goal here was to find some i and j such that the residual error for i ×j sub-matrix is much smaller than for the next sub-matrix ((i+1) ×j or i ×(j+1)).
Figure (1) shows the residual error for breast cancer data using model 1.a . It is clear from the figure that we can not find such i and j easily.
• Model A2: Similar to model A1, we increase i from 1 to n for genes (500 at a time), and j from 1 to p to get partial matrix, Mpart. But at each step, we calculate the best rank-1 approximation to this matrix, i.e. αpart and βpart for this partial matrix. Then we calculate the residual error at each step. Figure (2) shows the residual error for breast cancer data using model 2.a. We could not find such i and j from this figure either.
• We also considered variations of models A1 and A2, where instead of looking at the largest eigenvalues in the residual matrix, we looked at the second largest eigenvalues in the residual matrix (divided by the size of the matrix). Figure (3) shows the residual error for breast cancer data using model A1.c.

In the next two sets of models, instead of looking at the partial matrices, we examined the components of the vectors to find the bi-clusters. These approaches are similar to the algorithm in the paper, with different hinge functions for finding the bi-clusters.
• Model B1: Find the separation points for genes (resp., patients) by first finding the average of the top 3 values in α(s) (resp,. β(s)), then choosing the genes (resp., patients) whose values are higher than half of this average.
• Model B2: Find the separation points for genes, i, by first finding the mean and variance (μ and σ) for the second half of the genes in α(s), then choosing the genes whose values are more than μ+ 10σ. Find the separation point for patients, j, such that
 ||M(1..i, j) - α(1..i).S.βj||
is minimum.

We can describe the hinge functions in Models C1-C5 using the function:
Hinge( PolyDegree, NumPoly, sign, patientFirst, avoidOutliers )
where
• PolyDegree  (1,2) describes the degree of the polynomial used.
1 (for line) or 2 (for parabola)
• NumPoly  (1,2) describes the number of polynomials to be used for the hinge function. It can be 1 (for one line, from 1:i) or 2 (for 2 lines -- from 1:i and i+1 : n)
• sign (+,-) is + the hinge function uses the absolute values in the vectors, or - if the hinge function used the original values in the vectors,
• patientFirst (true,false) is true if the algorithm first finds subset of patients and subtracts off their values before finding the subset of genes  from the remainder. It is false if the algorithms finds the subset of patients and genes at the same time. See Model C3 for the details
• avoidOutliers   (true,false) is true if, instead of finding the second polynomial from (i+1) to n, it instead considers only from (i+1) to (i+10), to avoid any outliers at the end. See Model C4 for more details.
Note that the RoBiC system uses Hinge(1, 2, -, false, false). Here is the description of other models:
• Model C1 = Hinge(1, 2, +, false, false) differs from RoBiC only by using the absolute values in α and β vectors, i.e. use |α(s)| and |β(s)|, for finding the best two lines for α(s) and for β(s) that fit into their values.
• Model C2 = Hinge(1, 1,± , false, false) differs from RoBiC only by approximating the values in the patients vector and the genes vector by a single best line. That is, in β(s) (the sorted patients vector), find the best fitting line for values β(s)1 to β(s)j, which has error ej.
 ej = error (best line for (β(s)1, …, β(s)j)).
Then choose j, the separation point for patients, such that ej is minimum. Therefore, the patients from index 1 to j are in the current bicluster. Find the genes' separation point, i, using the same procedure.
We also tried this approach by using the absolute values in the vectors as well.
• Model C3 = Hinge(1, {1,2}, ±, true, false) finds the subset of patients first and then the subset of genes. We have tried this approach using both exact values and absolute values in α(s) and β(s). We also tried it by approximating the vectors with two lines or a single line. Here is the details of this model:
It fits the values in the patients vector by a single best line (or two lines), using the exact values (or absolute values) in α(s) and β(s). This process gives j, the separation point for patients. Therefore, the patients from index 1 to j are in the current bi-cluster. Subtract their values from the data matrix. Now calculate α vector from this remainder matrix and choose the subset of genes from this α, with the same function that finds the subset of patients. Subtract the values of the genes in the current bi-cluster from the matrix and repeat the process on the remainder of the matrix.
• Model C4 = Hinge(1, 2, ±, false, true) differs from RoBiC only by finding the best two lines in α(s) from index 1 to i and from i+1 to i+10. Similarly for β(s), it finds the best two lines from 1 to j and j+1 to j+10. We hoped this algorithm would help us avoid outliers.
• Model C5 = Hinge(2, 2, -, false, false) differs from RoBiC only by using the the best two parabolas that fit  the values in α(s) and β(s).

Table (1) shows the prediction accuracy for each model on the breast cancer data, using a SVM classifier,based on 5-fold cross-validation.  Model A1 and A2 are not presented in this table, because we could not find the separation point for patients and genes. While model C3b actually does best for this particular dataset, it was not the best for the others.
 Model Hinge Parameters Breast Cancer B1 69.74% B2 53.94% C1 Hinge(1, 2, +, false, false) 75.00% C2 a Hinge(1, 1, -, false, false) 86.84% C2 b Hinge(1, 1, +, false, false) 86.84% C3 a Hinge(1, 1, -, true, false) 67.11% C3 b Hinge(1, 1, +, true, false) 96.05% C3 c Hinge(1, 2, -, true, false) 69.73% C3 d Hinge(1, 2, +, true, false) 69.73% C4 a Hinge(1, 2, -, false, true) 77.63% C4 b Hinge(1, 2, +, false, true) 75.00% C5 Hinge(2, 2, -, false, false) 78.95% BiC(RoBic,SVM) Hinge(1, 2, -, false, false) 90.79%±7.6
Table 1: Summary of the Results for Different Models.