Implementation and Timing

The RoBiC code (for finding the biclusters from M) is written in Matlab. A script, written in JAVA, transforms this resulting R matrix into Weka format (ARFF). We then use Weka (which is a collection of machine learning algorithms for data mining tasks) for building the classifier on the transformed data and evaluating the results.

We ran our BiC(RoBiC, ...) system on a Pentium 4 machine with 1.3G memory. Calculating SVD(M,1) is relatively fast --- only a few seconds in Matlab. The slowest part is computing the minimum error for two lines in the genes' vector, which is necessary to find the subset of genes for each bicluster. Altogether, finding each bicluster from a data matrix that has around 100 samples and 20,000 genes takes approximately 1 minute. The subsequent Weka computation required around 30 minutes; most of this time is due to the cost of the ``in-fold'' aspect of feature selection.