Data Format for the RoBiC System
Data Format for the RoBiC System
Our RoBiC system accepts data in the same format as the
Plaid biclustering system:
Use three files, with extensions
.row, .col and .dat, following a common base name ---
eg,
MyData.row,
MyData.col,
and
MyData.dat.
- The file MyData.row
contains the names of the genes to be analyzed,
with one gene per line of input.
The gene names can contain spaces or tabs,
and can be up to 999 characters.
-
The file MyData.col
contains both the class and the name of each samples.
Each line should contain first the
class of sample,
followed by a space or tab and then the name of the sample itself.
Each row can be up to 999 characters long.
The biclustering part of RoBiC does not do any computation with
the gene (*.row) or sample (*.col) names;
the class name is used only in the subsequent classification-learning part,
and the gene and sample names are used to label the output.
It is possible for two or more of the gene (or sample) names to be identical.
- The file MyData.dat
should contain an n by p matrix of data values,
where there are n row names and p column names.
There must be exactly n rows of data in this file, one row per line.
Each row must contain exactly p numbers.
Missing values are not allowed.
The numbers within a row should be separated by spaces or tabs (not commas).
Of course, if necessary, we will also accept data in other formats.
For example, we have already produced written code to translate from
.arff (Weka)
format.
Return to main RoBiC page.