Clustered Gene Selection: Supplementary Materials

Department of
Computing Science
University of Alberta

Guohui Lin's Homepage

Clustered Gene Selection (CGS): Supplementary Materials

References

Z. Cai, R. Goebel, M. R. Salavatipour, and G.-H. Lin.
Selecting Dissimilar Genes for Multi-Class Classification: an Application in Cancer Subtyping.
BMC Bioinformatics.
8(2007): 206.
Z. Cai, R. Goebel, M. Salavatipour, Y. Shi, Lizhe Xu, and G.-H. Lin.
Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction.
In Proceedings of the Fifth Asia-Pacific Bioinformatics Conference (APBC 2007).
Hong Kong, 15-17 January 2007.
Pages 81-90.
(Acceptance rate: 35/104)
Z. Cai, L. Xu, Y. Shi, M. R. Salavatipour, R. Goebel, and G.-H. Lin.
Using Gene Clustering to Identify Discriminatory Genes with Higher Classification Accuracy.
In Proceedings of IEEE The 6th Symposium on Bioinformatics and Bioengineering (IEEE BIBE 2006).
Washington D.C., USA, October 16-18, 2006.
Pages 235-242.
(Acceptance rate: 33 full papers and 17 short papers out of 81 submissions.)

Release Notes

September 20, 2006: Here is the submitted APBC 2007 version. The final version is much shorter due to the page limit.
On March 23, 2006, the following supplementary materials were released:
- Eight microarray datasets (CAR, GLIOMA, LUNG, DLBCL, MLL, PROSTEATE, LEU, SRBCT) in MATLAB format (zipped, 15.1MB). These are the datasets we used in the experiments. You might want to check their original datasets.
- Results (zipped, 39KB) and plots (zipped, 170KB) of the 5-fold cross validation classification accuracies for Cho's, F-test, GS1, GS2, CGS-Cho's, CGS-F-test, CGS-GS1, and CGS-GS2 gene selection methods on all the eight datasets, in which the Pearson's Correlation Coefficient was used as the distance measure in the K-Means clustering algorithm.
- Results (zipped, 24KB)) and plots (zipped, 156KB)) of the leave-one-out cross validation classification accuracies for Cho's, F-test, GS1, GS2, CGS-Cho's, CGS-F-test, CGS-GS1, iand CGS-GS2 gene selection methods on all the eight datasets, in which the Pearson's Correlation Coefficient was used as the distance measure in the K-Means clustering algorithm.
- Results (zipped, 38KB) and plots (zipped, 170KB) of the 5-fold cross validation classification accuracies for Cho's, F-test, GS1, GS2, CGS-Cho's, CGS-F-test, CGS-GS1, and CGS-GS2 gene selection methods on all the eight datasets, in which the Euclidean distance was used in the K-Means clustering algorithm.
- Results (zipped, 24KB)) and plots (zipped, 156KB)) of the leave-one-out cross validation classification accuracies for Cho's, F-test, GS1, GS2, CGS-Cho's, CGS-F-test, CGS-GS1, iand CGS-GS2 gene selection methods on all the eight datasets, in which the Euclidean distance was used in the K-Means clustering algorithm.
- Results and plots using different combinations of K (80, 90, 100, 110, 120, 130, 140, 150) and C (1, 2, 3, 4, 5) on the Glioma dataset (zipped, 17KB).
- Quality of the Euclidean distance and the Pearson's correlation coefficient, tested on the GLIOMA dataset and the CAR dataset (zipped, 11KB).
- Feature genes selected by CGS-CHO method, collected in 500 subsets of 80 genes in the 5-fold cross validation (zipped, 6KB).
- Results and plots of the CGS based and non-CGS based gene selection methods, combined with the KNN-classifier and the SVM-classifier, in 5-Fold and LOO cross validations, on all the eight datasets (zipped, 14KB).
- Appendix of 24 common feature genes selected by CGS-CHO, CGS-F-test, CGS-GS1, and CGS-GS2 (zipped, 67KB).

Acknowledgments

This research is partially supported by AHFMR, AICML, CFI, iCORE, NSERC, and the University of Alberta.

Feedback

Please email ghlin[at]cs[dot]ualberta[dot]ca for any additional questions you might have.