Using cluster analysis to improve gene selection in the formation of discriminant rules for the prediction of disease outcomes
The identification of genes that have different expression levels in a known number of distinct disease phenotypes contributes significantly to the construction of a discriminant rule (classifier) for predicting the class of origin of an unclassified tissue sample. Existing methods for detecting differentially-expressed genes are mainly based on multiple hypothesis tests. Clustering-based approaches either work on gene-specific summary statistics or reduced forms of gene-expression profiles. Advancement in clustering-based approaches that work on full profiling data has been minor, due to the methodological barriers for assessing differential expression between tissue classes from identified clusters of genes. In this paper, we adopt a clustering-based approach, which works on full gene-expression profiles and draws inference on differential expression using weighted contrasts of mixed effects. With a real published gene-expression data set, we show that the proposed clustering-based approach can provide a list of marker genes that improves the prediction of disease outcomes. Comparisons with existing methods are also provided using simulated data.
Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013