On the Identification of Correlated Differential Features for Supervised Classification of High-Dimensional Data
MetadataShow full item record
Many real problems in supervised classification involve high-dimensional feature data measured for individuals of known origin from two or more classes. When the dimension of the feature vector is very large relative to the number of individuals, it presents formidable challenges to construct a discriminant rule (classifier) for assigning an unclassified individual to one of the known classes. One way to handle this high-dimensional problem is to identify highly relevant differential features for constructing a classifier. Here a new approach is considered, where a mixture model with random effects is used firstly to partition the features into clusters and then the relevance of each feature variable for differentiating the classes is formally tested and ranked using cluster-specific contrasts of mixed effects. Finally, a non-parametric clustering approach is adopted to identify networks of differential features that are highly correlated. The method is illustrated using a publicly available data set in cancer research for the discovery of correlated biomarkers relevant to the cancer diagnosis and prognosis.
Data Science: Innovative Developments in Data Analysis and Clustering
Medical and Health Sciences not elsewhere classified