Selection of Statistical Features Based on Mutual Information for Classification of Human Coding and Non-coding DNA Sequences
MetadataShow full item record
The classification of human gene sequences into exons and introns is an important but difficult problem. We study the discriminative power of various statistical features (22 in total) in term of their mutual information (MI). By performing correlation analysis, we are able to identify a set of features that has high MI value while at the same time is complementary in their information content. Using the set of features, which consists of the three SZ features, the AMI feature, and the first stop codon feature, we are able to achieve classification accuracy as high as 92%.
Proceedings of the 17th International Conference on Pattern Recognition
Copyright 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.