Effective Statistical Features for Coding and Non-coding DNA Sequence Classification for Yeast, C. elegans and Human
Author(s)
Liew, AWC
Wu, Y
Yan, H
Yan, H
Yang, M
Griffith University Author(s)
Year published
2005
Metadata
Show full item recordAbstract
This study performs a quantitative evaluation of the different coding features in terms of their information content for the classification of coding and non-coding regions for three species. Our study indicated that coding features that are effective for yeast or C. elegans are generally not very effective for human, which has a short average exon length. By performing a correlation analysis, we identified a subset of human coding features with high discriminative power, but complementary in their information content. For this subset, a classification accuracy of up to 90% was obtained using a simple kNN classifier.This study performs a quantitative evaluation of the different coding features in terms of their information content for the classification of coding and non-coding regions for three species. Our study indicated that coding features that are effective for yeast or C. elegans are generally not very effective for human, which has a short average exon length. By performing a correlation analysis, we identified a subset of human coding features with high discriminative power, but complementary in their information content. For this subset, a classification accuracy of up to 90% was obtained using a simple kNN classifier.
View less >
View less >
Journal Title
International Journal of Bioinformatics Research and Applications
Volume
1
Issue
2
Publisher URI
Copyright Statement
© 2005 Inderscience Publishers. Please refer to the journal's website for access to the definitive, published version.
Subject
Mathematical sciences
Biological sciences
Information and computing sciences