A Combination of Feature Extraction Methods with an Ensemble of Different Classifiers for Protein Structural Class Prediction Problem

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Dehzangi, Abdollah
Paliwal, Kuldip
Sharma, Alok
Dehzangi, Omid
Sattar, Abdul
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2013
Size
File type(s)
Location
License
Abstract

Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.

Journal Title

IEEE/ACM transactions on computational biology and bioinformatics

Conference Title
Book Title
Edition
Volume

10

Issue

3

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Item Access Status
Note
Access the data
Related item(s)
Subject

Mathematical sciences

Biological sciences

Information and computing sciences

Persistent link to this record
Citation
Collections