Detecting proline and non-proline cis-isomers in protein structures from sequences using deep residual ensemble learning
File version
Author(s)
Hanson, Jack
Heffernan, Rhys
Paliwal, Kuldip
Yang, Yuedong
Zhou, Yaoqi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the sampling of rare protein conformations for ab initio protein structure prediction.
Journal Title
Journal of Chemical Information and Modeling
Conference Title
Book Title
Edition
Volume
58
Issue
9
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Medicinal and biomolecular chemistry
Theoretical and computational chemistry
Theoretical and computational chemistry not elsewhere classified