Detecting proline and non-proline cis-isomers in protein structures from sequences using deep residual ensemble learning

No Thumbnail Available
File version
Author(s)
Singh, Jaswinder
Hanson, Jack
Heffernan, Rhys
Paliwal, Kuldip
Yang, Yuedong
Zhou, Yaoqi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2018
Size
File type(s)
Location
License
Abstract

It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the sampling of rare protein conformations for ab initio protein structure prediction.

Journal Title

Journal of Chemical Information and Modeling

Conference Title
Book Title
Edition
Volume

58

Issue

9

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Medicinal and biomolecular chemistry

Theoretical and computational chemistry

Theoretical and computational chemistry not elsewhere classified

Persistent link to this record
Citation
Collections