To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map
File version
Submitted Manuscript (SM)
Author(s)
Sun, Zhe
Lin, Lihua
Liu, Zifeng
Liu, Xun
Chong, Yutian
Lu, Yutong
Zhao, Huiying
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF, respectively.
Journal Title
Journal of Chemical Information and Modeling
Conference Title
Book Title
Edition
Volume
60
Issue
1
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
This document is the Pre-Print: unedited Author’s version of a Published Work that appeared in final form in Journal of Chemical Information and Modeling, copyright 2020 American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see DOI
Item Access Status
Note
Access the data
Related item(s)
Subject
Medicinal and biomolecular chemistry
Theoretical and computational chemistry
Theory of computation
Science & Technology
Life Sciences & Biomedicine
Physical Sciences
Technology
Chemistry, Medicinal
Persistent link to this record
Citation
Chen, S; Sun, Z; Lin, L; Liu, Z; Liu, X; Chong, Y; Lu, Y; Zhao, H; Yang, Y, To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map, Journal of Chemical Information and Modeling, 2020, 60 (1), pp. 391-399