SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model
File version
Version of Record (VoR)
Author(s)
Litfin, Thomas
Singh, Jaswinder
Paliwal, Kuldip
Zhou, Yaoqi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
Motivation: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. Results: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction.
Journal Title
Bioinformatics
Conference Title
Book Title
Edition
Volume
38
Issue
7
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Item Access Status
Note
Access the data
Related item(s)
Subject
Biological sciences
Information and computing sciences
Mathematical sciences
Science & Technology
Life Sciences & Biomedicine
Technology
Physical Sciences
Biochemical Research Methods
Persistent link to this record
Citation
Singh, J; Litfin, T; Singh, J; Paliwal, K; Zhou, Y, SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model, Bioinformatics, 2022, 38 (7), pp. 1888-1894