SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Singh, Jaspreet
Litfin, Thomas
Singh, Jaswinder
Paliwal, Kuldip
Zhou, Yaoqi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2022
Size
File type(s)
Location
Abstract

Motivation: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. Results: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction.

Journal Title

Bioinformatics

Conference Title
Book Title
Edition
Volume

38

Issue

7

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© The Author(s) 2022. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Item Access Status
Note
Access the data
Related item(s)
Subject

Biological sciences

Information and computing sciences

Mathematical sciences

Science & Technology

Life Sciences & Biomedicine

Technology

Physical Sciences

Biochemical Research Methods

Persistent link to this record
Citation

Singh, J; Litfin, T; Singh, J; Paliwal, K; Zhou, Y, SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model, Bioinformatics, 2022, 38 (7), pp. 1888-1894

Collections