Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning

No Thumbnail Available
File version
Author(s)
Yuan, Qianmu
Chen, Sheng
Wang, Yu
Zhao, Huiying
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2022
Size
File type(s)
Location
License
Abstract

More than one-third of the proteins contain metal ions in the Protein Data Bank. Correct identification of metal ion-binding residues is important for understanding protein functions and designing novel drugs. Due to the small size and high versatility of metal ions, it remains challenging to computationally predict their binding sites from protein sequence. Existing sequence-based methods are of low accuracy due to the lack of structural information, and time-consuming owing to the usage of multi-sequence alignment. Here, we propose LMetalSite, an alignment-free sequence-based predictor for binding sites of the four most frequently seen metal ions in BioLiP (Zn2+, Ca2+, Mg2+ and Mn2+). LMetalSite leverages the pretrained language model to rapidly generate informative sequence representations and employs transformer to capture long-range dependencies. Multi-task learning is adopted to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions. LMetalSite was shown to surpass state-of-the-art structure-based methods by more than 19.7, 14.4, 36.8 and 12.6% in area under the precision recall on the four independent tests, respectively. Further analyses indicated that the self-attention modules are effective to learn the structural contexts of residues from protein sequence. We provide the data sets, source codes and trained models of LMetalSite at https://github.com/biomed-AI/LMetalSite.

Journal Title

Briefings in Bioinformatics

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Bioinformatics and computational biology

Science & Technology

Life Sciences & Biomedicine

Biochemical Research Methods

Mathematical & Computational Biology

Biochemistry & Molecular Biology

Persistent link to this record
Citation

Yuan, Q; Chen, S; Wang, Y; Zhao, H; Yang, Y, Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning, Briefings in Bioinformatics, 2022, pp. bbac444

Collections