Fast and accurate protein intrinsic disorder prediction by using a pretrained language model

No Thumbnail Available
File version
Author(s)
Song, Yidong
Yuan, Qianmu
Chen, Sheng
Chen, Ken
Zhou, Yaoqi
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2023
Size
File type(s)
Location
License
Abstract

Determining intrinsically disordered regions of proteins is essential for elucidating protein biological functions and the mechanisms of their associated diseases. As the gap between the number of experimentally determined protein structures and the number of protein sequences continues to grow exponentially, there is a need for developing an accurate and computationally efficient disorder predictor. However, current single-sequence-based methods are of low accuracy, while evolutionary profile-based methods are computationally intensive. Here, we proposed a fast and accurate protein disorder predictor LMDisorder that employed embedding generated by unsupervised pretrained language models as features. We showed that LMDisorder performs best in all single-sequence-based methods and is comparable or better than another language-model-based technique in four independent test sets, respectively. Furthermore, LMDisorder showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2. In addition, the high computation efficiency of LMDisorder enabled proteome-scale analysis of human, showing that proteins with high predicted disorder content were associated with specific biological functions. The datasets, the source codes, and the trained model are available at https://github.com/biomed-AI/LMDisorder.

Journal Title

Briefings in Bioinformatics

Conference Title
Book Title
Edition
Volume

24

Issue

4

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Biochemistry and cell biology

Bioinformatics and computational biology

Genetics

Science & Technology

Life Sciences & Biomedicine

Biochemical Research Methods

Mathematical & Computational Biology

Biochemistry & Molecular Biology

Persistent link to this record
Citation

Song, Y; Yuan, Q; Chen, S; Chen, K; Zhou, Y; Yang, Y, Fast and accurate protein intrinsic disorder prediction by using a pretrained language model, Briefings in Bioinformatics, 2023, 24 (4)

Collections