Fast and accurate protein intrinsic disorder prediction by using a pretrained language model
File version
Author(s)
Yuan, Qianmu
Chen, Sheng
Chen, Ken
Zhou, Yaoqi
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Determining intrinsically disordered regions of proteins is essential for elucidating protein biological functions and the mechanisms of their associated diseases. As the gap between the number of experimentally determined protein structures and the number of protein sequences continues to grow exponentially, there is a need for developing an accurate and computationally efficient disorder predictor. However, current single-sequence-based methods are of low accuracy, while evolutionary profile-based methods are computationally intensive. Here, we proposed a fast and accurate protein disorder predictor LMDisorder that employed embedding generated by unsupervised pretrained language models as features. We showed that LMDisorder performs best in all single-sequence-based methods and is comparable or better than another language-model-based technique in four independent test sets, respectively. Furthermore, LMDisorder showed equivalent or even better performance than the state-of-the-art profile-based technique SPOT-Disorder2. In addition, the high computation efficiency of LMDisorder enabled proteome-scale analysis of human, showing that proteins with high predicted disorder content were associated with specific biological functions. The datasets, the source codes, and the trained model are available at https://github.com/biomed-AI/LMDisorder.
Journal Title
Briefings in Bioinformatics
Conference Title
Book Title
Edition
Volume
24
Issue
4
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Biochemistry and cell biology
Bioinformatics and computational biology
Genetics
Science & Technology
Life Sciences & Biomedicine
Biochemical Research Methods
Mathematical & Computational Biology
Biochemistry & Molecular Biology
Persistent link to this record
Citation
Song, Y; Yuan, Q; Chen, S; Chen, K; Zhou, Y; Yang, Y, Fast and accurate protein intrinsic disorder prediction by using a pretrained language model, Briefings in Bioinformatics, 2023, 24 (4)