Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting
File version
Author(s)
Chen, Jianwen
Lu, Yutong
Chen, Zhiguang
Xiao, Nong
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.
Journal Title
Journal of Chemical Information and Modeling
Conference Title
Book Title
Edition
Volume
60
Issue
4
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Medicinal and biomolecular chemistry
Theoretical and computational chemistry
Science & Technology
Life Sciences & Biomedicine
Physical Sciences
Chemistry, Medicinal
Persistent link to this record
Citation
Lv, X; Chen, J; Lu, Y; Chen, Z; Xiao, N; Yang, Y, Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting, Journal of Chemical Information and Modeling, 2020, 60 (4), pp. 2388-2395