Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting

No Thumbnail Available
File version
Author(s)
Lv, Xuan
Chen, Jianwen
Lu, Yutong
Chen, Zhiguang
Xiao, Nong
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2020
Size
File type(s)
Location
License
Abstract

Accurately predicting the impact of point mutation on protein stability has crucial roles in protein design and engineering. In this study, we proposed a novel method (BoostDDG) to predict stability changes upon point mutations from protein sequences based on the extreme gradient boosting. We extracted features comprehensively from evolutional information and predicted structures and performed feature selection by a strategy of sequential forward selection. The features and parameters were optimized by homologue-based cross-validation to avoid overfitting. Finally, we found that 14 features from six groups led to the highest Pearson correlation coefficient (PCC) of 0.535, which is consistent with the 0.540 on an independent test. Our method was indicated to consistently outperform other sequence-based methods on three precompiled test sets, and 7363 variants on two proteins (PTEN and TPMT). These results highlighted that BoostDDG is a powerful tool for predicting stability changes upon point mutations from protein sequences.

Journal Title

Journal of Chemical Information and Modeling

Conference Title
Book Title
Edition
Volume

60

Issue

4

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Medicinal and biomolecular chemistry

Theoretical and computational chemistry

Science & Technology

Life Sciences & Biomedicine

Physical Sciences

Chemistry, Medicinal

Persistent link to this record
Citation

Lv, X; Chen, J; Lu, Y; Chen, Z; Xiao, N; Yang, Y, Accurately Predicting Mutation-Caused Stability Changes from Protein Sequences Using Extreme Gradient Boosting, Journal of Chemical Information and Modeling, 2020, 60 (4), pp. 2388-2395

Collections