Structure-based prediction of protein-peptide binding regions using Random Forest
Embargoed until: 2018-09-01
MetadataShow full item record
Motivation: Protein–peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein–peptide interactions can be analyzed by studying the structures of protein–peptide complexes. However, only a small portion has known complex structures and experimental determination of protein–peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein–Peptide Residue-level Interaction) to use structural information for predicting protein–peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm. Results: SPRINT-Str achieves robust and consistent results for prediction of protein–peptide binding regions in terms of residues and sites. Matthews’ Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20–116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method‘s capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability.
© 2017 Oxford University Press. This is a pre-copy-editing, author-produced PDF of an article accepted for publication in Bioinformatics following peer review. The definitive publisher-authenticated version Structure-based prediction of protein– peptide binding regions using Random Forest, Bioinformatics, 1-8, 2017 is available online at: DOI:10.1093/bioinformatics/btx614
This publication has been entered into Griffith Research Online as an Advanced Online Version.
Information and Computing Sciences not elsewhere classified
Biological Sciences not elsewhere classified