SEGEM: A Fast and Accurate Automated Protein Backbone Structure Modeling Method for Cryo-EM
File version
Author(s)
Zhang, S
Li, X
Liu, Y
Yang, Y
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Houston, TX, USA
License
Abstract
Cryo-electron microscopy (cryo-EM) technique has been widely used in protein structure determination, whereas it remains a challenge to automatically build accurate protein backbone structure from cryo-EM density map. A typical pipeline to automatically build a structure model from cryo-EM map is to first predict mathrm{C}alpha sites and then assign them to protein sequence, which is a typical combinatorial optimization task of extremely high computational complexity. Here we propose SEGEM, a fast and accurate automated protein backbone structure modeling method for cryo-EM. We employed 3D Convolutional Neural Networks to predict mathrm{C}alpha sites with their amino acid types from cryo-EM, and developed a highly parallel pipeline to assign mathrm{C}alpha sites with their predicted amino acid types to protein sequence. We tested SEGEM on three benchmark datasets where it significantly outperformed several state-of-the-art prediction methods including MAINMAST, C-CNN and DeepTracer. In our method plus version SEGEM++, we combined SEGEM with the protein structure prediction algorithm AlphaFold2. SEGEM++is capable to identify whether AlphaFold2 folds a good structure, and rectify the incorrectly folded region through protein threading on cryo-EM map. In our curated dataset of hard targets where AlphaFold2 predicted structures obtained an average RMSD of 7. 87A and GDT-TS score of 0.652 when superimposed to the native structure, SEGEM++ achieved a significantly better RMSD of 2.46A and 0.676 GDT-TS score on average. Furthermore, with our highly parallel pipeline on 30 cores CPU, both SEGEM and SEGEM++ finished structure modeling within 10 minutes on average in our test datasets, indicating their potential in high throughout automated accurate backbone structure modeling for cryo-EM.
Journal Title
Conference Title
Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Clinical sciences
Proteomics and metabolomics
Persistent link to this record
Citation
Chen, S; Zhang, S; Li, X; Liu, Y; Yang, Y, SEGEM: A Fast and Accurate Automated Protein Backbone Structure Modeling Method for Cryo-EM, Proceedings - 2021 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021, 2021, pp. 24-31