SPOT-1D-Single: Improving the Single-Sequence-Based Prediction of Protein Secondary Structure, Backbone Angles, Solvent Accessibility and Half-Sphere Exposures using a Large Training Set and Ensembled Deep Learning

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Singh, Jaspreet
Litfin, Thomas
Paliwal, Kuldip
Singh, Jaswinder
Hanumanthappa, Anil Kumar
Zhou, Yaoqi
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2021
Size
File type(s)
Location
License
Abstract

MOTIVATION: Knowing protein secondary and other one-dimensional structural properties are essential for accurate protein structure and function prediction. As a result, many methods have been developed for predicting these one-dimensional structural properties. However, most methods relied on evolutionary information that may not exist for many proteins due to a lack of sequence homologs. Moreover, it is computationally intensive for obtaining evolutionary information as the library of protein sequences continues to expand exponentially. Here we developed a new single-sequence method called SPOT-1D-Single based on a large training dataset of 39120 proteins deposited prior to 2016 and an ensemble of hybrid Long-Short-Term-Memory bidirectional neural network and convolutional neural network. RESULTS: We showed that SPOT-1D-Single consistently improves over SPIDER3-Single and ProteinUnet for secondary structure, solvent accessibility, contact number, and backbone angles prediction for all seven independent test sets (TEST2018, SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ, CASP12, and CASP13 free-modeling targets). For example, the predicted three-state secondary structure's accuracy ranges from 72.12-74.28% by SPOT-1D-Single, compared to 69.1-72.6% by SPIDER3-Single and 70.6-73% by ProteinUnet. SPOT-1D-Single also predicts SS3 and SS8 with 6.24% and 6.98% better accuracy than SPOT-1D on SPOT-2018 proteins with no homologs (Neff=1), respectively. The new method's improvement over existing techniques is due to a larger training set combined with ensembled learning. AVAILABILITY: Standalone-version of SPOT-1D-Single is available at https://github.com/jas-preet/SPOT-1D-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-1d-single. The datasets used in this research can also be downloaded from GitHub.

Journal Title

Bioinformatics

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)

ARC

Grant identifier(s)

DP210101875

Rights Statement
Rights Statement

© 2021 Oxford University Press. This is a pre-copy-editing, author-produced PDF of an article accepted for publication in Bioinformatics following peer review. The definitive publisher-authenticated version SPOT-1D-Single: Improving the Single-Sequence-Based Prediction of Protein Secondary Structure, Backbone Angles, Solvent Accessibility and Half-Sphere Exposures using a Large Training Set and Ensembled Deep Learning, Bioinformatics, 2021 is available online at: https://doi.org/10.1093/bioinformatics/btab316.

Item Access Status
Note

This publication has been entered as an advanced online version in Griffith Research Online.

Access the data
Related item(s)
Subject

Mathematical sciences

Biological sciences

Information and computing sciences

Persistent link to this record
Citation

Singh, J; Litfin, T; Paliwal, K; Singh, J; Hanumanthappa, AK; Zhou, Y, SPOT-1D-Single: Improving the Single-Sequence-Based Prediction of Protein Secondary Structure, Backbone Angles, Solvent Accessibility and Half-Sphere Exposures using a Large Training Set and Ensembled Deep Learnin., Bioinformatics, 2021

Collections