Simplified deep learning models for protein backbone angle prediction
Files
File version
Author(s)
Primary Supervisor
Sattar, Abdul
Other Supervisors
Newton, Muhammad A
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Protein structure prediction (PSP) is crucial for biomedical and biological research as it allows accurate prediction of protein structures based ly on their amino acid (AA) sequences. However, this presents a significant challenge in bioinformatics, particularly in drug design. While experimental methods such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), Cryo-electron microscopy (cryo-EM), and Small-angle X-ray scattering (SAXS) can determine protein structures, they are time-consuming, expensive, and often impractical. Recent advancements in machine learning have significantly improved PSP. One significant development is the remarkable success of AlphaFold in the Critical Assessment of Structure Prediction (CASP) competition. However, it is important to acknowledge that most recent methods heavily rely on extensive computational resources and large memory requirements. Interestingly, simpler methods have shown the potential to produce better results. This thesis aims to develop streamlined deep learning methods for PSP, with a specific focus on accurate backbone angle prediction (BAP). As proteins have backbone angles, the folding of proteins is predominantly influenced by backbone angles. Throughout this work, we address various challenges associated with accurate BAP. Our findings illustrate that the implementation of efficient deep learning approaches in PSP significantly enhances the accuracy of predicting protein structures. This thesis focuses on addressing four key challenges associated with protein backbone angles prediction. The first challenge involves exploring feature interactions and neural networks to strike a balance between correlated features and complex neural networks, which is essential for improving accuracy. The thesis introduces the "Simpler Angle Predictor (SAP)" approach, which utilises simplified deep neural network (DNN) models to enhance the accuracy of protein BAP, ensuring interpretability and resilience against noise. The second challenge is achieving generalisation within specific classes of training examples. Deep learning methods often sacrifice accuracy for generalisation. To address this, predicted secondary structure (SS) knowledge is explicitly integrated into BAP by training separate DNN models for each SS category. This approach limits generalization within specific training examples and utilizes specialized knowledge to compensate for the loss of generalisation. The method, called "Simpler Angle Predictor for SSs (SAP4SS)", builds upon SAP and shows improved performance. The third challenge is maximising the potential of informative features. Hidden Markov Models (HMM) profiles have been used in bioinformatics, including BAP methods, to extract patterns from protein sequences. However, their untapped potential in BAP has not been realised. This thesis proposes a simple DNN model that effectively utilises HMM profiles along with other features, surpassing the performance of previous methods. The fourth challenge is focusing on ensemble models to maximise the predictive capabilities of the BAP model. Ensemble models, which combine different models and inputs, have been successful in various areas. This thesis introduces four ensemble models that use existing BAPs as first-level predictors and employ simple fully connected neural networks (FCNN) as second-level predictors. This ensemble approach outperforms previous BAP methods.
Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Info & Comm Tech
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Note
Access the data
Related item(s)
Subject
Protein Structure Prediction (PSP)
protein backbone angle prediction
simplified deep learning models