Simplified deep learning models for protein backbone angle prediction

Loading...
Thumbnail Image
Files
Mataeimoghadam_Fereshteh_Final Thesis_Redacted.pdf
Embargoed until 2025-02-14
File version
Primary Supervisor

Sattar, Abdul

Other Supervisors

Newton, Muhammad A

Editor(s)
Date
2024-02-14
Size
File type(s)
Location
License
Abstract

Protein structure prediction (PSP) is crucial for biomedical and biological research as it allows accurate prediction of protein structures based ly on their amino acid (AA) sequences. However, this presents a significant challenge in bioinformatics, particularly in drug design. While experimental methods such as X-ray crystallography, Nuclear Magnetic Resonance (NMR), Cryo-electron microscopy (cryo-EM), and Small-angle X-ray scattering (SAXS) can determine protein structures, they are time-consuming, expensive, and often impractical. Recent advancements in machine learning have significantly improved PSP. One significant development is the remarkable success of AlphaFold in the Critical Assessment of Structure Prediction (CASP) competition. However, it is important to acknowledge that most recent methods heavily rely on extensive computational resources and large memory requirements. Interestingly, simpler methods have shown the potential to produce better results. This thesis aims to develop streamlined deep learning methods for PSP, with a specific focus on accurate backbone angle prediction (BAP). As proteins have backbone angles, the folding of proteins is predominantly influenced by backbone angles. Throughout this work, we address various challenges associated with accurate BAP. Our findings illustrate that the implementation of efficient deep learning approaches in PSP significantly enhances the accuracy of predicting protein structures. This thesis focuses on addressing four key challenges associated with protein backbone angles prediction. The first challenge involves exploring feature interactions and neural networks to strike a balance between correlated features and complex neural networks, which is essential for improving accuracy. The thesis introduces the "Simpler Angle Predictor (SAP)" approach, which utilises simplified deep neural network (DNN) models to enhance the accuracy of protein BAP, ensuring interpretability and resilience against noise. The second challenge is achieving generalisation within specific classes of training examples. Deep learning methods often sacrifice accuracy for generalisation. To address this, predicted secondary structure (SS) knowledge is explicitly integrated into BAP by training separate DNN models for each SS category. This approach limits generalization within specific training examples and utilizes specialized knowledge to compensate for the loss of generalisation. The method, called "Simpler Angle Predictor for SSs (SAP4SS)", builds upon SAP and shows improved performance. The third challenge is maximising the potential of informative features. Hidden Markov Models (HMM) profiles have been used in bioinformatics, including BAP methods, to extract patterns from protein sequences. However, their untapped potential in BAP has not been realised. This thesis proposes a simple DNN model that effectively utilises HMM profiles along with other features, surpassing the performance of previous methods. The fourth challenge is focusing on ensemble models to maximise the predictive capabilities of the BAP model. Ensemble models, which combine different models and inputs, have been successful in various areas. This thesis introduces four ensemble models that use existing BAPs as first-level predictors and employ simple fully connected neural networks (FCNN) as second-level predictors. This ensemble approach outperforms previous BAP methods.

Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type

Thesis (PhD Doctorate)

Degree Program

Doctor of Philosophy (PhD)

School

School of Info & Comm Tech

Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

The author owns the copyright in this thesis, unless stated otherwise.

Item Access Status
Note
Access the data
Related item(s)
Subject

Protein Structure Prediction (PSP)

protein backbone angle prediction

simplified deep learning models

Persistent link to this record
Citation