Distance Maps for Prediction of Protein Structures and Protein-ligand Affinity

Loading...
Thumbnail Image
Files
Rahman_Julia_Final Thesis.pdf
Embargoed until 2025-07-24
File version
Primary Supervisor

Sattar, Abdul

Other Supervisors

Newton, Muhammad A

Editor(s)
Date
2024-07-24
Size
File type(s)
Location
License
Abstract

Understanding the three-dimensional structures of proteins and their interactions with drug molecules or ligands is crucial for rational drug design. The knowledge of a disease-linked protein enables the design of specific ligands that can bind to the protein, modulating its function to achieve therapeutic effects. Building on the significant advancements in the field, particularly those achieved by Deep-Mind's AlphaFold, this research explores distance map based methodologies to enhance and refine the determination of protein 3D structures and the prediction of protein-ligand binding affinity. Both aspects are crucial for the development of effective therapeutic agents.

Protein Structure Prediction (PSP) remains a persistent challenge in biological research due to the intricate and unique three-dimensional structures of proteins. Experimental methods face limitations such as high costs, time constraints, and occasional failures, contributing to a significant gap between vast protein sequence data and known structures. Computational methods have risen to prominence, offering a pathway to infer the tertiary structure of proteins from their amino acid sequences by identifying the most energetically favourable configurations. Among these computational strategies, the utilization of scoring functions like contact maps and distance maps has proven particularly effective for addressing the challenges of PSP. Contact maps represent the presence or absence of contact between amino acid pairs within a protein, serving as a boolean simplification of the more detailed distance maps, which quantify the exact spatial distances between these pairs. The precision of predicted protein structures is significantly influenced by the accuracy of distance map-based energy functions.

The progress in real-valued distance map prediction faces challenges, and this thesis addresses them and enhances accuracy through several techniques. Initially, we introduce a simplified real-valued distance predictor with a lightweight and cost-effective feature set, showing substantial improvement compared to existing predictors. Recognizing the significance of contact maps in PSP and their role as a boolean representation of distance maps, an ensemble method is developed to combine the outputs of three distance predictors for more accurate long-range contacts. A meta-ensemble distance predictor is designed to simultaneously obtain short and long-range distances by dividing the full distance range into three intervals, each predicted by separate models, and combining the results of these models using a meta layer to produce the final output. Finally, a real-valued to binned distance converter named Skewed Conversion (SC) allows the use of predicted distances in both real-valued and binned distance-based search strategies. The SC conversion improves the accuracy and quality of predicted protein structures by leveraging the benefits of bin probabilities.

In the drug discovery process, accurately predicting how strongly a drug (ligand) binds to its target protein (affinity) can greatly enhance the efficiency of developing new medications, saving both time and financial resources. However, achieving precise affinity predictions is a significant challenge that persists in research. This thesis introduces a novel approach by employing atomic-level distance-based features to understand the intricate details of protein-ligand interactions better. This method aims to increase the accuracy of predictions, offering a clearer insight into the complex dynamics at play in these crucial molecular interactions.

Throughout both PSP and affinity prediction problems, the thesis emphasizes the crucial role of distance maps. Employing various deep learning architectures implemented in Python, the proposed methods are rigorously tested using standard datasets, consistently outperforming state-of-the-art approaches.

Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type

Thesis (PhD Doctorate)

Degree Program

Doctor of Philosophy

School

School of Info & Comm Tech

Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

The author owns the copyright in this thesis, unless stated otherwise.

Item Access Status
Note
Access the data
Related item(s)
Subject

protein structures

protein-ligand affinity

distance maps

Persistent link to this record
Citation