Protein structure prediction from inaccurate and sparse NMR data using an enhanced genetic algorithm

View/ Open
Author(s)
Islam, Md Lisul
Shatabda, Swakkhar
Rashid, Mahmood A
Khan, MGM
Rahman, M Sohel
Griffith University Author(s)
Year published
2019
Metadata
Show full item recordAbstract
Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic ...
View more >Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic algorithm uses (i) a greedy mutation and crossover operator to intensify the search; (ii) a twin removal technique for diversification in the population; (iii) a random restart method to recover from stagnation; and (iv) a compaction factor to reduce the search space. Reducing the search space drastically, our approach improves the quality of the search. We tested our algorithms on a set of standard benchmarks. Experimentally, we show that our enhanced genetic algorithms significantly outperforms the traditional genetic algorithms and a previously proposed state-of-the-art method. Our method is capable of producing structures that are very close to the native structures and hence, the experimental biologists could adopt it to determine more accurate protein structures from NMR data.
View less >
View more >Nuclear Magnetic Resonance Spectroscopy (most commonly known as NMR Spectroscopy) is used to generate approximate and partial distances between pairs of atoms of the native structure of a protein. To predict protein structure from these partial distances by solving the Euclidean distance geometry problem from the partial distances obtained from NMR Spectroscopy, we can predict three-dimensional (3D) structure of a protein. In this paper, a new genetic algorithm is proposed to efficiently address the Euclidean distance geometry problem towards building 3D structure of a given protein applying NMR's sparse data. Our genetic algorithm uses (i) a greedy mutation and crossover operator to intensify the search; (ii) a twin removal technique for diversification in the population; (iii) a random restart method to recover from stagnation; and (iv) a compaction factor to reduce the search space. Reducing the search space drastically, our approach improves the quality of the search. We tested our algorithms on a set of standard benchmarks. Experimentally, we show that our enhanced genetic algorithms significantly outperforms the traditional genetic algorithms and a previously proposed state-of-the-art method. Our method is capable of producing structures that are very close to the native structures and hence, the experimental biologists could adopt it to determine more accurate protein structures from NMR data.
View less >
Journal Title
Computational Biology and Chemistry
Volume
79
Copyright Statement
© 2019 Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, providing that the work is properly cited.
Subject
Chemical sciences
Biological sciences
Protein structure prediction
Sparse data
Molecular distance geometry
Nuclear magnetic resonance spectroscopy
Genetic algorithms