Enhanced Feature Extraction from Evolutionary Profiles for Protein Fold Recognition

View/ Open
Author(s)
Primary Supervisor
Paliwal, Kuldip
Other Supervisors
So, Stephen
Year published
2016
Metadata
Show full item recordAbstract
Proteins are important biological macromolecules that play important roles in al- most all biological reactions. The function of a protein is dependent on the shape it folds in to, which is in turn dependent on the protein’s amino acid sequence. Ex- perimental approaches for determining a protein’s 3D structure are expensive and time consuming, so computational methods for determining the structure from the amino acid sequence are desired. Methods for directly computing the 3D structure of a protein exist, however they are impractical for large proteins and high resolution models due to the large search space. Instead of ...
View more >Proteins are important biological macromolecules that play important roles in al- most all biological reactions. The function of a protein is dependent on the shape it folds in to, which is in turn dependent on the protein’s amino acid sequence. Ex- perimental approaches for determining a protein’s 3D structure are expensive and time consuming, so computational methods for determining the structure from the amino acid sequence are desired. Methods for directly computing the 3D structure of a protein exist, however they are impractical for large proteins and high resolution models due to the large search space. Instead of trying to directly find the 3D struc- ture from first principles, the primary structure can be compared to proteins with known 3D structure. A ‘fold’ is a way of classifying proteins with the same major secondary structures in the same arrangement and with the same topological con- nections. Protein Fold Recognition (PFR) is an important step towards determining a protein’s structure, simplifying the protein structure prediction problem. This is a multi-class classification problem solvable using machine learning techniques. The PFR problem has been widely studied in the past, with feature extraction approaches including using counts of amino acids and pairs of amino acids, physic- ochemical information, evolutionary information from the Position Specific Scoring Matrix (PSSM), and structural information from its predicted secondary structure. These approaches do work, but with limited success. Current state of the art features use information from the PSSM as well as the predicted secondary structure.
View less >
View more >Proteins are important biological macromolecules that play important roles in al- most all biological reactions. The function of a protein is dependent on the shape it folds in to, which is in turn dependent on the protein’s amino acid sequence. Ex- perimental approaches for determining a protein’s 3D structure are expensive and time consuming, so computational methods for determining the structure from the amino acid sequence are desired. Methods for directly computing the 3D structure of a protein exist, however they are impractical for large proteins and high resolution models due to the large search space. Instead of trying to directly find the 3D struc- ture from first principles, the primary structure can be compared to proteins with known 3D structure. A ‘fold’ is a way of classifying proteins with the same major secondary structures in the same arrangement and with the same topological con- nections. Protein Fold Recognition (PFR) is an important step towards determining a protein’s structure, simplifying the protein structure prediction problem. This is a multi-class classification problem solvable using machine learning techniques. The PFR problem has been widely studied in the past, with feature extraction approaches including using counts of amino acids and pairs of amino acids, physic- ochemical information, evolutionary information from the Position Specific Scoring Matrix (PSSM), and structural information from its predicted secondary structure. These approaches do work, but with limited success. Current state of the art features use information from the PSSM as well as the predicted secondary structure.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
Griffith School of Engineering
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Public
Subject
Proteins
protein amino acid sequence
Protein Fold Recognition (PFR)
Position Specific Scoring Matrix (PSSM),