Show simple item record

dc.contributor.advisorLiew, Wee-Chung
dc.contributor.advisorZhou, Yaoqi
dc.contributor.authorTaherzadeh, Ghazaleh
dc.date.accessioned2018-06-13T04:58:30Z
dc.date.available2018-06-13T04:58:30Z
dc.date.issued2018-05
dc.identifier.doi10.25904/1912/1478
dc.identifier.urihttp://hdl.handle.net/10072/376837
dc.description.abstractOverwhelmed with genomic data, determining functions of previously unseen proteins is one of the most challenging problems. While most protein functions can often be inferred from their homologous counterparts with known functions in other species, not all proteins have homologs whose functions were determined. The functional roles are performed by interactions between proteins and other biologically active molecules. Thus, the first step to identify protein function through its interaction is to detect potential binding sites of the protein. Moreover, protein functions may alter when proteins undergo some modifications. Obviously, experimental determination of functions for millions of new proteins is not practical due to vast amount of possible functions to be tested. Thus, it is highly desirable to have computational tools to prioritize possible functions for new proteins. In this thesis, we proposed machine learning-based methods for predicting putative binding sites of proteins interacting with small molecules, specifically peptides and carbohydrates, in addition to predicting putative sites of post-translational modifications (PTMs). The main contributions of our methods lie in three aspects. First, we proposed the first predictive model to predict protein-peptide binding sites without the knowledge of the protein structure (Taherzadeh et al. 2016). The method was further improved by using experimental structures. The performance of the method is robust even if unbound structures or quality model structures built from homologs were employed, indicating the wide applicability of the method developed (Taherzadeh et al. 2017). Second, we established the first publicly available tool for predicting carbohydrate binding sites in the absence of protein structures (Taherzadeh et al. 2016). Accurate performance of this method is confirmed by predicting more binding residues in carbohydrate-binding proteins than in non-binding proteins in human proteome and by its successful application to 1000 Genomes Project. Third, we proposed a method for predicting post-translational modification (PTM) site of lysine malonylation (Taherzadeh et al.). This predictive model built from M. musculus proteins achieved comparable performance when tested on H. sapiens proteins. All aforementioned methods are thoroughly assessed on cross-validation and the independent test sets after removing homologue sequences. Consistent performance on cross-validation and independent datasets confirmed the accuracy and robustness of predictive methods. All methods significantly outperform existing techniques.
dc.languageEnglish
dc.language.isoen
dc.publisherGriffith University
dc.publisher.placeBrisbane
dc.subject.keywordsProtein function prediction
dc.subject.keywordsMachine learning
dc.subject.keywordsPeptides
dc.subject.keywordsCarbohydrates
dc.subject.keywordsLysine malonylation
dc.titleProtein Function Prediction by Machine Learning
dc.typeGriffith thesis
gro.facultyScience, Environment, Engineering and Technology
gro.rights.copyrightThe author owns the copyright in this thesis, unless stated otherwise.
gro.hasfulltextFull Text
dc.contributor.otheradvisorYang, Yuedong
gro.thesis.degreelevelThesis (PhD Doctorate)
gro.thesis.degreeprogramDoctor of Philosophy (PhD)
gro.departmentSchool of Info & Comm Tech
gro.griffith.authorTaherzadeh, Ghazaleh


Files in this item

This item appears in the following Collection(s)

Show simple item record