Show simple item record

dc.contributor.advisorZhou, Yaoqi
dc.contributor.authorLitfin, Thomas
dc.date.accessioned2020-07-17T00:55:18Z
dc.date.available2020-07-17T00:55:18Z
dc.date.issued2020-07-06
dc.identifier.doi10.25904/1912/29
dc.identifier.urihttp://hdl.handle.net/10072/395548
dc.description.abstractThis thesis outlines the development of several tools for the prediction of protein molecular functions. The limited data available for training represents a challenge for developing accurate, generalizable models. To combat this challenge, we established several molecularfunction predictors based on sequenceand structurebased inference from weakly annotated data. The core of the approach is driven by a geometric alignment between 3D structures as a proxy for functional relatedness. We also exploited a large dataset annotated for a related task to inform a model via transfer learning. In Chapter 2, SPOTLigand 2 was established for predicting proteinligand interactions. The method employs weakly annotated ligandbinding sequences to improve virtual screening performance by 93% (top 1% enrichment factor) when compared to a baseline using complex structures only. In Chapter 3, SPOTpeptide was developed for identifying peptidebinding domains and peptide binding sites. The method is an implementation of a structurebased homology modelling pipeline augmented by local measures of interface complementarity. This is the first method devoted specifically to the identification of peptidebinding domains at a genomescale and outperformed a simple sequencebased baseline by 30% according to Matthews correlation coefficient (MCC). Binding site MCC was also improved by 20% compared with the next best method from the literature. In Chapter 4, the SPalignbased structure homology framework was further validated in a prospective study involving the characterization of Bacillus subtilis YesU as a carbohydratebinding protein. In Chapter 5, we developed a deep learning model called SPOTMoRF for the identification of short peptide segments that undergo a disordertoorder transition when binding a functional partner. The deep learning model was facilitated by a transfer learning framework which allowed an MCC improvement of 40% compared with a model trained directly from random initialization. These tools are made available as web services from http://sparks-lab.org/ and should facilitate the annotation of protein function at various levels of resolution.
dc.languageEnglish
dc.language.isoen
dc.publisherGriffith University
dc.publisher.placeBrisbane
dc.subject.keywordsprotein molecular functions
dc.subject.keywordspredicting
dc.subject.keywordstool development
dc.titleData-driven prediction of molecular function
dc.typeGriffith thesis
gro.facultyScience, Environment, Engineering and Technology
gro.rights.copyrightThe author owns the copyright in this thesis, unless stated otherwise.
gro.hasfulltextFull Text
dc.contributor.otheradvisorZhan, Jian
gro.identifier.gurtID000000012599
gro.thesis.degreelevelThesis (PhD Doctorate)
gro.thesis.degreeprogramDoctor of Philosophy (PhD)
gro.departmentSchool of Info & Comm Tech
gro.griffith.authorLitfin, Tom


Files in this item

This item appears in the following Collection(s)

Show simple item record