|dc.description.abstract||In this thesis we tackle the protein structure prediction subproblems listed previously, by
applying state of the art deep learning techniques.
The work in chapter 2 presents the method SPIDER. In this method, state of the art
deep learning is applied iteratively to the task of predicting backbone torsion angles and
, and dihedral angles and , by applying evolutionary-derived sequence pro les and
physio-chemical properties of amino acid residues. This work is the fi rst method for the
sequence based prediction of and angles.
Chapter 3 presents the method SPIDER2. This method takes the state of the art
iterative deep learning applied in SPIDER, and extends it to the prediction of three-state
secondary structure, solvent accessible surface area, and ; ; , and angles, and achieves
the best reported prediction accuracies for all of them (at the date of publication).
Chapter 4 further builds on the work done in the previous chapters, and now adds
the prediction of half sphere exposure (both C and C based) and contact numbers to
SPIDER2, in a method called SPIDER2-HSE.
In Chapter 5, Long Short-Term Memory Bidirectional Recurrent Neural Networks were
applied to the prediction of three-state secondary structure, solvent accessible surface area,
; ; , and angles, as well as half sphere exposure and contact numbers. Previously
methods used for these predictions (including SPIDER2) were typically window based.
That is to say that the input data made available to the model for a given residue, is
comprised of information for only that residue and a number of residues on either side in the sequence (in the range of 10-20 residues on each side). The use of LSTM-BRNNs in
this method allows SPIDER3 to better learn both long and short term interactions within
proteins. This advancement again lead to the best reported accuracies for all predicted
In Chapter 6, the LSTM-BRNN model used in SPIDER3 is applied to the prediction
of the same structural property predictions, plus the prediction of eight-state secondary
structure, using only single-sequence inputs. That is, structural properties were predicted
without using any evolutionary information. This provides a method that provides not
only the best reported single-sequence secondary structure and solvent accessible surface
area predictions, but the fi rst reported method for the single-sequence based prediction of
half sphere exposure, contact numbers, and ; ; , and angles. This study is important
as most proteins have few homologous sequences and their evolutionary profi les are inac-
curate and time-consuming to calculate. This single-sequence-based technique allows for
fast genome-scale screening analysis of protein one-dimensional structural properties.||en_US