Digerati – A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins

No Thumbnail Available
File version
Author(s)
Li, Fuyi
Guo, Xudong
Bi, Yue
Jia, Runchang
Pitt, Miranda E
Pan, Shirui
Li, Shuqin
Gasser, Robin B
Coin, Lachlan JM
Song, Jiangning
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2023
Size
File type(s)
Location
License
Abstract

The genome of Mycobacterium tuberculosis contains a relatively high percentage (10%) of genes that are poorly characterised because of their highly repetitive nature and high GC content. Some of these genes encode proteins of the PE/PPE family, which are thought to be involved in host-pathogen interactions, virulence, and disease pathogenicity. Members of this family are genetically divergent and challenging to both identify and classify using conventional computational tools. Thus, advanced in silico methods are needed to identify proteins of this family for subsequent functional annotation efficiently. In this study, we developed the first deep learning-based approach, termed Digerati, for the rapid and accurate identification of PE and PPE family proteins. Digerati was built upon a multipath parallel hybrid deep learning framework, which equips multi-layer convolutional neural networks with bidirectional, long short-term memory, equipped with a self-attention module to effectively learn the higher-order feature representations of PE/PPE proteins. Empirical studies demonstrated that Digerati achieved a significantly better performance (∼18–20%) than alignment-based approaches, including BLASTP, PHMMER, and HHsuite, in both prediction accuracy and speed. Digerati is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE/PPE family members. The webserver and source codes of Digerati are publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/Digerati/.

Journal Title

Computers in Biology and Medicine

Conference Title
Book Title
Edition
Volume

163

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Bioinformatics and computational biology

Health services and systems

Applied computing

Science & Technology

Life Sciences & Biomedicine

Technology

Biology

Computer Science, Interdisciplinary Applications

Persistent link to this record
Citation

Li, F; Guo, X; Bi, Y; Jia, R; Pitt, ME; Pan, S; Li, S; Gasser, RB; Coin, LJM; Song, J, Digerati – A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins, Computers in Biology and Medicine, 2023, 163, pp. 107155

Collections