PLANNER: a multi-scale deep language model for the origins of replication site prediction

No Thumbnail Available
File version
Author(s)
Wang, C
He, Z
Jia, R
Pan, S
Coin, LJ
Song, J
Li, F
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
License
Abstract

Origins of replication sites (ORIs) are crucial genomic regions where DNA replication initiation takes place, playing pivotal roles in fundamental biological processes like cell division, gene expression regulation, and DNA integrity. Accurate identification of ORIs is essential for comprehending cell replication, gene expression, and mutation-related diseases. However, experimental approaches for ORI identification are often expensive and time-consuming, leading to the growing popularity of computational methods. In this study, we present PLANNER (DeeP LeArNiNg prEdictor for ORI), a novel approach for species-specific and cell-specific prediction of eukaryotic ORIs. PLANNER uses the multi-scale ktuple sequences as input and employs the DNABERT pre-training model with transfer learning and ensemble learning strategies to train accurate predictive models. Extensive empirical test results demonstrate that PLANNER achieved superior predictive performance compared to state-of-the-art approaches, including iOri-Euk, Stack-ORI, and ORI-Deep, within specific cell types and across different cell types. Furthermore, by incorporating an interpretable analysis mechanism, we provide insights into the learned patterns, facilitating the mapping from discovering important sequential determinants to comprehensively analysing their biological functions. To facilitate the widespread utilisation of PLANNER, we developed an online webserver and local stand-alone software, available at http://planner.unimelb-biotools.cloud.edu.au/ and https://github.com/CongWang3/PLANNER, respectively

Journal Title

IEEE Journal of Biomedical and Health Informatics

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note

This publication has been entered in Griffith Research Online as an advance online version.

Access the data
Related item(s)
Subject

Bioinformatics and computational biology

Persistent link to this record
Citation

Wang, C; He, Z; Jia, R; Pan, S; Coin, LJ; Song, J; Li, F, PLANNER: a multi-scale deep language model for the origins of replication site prediction, IEEE Journal of Biomedical and Health Informatics, 2024

Collections