Show simple item record

dc.contributor.authorZhou, Xiang
dc.contributor.authorChai, Hua
dc.contributor.authorZhao, Huiying
dc.contributor.authorLuo, Ching-Hsing
dc.contributor.authorYang, Yuedong
dc.date.accessioned2020-10-28T00:22:47Z
dc.date.available2020-10-28T00:22:47Z
dc.date.issued2020
dc.identifier.issn2047-217X
dc.identifier.doi10.1093/gigascience/giaa076
dc.identifier.urihttp://hdl.handle.net/10072/398779
dc.description.abstractBackground: Gene expression plays a key intermediate role in linking molecular features at the DNA level and phenotype. However, owing to various limitations in experiments, the RNA-seq data are missing in many samples while there exist high-quality of DNA methylation data. Because DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on a single cancer dataset and do not fully utilize information from large pan-cancer datasets. Results: Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through a transfer learning-based neural network, namely, TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-of-the-art methods in imputation accuracy with a 7-11% improvement under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylation-driving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purposes by an independent test on the Wilms tumor dataset from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) project. Conclusions: TDimpute is an effective method for RNA-seq imputation with limited training samples.
dc.description.peerreviewedYes
dc.languageEnglish
dc.language.isoeng
dc.publisherOxford University Press
dc.relation.ispartofpagefromgiaa076
dc.relation.ispartofissue7
dc.relation.ispartofjournalGigaScience
dc.relation.ispartofvolume9
dc.subject.fieldofresearchBiological Sciences
dc.subject.fieldofresearchcode06
dc.subject.keywordsScience & Technology
dc.subject.keywordsLife Sciences & Biomedicine
dc.subject.keywordsBiology
dc.subject.keywordsMultidisciplinary Sciences
dc.subject.keywordsLife Sciences & Biomedicine - Other Topics
dc.titleImputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network
dc.typeJournal article
dc.type.descriptionC1 - Articles
dcterms.bibliographicCitationZhou, X; Chai, H; Zhao, H; Luo, C-H; Yang, Y, Imputing missing RNA-sequencing data from DNA methylation by using a transfer learning-based neural network, GigaScience, 2020, 9 (7), pp. giaa076
dcterms.dateAccepted2020-06-24
dcterms.licensehttp://creativecommons.org/licenses/by/4.0/
dc.date.updated2020-10-28T00:19:00Z
dc.description.versionVersion of Record (VoR)
gro.rights.copyright© The Author(s) 2020. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
gro.hasfulltextFull Text
gro.griffith.authorYang, Yuedong


Files in this item

This item appears in the following Collection(s)

  • Journal articles
    Contains articles published by Griffith authors in scholarly journals.

Show simple item record