Imputing spatial transcriptomics through gene network constructed from protein language model

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Zeng, Yuansong
Song, Yujie
Zhang, Chengyang
Li, Haoxuan
Zhao, Yongkang
Yu, Weijiang
Zhang, Shiqi
Zhang, Hongyu
Dai, Zhiming
Yang, Yuedong
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
Abstract

Image-based spatial transcriptomic sequencing technologies have enabled the measurement of gene expression at single-cell resolution, but with a limited number of genes. Current computational approaches attempt to overcome these limitations by imputing missing genes, but face challenges regarding prediction accuracy and identification of cell populations due to the neglect of gene-gene relationships. In this context, we present stImpute, a method to impute spatial transcriptomics according to reference scRNA-seq data based on the gene network constructed from the protein language model ESM-2. Specifically, stImpute employs an autoencoder to create gene expression embeddings for both spatial transcriptomics and scRNA-seq data, which are used to identify the nearest neighboring cells between scRNA-seq and spatial transcriptomics datasets. According to the neighbored cells, the gene expressions of spatial transcriptomics cells are imputed through a graph neural network, where nodes are genes, and edges are based on cosine similarity between the ESM-2 embeddings of the gene-encoding proteins. The gene prediction uncertainty is further measured through a deep learning model. stImpute was shown to consistently outperform state-of-the-art methods across multiple datasets concerning imputation and clustering. stImpute also demonstrates robustness in producing consistent results that are insensitive to model parameters.

Journal Title

Communications Biology

Conference Title
Book Title
Edition
Volume

7

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© The Author(s) 2024. This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Zeng, Y; Song, Y; Zhang, C; Li, H; Zhao, Y; Yu, W; Zhang, S; Zhang, H; Dai, Z; Yang, Y, Imputing spatial transcriptomics through gene network constructed from protein language model, Communications Biology, 2024, 7, pp. 1271

Collections