Pesquisa | Biblioteca Virtual em Saúde

Unraveling Spatial Domain Characterization in Spatially Resolved Transcriptomics with Robust Graph Contrastive Clustering.

Zhang, Yingxi; Yu, Zhuohan; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-39012523

RESUMO

MOTIVATION: Spatial transcriptomics can quantify gene expression and its spatial distribution in tissues, thus revealing molecular mechanisms of cellular interactions underlying tissue heterogeneity, tissue regeneration, and spatially localized disease mechanisms. However, existing spatial clustering methods often fail to exploit the full potential of spatial information, resulting in inaccurate identification of spatial domains. RESULTS: In this paper, we develop a deep graph contrastive clustering framework, stDGCC, that accurately uncovers underlying spatial domains via explicitly modeling spatial information and gene expression profiles from spatial transcriptomics data. The stDGCC framework proposes a spatially informed graph node embedding model to preserve the topological information of spots and to learn the informative and discriminative characterization of spatial transcriptomics data through self-supervised contrastive learning. By simultaneously optimizing the contrastive learning loss, reconstruction loss, and Kullback-Leibler (KL) divergence loss, stDGCC achieves joint optimization of feature learning and topology structure preservation in an end-to-end manner. We validate the effectiveness of stDGCC on various spatial transcriptomics datasets acquired from different platforms, each with varying spatial resolutions. Our extensive experiments demonstrate the superiority of stDGCC over various state-of-the-art clustering methods in accurately identifying cellular-level biological structures. AVAILABILITY: Code and data are available from https://github.com/TimE9527/stDGCC and https://figshare.com/projects/stDGCC/186525. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Distribution-Agnostic Deep Learning Enables Accurate Single-Cell Data Recovery and Transcriptional Regulation Interpretation.

Su, Yanchi; Yu, Zhuohan; Yang, Yuning; Wong, Ka-Chun; Li, Xiangtao.

Adv Sci (Weinh) ; 11(16): e2307280, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38380499

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.

Assuntos

Aprendizado Profundo , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Wang, Yunhe; Yu, Zhuohan; Li, Shaochuan; Bian, Chuang; Liang, Yanchun; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 39(2)2023 02 14.

Artigo em Inglês | MEDLINE | ID: mdl-36734596

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored. RESULTS: To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION: The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Perfilação da Expressão Gênica , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Software , Análise de Célula Única/métodos , Análise por Conglomerados

Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA.

Yu, Zhuohan; Su, Yanchi; Lu, Yifu; Yang, Yuning; Wang, Fuzhou; Zhang, Shixiong; Chang, Yi; Wong, Ka-Chun; Li, Xiangtao.

Nat Commun ; 14(1): 400, 2023 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-36697410

RESUMO

Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.

Assuntos

Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Perfilação da Expressão Gênica/métodos , Neoplasias Pancreáticas/genética , Regulação da Expressão Gênica , Transcriptoma , Carcinoma Ductal Pancreático/genética , Análise de Célula Única/métodos

GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering.

Lu, Yifu; Yu, Zhuohan; Wang, Yunhe; Ma, Zhiqiang; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 38(11): 3020-3028, 2022 05 26.

Artigo em Inglês | MEDLINE | ID: mdl-35451457

RESUMO

MOTIVATION: Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise. RESULTS: In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms. AVAILABILITY AND IMPLEMENTATION: The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Software , Consenso , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Célula Única

Elucidating transcriptomic profiles from single-cell RNA sequencing data using nature-inspired compressed sensing.

Yu, Zhuohan; Bian, Chuang; Liu, Genggeng; Zhang, Shixiong; Wong, Ka-Chun; Li, Xiangtao.

Brief Bioinform ; 22(5)2021 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-33855366

RESUMO

Gene-expression profiling can define the cell state and gene-expression pattern of cells at the genetic level in a high-throughput manner. With the development of transcriptome techniques, processing high-dimensional genetic data has become a major challenge in expression profiling. Thanks to the recent widespread use of matrix decomposition methods in bioinformatics, a computational framework based on compressed sensing was adopted to reduce dimensionality. However, compressed sensing requires an optimization strategy to learn the modular dictionaries and activity levels from the low-dimensional random composite measurements to reconstruct the high-dimensional gene-expression data. Considering this, here we introduce and compare four compressed sensing frameworks coming from nature-inspired optimization algorithms (CSCS, ABCCS, BACS and FACS) to improve the quality of the decompression process. Several experiments establish that the three proposed methods outperform benchmark methods on nine different datasets, especially the FACS method. We illustrate therefore, the robustness and convergence of FACS in various aspects; notably, time complexity and parameter analyses highlight properties of our proposed FACS. Furthermore, differential gene-expression analysis, cell-type clustering, gene ontology enrichment and pathology analysis are conducted, which bring novel insights into cell-type identification and characterization mechanisms from different perspectives. All algorithms are written in Python and available at https://github.com/Philyzh8/Nature-inspired-CS.

Assuntos

Algoritmos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Transcriptoma , Animais , Análise por Conglomerados , Redes Reguladoras de Genes/genética , Humanos , Anotação de Sequência Molecular/métodos , Reprodutibilidade dos Testes , Transdução de Sinais/genética , Fatores de Tempo

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA