Pesquisa | BVS Violência e Saúde

stAA: adversarial graph autoencoder for spatial clustering task of spatially resolved transcriptomics.

Fang, Zhaoyu; Liu, Teng; Zheng, Ruiqing; A, Jin; Yin, Mingzhu; Li, Min.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38189544

RESUMO

With the development of spatially resolved transcriptomics technologies, it is now possible to explore the gene expression profiles of single cells while preserving their spatial context. Spatial clustering plays a key role in spatial transcriptome data analysis. In the past 2 years, several graph neural network-based methods have emerged, which significantly improved the accuracy of spatial clustering. However, accurately identifying the boundaries of spatial domains remains a challenging task. In this article, we propose stAA, an adversarial variational graph autoencoder, to identify spatial domain. stAA generates cell embedding by leveraging gene expression and spatial information using graph neural networks and enforces the distribution of cell embeddings to a prior distribution through Wasserstein distance. The adversarial training process can make cell embeddings better capture spatial domain information and more robust. Moreover, stAA incorporates global graph information into cell embeddings using labels generated by pre-clustering. Our experimental results show that stAA outperforms the state-of-the-art methods and achieves better clustering results across different profiling platforms and various resolutions. We also conducted numerous biological analyses and found that stAA can identify fine-grained structures in tissues, recognize different functional subtypes within tumors and accurately identify developmental trajectories.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados , Redes Neurais de Computação

Graph deep learning enabled spatial domains identification for spatial transcriptomics.

Liu, Teng; Fang, Zhao-Yu; Li, Xin; Zhang, Li-Ning; Cao, Dong-Sheng; Yin, Ming-Zhu.

Brief Bioinform ; 24(3)2023 05 19.

Artigo em Inglês | MEDLINE | ID: mdl-37080761

RESUMO

Advancing spatially resolved transcriptomics (ST) technologies help biologists comprehensively understand organ function and tissue microenvironment. Accurate spatial domain identification is the foundation for delineating genome heterogeneity and cellular interaction. Motivated by this perspective, a graph deep learning (GDL) based spatial clustering approach is constructed in this paper. First, the deep graph infomax module embedded with residual gated graph convolutional neural network is leveraged to address the gene expression profiles and spatial positions in ST. Then, the Bayesian Gaussian mixture model is applied to handle the latent embeddings to generate spatial domains. Designed experiments certify that the presented method is superior to other state-of-the-art GDL-enabled techniques on multiple ST datasets. The codes and dataset used in this manuscript are summarized at https://github.com/narutoten520/SCGDL.

Assuntos

Aprendizado Profundo , Transcriptoma , Teorema de Bayes , Perfilação da Expressão Gênica , Comunicação Celular

scMAE: a masked autoencoder for single-cell RNA-seq clustering.

Fang, Zhaoyu; Zheng, Ruiqing; Li, Min.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38230824

RESUMO

MOTIVATION: Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes. RESULTS: Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations. AVAILABILITY AND IMPLEMENTATION: The source code of scMAE is available at: https://zenodo.org/records/10465991.

Assuntos

Perfilação da Expressão Gênica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Análise por Conglomerados , Algoritmos

Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep.

Liu, Teng; Fang, Zhaoyu; Li, Xin; Zhang, Lining; Cao, Dong-Sheng; Li, Min; Yin, Mingzhu.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38243703

RESUMO

MOTIVATION: Spatial clustering is essential and challenging for spatial transcriptomics' data analysis to unravel tissue microenvironment and biological function. Graph neural networks are promising to address gene expression profiles and spatial location information in spatial transcriptomics to generate latent representations. However, choosing an appropriate graph deep learning module and graph neural network necessitates further exploration and investigation. RESULTS: In this article, we present GRAPHDeep to assemble a spatial clustering framework for heterogeneous spatial transcriptomics data. Through integrating 2 graph deep learning modules and 20 graph neural networks, the most appropriate combination is decided for each dataset. The constructed spatial clustering method is compared with state-of-the-art algorithms to demonstrate its effectiveness and superiority. The significant new findings include: (i) the number of genes or proteins of spatial omics data is quite crucial in spatial clustering algorithms; (ii) the variational graph autoencoder is more suitable for spatial clustering tasks than deep graph infomax module; (iii) UniMP, SAGE, SuperGAT, GATv2, GCN, and TAG are the recommended graph neural networks for spatial clustering tasks; and (iv) the used graph neural network in the existent spatial clustering frameworks is not the best candidate. This study could be regarded as desirable guidance for choosing an appropriate graph neural network for spatial clustering. AVAILABILITY AND IMPLEMENTATION: The source code of GRAPHDeep is available at https://github.com/narutoten520/GRAPHDeep. The studied spatial omics data are available at https://zenodo.org/record/8141084.

Assuntos

Algoritmos , Perfilação da Expressão Gênica , Redes Neurais de Computação , Software , Análise por Conglomerados

REBET: a method to determine the number of cell clusters based on batch effect removal.

Fang, Zhao-Yu; Lin, Cui-Xiang; Xu, Yun-Pei; Li, Hong-Dong; Xu, Qing-Song.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34131702

RESUMO

In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).

Assuntos

Análise por Conglomerados , RNA-Seq/métodos , Análise de Célula Única/métodos , Algoritmos , Bases de Dados Genéticas , Reprodutibilidade dos Testes

A comprehensive overview of graph neural network-based approaches to clustering for spatial transcriptomics.

Liu, Teng; Fang, Zhao-Yu; Zhang, Zongbo; Yu, Yongxiang; Li, Min; Yin, Ming-Zhu.

Comput Struct Biotechnol J ; 23: 106-128, 2024 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38089467

RESUMO

Spatial transcriptomics technologies enable researchers to accurately quantify and localize messenger ribonucleic acid (mRNA) transcripts at a high resolution while preserving their spatial context. The identification of spatial domains, or the task of spatial clustering, plays a crucial role in investigating data on spatial transcriptomes. One promising approach for classifying spatial domains involves the use of graph neural networks (GNNs) by leveraging gene expressions, spatial locations, and histological images. This study provided a comprehensive overview of the most recent GNN-based methods of spatial clustering methods for the analysis of data on spatial transcriptomics. We extensively evaluated the performance of current methods on prevalent datasets of spatial transcriptomics by considering their accuracy of clustering, robustness, data stabilization, relevant requirements, computational efficiency, and memory use. To this end, we explored 60 clustering scenarios by extending the essential frameworks of spatial clustering for the selection of the GNNs, algorithms of downstream clustering, principal component analysis (PCA)-based reduction, and refined methods of correction. We comparatively analyzed the performance of the methods in terms of spatial clustering to identify their limitations and outline future directions of research in the field. Our survey yielded novel insights, and provided motivation for further investigating spatial transcriptomics.

RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest.

Zhao, Yuan; Fang, Zhao-Yu; Lin, Cui-Xiang; Deng, Chao; Xu, Yun-Pei; Li, Hong-Dong.

Front Genet ; 12: 665843, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34386033

RESUMO

In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.

Intron Retention as a Mode for RNA-Seq Data Analysis.

Zheng, Jian-Tao; Lin, Cui-Xiang; Fang, Zhao-Yu; Li, Hong-Dong.

Front Genet ; 11: 586, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32733531

RESUMO

Intron retention (IR) is an alternative splicing mode whereby introns, rather than being spliced out as usual, are retained in mature mRNAs. It was previously considered a consequence of mis-splicing and received very limited attention. Only recently has IR become of interest for transcriptomic data analysis owing to its recognized roles in gene expression regulation and associations with complex diseases. In this article, we first review the function of IR in regulating gene expression in a number of biological processes, such as neuron differentiation and activation of CD4+ T cells. Next, we briefly review its association with diseases, such as Alzheimer's disease and cancers. Then, we describe state-of-the-art methods for IR detection, including RNA-seq analysis tools IRFinder and iREAD, highlighting their underlying principles and discussing their advantages and limitations. Finally, we discuss the challenges for IR detection and potential ways in which IR detection methods could be improved.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA