Búsqueda | Biblioteca Virtual en Salud

scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering.

Wang, Yunhe; Yu, Zhuohan; Li, Shaochuan; Bian, Chuang; Liang, Yanchun; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 39(2)2023 02 14.

Artículo en Inglés | MEDLINE | ID: mdl-36734596

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored. RESULTS: To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION: The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de la Célula Individual/métodos , Análisis por Conglomerados

Elucidating transcriptomic profiles from single-cell RNA sequencing data using nature-inspired compressed sensing.

Yu, Zhuohan; Bian, Chuang; Liu, Genggeng; Zhang, Shixiong; Wong, Ka-Chun; Li, Xiangtao.

Brief Bioinform ; 22(5)2021 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-33855366

RESUMEN

Gene-expression profiling can define the cell state and gene-expression pattern of cells at the genetic level in a high-throughput manner. With the development of transcriptome techniques, processing high-dimensional genetic data has become a major challenge in expression profiling. Thanks to the recent widespread use of matrix decomposition methods in bioinformatics, a computational framework based on compressed sensing was adopted to reduce dimensionality. However, compressed sensing requires an optimization strategy to learn the modular dictionaries and activity levels from the low-dimensional random composite measurements to reconstruct the high-dimensional gene-expression data. Considering this, here we introduce and compare four compressed sensing frameworks coming from nature-inspired optimization algorithms (CSCS, ABCCS, BACS and FACS) to improve the quality of the decompression process. Several experiments establish that the three proposed methods outperform benchmark methods on nine different datasets, especially the FACS method. We illustrate therefore, the robustness and convergence of FACS in various aspects; notably, time complexity and parameter analyses highlight properties of our proposed FACS. Furthermore, differential gene-expression analysis, cell-type clustering, gene ontology enrichment and pathology analysis are conducted, which bring novel insights into cell-type identification and characterization mechanisms from different perspectives. All algorithms are written in Python and available at https://github.com/Philyzh8/Nature-inspired-CS.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Transcriptoma , Animales , Análisis por Conglomerados , Redes Reguladoras de Genes/genética , Humanos , Anotación de Secuencia Molecular/métodos , Reproducibilidad de los Resultados , Transducción de Señal/genética , Factores de Tiempo

GMHCC: high-throughput analysis of biomolecular data using graph-based multiple hierarchical consensus clustering.

Lu, Yifu; Yu, Zhuohan; Wang, Yunhe; Ma, Zhiqiang; Wong, Ka-Chun; Li, Xiangtao.

Bioinformatics ; 38(11): 3020-3028, 2022 05 26.

Artículo en Inglés | MEDLINE | ID: mdl-35451457

RESUMEN

MOTIVATION: Thanks to the development of high-throughput sequencing technologies, massive amounts of various biomolecular data have been accumulated to revolutionize the study of genomics and molecular biology. One of the main challenges in analyzing this biomolecular data is to cluster their subtypes into subpopulations to facilitate subsequent downstream analysis. Recently, many clustering methods have been developed to address the biomolecular data. However, the computational methods often suffer from many limitations such as high dimensionality, data heterogeneity and noise. RESULTS: In our study, we develop a novel Graph-based Multiple Hierarchical Consensus Clustering (GMHCC) method with an unsupervised graph-based feature ranking (FR) and a graph-based linking method to explore the multiple hierarchical information of the underlying partitions of the consensus clustering for multiple types of biomolecular data. Indeed, we first propose to use a graph-based unsupervised FR model to measure each feature by building a graph over pairwise features and then providing each feature with a rank. Subsequently, to maintain the diversity and robustness of basic partitions (BPs), we propose multiple diverse feature subsets to generate several BPs and then explore the hierarchical structures of the multiple BPs by refining the global consensus function. Finally, we develop a new graph-based linking method, which explicitly considers the relationships between clusters to generate the final partition. Experiments on multiple types of biomolecular data including 35 cancer gene expression datasets and eight single-cell RNA-seq datasets validate the effectiveness of our method over several state-of-the-art consensus clustering approaches. Furthermore, differential gene analysis, gene ontology enrichment analysis and KEGG pathway analysis are conducted, providing novel insights into cell developmental lineages and characterization mechanisms. AVAILABILITY AND IMPLEMENTATION: The source code is available at GitHub: https://github.com/yifuLu/GMHCC. The software and the supporting data can be downloaded from: https://figshare.com/articles/software/GMHCC/17111291. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Programas Informáticos , Consenso , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de la Célula Individual

Distribution-Agnostic Deep Learning Enables Accurate Single-Cell Data Recovery and Transcriptional Regulation Interpretation.

Su, Yanchi; Yu, Zhuohan; Yang, Yuning; Wong, Ka-Chun; Li, Xiangtao.

Adv Sci (Weinh) ; 11(16): e2307280, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38380499

RESUMEN

Single-cell RNA sequencing (scRNA-seq) is a robust method for studying gene expression at the single-cell level, but accurately quantifying genetic material is often hindered by limited mRNA capture, resulting in many missing expression values. Existing imputation methods rely on strict data assumptions, limiting their broader application, and lack reliable supervision, leading to biased signal recovery. To address these challenges, authors developed Bis, a distribution-agnostic deep learning model for accurately recovering missing sing-cell gene expression from multiple platforms. Bis is an optimal transport-based autoencoder model that can capture the intricate distribution of scRNA-seq data while addressing the characteristic sparsity by regularizing the cellular embedding space. Additionally, they propose a module using bulk RNA-seq data to guide reconstruction and ensure expression consistency. Experimental results show Bis outperforms other models across simulated and real datasets, showcasing superiority in various downstream analyses including batch effect removal, clustering, differential expression analysis, and trajectory inference. Moreover, Bis successfully restores gene expression levels in rare cell subsets in a tumor-matched peripheral blood dataset, revealing developmental characteristics of cytokine-induced natural killer cells within a head and neck squamous cell carcinoma microenvironment.

Asunto(s)

Aprendizaje Profundo , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión Génica/métodos

Topological identification and interpretation for single-cell gene regulation elucidation across multiple platforms using scMGCA.

Yu, Zhuohan; Su, Yanchi; Lu, Yifu; Yang, Yuning; Wang, Fuzhou; Zhang, Shixiong; Chang, Yi; Wong, Ka-Chun; Li, Xiangtao.

Nat Commun ; 14(1): 400, 2023 01 25.

Artículo en Inglés | MEDLINE | ID: mdl-36697410

RESUMEN

Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.

Asunto(s)

Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Perfilación de la Expresión Génica/métodos , Neoplasias Pancreáticas/genética , Regulación de la Expresión Génica , Transcriptoma , Carcinoma Ductal Pancreático/genética , Análisis de la Célula Individual/métodos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

Detalles de la búsqueda