Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
1.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39487083

RESUMO

Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise por Conglomerados , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Algoritmos , Aprendizado de Máquina , Biologia Computacional/métodos , Redes Neurais de Computação , Perfilação da Expressão Gênica/métodos , Software , Transcriptoma , Análise da Expressão Gênica de Célula Única
2.
Brief Bioinform ; 25(6)2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39356327

RESUMO

Single-cell cross-modal joint clustering has been extensively utilized to investigate the tumor microenvironment. Although numerous approaches have been suggested, accurate clustering remains the main challenge. First, the gene expression matrix frequently contains numerous missing values due to measurement limitations. The majority of existing clustering methods treat it as a typical multi-modal dataset without further processing. Few methods conduct recovery before clustering and do not sufficiently engage with the underlying research, leading to suboptimal outcomes. Additionally, the existing cross-modal information fusion strategy does not ensure consistency of representations across different modes, potentially leading to the integration of conflicting information, which could degrade performance. To address these challenges, we propose the 'Recover then Aggregate' strategy and introduce the Unified Cross-Modal Deep Clustering model. Specifically, we have developed a data augmentation technique based on neighborhood similarity, iteratively imposing rank constraints on the Laplacian matrix, thus updating the similarity matrix and recovering dropout events. Concurrently, we integrate cross-modal features and employ contrastive learning to align modality-specific representations with consistent ones, enhancing the effective integration of diverse modal information. Comprehensive experiments on five real-world multi-modal datasets have demonstrated this method's superior effectiveness in single-cell clustering tasks.


Assuntos
Análise de Célula Única , Análise por Conglomerados , Análise de Célula Única/métodos , Humanos , Algoritmos , Microambiente Tumoral , Biologia Computacional/métodos
3.
Cell Syst ; 15(10): 969-981.e6, 2024 Oct 16.
Artigo em Inglês | MEDLINE | ID: mdl-39378875

RESUMO

Spatially resolved transcriptomics (SRT) combines gene expression profiles with the physical locations of cells in their native states but suffers from unpredictable spatial noise due to cell damage during cryosectioning and exposure to reagents for staining and mRNA release. To address this noise, we developed SpotGF, an algorithm for denoising SRT data using optimal transport-based gene filtering. SpotGF quantifies diffusion patterns numerically, distinguishing widespread expression genes from aggregated expression genes and filtering out the former as noise. Unlike conventional denoising methods, SpotGF preserves raw sequencing data, thereby avoiding false positives that can arise from imputation. Additionally, SpotGF demonstrates superior performance in cell clustering, identifying potential marker genes, and annotating cell types. Overall, SpotGF has the potential to become a crucial preprocessing step in the downstream analysis of SRT data. The SpotGF software is freely available at GitHub. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Crioultramicrotomia , Glycine max/genética , Raízes de Plantas/genética , Regulação da Expressão Gênica de Plantas , Regulação para Cima , Feixe Vascular de Plantas , Células Vegetais , Humanos , Neoplasias Colorretais/patologia , Arabidopsis/genética
4.
Genome Biol ; 25(1): 241, 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39252099

RESUMO

Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell-cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell transcriptomics. DcjComm detects functional modules to explore expression patterns and performs dimension reduction and clustering to discover cellular identities by the non-negative matrix factorization-based joint learning model. DcjComm then infers cell-cell communication by integrating ligand-receptor pairs, transcription factors, and target genes. DcjComm demonstrates superior performance compared to state-of-the-art methods.


Assuntos
Comunicação Celular , Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Humanos , Biologia Computacional/métodos
5.
BMC Bioinformatics ; 25(Suppl 2): 292, 2024 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-39237886

RESUMO

BACKGROUND: With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging. RESULTS: We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property. CONCLUSIONS: SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de RNA/métodos , Redes Reguladoras de Genes , RNA-Seq/métodos , Algoritmos , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única
6.
Methods Mol Biol ; 2812: 155-168, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39068361

RESUMO

This chapter shows applying the Asymmetric Within-Sample Transformation to single-cell RNA-Seq data matched with a previous dropout imputation. The asymmetric transformation is a special winsorization that flattens low-expressed intensities and preserves highly expressed gene levels. Before a standard hierarchical clustering algorithm, an intermediate step removes noninformative genes according to a threshold applied to a per-gene entropy estimate. Following the clustering, a time-intensive algorithm is shown to uncover the molecular features associated with each cluster. This step implements a resampling algorithm to generate a random baseline to measure up/downregulated significant genes. To this aim, we adopt a GLM model as implemented in DESeq2 package. We render the results in graphical mode. While the tools are standard heat maps, we introduce some data scaling to clarify the results' reliability.


Assuntos
Algoritmos , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Humanos , Perfilação da Expressão Gênica/métodos , Software , Biologia Computacional/métodos , RNA-Seq/métodos
7.
Comput Biol Med ; 179: 108921, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39059210

RESUMO

Single-cell RNA sequencing (scRNA-seq) is the sequencing technology of a single cell whose expression reflects the overall characteristics of the individual cell, facilitating the research of problems at the cellular level. However, the problems of scRNA-seq such as dimensionality reduction processing of massive data, technical noise in data, and visualization of single-cell type clustering cause great difficulties for analyzing and processing scRNA-seq data. In this paper, we propose a new single-cell data analysis model using denoising autoencoder and multi-type graph neural networks (scDMG), which learns cell-cell topology information and latent representation of scRNA-seq data. scDMG introduces the zero-inflated negative binomial (ZINB) model into a denoising autoencoder (DAE) to perform dimensionality reduction and denoising on the raw data. scDMG integrates multiple-type graph neural networks as the encoder to further train the preprocessed data, which better deals with various types of scRNA-seq datasets, resolves dropout events in scRNA-seq data, and enables preliminary classification of scRNA-seq data. By employing TSNE and PCA algorithms for the trained data and invoking Louvain algorithm, scDMG has better dimensionality reduction and clustering optimization. Compared with other mainstream scRNA-seq clustering algorithms, scDMG outperforms other state-of-the-art methods in various clustering performance metrics and shows better scalability, shorter runtime, and great clustering results.


Assuntos
Redes Neurais de Computação , Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Análise de Sequência de RNA/métodos , Algoritmos
8.
Methods Mol Biol ; 2757: 383-445, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38668977

RESUMO

The emergence and development of single-cell RNA sequencing (scRNA-seq) techniques enable researchers to perform large-scale analysis of the transcriptomic profiling at cell-specific resolution. Unsupervised clustering of scRNA-seq data is central for most studies, which is essential to identify novel cell types and their gene expression logics. Although an increasing number of algorithms and tools are available for scRNA-seq analysis, a practical guide for users to navigate the landscape remains underrepresented. This chapter presents an overview of the scRNA-seq data analysis pipeline, quality control, batch effect correction, data standardization, cell clustering and visualization, cluster correlation analysis, and marker gene identification. Taking the two broadly used analysis packages, i.e., Scanpy and MetaCell, as examples, we provide a hands-on guideline and comparison regarding the best practices for the above essential analysis steps and data visualization. Additionally, we compare both packages and algorithms using a scRNA-seq dataset of the ctenophore Mnemiopsis leidyi, which is representative of one of the earliest animal lineages, critical to understanding the origin and evolution of animal novelties. This pipeline can also be helpful for analyses of other taxa, especially prebilaterian animals, where these tools are under development (e.g., placozoan and Porifera).


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Análise de Célula Única , Software , Análise de Célula Única/métodos , Animais , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Biologia Computacional/métodos , Análise por Conglomerados , Transcriptoma/genética
9.
BMC Bioinformatics ; 25(1): 164, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664601

RESUMO

Multimodal integration combines information from different sources or modalities to gain a more comprehensive understanding of a phenomenon. The challenges in multi-omics data analysis lie in the complexity, high dimensionality, and heterogeneity of the data, which demands sophisticated computational tools and visualization methods for proper interpretation and visualization of multi-omics data. In this paper, we propose a novel method, termed Orthogonal Multimodality Integration and Clustering (OMIC), for analyzing CITE-seq. Our approach enables researchers to integrate multiple sources of information while accounting for the dependence among them. We demonstrate the effectiveness of our approach using CITE-seq data sets for cell clustering. Our results show that our approach outperforms existing methods in terms of accuracy, computational efficiency, and interpretability. We conclude that our proposed OMIC method provides a powerful tool for multimodal data analysis that greatly improves the feasibility and reliability of integrated data.


Assuntos
Análise de Célula Única , Análise por Conglomerados , Análise de Célula Única/métodos , Biologia Computacional/métodos , Humanos , Algoritmos
10.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38493338

RESUMO

In recent years, there has been a growing trend in the realm of parallel clustering analysis for single-cell RNA-seq (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) data. However, prevailing methods often treat these two data modalities as equals, neglecting the fact that the scRNA mode holds significantly richer information compared to the scATAC. This disregard hinders the model benefits from the insights derived from multiple modalities, compromising the overall clustering performance. To this end, we propose an effective multi-modal clustering model scEMC for parallel scRNA and Assay of Transposase Accessible Chromatin data. Concretely, we have devised a skip aggregation network to simultaneously learn global structural information among cells and integrate data from diverse modalities. To safeguard the quality of integrated cell representation against the influence stemming from sparse scATAC data, we connect the scRNA data with the aggregated representation via skip connection. Moreover, to effectively fit the real distribution of cells, we introduced a Zero Inflated Negative Binomial-based denoising autoencoder that accommodates corrupted data containing synthetic noise, concurrently integrating a joint optimization module that employs multiple losses. Extensive experiments serve to underscore the effectiveness of our model. This work contributes significantly to the ongoing exploration of cell subpopulations and tumor microenvironments, and the code of our work will be public at https://github.com/DayuHuu/scEMC.


Assuntos
Cromatina , RNA Citoplasmático Pequeno , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , Aprendizagem , RNA Citoplasmático Pequeno/genética , Transposases , Análise de Sequência de RNA , Perfilação da Expressão Gênica
11.
J Biomech ; 162: 111909, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38118308

RESUMO

The properties of organs, tissues, organoids, and other systems of cells, are influenced by the spatial localization and distribution of their elements. Here, we used networks to describe distributions of cells on a surface where the small-world coefficient (SW) of the networks was varied between SW~1 (random uniform distributions) and SW~10 (clustered distributions). The small-world coefficient is a topological measure of graphs: networks with SW>1 are topologically biased to transmit information. For each system configuration, we then determined the total energy U as the sum of the energies that describe cell-cell interactions - approximated by a harmonic potential. The graph of energy (U) across the configuration space of the networks (SW) is the energy landscape: it indicates which configuration a system of cells will likely assume over time. We found that, depending on the model parameters, the energy landscapes of 2D distributions of cells may be of different types: from type I to type IV. Type I and type II systems have high probability to evolve into random distributions. Type III and type IV systems have a higher probability to form clustered architectures. A great many of simulations indicated that cultures of cells with high initial density and limited sensing range could evolve into clustered configurations with enhanced topological characteristics. Moreover, the strongest the binding between cells, the greater the likelihood that they will assume configurations characterized by finite values of SW. Results of the work are relevant for those working the field of tissue engineering, regenerative medicine, the formation of in-vitro-models, the analysis of neuro-degenerative diseases.


Assuntos
Células , Metabolismo Energético , Células/metabolismo
12.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38145950

RESUMO

Single cell sequencing technology has provided unprecedented opportunities for comprehensively deciphering cell heterogeneity. Nevertheless, the high dimensionality and intricate nature of cell heterogeneity have presented substantial challenges to computational methods. Numerous novel clustering methods have been proposed to address this issue. However, none of these methods achieve the consistently better performance under different biological scenarios. In this study, we developed CAKE, a novel and scalable self-supervised clustering method, which consists of a contrastive learning model with a mixture neighborhood augmentation for cell representation learning, and a self-Knowledge Distiller model for the refinement of clustering results. These designs provide more condensed and cluster-friendly cell representations and improve the clustering performance in term of accuracy and robustness. Furthermore, in addition to accurately identifying the major type cells, CAKE could also find more biologically meaningful cell subgroups and rare cell types. The comprehensive experiments on real single-cell RNA sequencing datasets demonstrated the superiority of CAKE in visualization and clustering over other comparison methods, and indicated its extensive application in the field of cell heterogeneity analysis. Contact: Ruiqing Zheng. (rqzheng@csu.edu.cn).


Assuntos
Algoritmos , Aprendizagem , Análise por Conglomerados , Análise de Sequência de RNA
13.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37935617

RESUMO

Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Expressão Gênica
14.
BMC Genomics ; 24(1): 725, 2023 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-38036964

RESUMO

In recent single-cell -omics studies, both the differential activity of transcription factors regulating cell fate determination and differential genome activation have been tested for utility as descriptors of cell types. Naturally, genome accessibility and gene expression are interlinked. To understand the variability in genomic feature activation in the GABAergic neurons of different spatial origins, we have mapped accessible chromatin regions and mRNA expression in single cells derived from the developing mouse central nervous system (CNS). We first defined a reference set of open chromatin regions for scATAC-seq read quantitation across samples, allowing comparison of chromatin accessibility between brain regions and cell types directly. Second, we integrated the scATAC-seq and scRNA-seq data to form a unified resource of transcriptome and chromatin accessibility landscape for the cell types in di- and telencephalon, midbrain and anterior hindbrain of E14.5 mouse embryo. Importantly, we implemented resolution optimization at the clustering, and automatized the cell typing step. We show high level of concordance between the cell clustering based on the chromatin accessibility and the transcriptome in analyzed neuronal lineages, indicating that both genome and transcriptome features can be used for cell type definition. Hierarchical clustering by the similarity in accessible chromatin reveals that the genomic feature activation correlates with neurotransmitter phenotype, selector gene expression, cell differentiation stage and neuromere origins.


Assuntos
Cromatina , Fatores de Transcrição , Animais , Camundongos , Cromatina/genética , Diferenciação Celular/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Genoma , Encéfalo/metabolismo , Análise de Célula Única
15.
J Theor Biol ; 575: 111646, 2023 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-37852358

RESUMO

This paper presents a numerical method for modelling cell migration and aggregation due to chemotaxis where the cell is attracted towards the direction in which the concentration of a chemical signal is increasing. In the model presented here, each cell is represented by a system of springs connected together at node points on the cell's membrane and on the boundary of the cell's nucleus. The nodes located on a cell's membrane are subject to a force which is proportional to the gradient of the concentration of the chemical signal which mimics the behaviour of the chemical receptors in the cell's membrane. In particular, the model developed here will consider what happens when two (or more) cells collide and how their membranes connect to each other to form clusters of cells. The methods described in this paper will be illustrated with a number of typical examples simulating cells moving in response to a chemical signal and how they combine to form clusters.


Assuntos
Quimiotaxia , Modelos Biológicos , Quimiotaxia/fisiologia , Movimento Celular/fisiologia , Modelos Teóricos , Análise por Conglomerados
16.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37769630

RESUMO

Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
17.
Cureus ; 15(8): e43244, 2023 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-37692623

RESUMO

BACKGROUND: In vitro studies with human fetal islets of different gestational ages (GA) would be a great tool to generate information on the developmental process of the islets as this would help to recontextualize diabetes research and clinical practice. Pancreatic islets from human cadavers and other animal species are extensively researched to explore their suitability for islet transplantation procedure, one of the upcoming treatment strategies for insulin-dependent diabetes mellitus. Although human fetal islets are also considered for islet transplantation, ethical issues and limited knowledge constraints their use. The fetal islets could be explored to address the information lacunae on the maturity process of pancreatic islets and the endocrine-exocrine signaling mechanisms. AIM: This study aimed to assess the feasibility of isolating viable islets and study the cytoarchitecture of the fetal pancreas of GA 22-29 weeks, not reported otherwise. METHODOLOGY: Pancreas obtained from the aborted fetuses of GA 22-29 weeks were subjected to collagenase digestion and were further cultured to determine the viability in vitro. Parameters assessed were expression of markers for endocrine cell lineages and insulin release to glucose challenge. RESULTS: Islets were viable in vitro and islets were shown to maintain cues for post-digestion re-aggregation and expansion in culture. The immunofluorescent staining showed islets of varying sizes, homogenous cell clusters aggregating to form heterogenous cell clusters, otherwise not reported for this GA. On stimulation with different concentrations of glucose (2.8 and 28 mM), the fetal islets in the culture exhibited insulin release, and this response confirmed their viability in vitro. CONCLUSION: Our findings showed that viable islets could be isolated and cultured in vitro for further in-depth studies to explore their proliferative potential as well as for the identification of pancreatic progenitors, a good strategy to take forward.

18.
Genome Biol ; 24(1): 212, 2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37730638

RESUMO

BACKGROUND: Single-cell sequencing provides detailed insights into biological processes including cell differentiation and identity. While providing deep cell-specific information, the method suffers from technical constraints, most notably a limited number of expressed genes per cell, which leads to suboptimal clustering and cell type identification. RESULTS: Here, we present DISCERN, a novel deep generative network that precisely reconstructs missing single-cell gene expression using a reference dataset. DISCERN outperforms competing algorithms in expression inference resulting in greatly improved cell clustering, cell type and activity detection, and insights into the cellular regulation of disease. We show that DISCERN is robust against differences between batches and is able to keep biological differences between batches, which is a common problem for imputation and batch correction algorithms. We use DISCERN to detect two unseen COVID-19-associated T cell types, cytotoxic CD4+ and CD8+ Tc2 T helper cells, with a potential role in adverse disease outcome. We utilize T cell fraction information of patient blood to classify mild or severe COVID-19 with an AUROC of 80% that can serve as a biomarker of disease stage. DISCERN can be easily integrated into existing single-cell sequencing workflow. CONCLUSIONS: Thus, DISCERN is a flexible tool for reconstructing missing single-cell gene expression using a reference dataset and can easily be applied to a variety of data sets yielding novel insights, e.g., into disease mechanisms.


Assuntos
COVID-19 , Humanos , COVID-19/genética , Algoritmos , Ciclo Celular , Diferenciação Celular , Análise por Conglomerados
19.
Front Immunol ; 14: 1194745, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37609075

RESUMO

Background: Robust immune cell gene expression signatures are central to the analysis of single cell studies. Nearly all known sets of immune cell signatures have been derived by making use of only single gene expression datasets. Utilizing the power of multiple integrated datasets could lead to high-quality immune cell signatures which could be used as superior inputs to machine learning-based cell type classification approaches. Results: We established a novel workflow for the discovery of immune cell type signatures based primarily on gene-versus-gene expression similarity. It leverages multiple datasets, here seven single cell expression datasets from six different cancer types and resulted in eleven immune cell type-specific gene expression signatures. We used these to train random forest classifiers for immune cell type assignment for single-cell RNA-seq datasets. We obtained similar or better prediction results compared to commonly used methods for cell type assignment in independent benchmarking datasets. Our gene signature set yields higher prediction scores than other published immune cell type gene sets in random forest-based cell type classification. We further demonstrate how our approach helps to avoid bias in downstream statistical analyses by re-analysis of a published IFN stimulation experiment. Discussion and conclusion: We demonstrated the quality of our immune cell signatures and their strong performance in a random forest-based cell typing approach. We argue that classifying cells based on our comparably slim sets of genes accompanied by a random forest-based approach not only matches or outperforms widely used published approaches. It also facilitates unbiased downstream statistical analyses of differential gene expression between cell types for significantly more genes compared to previous cell classification algorithms.


Assuntos
Algoritmos , Algoritmo Florestas Aleatórias , Benchmarking , Aprendizado de Máquina , Expressão Gênica
20.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37185897

RESUMO

Single-cell RNA-seq analysis has become a powerful tool to analyse the transcriptomes of individual cells. In turn, it has fostered the possibility of screening thousands of single cells in parallel. Thus, contrary to the traditional bulk measurements that only paint a macroscopic picture, gene measurements at the cell level aid researchers in studying different tissues and organs at various stages. However, accurate clustering methods for such high-dimensional data remain exiguous and a persistent challenge in this domain. Of late, several methods and techniques have been promulgated to address this issue. In this article, we propose a novel framework for clustering large-scale single-cell data and subsequently identifying the rare-cell sub-populations. To handle such sparse, high-dimensional data, we leverage PaCMAP (Pairwise Controlled Manifold Approximation), a feature extraction algorithm that preserves both the local and the global structures of the data and Gaussian Mixture Model to cluster single-cell data. Subsequently, we exploit Edited Nearest Neighbours sampling and Isolation Forest/One-class Support Vector Machine to identify rare-cell sub-populations. The performance of the proposed method is validated using the publicly available datasets with varying degrees of cell types and rare-cell sub-populations. On several benchmark datasets, the proposed method outperforms the existing state-of-the-art methods. The proposed method successfully identifies cell types that constitute populations ranging from 0.1 to 8% with F1-scores of 0.91 0.09. The source code is available at https://github.com/scrab017/RarPG.


Assuntos
Análise da Expressão Gênica de Célula Única , Aprendizado de Máquina não Supervisionado , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA