RESUMO
MOTIVATION: Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS: Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION: The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.
Assuntos
Análise de Célula Única , Transcriptoma , Análise de Célula Única/métodos , Análise por Conglomerados , Transcriptoma/genética , Humanos , Algoritmos , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , SoftwareRESUMO
Integrating single-cell datasets produced by multiple omics technologies is essential for defining cellular heterogeneity. Mosaic integration, in which different datasets share only some of the measured modalities, poses major challenges, particularly regarding modality alignment and batch effect removal. Here, we present a deep probabilistic framework for the mosaic integration and knowledge transfer (MIDAS) of single-cell multimodal data. MIDAS simultaneously achieves dimensionality reduction, imputation and batch correction of mosaic data by using self-supervised modality alignment and information-theoretic latent disentanglement. We demonstrate its superiority to 19 other methods and reliability by evaluating its performance in trimodal and mosaic integration tasks. We also constructed a single-cell trimodal atlas of human peripheral blood mononuclear cells and tailored transfer learning and reciprocal reference mapping schemes to enable flexible and accurate knowledge transfer from the atlas to new data. Applications in mosaic integration, pseudotime analysis and cross-tissue knowledge transfer on bone marrow mosaic datasets demonstrate the versatility and superiority of MIDAS. MIDAS is available at https://github.com/labomics/midas .
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Leucócitos Mononucleares/citologia , Software , Biologia Computacional/métodos , Reprodutibilidade dos TestesRESUMO
Intestinal bacteria strains play crucial roles in maintaining host health. Researchers have increasingly recognized the importance of strain-level analysis in metagenomic studies. Many analysis tools and several cutting-edge sequencing techniques like single cell sequencing have been proposed to decipher strains in metagenomes. However, strain-level complexity is far from being well characterized up to date. As the indicator of strain-level complexity, metagenomic single-nucleotide polymorphisms (SNPs) have been utilized to disentangle conspecific strains. Lots of SNP-based tools have been developed to identify strains in metagenomes. However, the sufficient sequencing depth for SNP and strain-level analysis remains unclear. We conducted ultra-deep sequencing of the human gut microbiome and constructed an unbiased framework to perform reliable SNP analysis. SNP profiles of the human gut metagenome by ultra-deep sequencing were obtained. SNPs identified from conventional and ultra-deep sequencing data were thoroughly compared and the relationship between SNP identification and sequencing depth were investigated. The results show that the commonly used shallow-depth sequencing is incapable to support a systematic metagenomic SNP discovery. In contrast, ultra-deep sequencing could detect more functionally important SNPs, which leads to reliable downstream analyses and novel discoveries. We also constructed a machine learning model to provide guidance for researchers to determine the optimal sequencing depth for their projects (SNPsnp, https://github.com/labomics/SNPsnp). To conclude, the SNP profiles based on ultra-deep sequencing data extend current knowledge on metagenomics and highlights the importance of evaluating sequencing depth before starting SNP analysis. This study provides new ideas and references for future strain-level investigations.
RESUMO
Liver cirrhosis (LC) has been associated with gut microbes. However, the strain diversity of species and its association with LC have received little attention. Here, we constructed a computational framework to study the strain heterogeneity in the gut microbiome of patients with LC. Only Faecalibacterium prausnitzii shows different single-nucleotide polymorphism (SNP) patterns between the LC and healthy control (HC) groups. Strain diversity analysis discovered that although most F. prausnitzii genomes are more deficient in the LC group than in the HC group at the strain level, a subgroup of 19 F. prausnitzii strains showed no sensitivity to LC, which is inconsistent with the species-level result. The functional differences between this subgroup and other strains may involve short-chain fatty acid production and chlorine-related pathways. These findings demonstrate functional differences among F. prausnitzii subgroups, which extend current knowledge about strain heterogeneity and relationships between F. prausnitzii and LC at the strain level. IMPORTANCE Most metagenomic studies focus on microbes at the species level, thus ignoring the different effects of different strains of the same species on the host. In this study, we explored the different microbes at the strain level in the intestines of patients with liver cirrhosis and of healthy people. Previous studies have shown that the species Faecalibacterium prausnitzii has a lower abundance in patients with liver cirrhosis than in healthy people. However, our results found multiple F. prausnitzii strains that do not decrease in abundance in patients with liver cirrhosis. It is more sensitive to select the appropriate strains as indicators to distinguish between the disease and the control samples than to use the entire species as an indicator. We clustered multiple F. prausnitzii strains and discuss the functional differences of different clusters. Our findings suggest that more attention should be paid to metagenomic studies at the strain level.