RESUMO
Droplet-based single-cell sequencing techniques rely on the fundamental assumption that each droplet encapsulates a single cell, enabling individual cell omics profiling. However, the inevitable issue of multiplets, where two or more cells are encapsulated within a single droplet, can lead to spurious cell type annotations and obscure true biological findings. The issue of multiplets is exacerbated in single-cell multiomics settings, where integrating cross-modality information for clustering can inadvertently promote the aggregation of multiplet clusters and increase the risk of erroneous cell type annotations. Here, we propose a compound Poisson model-based framework for multiplet detection in single-cell multiomics data. Leveraging experimental cell hashing results as the ground truth for multiplet status, we conducted trimodal DOGMA-seq experiments and generated 17 benchmarking datasets from two tissues, involving a total of 280,123 droplets. We demonstrated that the proposed method is an essential tool for integrating cross-modality multiplet signals, effectively eliminating multiplet clusters in single-cell multiomics data-a task at which the benchmarked single-omics methods proved inadequate.
Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Humanos , Animais , Análise por Conglomerados , Algoritmos , Camundongos , Distribuição de Poisson , MultiômicaRESUMO
The recently developed method TEA-seq and similar DOGMA-seq single cell trimodal omics assays provide unprecedented opportunities for understanding cell biology, but independent evaluation is lacking. We explore the utility of DOGMA-seq compared to the bimodal CITE-seq assay in activated and stimulated human peripheral blood T cells. We find that single cell trimodal omics measurements after digitonin (DIG) permeabilization were generally better than after an alternative "low-loss lysis" (LLL) permeabilization condition. Next, we find that DOGMA-seq with optimized DIG permeabilization and its ATAC library provides more information, although its mRNA and cell surface protein libraries have slightly inferior quality, compared to CITE-seq.
Assuntos
Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , RNA Mensageiro/genética , Análise de Sequência de DNA/métodos , Análise de Célula ÚnicaRESUMO
Identifying and removing multiplets are essential to improving the scalability and the reliability of single cell RNA sequencing (scRNA-seq). Multiplets create artificial cell types in the dataset. We propose a Gaussian mixture model-based multiplet identification method, GMM-Demux. GMM-Demux accurately identifies and removes multiplets through sample barcoding, including cell hashing and MULTI-seq. GMM-Demux uses a droplet formation model to authenticate putative cell types discovered from a scRNA-seq dataset. We generate two in-house cell-hashing datasets and compared GMM-Demux against three state-of-the-art sample barcoding classifiers. We show that GMM-Demux is stable and highly accurate and recognizes 9 multiplet-induced fake cell types in a PBMC dataset.