RESUMO
RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.
Assuntos
RNA , Transcriptoma , RNA/genética , AprendizagemRESUMO
Detecting differentially expressed genes is important for characterizing subpopulations of cells. In scRNA-seq data, however, nuisance variation due to technical factors like sequencing depth and RNA capture efficiency obscures the underlying biological signal. Deep generative models have been extensively applied to scRNA-seq data, with a special focus on embedding cells into a low-dimensional latent space and correcting for batch effects. However, little attention has been paid to the problem of utilizing the uncertainty from the deep generative model for differential expression (DE). Furthermore, the existing approaches do not allow for controlling for effect size or the false discovery rate (FDR). Here, we present lvm-DE, a generic Bayesian approach for performing DE predictions from a fitted deep generative model, while controlling the FDR. We apply the lvm-DE framework to scVI and scSphere, two deep generative models. The resulting approaches outperform state-of-the-art methods at estimating the log fold change in gene expression levels as well as detecting differentially expressed genes between subpopulations of cells.
Assuntos
RNA , Análise de Célula Única , Teorema de Bayes , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodosAssuntos
Ecossistema , Genômica , Biologia Computacional , Análise de Célula Única , Análise de DadosRESUMO
Single-cell ATAC sequencing (scATAC-seq) is a powerful and increasingly popular technique to explore the regulatory landscape of heterogeneous cellular populations. However, the high noise levels, degree of sparsity, and scale of the generated data make its analysis challenging. Here, we present PeakVI, a probabilistic framework that leverages deep neural networks to analyze scATAC-seq data. PeakVI fits an informative latent space that preserves biological heterogeneity while correcting batch effects and accounting for technical effects, such as library size and region-specific biases. In addition, PeakVI provides a technique for identifying differential accessibility at a single-region resolution, which can be used for cell-type annotation as well as identification of key cis-regulatory elements. We use public datasets to demonstrate that PeakVI is scalable, stable, robust to low-quality data, and outperforms current analysis methods on a range of critical analysis tasks. PeakVI is publicly available and implemented in the scvi-tools framework.
Assuntos
Cromatina , Sequências Reguladoras de Ácido Nucleico , Cromatina/genética , Biblioteca GênicaRESUMO
Spatial transcriptomic technologies promise to resolve cellular wiring diagrams of tissues in health and disease, but comprehensive mapping of cell types in situ remains a challenge. Here we present Ñell2location, a Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single-cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. We assessed cell2location in three different tissues and show improved mapping of fine-grained cell types. In the mouse brain, we discovered fine regional astrocyte subtypes across the thalamus and hypothalamus. In the human lymph node, we spatially mapped a rare pre-germinal center B cell population. In the human gut, we resolved fine immune cell populations in lymphoid follicles. Collectively, our results present Ñell2location as a versatile analysis tool for mapping tissue architectures in a comprehensive manner.
Assuntos
Análise de Célula Única , Transcriptoma , Animais , Teorema de Bayes , Camundongos , Análise de Célula Única/métodos , Transcriptoma/genéticaRESUMO
Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.
Assuntos
Conjuntos de Dados como Assunto/normas , Aprendizado Profundo , Especificidade de Órgãos , Análise de Célula Única/normas , Animais , COVID-19/patologia , Humanos , Camundongos , Padrões de Referência , SARS-CoV-2/patogenicidadeRESUMO
The paired measurement of RNA and surface proteins in single cells with cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) is a promising approach to connect transcriptional variation with cell phenotypes and functions. However, combining these paired views into a unified representation of cell state is made challenging by the unique technical characteristics of each measurement. Here we present Total Variational Inference (totalVI; https://scvi-tools.org ), a framework for end-to-end joint analysis of CITE-seq data that probabilistically represents the data as a composite of biological and technical factors, including protein background and batch effects. To evaluate totalVI's performance, we profiled immune cells from murine spleen and lymph nodes with CITE-seq, measuring over 100 surface proteins. We demonstrate that totalVI provides a cohesive solution for common analysis tasks such as dimensionality reduction, the integration of datasets with different measured proteins, estimation of correlations between molecules and differential expression testing.
Assuntos
Linfonodos/metabolismo , Proteínas/análise , Análise de Célula Única/métodos , Baço/metabolismo , Transcriptoma/genética , Animais , Células Cultivadas , Análise de Dados , Feminino , Ensaios de Triagem em Larga Escala/métodos , Linfonodos/citologia , Camundongos , Camundongos Endogâmicos C57BL , RNA/análise , RNA/genética , Baço/citologiaRESUMO
Generative models provide a well-established statistical framework for evaluating uncertainty and deriving conclusions from large data sets especially in the presence of noise, sparsity, and bias. Initially developed for computer vision and natural language processing, these models have been shown to effectively summarize the complexity that underlies many types of data and enable a range of applications including supervised learning tasks, such as assigning labels to images; unsupervised learning tasks, such as dimensionality reduction; and out-of-sample generation, such as de novo image synthesis. With this early success, the power of generative models is now being increasingly leveraged in molecular biology, with applications ranging from designing new molecules with properties of interest to identifying deleterious mutations in our genomes and to dissecting transcriptional variability between single cells. In this review, we provide a brief overview of the technical notions behind generative models and their implementation with deep learning techniques. We then describe several different ways in which these models can be utilized in practice, using several recent applications in molecular biology as examples.
Assuntos
Aprendizado Profundo , Modelos Estatísticos , Biologia Molecular , Pesquisa Biomédica , Tomada de Decisões , Redes Neurais de ComputaçãoRESUMO
MOTIVATION: Single-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretable. RESULTS: We present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications. AVAILABILITY AND IMPLEMENTATION: The factor model is available in the scVI package hosted at https://github.com/YosefLab/scVI/. CONTACT: v@nxn.se. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
RNA-Seq , Análise de Célula Única , Teorema de Bayes , Análise de Sequência de RNA , Software , Sequenciamento do ExomaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
Single-cell RNA sequencing studies of differentiating systems have raised fundamental questions regarding the discrete versus continuous nature of both differentiation and cell fate. Here we present Palantir, an algorithm that models trajectories of differentiating cells by treating cell fate as a probabilistic process and leverages entropy to measure cell plasticity along the trajectory. Palantir generates a high-resolution pseudo-time ordering of cells and, for each cell state, assigns a probability of differentiating into each terminal state. We apply our algorithm to human bone marrow single-cell RNA sequencing data and detect important landmarks of hematopoietic differentiation. Palantir's resolution enables the identification of key transcription factors that drive lineage fate choice and closely track when cells lose plasticity. We show that Palantir outperforms existing algorithms in identifying cell lineages and recapitulating gene expression trends during differentiation, is generalizable to diverse tissue types, and is well-suited to resolving less-studied differentiating systems.
Assuntos
Algoritmos , Diferenciação Celular/genética , Linhagem da Célula/genética , Análise de Sequência de RNA/estatística & dados numéricos , Análise de Célula Única/estatística & dados numéricos , Animais , Biotecnologia , Células da Medula Óssea/citologia , Células da Medula Óssea/metabolismo , Eritropoese/genética , Regulação da Expressão Gênica no Desenvolvimento , Hematopoese/genética , Humanos , Cadeias de Markov , Camundongos , Modelos Biológicos , Modelos EstatísticosRESUMO
Carbapenem-resistant Enterobacteriaceae (CRE) organisms have emerged to become a major global public health threat among antimicrobial resistant bacterial human pathogens. Little is known about how CREs emerge. One characteristic phenotype of CREs is heteroresistance, which is clinically associated with treatment failure in patients given a carbapenem. Through in vitro whole-transcriptome analysis we tracked gene expression over time in two different strains (BR7, BR21) of heteroresistant KPC-producing Klebsiella pneumoniae, first exposed to a bactericidal concentration of imipenem followed by growth in drug-free medium. In both strains, the immediate response was dominated by a shift in expression of genes involved in glycolysis toward those involved in catabolic pathways. This response was followed by global dampening of transcriptional changes involving protein translation, folding and transport, and decreased expression of genes encoding critical junctures of lipopolysaccharide biosynthesis. The emerged high-level carbapenem-resistant BR21 subpopulation had a prophage (IS1) disrupting ompK36 associated with irreversible OmpK36 porin loss. On the other hand, OmpK36 loss in BR7 was reversible. The acquisition of high-level carbapenem resistance by the two heteroresistant strains was associated with distinct and shared stepwise transcriptional programs. Carbapenem heteroresistance may emerge from the most adaptive subpopulation among a population of cells undergoing a complex set of stress-adaptive responses.