Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 358
Filtrar
Más filtros

Base de datos
Tipo del documento
Intervalo de año de publicación
1.
Genome Biol ; 25(1): 181, 2024 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-38978088

RESUMEN

Single-cell multiomic analysis of the epigenome, transcriptome, and proteome allows for comprehensive characterization of the molecular circuitry that underpins cell identity and state. However, the holistic interpretation of such datasets presents a challenge given a paucity of approaches for systematic, joint evaluation of different modalities. Here, we present Panpipes, a set of computational workflows designed to automate multimodal single-cell and spatial transcriptomic analyses by incorporating widely-used Python-based tools to perform quality control, preprocessing, integration, clustering, and reference mapping at scale. Panpipes allows reliable and customizable analysis and evaluation of individual and integrated modalities, thereby empowering decision-making before downstream investigations.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Transcriptoma , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Humanos , Flujo de Trabajo
2.
Nature ; 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-38987588

RESUMEN

Chronic hepatitis B virus (HBV) infection affects 300 million patients worldwide1,2, in whom virus-specific CD8 T cells by still ill-defined mechanisms lose their function and cannot eliminate HBV-infected hepatocytes3-7. Here we demonstrate that a liver immune rheostat renders virus-specific CD8 T cells refractory to activation and leads to their loss of effector functions. In preclinical models of persistent infection with hepatotropic viruses such as HBV, dysfunctional virus-specific CXCR6+ CD8 T cells accumulated in the liver and, as a characteristic hallmark, showed enhanced transcriptional activity of cAMP-responsive element modulator (CREM) distinct from T cell exhaustion. In patients with chronic hepatitis B, circulating and intrahepatic HBV-specific CXCR6+ CD8 T cells with enhanced CREM expression and transcriptional activity were detected at a frequency of 12-22% of HBV-specific CD8 T cells. Knocking out the inhibitory CREM/ICER isoform in T cells, however, failed to rescue T cell immunity. This indicates that CREM activity was a consequence, rather than the cause, of loss in T cell function, further supported by the observation of enhanced phosphorylation of protein kinase A (PKA) which is upstream of CREM. Indeed, we found that enhanced cAMP-PKA-signalling from increased T cell adenylyl cyclase activity augmented CREM activity and curbed T cell activation and effector function in persistent hepatic infection. Mechanistically, CD8 T cells recognizing their antigen on hepatocytes established close and extensive contact with liver sinusoidal endothelial cells, thereby enhancing adenylyl cyclase-cAMP-PKA signalling in T cells. In these hepatic CD8 T cells, which recognize their antigen on hepatocytes, phosphorylation of key signalling kinases of the T cell receptor signalling pathway was impaired, which rendered them refractory to activation. Thus, close contact with liver sinusoidal endothelial cells curbs the activation and effector function of HBV-specific CD8 T cells that target hepatocytes expressing viral antigens by means of the adenylyl cyclase-cAMP-PKA axis in an immune rheostat-like fashion.

3.
Nat Commun ; 15(1): 5577, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38956082

RESUMEN

Recent advances in single-cell immune profiling have enabled the simultaneous measurement of transcriptome and T cell receptor (TCR) sequences, offering great potential for studying immune responses at the cellular level. However, integrating these diverse modalities across datasets is challenging due to their unique data characteristics and technical variations. Here, to address this, we develop the multimodal generative model mvTCR to fuse modality-specific information across transcriptome and TCR into a shared representation. Our analysis demonstrates the added value of multimodal over unimodal approaches to capture antigen specificity. Notably, we use mvTCR to distinguish T cell subpopulations binding to SARS-CoV-2 antigens from bystander cells. Furthermore, when combined with reference mapping approaches, mvTCR can map newly generated datasets to extensive T cell references, facilitating knowledge transfer. In summary, we envision mvTCR to enable a scalable analysis of multimodal immune profiling data and advance our understanding of immune responses.


Asunto(s)
COVID-19 , Receptores de Antígenos de Linfocitos T , SARS-CoV-2 , Análisis de la Célula Individual , Transcriptoma , Receptores de Antígenos de Linfocitos T/metabolismo , Receptores de Antígenos de Linfocitos T/genética , Receptores de Antígenos de Linfocitos T/inmunología , Análisis de la Célula Individual/métodos , Humanos , SARS-CoV-2/inmunología , SARS-CoV-2/genética , COVID-19/inmunología , COVID-19/virología , Linfocitos T/inmunología , Linfocitos T/metabolismo , Perfilación de la Expresión Génica/métodos , Antígenos Virales/inmunología , Antígenos Virales/genética
4.
Nat Methods ; 21(7): 1196-1205, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38871986

RESUMEN

Single-cell RNA sequencing allows us to model cellular state dynamics and fate decisions using expression similarity or RNA velocity to reconstruct state-change trajectories; however, trajectory inference does not incorporate valuable time point information or utilize additional modalities, whereas methods that address these different data views cannot be combined or do not scale. Here we present CellRank 2, a versatile and scalable framework to study cellular fate using multiview single-cell data of up to millions of cells in a unified fashion. CellRank 2 consistently recovers terminal states and fate probabilities across data modalities in human hematopoiesis and endodermal development. Our framework also allows combining transitions within and across experimental time points, a feature we use to recover genes promoting medullary thymic epithelial cell formation during pharyngeal endoderm development. Moreover, we enable estimating cell-specific transcription and degradation rates from metabolic-labeling data, which we apply to an intestinal organoid system to delineate differentiation trajectories and pinpoint regulatory strategies.


Asunto(s)
Diferenciación Celular , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Endodermo/citología , Endodermo/metabolismo , Hematopoyesis , Linaje de la Célula , Análisis de Secuencia de ARN/métodos , Organoides/metabolismo , Organoides/citología
5.
Genes (Basel) ; 15(6)2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38927741

RESUMEN

Bronchopulmonary dysplasia (BPD) is a chronic lung disease commonly affecting premature infants, with limited therapeutic options and increased long-term consequences. Adrenomedullin (Adm), a proangiogenic peptide hormone, has been found to protect rodents against experimental BPD. This study aims to elucidate the molecular and cellular mechanisms through which Adm influences BPD pathogenesis using a lipopolysaccharide (LPS)-induced model of experimental BPD in mice. Bulk RNA sequencing of Adm-sufficient (wild-type or Adm+/+) and Adm-haplodeficient (Adm+/-) mice lungs, integrated with single-cell RNA sequencing data, revealed distinct gene expression patterns and cell type alterations associated with Adm deficiency and LPS exposure. Notably, computational integration with cell atlas data revealed that Adm-haplodeficient mouse lungs exhibited gene expression signatures characteristic of increased inflammation, natural killer (NK) cell frequency, and decreased endothelial cell and type II pneumocyte frequency. Furthermore, in silico human BPD patient data analysis supported our cell type frequency finding, highlighting elevated NK cells in BPD infants. These results underscore the protective role of Adm in experimental BPD and emphasize that it is a potential therapeutic target for BPD infants with an inflammatory phenotype.


Asunto(s)
Adrenomedulina , Displasia Broncopulmonar , Adrenomedulina/genética , Adrenomedulina/metabolismo , Displasia Broncopulmonar/genética , Displasia Broncopulmonar/patología , Displasia Broncopulmonar/metabolismo , Animales , Ratones , Humanos , Análisis de Secuencia de ARN/métodos , Modelos Animales de Enfermedad , Lipopolisacáridos , Pulmón/metabolismo , Pulmón/patología , Células Asesinas Naturales/metabolismo , Células Asesinas Naturales/inmunología , Transcriptoma
6.
Genome Med ; 16(1): 80, 2024 06 11.
Artículo en Inglés | MEDLINE | ID: mdl-38862979

RESUMEN

The study of immunology, traditionally reliant on proteomics to evaluate individual immune cells, has been revolutionized by single-cell RNA sequencing. Computational immunologists play a crucial role in analysing these datasets, moving beyond traditional protein marker identification to encompass a more detailed view of cellular phenotypes and their functional roles. Recent technological advancements allow the simultaneous measurements of multiple cellular components-transcriptome, proteome, chromatin, epigenetic modifications and metabolites-within single cells, including in spatial contexts within tissues. This has led to the generation of complex multiscale datasets that can include multimodal measurements from the same cells or a mix of paired and unpaired modalities. Modern machine learning (ML) techniques allow for the integration of multiple "omics" data without the need for extensive independent modelling of each modality. This review focuses on recent advancements in ML integrative approaches applied to immunological studies. We highlight the importance of these methods in creating a unified representation of multiscale data collections, particularly for single-cell and spatial profiling technologies. Finally, we discuss the challenges of these holistic approaches and how they will be instrumental in the development of a common coordinate framework for multiscale studies, thereby accelerating research and enabling discoveries in the computational immunology field.


Asunto(s)
Biología Computacional , Aprendizaje Automático , Humanos , Biología Computacional/métodos , Análisis de la Célula Individual/métodos , Alergia e Inmunología , Animales , Inmunoinformática
7.
Bioinformatics ; 40(Supplement_1): i548-i557, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940138

RESUMEN

SUMMARY: Spatial omics technologies are increasingly leveraged to characterize how disease disrupts tissue organization and cellular niches. While multiple methods to analyze spatial variation within a sample have been published, statistical and computational approaches to compare cell spatial organization across samples or conditions are mostly lacking. We present GraphCompass, a comprehensive set of omics-adapted graph analysis methods to quantitatively evaluate and compare the spatial arrangement of cells in samples representing diverse biological conditions. GraphCompass builds upon the Squidpy spatial omics toolbox and encompasses various statistical approaches to perform cross-condition analyses at the level of individual cell types, niches, and samples. Additionally, GraphCompass provides custom visualization functions that enable effective communication of results. We demonstrate how GraphCompass can be used to address key biological questions, such as how cellular organization and tissue architecture differ across various disease states and which spatial patterns correlate with a given pathological condition. GraphCompass can be applied to various popular omics techniques, including, but not limited to, spatial proteomics (e.g. MIBI-TOF), spot-based transcriptomics (e.g. 10× Genomics Visium), and single-cell resolved transcriptomics (e.g. Stereo-seq). In this work, we showcase the capabilities of GraphCompass through its application to three different studies that may also serve as benchmark datasets for further method development. With its easy-to-use implementation, extensive documentation, and comprehensive tutorials, GraphCompass is accessible to biologists with varying levels of computational expertise. By facilitating comparative analyses of cell spatial organization, GraphCompass promises to be a valuable asset in advancing our understanding of tissue function in health and disease. .


Asunto(s)
Programas Informáticos , Humanos , Proteómica/métodos , Biología Computacional/métodos , Genómica/métodos , Animales , Transcriptoma , Análisis de la Célula Individual/métodos
8.
Cell ; 187(10): 2343-2358, 2024 May 09.
Artículo en Inglés | MEDLINE | ID: mdl-38729109

RESUMEN

As the number of single-cell datasets continues to grow rapidly, workflows that map new data to well-curated reference atlases offer enormous promise for the biological community. In this perspective, we discuss key computational challenges and opportunities for single-cell reference-mapping algorithms. We discuss how mapping algorithms will enable the integration of diverse datasets across disease states, molecular modalities, genetic perturbations, and diverse species and will eventually replace manual and laborious unsupervised clustering pipelines.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Biología Computacional/métodos , Análisis de Datos , Animales , Análisis por Conglomerados
9.
Nat Comput Sci ; 4(5): 367-378, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38730184

RESUMEN

Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of ~20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 µs. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Simulación de Dinámica Molecular , Proteínas , Ligandos , Descubrimiento de Drogas/métodos , Proteínas/química , Proteínas/metabolismo , Teoría Cuántica
10.
J Hepatol ; 2024 Apr 05.
Artículo en Inglés | MEDLINE | ID: mdl-38583492

RESUMEN

BACKGROUND & AIMS: Polyploidy in hepatocytes has been proposed as a genetic mechanism to buffer against transcriptional dysregulation. Here, we aim to demonstrate the role of polyploidy in modulating gene regulatory networks in hepatocytes during ageing. METHODS: We performed single-nucleus RNA sequencing in hepatocyte nuclei of different ploidy levels isolated from young and old wild-type mice. Changes in the gene expression and regulatory network were compared to three independent strains that were haploinsufficient for HNF4A, CEBPA or CTCF, representing non-deleterious perturbations. Phenotypic characteristics of the liver section were additionally evaluated histologically, whereas the genomic allele composition of hepatocytes was analysed by BaseScope. RESULTS: We observed that ageing in wild-type mice results in nuclei polyploidy and a marked increase in steatosis. Haploinsufficiency of liver-specific master regulators (HFN4A or CEBPA) results in the enrichment of hepatocytes with tetraploid nuclei at a young age, affecting the genomic regulatory network, and dramatically suppressing ageing-related steatosis tissue wide. Notably, these phenotypes are not the result of subtle disruption to liver-specific transcriptional networks, since haploinsufficiency in the CTCF insulator protein resulted in the same phenotype. Further quantification of genotypes of tetraploid hepatocytes in young and old HFN4A-haploinsufficient mice revealed that during ageing, tetraploid hepatocytes lead to the selection of wild-type alleles, restoring non-deleterious genetic perturbations. CONCLUSIONS: Our results suggest a model whereby polyploidisation leads to fundamentally different cell states. Polyploid conversion enables pleiotropic buffering against age-related decline via non-random allelic segregation to restore a wild-type genome. IMPACT AND IMPLICATIONS: The functional role of hepatocyte polyploidisation during ageing is poorly understood. Using single-nucleus RNA sequencing and BaseScope approaches, we have studied ploidy dynamics during ageing in murine livers with non-deleterious genetic perturbations. We have identified that hepatocytes present different cellular states and the ability to buffer ageing-associated dysfunctions. Tetraploid nuclei exhibit robust transcriptional networks and are better adapted to genomically overcome perturbations. Novel therapeutic interventions aimed at attenuating age-related changes in tissue function could be exploited by manipulation of ploidy dynamics during chronic liver conditions.

11.
Genome Biol ; 25(1): 109, 2024 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-38671451

RESUMEN

Single-cell multiplexing techniques (cell hashing and genetic multiplexing) combine multiple samples, optimizing sample processing and reducing costs. Cell hashing conjugates antibody-tags or chemical-oligonucleotides to cell membranes, while genetic multiplexing allows to mix genetically diverse samples and relies on aggregation of RNA reads at known genomic coordinates. We develop hadge (hashing deconvolution combined with genotype information), a Nextflow pipeline that combines 12 methods to perform both hashing- and genotype-based deconvolution. We propose a joint deconvolution strategy combining best-performing methods and demonstrate how this approach leads to the recovery of previously discarded cells in a nuclei hashing of fresh-frozen brain tissue.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Encéfalo/metabolismo , Encéfalo/citología , Programas Informáticos , Genotipo
12.
Res Sq ; 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38645152

RESUMEN

With the growing number of single-cell analysis tools, benchmarks are increasingly important to guide analysis and method development. However, a lack of standardisation and extensibility in current benchmarks limits their usability, longevity, and relevance to the community. We present Open Problems, a living, extensible, community-guided benchmarking platform including 10 current single-cell tasks that we envision will raise standards for the selection, evaluation, and development of methods in single-cell analysis.

13.
Nat Commun ; 15(1): 2866, 2024 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-38570482

RESUMEN

Traumatic brain injury leads to a highly orchestrated immune- and glial cell response partially responsible for long-lasting disability and the development of secondary neurodegenerative diseases. A holistic understanding of the mechanisms controlling the responses of specific cell types and their crosstalk is required to develop an efficient strategy for better regeneration. Here, we combine spatial and single-cell transcriptomics to chart the transcriptomic signature of the injured male murine cerebral cortex, and identify specific states of different glial cells contributing to this signature. Interestingly, distinct glial cells share a large fraction of injury-regulated genes, including inflammatory programs downstream of the innate immune-associated pathways Cxcr3 and Tlr1/2. Systemic manipulation of these pathways decreases the reactivity state of glial cells associated with poor regeneration. The functional relevance of the discovered shared signature of glial cells highlights the importance of our resource enabling comprehensive analysis of early events after brain injury.


Asunto(s)
Lesiones Encefálicas , Heridas Punzantes , Animales , Ratones , Masculino , Proteína Ácida Fibrilar de la Glía/metabolismo , Neuroglía/metabolismo , Lesiones Encefálicas/metabolismo , Corteza Cerebral/metabolismo , Heridas Punzantes/complicaciones , Heridas Punzantes/metabolismo
14.
Bioinformatics ; 40(4)2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38485697

RESUMEN

SUMMARY: Accurate clustering of mixed data, encompassing binary, categorical, and continuous variables, is vital for effective patient stratification in clinical questionnaire analysis. To address this need, we present longmixr, a comprehensive R package providing a robust framework for clustering mixed longitudinal data using finite mixture modeling techniques. By incorporating consensus clustering, longmixr ensures reliable and stable clustering results. Moreover, the package includes a detailed vignette that facilitates cluster exploration and visualization. AVAILABILITY AND IMPLEMENTATION: The R package is freely available at https://cran.r-project.org/package=longmixr with detailed documentation, including a case vignette, at https://cellmapslab.github.io/longmixr/.


Asunto(s)
Programas Informáticos , Humanos , Estudios Transversales , Análisis por Conglomerados , Encuestas y Cuestionarios
15.
Nat Methods ; 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38509327

RESUMEN

Spatially resolved omics technologies are transforming our understanding of biological tissues. However, the handling of uni- and multimodal spatial omics datasets remains a challenge owing to large data volumes, heterogeneity of data types and the lack of flexible, spatially aware data structures. Here we introduce SpatialData, a framework that establishes a unified and extensible multiplatform file-format, lazy representation of larger-than-memory data, transformations and alignment to common coordinate systems. SpatialData facilitates spatial annotations and cross-modal aggregation and analysis, the utility of which is illustrated in the context of multiple vignettes, including integrative analysis on a multimodal Xenium and Visium breast cancer study.

16.
Eur Respir J ; 63(2)2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38212077

RESUMEN

BACKGROUND: Fibroblast-to-myofibroblast conversion is a major driver of tissue remodelling in organ fibrosis. Distinct lineages of fibroblasts support homeostatic tissue niche functions, yet their specific activation states and phenotypic trajectories during injury and repair have remained unclear. METHODS: We combined spatial transcriptomics, multiplexed immunostainings, longitudinal single-cell RNA-sequencing and genetic lineage tracing to study fibroblast fates during mouse lung regeneration. Our findings were validated in idiopathic pulmonary fibrosis patient tissues in situ as well as in cell differentiation and invasion assays using patient lung fibroblasts. Cell differentiation and invasion assays established a function of SFRP1 in regulating human lung fibroblast invasion in response to transforming growth factor (TGF)ß1. MEASUREMENTS AND MAIN RESULTS: We discovered a transitional fibroblast state characterised by high Sfrp1 expression, derived from both Tcf21-Cre lineage positive and negative cells. Sfrp1 + cells appeared early after injury in peribronchiolar, adventitial and alveolar locations and preceded the emergence of myofibroblasts. We identified lineage-specific paracrine signals and inferred converging transcriptional trajectories towards Sfrp1 + transitional fibroblasts and Cthrc1 + myofibroblasts. TGFß1 downregulated SFRP1 in noninvasive transitional cells and induced their switch to an invasive CTHRC1+ myofibroblast identity. Finally, using loss-of-function studies we showed that SFRP1 modulates TGFß1-induced fibroblast invasion and RHOA pathway activity. CONCLUSIONS: Our study reveals the convergence of spatially and transcriptionally distinct fibroblast lineages into transcriptionally uniform myofibroblasts and identifies SFRP1 as a modulator of TGFß1-driven fibroblast phenotypes in fibrogenesis. These findings are relevant in the context of therapeutic interventions that aim at limiting or reversing fibroblast foci formation.


Asunto(s)
Fibrosis Pulmonar Idiopática , Miofibroblastos , Ratones , Animales , Humanos , Miofibroblastos/metabolismo , Fibroblastos/metabolismo , Pulmón/metabolismo , Fibrosis Pulmonar Idiopática/metabolismo , Diferenciación Celular , Factor de Crecimiento Transformador beta1/metabolismo , Proteínas de la Matriz Extracelular/metabolismo , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo
17.
Nat Methods ; 21(1): 28-31, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38049697

RESUMEN

Single-cell ATAC sequencing coverage in regulatory regions is typically binarized as an indicator of open chromatin. Here we show that binarization is an unnecessary step that neither improves goodness of fit, clustering, cell type identification nor batch integration. Fragment counts, but not read counts, should instead be modeled, which preserves quantitative regulatory information. These results have immediate implications for single-cell ATAC sequencing analysis.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Cromatina/genética , Análisis de la Célula Individual
18.
Artículo en Inglés | MEDLINE | ID: mdl-38086412

RESUMEN

BACKGROUND: In optical coherence tomography (OCT) scans of patients with inherited retinal diseases (IRDs), the measurement of the thickness of the outer nuclear layer (ONL) has been well established as a surrogate marker for photoreceptor preservation. Current automatic segmentation tools fail in OCT segmentation in IRDs, and manual segmentation is time-consuming. METHODS AND MATERIAL: Patients with IRD and an available OCT scan were screened for the present study. Additionally, OCT scans of patients without retinal disease were included to provide training data for artificial intelligence (AI). We trained a U-net-based model on healthy patients and applied a domain adaption technique to the IRD patients' scans. RESULTS: We established an AI-based image segmentation algorithm that reliably segments the ONL in OCT scans of IRD patients. In a test dataset, the dice score of the algorithm was 98.7%. Furthermore, we generated thickness maps of the full retinal thickness and the ONL layer for each patient. CONCLUSION: Accurate segmentation of anatomical layers on OCT scans plays a crucial role for predictive models linking retinal structure to visual function. Our algorithm for segmentation of OCT images could provide the basis for further studies on IRDs.

19.
bioRxiv ; 2024 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-37961672

RESUMEN

Integration of single-cell RNA-sequencing (scRNA-seq) datasets has become a standard part of the analysis, with conditional variational autoencoders (cVAE) being among the most popular approaches. Increasingly, researchers are asking to map cells across challenging cases such as cross-organs, species, or organoids and primary tissue, as well as different scRNA-seq protocols, including single-cell and single-nuclei. Current computational methods struggle to harmonize datasets with such substantial differences, driven by technical or biological variation. Here, we propose to address these challenges for the popular cVAE-based approaches by introducing and comparing a series of regularization constraints. The two commonly used strategies for increasing batch correction in cVAEs, that is Kullback-Leibler divergence (KL) regularization strength tuning and adversarial learning, suffer from substantial loss of biological information. Therefore, we adapt, implement, and assess alternative regularization strategies for cVAEs and investigate how they improve batch effect removal or better preserve biological variation, enabling us to propose an optimal cVAE-based integration strategy for complex systems. We show that using a VampPrior instead of the commonly used Gaussian prior not only improves the preservation of biological variation but also unexpectedly batch correction. Moreover, we show that our implementation of cycle-consistency loss leads to significantly better biological preservation than adversarial learning implemented in the previously proposed GLUE model. Additionally, we do not recommend relying only on the KL regularization strength tuning for increasing batch correction, as it removes both biological and batch information without discriminating between the two. Based on our findings, we propose a new model that combines VampPrior and cycle-consistency loss. We show that using it for datasets with substantial batch effects improves downstream interpretation of cell states and biological conditions. To ease the use of the newly proposed model, we make it available in the scvi-tools package as an external model named sysVI. Moreover, in the future, these regularization techniques could be added to other established cVAE-based models to improve the integration of datasets with substantial batch effects.

20.
Nat Methods ; 21(1): 50-59, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-37735568

RESUMEN

RNA velocity has been rapidly adopted to guide interpretation of transcriptional dynamics in snapshot single-cell data; however, current approaches for estimating RNA velocity lack effective strategies for quantifying uncertainty and determining the overall applicability to the system of interest. Here, we present veloVI (velocity variational inference), a deep generative modeling framework for estimating RNA velocity. veloVI learns a gene-specific dynamical model of RNA metabolism and provides a transcriptome-wide quantification of velocity uncertainty. We show that veloVI compares favorably to previous approaches with respect to goodness of fit, consistency across transcriptionally similar cells and stability across preprocessing pipelines for quantifying RNA abundance. Further, we demonstrate that veloVI's posterior velocity uncertainty can be used to assess whether velocity analysis is appropriate for a given dataset. Finally, we highlight veloVI as a flexible framework for modeling transcriptional dynamics by adapting the underlying dynamical model to use time-dependent transcription rates.


Asunto(s)
ARN , Transcriptoma , ARN/genética , Aprendizaje
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA