RESUMEN
The enteric nervous system (ENS) coordinates diverse functions in the intestine but has eluded comprehensive molecular characterization because of the rarity and diversity of cells. Here we develop two methods to profile the ENS of adult mice and humans at single-cell resolution: RAISIN RNA-seq for profiling intact nuclei with ribosome-bound mRNA and MIRACL-seq for label-free enrichment of rare cell types by droplet-based profiling. The 1,187,535 nuclei in our mouse atlas include 5,068 neurons from the ileum and colon, revealing extraordinary neuron diversity. We highlight circadian expression changes in enteric neurons, show that disease-related genes are dysregulated with aging, and identify differences between the ileum and proximal/distal colon. In humans, we profile 436,202 nuclei, recovering 1,445 neurons, and identify conserved and species-specific transcriptional programs and putative neuro-epithelial, neuro-stromal, and neuro-immune interactions. The human ENS expresses risk genes for neuropathic, inflammatory, and extra-intestinal diseases, suggesting neuronal contributions to disease.
Asunto(s)
Sistema Nervioso Entérico/citología , Sistema Nervioso Entérico/metabolismo , Regulación del Desarrollo de la Expresión Génica/genética , Neuronas/metabolismo , Cuerpos de Nissl/metabolismo , ARN Mensajero/metabolismo , Análisis de la Célula Individual/métodos , Envejecimiento/genética , Envejecimiento/metabolismo , Animales , Relojes Circadianos/genética , Colon/citología , Colon/metabolismo , Retículo Endoplásmico Rugoso/genética , Retículo Endoplásmico Rugoso/metabolismo , Retículo Endoplásmico Rugoso/ultraestructura , Células Epiteliales/metabolismo , Femenino , Predisposición Genética a la Enfermedad/genética , Humanos , Íleon/citología , Íleon/metabolismo , Inflamación/genética , Inflamación/metabolismo , Enfermedades Intestinales/genética , Enfermedades Intestinales/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Ratones Transgénicos , Microscopía Electrónica de Transmisión , Enfermedades del Sistema Nervioso/genética , Enfermedades del Sistema Nervioso/metabolismo , Neuroglía/citología , Neuroglía/metabolismo , Neuronas/citología , Cuerpos de Nissl/genética , Cuerpos de Nissl/ultraestructura , ARN Mensajero/genética , RNA-Seq , Ribosomas/metabolismo , Ribosomas/ultraestructura , Células del Estroma/metabolismoRESUMEN
Understanding the genetic and molecular drivers of phenotypic heterogeneity across individuals is central to biology. As new technologies enable fine-grained and spatially resolved molecular profiling, we need new computational approaches to integrate data from the same organ across different individuals into a consistent reference and to construct maps of molecular and cellular organization at histological and anatomical scales. Here, we review previous efforts and discuss challenges involved in establishing such a common coordinate framework, the underlying map of tissues and organs. We focus on strategies to handle anatomical variation across individuals and highlight the need for new technologies and analytical methods spanning multiple hierarchical scales of spatial resolution.
Asunto(s)
Variación Anatómica , Diagnóstico por Imagen/normas , Examen Físico/normas , Diagnóstico por Imagen/métodos , Humanos , Examen Físico/métodos , Estándares de ReferenciaRESUMEN
Cis-regulatory elements (CREs), such as promoters and enhancers, are DNA sequences that regulate the expression of genes. The activity of a CRE is influenced by the order, composition, and spacing of sequence motifs that are bound by proteins called transcription factors (TFs). Synthetic CREs with specific properties are needed for biomanufacturing as well as for many therapeutic applications including cell and gene therapy. Here, we present regLM, a framework to design synthetic CREs with desired properties, such as high, low, or cell type-specific activity, using autoregressive language models in conjunction with supervised sequence-to-function models. We used our framework to design synthetic yeast promoters and cell type-specific human enhancers. We demonstrate that the synthetic CREs generated by our approach are not only predicted to have the desired functionality but also contain biological features similar to experimentally validated CREs. regLM thus facilitates the design of realistic regulatory DNA elements while providing insights into the cis-regulatory code.
Asunto(s)
Elementos de Facilitación Genéticos , Regiones Promotoras Genéticas , Humanos , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Saccharomyces cerevisiae/genética , Modelos Genéticos , Secuencias Reguladoras de Ácidos NucleicosRESUMEN
As a data-driven science, genomics largely utilizes machine learning to capture dependencies in data and derive novel biological hypotheses. However, the ability to extract new insights from the exponentially increasing volume of genomics data requires more expressive machine learning models. By effectively leveraging large data sets, deep learning has transformed fields such as computer vision and natural language processing. Now, it is becoming the method of choice for many genomics modelling tasks, including predicting the impact of genetic variation on gene regulatory mechanisms such as DNA accessibility and splicing.
Asunto(s)
Aprendizaje Profundo , Genómica/métodos , Modelos Genéticos , Redes Neurales de la Computación , Secuencia de Bases , Simulación por Computador , Humanos , Aprendizaje Automático Supervisado , Aprendizaje Automático no SupervisadoRESUMEN
MOTIVATION: Research on epigenetic modifications and other chromatin features at genomic regulatory elements elucidates essential biological mechanisms including the regulation of gene expression. Despite the growing number of epigenetic datasets, new tools are still needed to discover novel distinctive patterns of heterogeneous epigenetic signals at regulatory elements. RESULTS: We introduce ChromDMM, a product Dirichlet-multinomial mixture model for clustering genomic regions that are characterized by multiple chromatin features. ChromDMM extends the mixture model framework by profile shifting and flipping that can probabilistically account for inaccuracies in the position and strand-orientation of the genomic regions. Owing to hyper-parameter optimization, ChromDMM can also regularize the smoothness of the epigenetic profiles across the consecutive genomic regions. With simulated data, we demonstrate that ChromDMM clusters, shifts and strand-orients the profiles more accurately than previous methods. With ENCODE data, we show that the clustering of enhancer regions in the human genome reveals distinct patterns in several chromatin features. We further validate the enhancer clusters by their enrichment for transcriptional regulatory factor binding sites. AVAILABILITY AND IMPLEMENTATION: ChromDMM is implemented as an R package and is available at https://github.com/MariaOsmala/ChromDMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Epigenómica , Genoma Humano , Humanos , Análisis por Conglomerados , Cromatina/genética , Epigénesis GenéticaRESUMEN
Spatial and molecular characteristics determine tissue function, yet high-resolution methods to capture both concurrently are lacking. Here, we developed high-definition spatial transcriptomics, which captures RNA from histological tissue sections on a dense, spatially barcoded bead array. Each experiment recovers several hundred thousand transcript-coupled spatial barcodes at 2-µm resolution, as demonstrated in mouse brain and primary breast cancer. This opens the way to high-resolution spatial analysis of cells and tissues.
Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Animales , Neoplasias de la Mama/patología , Femenino , Humanos , Ratones , Bulbo Olfatorio/citología , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Análisis de Matrices TisularesRESUMEN
Genome-wide association studies (GWAS) identify genetic variants associated with traits or diseases. GWAS never directly link variants to regulatory mechanisms. Instead, the functional annotation of variants is typically inferred by post hoc analyses. A specific class of deep learning-based methods allows for the prediction of regulatory effects per variant on several cell type-specific chromatin features. We here describe "DeepWAS", a new approach that integrates these regulatory effect predictions of single variants into a multivariate GWAS setting. Thereby, single variants associated with a trait or disease are directly coupled to their impact on a chromatin feature in a cell type. Up to 61 regulatory SNPs, called dSNPs, were associated with multiple sclerosis (MS, 4,888 cases and 10,395 controls), major depressive disorder (MDD, 1,475 cases and 2,144 controls), and height (5,974 individuals). These variants were mainly non-coding and reached at least nominal significance in classical GWAS. The prediction accuracy was higher for DeepWAS than for classical GWAS models for 91% of the genome-wide significant, MS-specific dSNPs. DSNPs were enriched in public or cohort-matched expression and methylation quantitative trait loci and we demonstrated the potential of DeepWAS to generate testable functional hypotheses based on genotype data alone. DeepWAS is available at https://github.com/cellmapslab/DeepWAS.
Asunto(s)
Aprendizaje Profundo , Estudios de Asociación Genética , Análisis Multivariante , Estudio de Asociación del Genoma Completo , Humanos , Polimorfismo de Nucleótido Simple , Sitios de Carácter CuantitativoRESUMEN
In many human diseases, associated genetic changes tend to occur within noncoding regions, whose effect might be related to transcriptional control. A central goal in human genetics is to understand the function of such noncoding regions: given a region that is statistically associated with changes in gene expression (expression quantitative trait locus [eQTL]), does it in fact play a regulatory role? And if so, how is this role "coded" in its sequence? These questions were the subject of the Critical Assessment of Genome Interpretation eQTL challenge. Participants were given a set of sequences that flank eQTLs in humans and were asked to predict whether these are capable of regulating transcription (as evaluated by massively parallel reporter assays), and whether this capability changes between alternative alleles. Here, we report lessons learned from this community effort. By inspecting predictive properties in isolation, and conducting meta-analysis over the competing methods, we find that using chromatin accessibility and transcription factor binding as features in an ensemble of classifiers or regression models leads to the most accurate results. We then characterize the loci that are harder to predict, putting the spotlight on areas of weakness, which we expect to be the subject of future studies.
Asunto(s)
Biología Computacional/métodos , Expresión Génica , Regulación de la Expresión Génica , Predisposición Genética a la Enfermedad , Humanos , Sitios de Carácter CuantitativoAsunto(s)
Infecciones por Coronavirus/patología , Miocarditis/diagnóstico , Miocardio/metabolismo , Peptidil-Dipeptidasa A/genética , Neumonía Viral/patología , Enzima Convertidora de Angiotensina 2 , Betacoronavirus/aislamiento & purificación , Betacoronavirus/patogenicidad , COVID-19 , Cardiomiopatía Dilatada/genética , Cardiomiopatía Dilatada/patología , Cardiomiopatía Hipertrófica/genética , Cardiomiopatía Hipertrófica/patología , Catepsina L/genética , Catepsina L/metabolismo , Infecciones por Coronavirus/complicaciones , Infecciones por Coronavirus/virología , Ventrículos Cardíacos/metabolismo , Humanos , Miocarditis/etiología , Pandemias , Peptidil-Dipeptidasa A/química , Peptidil-Dipeptidasa A/metabolismo , Neumonía Viral/complicaciones , Neumonía Viral/virología , Inhibidores de Proteasas/farmacología , RNA-Seq , SARS-CoV-2 , Serina Endopeptidasas/genética , Serina Endopeptidasas/metabolismo , Regulación hacia Arriba/efectos de los fármacosRESUMEN
In healthy skin, a cutaneous immune system maintains the balance between tolerance towards innocuous environmental antigens and immune responses against pathological agents. In atopic dermatitis (AD), barrier and immune dysfunction result in chronic tissue inflammation. Our understanding of the skin tissue ecosystem in AD remains incomplete with regard to the hallmarks of pathological barrier formation, and cellular state and clonal composition of disease-promoting cells. Here, we generated a multi-modal cell census of 310,691 cells spanning 86 cell subsets from whole skin tissue of 19 adult individuals, including non-lesional and lesional skin from 11 AD patients, and integrated it with 396,321 cells from four studies into a comprehensive human skin cell atlas in health and disease. Reconstruction of human keratinocyte differentiation from basal to cornified layers revealed a disrupted cornification trajectory in AD. This disrupted epithelial differentiation was associated with signals from a unique immune and stromal multicellular community comprised of MMP12 + dendritic cells (DCs), mature migratory DCs, cycling ILCs, NK cells, inflammatory CCL19 + IL4I1 + fibroblasts, and clonally expanded IL13 + IL22 + IL26 + T cells with overlapping type 2 and type 17 characteristics. Cell subsets within this immune and stromal multicellular community were connected by multiple inter-cellular positive feedback loops predicted to impact community assembly and maintenance. AD GWAS gene expression was enriched both in disrupted cornified keratinocytes and in cell subsets from the lesional immune and stromal multicellular community including IL13 + IL22 + IL26 + T cells and ILCs, suggesting that epithelial or immune dysfunction in the context of the observed cellular communication network can initiate and then converge towards AD. Our work highlights specific, disease-associated cell subsets and interactions as potential targets in progression and resolution of chronic inflammation.
RESUMEN
Multimodal measurements of single-cell profiles are proving increasingly useful for characterizing cell states and regulatory mechanisms. In the present study, we developed PHAGE-ATAC (Assay for Transposase-Accessible Chromatin), a massively parallel droplet-based method that uses phage displaying, engineered, camelid single-domain antibodies ('nanobodies') for simultaneous single-cell measurements of protein levels and chromatin accessibility profiles, and mitochondrial DNA-based clonal tracing. We use PHAGE-ATAC for multimodal analysis in primary human immune cells, sample multiplexing, intracellular protein analysis and the detection of SARS-CoV-2 spike protein in human cell populations. Finally, we construct a synthetic high-complexity phage library for selection of antigen-specific nanobodies that bind cells of particular molecular profiles, opening an avenue for protein detection, cell characterization and screening with single-cell genomics.
Asunto(s)
Bacteriófagos , COVID-19 , Bacteriófagos/genética , Cromatina/genética , Humanos , SARS-CoV-2 , Análisis de la Célula Individual/métodos , Glicoproteína de la Espiga del CoronavirusRESUMEN
Understanding gene function and regulation in homeostasis and disease requires knowledge of the cellular and tissue contexts in which genes are expressed. Here, we applied four single-nucleus RNA sequencing methods to eight diverse, archived, frozen tissue types from 16 donors and 25 samples, generating a cross-tissue atlas of 209,126 nuclei profiles, which we integrated across tissues, donors, and laboratory methods with a conditional variational autoencoder. Using the resulting cross-tissue atlas, we highlight shared and tissue-specific features of tissue-resident cell populations; identify cell types that might contribute to neuromuscular, metabolic, and immune components of monogenic diseases and the biological processes involved in their pathology; and determine cell types and gene modules that might underlie disease mechanisms for complex traits analyzed by genome-wide association studies.
Asunto(s)
Núcleo Celular , Enfermedad , RNA-Seq , Biomarcadores , Núcleo Celular/genética , Enfermedad/genética , Estudio de Asociación del Genoma Completo , Humanos , Especificidad de Órganos , Fenotipo , RNA-Seq/métodosRESUMEN
The critical functions of the human liver are coordinated through the interactions of hepatic parenchymal and non-parenchymal cells. Recent advances in single-cell transcriptional approaches have enabled an examination of the human liver with unprecedented resolution. However, dissociation-related cell perturbation can limit the ability to fully capture the human liver's parenchymal cell fraction, which limits the ability to comprehensively profile this organ. Here, we report the transcriptional landscape of 73,295 cells from the human liver using matched single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq). The addition of snRNA-seq enabled the characterization of interzonal hepatocytes at a single-cell resolution, revealed the presence of rare subtypes of liver mesenchymal cells, and facilitated the detection of cholangiocyte progenitors that had only been observed during in vitro differentiation experiments. However, T and B lymphocytes and natural killer cells were only distinguishable using scRNA-seq, highlighting the importance of applying both technologies to obtain a complete map of tissue-resident cell types. We validated the distinct spatial distribution of the hepatocyte, cholangiocyte, and mesenchymal cell populations by an independent spatial transcriptomics data set and immunohistochemistry. Conclusion: Our study provides a systematic comparison of the transcriptomes captured by scRNA-seq and snRNA-seq and delivers a high-resolution map of the parenchymal cell populations in the healthy human liver.
Asunto(s)
Hígado , Análisis de la Célula Individual , Núcleo Celular/genética , Humanos , Análisis de Secuencia de ARN , Transcriptoma/genéticaRESUMEN
Angiotensin-converting enzyme 2 (ACE2) and accessory proteases (TMPRSS2 and CTSL) are needed for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) cellular entry, and their expression may shed light on viral tropism and impact across the body. We assessed the cell-type-specific expression of ACE2, TMPRSS2 and CTSL across 107 single-cell RNA-sequencing studies from different tissues. ACE2, TMPRSS2 and CTSL are coexpressed in specific subsets of respiratory epithelial cells in the nasal passages, airways and alveoli, and in cells from other organs associated with coronavirus disease 2019 (COVID-19) transmission or pathology. We performed a meta-analysis of 31 lung single-cell RNA-sequencing studies with 1,320,896 cells from 377 nasal, airway and lung parenchyma samples from 228 individuals. This revealed cell-type-specific associations of age, sex and smoking with expression levels of ACE2, TMPRSS2 and CTSL. Expression of entry factors increased with age and in males, including in airway secretory cells and alveolar type 2 cells. Expression programs shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues included genes that may mediate viral entry, key immune functions and epithelial-macrophage cross-talk, such as genes involved in the interleukin-6, interleukin-1, tumor necrosis factor and complement pathways. Cell-type-specific expression patterns may contribute to the pathogenesis of COVID-19, and our work highlights putative molecular pathways for therapeutic intervention.
Asunto(s)
COVID-19/epidemiología , COVID-19/genética , Interacciones Huésped-Patógeno/genética , SARS-CoV-2/fisiología , Análisis de Secuencia de ARN/estadística & datos numéricos , Análisis de la Célula Individual/estadística & datos numéricos , Internalización del Virus , Adulto , Anciano , Anciano de 80 o más Años , Células Epiteliales Alveolares/metabolismo , Células Epiteliales Alveolares/virología , Enzima Convertidora de Angiotensina 2/genética , Enzima Convertidora de Angiotensina 2/metabolismo , COVID-19/patología , COVID-19/virología , Catepsina L/genética , Catepsina L/metabolismo , Conjuntos de Datos como Asunto/estadística & datos numéricos , Demografía , Femenino , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Pulmón/metabolismo , Pulmón/virología , Masculino , Persona de Mediana Edad , Especificidad de Órganos/genética , Sistema Respiratorio/metabolismo , Sistema Respiratorio/virología , Análisis de Secuencia de ARN/métodos , Serina Endopeptidasas/genética , Serina Endopeptidasas/metabolismo , Análisis de la Célula Individual/métodosRESUMEN
Coronavirus disease 2019 (COVID-19) is a global pandemic caused by a novel severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). SARS-CoV-2 infection of host cells occurs predominantly via binding of the viral surface spike protein to the human angiotensin-converting enzyme 2 (ACE2) receptor. Hypertension and pre-existing cardiovascular disease are risk factors for morbidity from COVID-19, and it remains uncertain whether the use of angiotensin converting enzyme inhibitors (ACEi) or angiotensin receptor blockers (ARB) impacts infection and disease. Here, we aim to shed light on this question by assessing ACE2 expression in normal and diseased human myocardial samples profiled by bulk and single nucleus RNA-seq.
RESUMEN
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , ARN/genética , Análisis de Secuencia de ARN/métodos , Animales , Células Sanguíneas , Caenorhabditis elegans/genética , Regulación de la Expresión Génica/genética , Leucocitos Mononucleares , Modelos Estadísticos , Fenotipo , ARN/análisis , ARN Citoplasmático Pequeño/genética , Análisis de la Célula Individual/métodosRESUMEN
Epigenetic processes, including DNA methylation (DNAm), are among the mechanisms allowing integration of genetic and environmental factors to shape cellular function. While many studies have investigated either environmental or genetic contributions to DNAm, few have assessed their integrated effects. Here we examine the relative contributions of prenatal environmental factors and genotype on DNA methylation in neonatal blood at variably methylated regions (VMRs) in 4 independent cohorts (overall n = 2365). We use Akaike's information criterion to test which factors best explain variability of methylation in the cohort-specific VMRs: several prenatal environmental factors (E), genotypes in cis (G), or their additive (G + E) or interaction (GxE) effects. Genetic and environmental factors in combination best explain DNAm at the majority of VMRs. The CpGs best explained by either G, G + E or GxE are functionally distinct. The enrichment of genetic variants from GxE models in GWAS for complex disorders supports their importance for disease risk.