RESUMEN
Synovial tissue inflammation is a hallmark of rheumatoid arthritis (RA). Recent work has identified prominent pathogenic cell states in inflamed RA synovial tissue, such as T peripheral helper cells; however, the epigenetic regulation of these states has yet to be defined. Here, we examine genome-wide open chromatin at single-cell resolution in 30 synovial tissue samples, including 12 samples with transcriptional data in multimodal experiments. We identify 24 chromatin classes and predict their associated transcription factors, including a CD8 + GZMK+ class associated with EOMES and a lining fibroblast class associated with AP-1. By integrating with an RA tissue transcriptional atlas, we propose that these chromatin classes represent 'superstates' corresponding to multiple transcriptional cell states. Finally, we demonstrate the utility of this RA tissue chromatin atlas through the associations between disease phenotypes and chromatin class abundance, as well as the nomination of classes mediating the effects of putatively causal RA genetic variants.
Asunto(s)
Artritis Reumatoide , Cromatina , Membrana Sinovial , Artritis Reumatoide/genética , Artritis Reumatoide/metabolismo , Artritis Reumatoide/patología , Artritis Reumatoide/inmunología , Humanos , Cromatina/metabolismo , Cromatina/genética , Membrana Sinovial/metabolismo , Membrana Sinovial/patología , Proteínas de Dominio T Box/metabolismo , Proteínas de Dominio T Box/genética , Epigénesis Genética , Análisis de la Célula Individual , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Fibroblastos/metabolismo , Factor de Transcripción AP-1/metabolismo , Factor de Transcripción AP-1/genética , Transcripción Genética , Linfocitos T CD8-positivos/inmunología , Linfocitos T CD8-positivos/metabolismoRESUMEN
T-cells recognize antigens and induce specialized gene expression programs (GEPs) enabling functions including proliferation, cytotoxicity, and cytokine production. Traditionally, different classes of helper T-cells express mutually exclusive responses - for example, Th1, Th2, and Th17 programs. However, new single-cell RNA sequencing (scRNA-Seq) experiments have revealed a continuum of T-cell states without discrete clusters corresponding to these subsets, implying the need for new analytical frameworks. Here, we advance the characterization of T-cells with T-CellAnnoTator (TCAT), a pipeline that simultaneously quantifies pre-defined GEPs capturing activation states and cellular subsets. From 1,700,000 T-cells from 700 individuals across 38 tissues and five diverse disease contexts, we discover 46 reproducible GEPs reflecting the known core functions of T-cells including proliferation, cytotoxicity, exhaustion, and T helper effector states. We experimentally characterize several novel activation programs and apply TCAT to describe T-cell activation and exhaustion in Covid-19 and cancer, providing insight into T-cell function in these diseases.
RESUMEN
Translating genome-wide association study (GWAS) loci into causal variants and genes requires accurate cell-type-specific enhancer-gene maps from disease-relevant tissues. Building enhancer-gene maps is essential but challenging with current experimental methods in primary human tissues. Here we developed a nonparametric statistical method, SCENT (single-cell enhancer target gene mapping), that models association between enhancer chromatin accessibility and gene expression in single-cell or nucleus multimodal RNA sequencing and ATAC sequencing data. We applied SCENT to 9 multimodal datasets including >120,000 single cells or nuclei and created 23 cell-type-specific enhancer-gene maps. These maps were highly enriched for causal variants in expression quantitative loci and GWAS for 1,143 diseases and traits. We identified likely causal genes for both common and rare diseases and linked somatic mutation hotspots to target genes. We demonstrate that application of SCENT to multimodal data from disease-relevant human tissue enables the scalable construction of accurate cell-type-specific enhancer-gene maps, essential for defining noncoding variant function.
Asunto(s)
Estudio de Asociación del Genoma Completo , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Alelos , Estudio de Asociación del Genoma Completo/métodos , Mapeo Cromosómico , Fenotipo , Cromatina/genética , Polimorfismo de Nucleótido Simple , Predisposición Genética a la Enfermedad/genéticaRESUMEN
Rheumatoid arthritis is a prototypical autoimmune disease that causes joint inflammation and destruction1. There is currently no cure for rheumatoid arthritis, and the effectiveness of treatments varies across patients, suggesting an undefined pathogenic diversity1,2. Here, to deconstruct the cell states and pathways that characterize this pathogenic heterogeneity, we profiled the full spectrum of cells in inflamed synovium from patients with rheumatoid arthritis. We used multi-modal single-cell RNA-sequencing and surface protein data coupled with histology of synovial tissue from 79 donors to build single-cell atlas of rheumatoid arthritis synovial tissue that includes more than 314,000 cells. We stratified tissues into six groups, referred to as cell-type abundance phenotypes (CTAPs), each characterized by selectively enriched cell states. These CTAPs demonstrate the diversity of synovial inflammation in rheumatoid arthritis, ranging from samples enriched for T and B cells to those largely lacking lymphocytes. Disease-relevant cell states, cytokines, risk genes, histology and serology metrics are associated with particular CTAPs. CTAPs are dynamic and can predict treatment response, highlighting the clinical utility of classifying rheumatoid arthritis synovial phenotypes. This comprehensive atlas and molecular, tissue-based stratification of rheumatoid arthritis synovial tissue reveal new insights into rheumatoid arthritis pathology and heterogeneity that could inform novel targeted treatments.
Asunto(s)
Artritis Reumatoide , Humanos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/genética , Artritis Reumatoide/inmunología , Artritis Reumatoide/patología , Citocinas/metabolismo , Inflamación/complicaciones , Inflamación/genética , Inflamación/inmunología , Inflamación/patología , Membrana Sinovial/patología , Linfocitos T/inmunología , Linfocitos B/inmunología , Predisposición Genética a la Enfermedad/genética , Fenotipo , Análisis de Expresión Génica de una Sola CélulaRESUMEN
In autoimmune diseases such as rheumatoid arthritis, the immune system attacks the body's own cells. Developing a precise understanding of the cell states where noncoding autoimmune risk variants impart causal mechanisms is critical to developing curative therapies. Here, to identify noncoding regions with accessible chromatin that associate with cell-state-defining gene expression patterns, we leveraged multimodal single-nucleus RNA and assay for transposase-accessible chromatin (ATAC) sequencing data across 28,674 cells from the inflamed synovial tissue of 12 donors. Specifically, we used a multivariate Poisson model to predict peak accessibility from single-nucleus RNA sequencing principal components. For 14 autoimmune diseases, we discovered that cell-state-dependent ('dynamic') chromatin accessibility peaks in immune cell types were enriched for heritability, compared with cell-state-invariant ('cs-invariant') peaks. These dynamic peaks marked regulatory elements associated with T peripheral helper, regulatory T, dendritic and STAT1+CXCL10+ myeloid cell states. We argue that dynamic regulatory elements can help identify precise cell states enriched for disease-critical genetic variation.
Asunto(s)
Enfermedades Autoinmunes , Cromatina , Humanos , Cromatina/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Cromosomas , Enfermedades Autoinmunes/genética , Genoma HumanoRESUMEN
Synovial tissue inflammation is the hallmark of rheumatoid arthritis (RA). Recent work has identified prominent pathogenic cell states in inflamed RA synovial tissue, such as T peripheral helper cells; however, the epigenetic regulation of these states has yet to be defined. We measured genome-wide open chromatin at single cell resolution from 30 synovial tissue samples, including 12 samples with transcriptional data in multimodal experiments. We identified 24 chromatin classes and predicted their associated transcription factors, including a CD8+ GZMK+ class associated with EOMES and a lining fibroblast class associated with AP-1. By integrating an RA tissue transcriptional atlas, we found that the chromatin classes represented 'superstates' corresponding to multiple transcriptional cell states. Finally, we demonstrated the utility of this RA tissue chromatin atlas through the associations between disease phenotypes and chromatin class abundance as well as the nomination of classes mediating the effects of putatively causal RA genetic variants.
RESUMEN
Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs). We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.
Asunto(s)
Estudio de Asociación del Genoma Completo , Análisis de la Célula Individual , Perfilación de la Expresión Génica/métodos , Herencia Multifactorial/genética , RNA-Seq , Análisis de la Célula Individual/métodos , TriglicéridosRESUMEN
Neuronal stimulation induced by the brain-derived neurotrophic factor (BDNF) triggers gene expression, which is crucial for neuronal survival, differentiation, synaptic plasticity, memory formation, and neurocognitive health. However, its role in chromatin regulation is unclear. Here, using temporal profiling of chromatin accessibility and transcription in mouse primary cortical neurons upon either BDNF stimulation or depolarization (KCl), we identify features that define BDNF-specific chromatin-to-gene expression programs. Enhancer activation is an early event in the regulatory control of BDNF-treated neurons, where the bZIP motif-binding Fos protein pioneered chromatin opening and cooperated with co-regulatory transcription factors (Homeobox, EGRs, and CTCF) to induce transcription. Deleting cis-regulatory sequences affect BDNF-mediated Arc expression, a regulator of synaptic plasticity. BDNF-induced accessible regions are linked to preferential exon usage by neurodevelopmental disorder-related genes and the heritability of neuronal complex traits, which were validated in human iPSC-derived neurons. Thus, we provide a comprehensive view of BDNF-mediated genome regulatory features using comparative genomic approaches to dissect mammalian neuronal stimulation.
Asunto(s)
Factor Neurotrófico Derivado del Encéfalo , Cromatina , Animales , Factor Neurotrófico Derivado del Encéfalo/genética , Factor Neurotrófico Derivado del Encéfalo/metabolismo , Factor Neurotrófico Derivado del Encéfalo/farmacología , Cromatina/genética , Cromatina/metabolismo , Humanos , Mamíferos/genética , Ratones , Neuronas/metabolismo , Factores de Transcripción/metabolismoRESUMEN
Recent advances in single-cell technologies and integration algorithms make it possible to construct comprehensive reference atlases encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony ( https://github.com/immunogenomics/symphony ), an algorithm for building large-scale, integrated reference atlases in a convenient, portable format that enables efficient query mapping within seconds. Symphony localizes query cells within a stable low-dimensional reference embedding, facilitating reproducible downstream transfer of reference-defined annotations to the query. We demonstrate the power of Symphony in multiple real-world datasets, including (1) mapping a multi-donor, multi-species query to predict pancreatic cell types, (2) localizing query cells along a developmental trajectory of fetal liver hematopoiesis, and (3) inferring surface protein expression with a multimodal CITE-seq atlas of memory T cells.
Asunto(s)
Genoma , Análisis de la Célula Individual , Programas Informáticos , Algoritmos , Biología Computacional , HumanosRESUMEN
To estimate a study design's power to detect differential abundance, we require a framework that simulates many multi-sample single-cell datasets. However, current simulation methods are challenging for large-scale power analyses because they are computationally resource intensive and do not support easy simulation of multi-sample datasets. Current methods also lack modeling of important inter-sample variation, such as the variation in the frequency of cell states between samples that is observed in single-cell data. Thus, we developed single-cell POwer Simulation Tool (scPOST) to address these limitations and help investigators quickly simulate multi-sample single-cell datasets. Users may explore a range of effect sizes and study design choices (such as increasing the number of samples or cells per sample) to determine their effect on power, and thus choose the optimal study design for their planned experiments.
Asunto(s)
Proyectos de Investigación , Simulación por ComputadorRESUMEN
Deciphering the interplay between chromatin accessibility and transcription factor (TF) binding is fundamental to understanding transcriptional regulation, control of cellular states, and the establishment of new phenotypes. Recent genome-wide chromatin accessibility profiling studies have provided catalogs of putative open regions, where TFs can recognize their motifs and regulate gene expression programs. Here, we present motif enrichment in differential elements of accessibility (MEDEA), a computational tool that analyzes high-throughput chromatin accessibility genomic data to identify cell-type-specific accessible regions and lineage-specific motifs associated with TF binding therein. To benchmark MEDEA, we used a panel of reference cell lines profiled by ENCODE and curated by the ENCODE Project Consortium for the ENCODE-DREAM Challenge. By comparing results with RNA-seq data, ChIP-seq peaks, and DNase-seq footprints, we show that MEDEA improves the detection of motifs associated with known lineage specifiers. We then applied MEDEA to 610 ENCODE DNase-seq data sets, where it revealed significant motifs even when absolute enrichment was low and where it identified novel regulators, such as NRF1 in kidney development. Finally, we show that MEDEA performs well on both bulk and single-cell ATAC-seq data. MEDEA is publicly available as part of our Glossary-GENRE suite for motif enrichment analysis.
Asunto(s)
Cromatina/metabolismo , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Sitios de Unión , Línea Celular , Linaje de la Célula/genética , ADN/química , Humanos , Motivos de NucleótidosRESUMEN
Combinatorial interactions among transcription factors (TFs) play essential roles in generating gene expression specificity and diversity in metazoans. Using yeast 2-hybrid (Y2H) assays on nearly all sequence-specific Drosophila TFs, we identified 1,983 protein-protein interactions (PPIs), more than doubling the number of currently known PPIs among Drosophila TFs. For quality assessment, we validated a subset of our interactions using MITOMI and bimolecular fluorescence complementation assays. We combined our interactome with prior PPI data to generate an integrated Drosophila TF-TF binary interaction network. Our analysis of ChIP-seq data, integrating PPI and gene expression information, uncovered different modes by which interacting TFs are recruited to DNA. We further demonstrate the utility of our Drosophila interactome in shedding light on human TF-TF interactions. This study reveals how TFs interact to bind regulatory elements in vivo and serves as a resource of Drosophila TF-TF binary PPIs for understanding tissue-specific gene regulation.
Asunto(s)
Drosophila melanogaster/metabolismo , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , ADN/química , ADN/metabolismo , Regulación de la Expresión Génica , Microscopía Fluorescente , Mapas de Interacción de Proteínas/genética , Elementos Reguladores de la Transcripción , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/genética , Técnicas del Sistema de Dos HíbridosRESUMEN
Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs.