RESUMO
Treatment of cancer has been revolutionized by immune checkpoint blockade therapies. Despite the high rate of response in advanced melanoma, the majority of patients succumb to disease. To identify factors associated with success or failure of checkpoint therapy, we profiled transcriptomes of 16,291 individual immune cells from 48 tumor samples of melanoma patients treated with checkpoint inhibitors. Two distinct states of CD8+ T cells were defined by clustering and associated with patient tumor regression or progression. A single transcription factor, TCF7, was visualized within CD8+ T cells in fixed tumor samples and predicted positive clinical outcome in an independent cohort of checkpoint-treated patients. We delineated the epigenetic landscape and clonality of these T cell states and demonstrated enhanced antitumor immunity by targeting novel combinations of factors in exhausted cells. Our study of immune cell transcriptomes from tumors demonstrates a strategy for identifying predictors, mechanisms, and targets for enhancing checkpoint immunotherapy.
Assuntos
Linfócitos T CD8-Positivos/imunologia , Imunoterapia/métodos , Melanoma/imunologia , Transcriptoma , Animais , Anticorpos Monoclonais Humanizados/imunologia , Anticorpos Monoclonais Humanizados/farmacologia , Antígenos CD/imunologia , Antineoplásicos Imunológicos/imunologia , Antineoplásicos Imunológicos/farmacologia , Apirase/antagonistas & inibidores , Apirase/imunologia , Linhagem Celular Tumoral , Humanos , Antígenos Comuns de Leucócito/antagonistas & inibidores , Antígenos Comuns de Leucócito/imunologia , Melanoma/terapia , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Fator 1 de Transcrição de Linfócitos T/metabolismoRESUMO
Leukemia stem cells (LSCs) have the capacity to self-renew and propagate disease upon serial transplantation in animal models, and elimination of this cell population is required for curative therapies. Here, we describe a series of pooled, in vivo RNAi screens to identify essential transcription factors (TFs) in a murine model of acute myeloid leukemia (AML) with genetically and phenotypically defined LSCs. These screens reveal the heterodimeric, circadian rhythm TFs Clock and Bmal1 as genes required for the growth of AML cells in vitro and in vivo. Disruption of canonical circadian pathway components produces anti-leukemic effects, including impaired proliferation, enhanced myeloid differentiation, and depletion of LSCs. We find that both normal and malignant hematopoietic cells harbor an intact clock with robust circadian oscillations, and genetic knockout models reveal a leukemia-specific dependence on the pathway. Our findings establish a role for the core circadian clock genes in AML.
Assuntos
Fatores de Transcrição ARNTL/genética , Proteínas CLOCK/genética , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Células-Tronco Neoplásicas/patologia , Animais , Ritmo Circadiano , Modelos Animais de Doenças , Técnicas de Inativação de Genes , Hematopoese , Humanos , Leucemia Mieloide Aguda/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Células-Tronco Neoplásicas/metabolismo , Interferência de RNA , RNA Interferente Pequeno/metabolismoRESUMO
Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.
Assuntos
Genômica , Aprendizado de Máquina , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico , DNA/síntese química , DNA/genética , DNA/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismoRESUMO
Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.
Assuntos
Deriva Genética , Modelos Genéticos , Evolução Biológica , DNA , Evolução Molecular , Regulação da Expressão Gênica , Mutação/genética , Fenótipo , Saccharomyces cerevisiae/genéticaRESUMO
Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here we describe the high-throughput, functional assessment of phosphorylation sites through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally resolved phosphoproteomics. Using T cell activation as a model, we observe hundreds of unstudied phosphorylation sites that modulate NFAT transcriptional activity. We identify the phosphorylation-mediated nuclear localization of PHLPP1, which promotes NFAT but inhibits NFκB activity. We also find that specific phosphosite mutants can alter gene expression in subtle yet distinct patterns, demonstrating the potential for fine-tuning transcriptional responses. Overall, base editor screening of PTM sites provides a powerful platform to dissect PTM function within signaling pathways.
Assuntos
Processamento de Proteína Pós-Traducional , Fosforilação , Humanos , Fatores de Transcrição NFATC/metabolismo , Fatores de Transcrição NFATC/genética , Transdução de Sinais , Células HEK293 , Proteômica/métodos , Ensaios de Triagem em Larga Escala/métodos , Linfócitos T/metabolismo , Células Jurkat , NF-kappa B/metabolismoRESUMO
BACKGROUND: FCGR2A binds antibody-antigen complexes to regulate the abundance of circulating and deposited complexes along with downstream immune and autoimmune responses. Although the abundance of FCRG2A may be critical in immune-mediated diseases, little is known about whether its surface expression is regulated through cis genomic elements and non-coding variants. In the current study, we aimed to characterize the regulation of FCGR2A expression, the impact of genetic variation and its association with autoimmune disease. METHODS: We applied CRISPR-based interference and editing to scrutinize 1.7 Mb of open chromatin surrounding the FCGR2A gene to identify regulatory elements. Relevant transcription factors (TFs) binding to these regions were defined through public databases. Genetic variants affecting regulation were identified using luciferase reporter assays and were verified in a cohort of 1996 genotyped healthy individuals using flow cytometry. RESULTS: We identified a complex proximal region and five distal enhancers regulating FCGR2A. The proximal region split into subregions upstream and downstream of the transcription start site, was enriched in binding of inflammation-regulated TFs, and harbored a variant associated with FCGR2A expression in primary myeloid cells. One distal enhancer region was occupied by CCCTC-binding factor (CTCF) whose binding site was disrupted by a rare genetic variant, altering gene expression. CONCLUSIONS: The FCGR2A gene is regulated by multiple proximal and distal genomic regions, with links to autoimmune disease. These findings may open up novel therapeutic avenues where fine-tuning of FCGR2A levels may constitute a part of treatment strategies for immune-mediated diseases.
Assuntos
Doenças Autoimunes , Elementos Facilitadores Genéticos , Receptores de IgG , Doenças Autoimunes/genética , Sítios de Ligação , Genômica , Genótipo , Humanos , Receptores de IgG/genéticaRESUMO
BACKGROUND: Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark. RESULTS: BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in the k-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding - and any associated epigenomic mark - is inherently more variable than non-cooperative binding. CONCLUSIONS: BROCKMAN and related approaches will help gain a mechanistic understanding of the trans determinants of chromatin variability between cells, treatments, and individuals.
Assuntos
Epigenômica/métodos , Fatores de Transcrição/metabolismo , Sítios de Ligação , HumanosRESUMO
Identifying genes in the genomic context is central to a cell's ability to interpret the genome. Yet, in general, the signals used to define eukaryotic genes are poorly described. Here, we derived simple classifiers that identify where transcription will initiate and terminate using nucleic acid sequence features detectable by the yeast cell, which we integrate into a Unified Model (UM) that models transcription as a whole. The cis-elements that denote where transcription initiates function primarily through nucleosome depletion, and, using a synthetic promoter system, we show that most of these elements are sufficient to initiate transcription in vivo. Hrp1 binding sites are the major characteristic of terminators; these binding sites are often clustered in terminator regions and can terminate transcription bidirectionally. The UM predicts global transcript structure by modeling transcription of the genome using a hidden Markov model whose emissions are the outputs of the initiation and termination classifiers. We validated the novel predictions of the UM with available RNA-seq data and tested it further by directly comparing the transcript structure predicted by the model to the transcription generated by the cell for synthetic DNA segments of random design. We show that the UM identifies transcription start sites more accurately than the initiation classifier alone, indicating that the relative arrangement of promoter and terminator elements influences their function. Our model presents a concrete description of how the cell defines transcript units, explains the existence of nongenic transcripts, and provides insight into genome evolution.
Assuntos
DNA Fúngico/genética , Modelos Genéticos , Saccharomyces cerevisiae/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica , Sítios de Ligação , Simulação por Computador , Genes Fúngicos , Genoma Fúngico , Nucleossomos/genética , Regiões Promotoras Genéticas , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/metabolismoRESUMO
The yeast Saccharomyces cerevisiae is a prevalent system for the analysis of transcriptional networks. As a result, multiple DNA-binding sequence specificities (motifs) have been derived for most yeast transcription factors (TFs). However, motifs from different studies are often inconsistent with each other, making subsequent analyses complicated and confusing. Here, we have created YeTFaSCo (The Yeast Transcription Factor Specificity Compendium, http://yetfasco.ccbr.utoronto.ca/), an extensive collection of S. cerevisiae TF specificities. YeTFaSCo differs from related databases by being more comprehensive (including 1709 motifs for 256 proteins or protein complexes), and by evaluating the motifs using multiple objective quality metrics. The metrics include correlation between motif matches and ChIP-chip data, gene expression patterns, and GO terms, as well as motif agreement between different studies. YeTFaSCo also features an index of 'expert-curated' motifs, each associated with a confidence assessment. In addition, the database website features tools for motif analysis, including a sequence scanning function and precomputed genome-browser tracks of motif occurrences across the entire yeast genome. Users can also search the database for motifs that are similar to a query motif.
Assuntos
Bases de Dados Genéticas , Motivos de Nucleotídeos , Elementos Reguladores de Transcrição , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação , Imunoprecipitação da Cromatina , DNA Fúngico/química , Perfilação da Expressão Gênica , Genoma Fúngico , Internet , Regiões Promotoras Genéticas , Análise de Sequência de DNARESUMO
DNA libraries are critical components of many biological assays. These libraries are often kept in plasmids that are amplified in E. coli to generate sufficient material for an experiment. Library uniformity is critical for ensuring that every element in the library is tested similarly and is thought to be influenced by the culture approach used during library amplification. We tested five commonly used culturing methods for their ability to uniformly amplify plasmid libraries: liquid, semisolid agar, cell spreader-spread plates with high or low colony density, and bead-spread plates. Each approach was evaluated with two library types: a random 80-mer library, representing high complexity and low coverage of similar sequence lengths, and a human TF ORF library, representing low complexity and high coverage of diverse sequence lengths. We found that no method was better than liquid culture, which produced relatively uniform libraries regardless of library type. However, when libraries were transformed with high coverage, the culturing method had minimal impact on uniformity or amplification bias. Plating libraries was the worst approach by almost every measure for both library types and, counterintuitively, produced the strongest biases against long sequence representation. Semisolid agar amplified most elements of the library uniformly but also included outliers with orders of magnitude higher abundance. For amplifying DNA libraries, liquid culture, the simplest method, appears to be best.
Assuntos
Amplificação de Genes , Plasmídeos , Plasmídeos/genética , Humanos , Fatores de Transcrição/genética , Escherichia coli/genética , Técnicas de Cultura/métodosRESUMO
Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.
Assuntos
Saccharomyces cerevisiae , Transcriptoma , Humanos , Saccharomyces cerevisiae/genética , Genoma , DNA , CromatinaRESUMO
Genetic variants associated with autoimmune diseases are highly enriched within putative cis -regulatory regions of CD4 + T cells, suggesting that they alter disease risk via changes in gene regulation. However, very few genetic variants have been shown to affect T cell gene expression or function. We tested >18,000 autoimmune disease-associated variants for allele-specific expression using massively parallel reporter assays in primary human CD4 + T cells. The 545 expression-modulating variants (emVars) identified greatly enrich for likely causal variants. We provide evidence that many emVars are mediated by common upstream regulatory conduits, and that putative target genes of primary T cell emVars are highly enriched within a lymphocyte activation network. Using bulk and single-cell CRISPR-interference screens, we confirm that emVar-containing T cell cis -regulatory elements modulate both known and novel target genes that regulate T cell proliferation, providing plausible mechanisms by which these variants alter autoimmune disease risk.
RESUMO
A systematic evaluation of how model architectures and training strategies impact genomics model performance is needed. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. All top-performing models used neural networks but diverged in architectures and training strategies. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide models into modular building blocks. We tested all possible combinations for the top three models, further improving their performance. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets, demonstrating the progress that can be driven by gold-standard genomics datasets.
RESUMO
Genome-wide association studies (GWASs) have uncovered hundreds of autoimmune disease-associated loci; however, the causal genetic variants within each locus are mostly unknown. Here, we perform high-throughput allele-specific reporter assays to prioritize disease-associated variants for five autoimmune diseases. By examining variants that both promote allele-specific reporter expression and are located in accessible chromatin, we identify 60 putatively causal variants that enrich for statistically fine-mapped variants by up to 57.8-fold. We introduced the risk allele of a prioritized variant (rs72928038) into a human T cell line and deleted the orthologous sequence in mice, both resulting in reduced BACH2 expression. Naive CD8 T cells from mice containing the deletion had reduced expression of genes that suppress activation and maintain stemness and, upon acute viral infection, displayed greater propensity to become effector T cells. Our results represent an example of an effective approach for prioritizing variants and studying their physiologically relevant effects.
Assuntos
Doenças Autoimunes , Estudo de Associação Genômica Ampla , Alelos , Animais , Doenças Autoimunes/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Camundongos , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico , Linfócitos TRESUMO
Genome-wide association studies of Systemic Lupus Erythematosus (SLE) nominate 3073 genetic variants at 91 risk loci. To systematically screen these variants for allelic transcriptional enhancer activity, we construct a massively parallel reporter assay (MPRA) library comprising 12,396 DNA oligonucleotides containing the genomic context around every allele of each SLE variant. Transfection into the Epstein-Barr virus-transformed B cell line GM12878 reveals 482 variants with enhancer activity, with 51 variants showing genotype-dependent (allelic) enhancer activity at 27 risk loci. Comparison of MPRA results in GM12878 and Jurkat T cell lines highlights shared and unique allelic transcriptional regulatory mechanisms at SLE risk loci. In-depth analysis of allelic transcription factor (TF) binding at and around allelic variants identifies one class of TFs whose DNA-binding motif tends to be directly altered by the risk variant and a second class of TFs that bind allelically without direct alteration of their motif by the variant. Collectively, our approach provides a blueprint for the discovery of allelic gene regulation at risk loci for any disease and offers insight into the transcriptional regulatory mechanisms underlying SLE.
Assuntos
Alelos , Predisposição Genética para Doença/genética , Lúpus Eritematoso Sistêmico/genética , Linfócitos B , Linhagem Celular , Cromatina , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Genótipo , Herpesvirus Humano 4 , Humanos , Locos de Características Quantitativas , Sinaptogirinas/genética , Linfócitos TRESUMO
Improved methods are needed to model CRISPR screen data for interrogation of genetic elements that alter reporter gene expression readout. We create MAUDE (Mean Alterations Using Discrete Expression) for quantifying the impact of guide RNAs on a target gene's expression in a pooled, sorting-based expression screen. MAUDE quantifies guide-level effects by modeling the distribution of cells across sorting expression bins. It then combines guides to estimate the statistical significance and effect size of targeted genetic elements. We demonstrate that MAUDE outperforms previous approaches and provide experimental design guidelines to best leverage MAUDE, which is available on https://github.com/Carldeboer/MAUDE.
Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Expressão Gênica , Técnicas Genéticas , RNA Guia de Cinetoplastídeos , Software , Algoritmos , Sistemas CRISPR-Cas , Modelos GenéticosRESUMO
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF's specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.
Assuntos
Eucariotos/genética , Regulação da Expressão Gênica , Lógica , Regiões Promotoras Genéticas , Sítios de Ligação , DNA/metabolismo , Genes Reporter , Modelos Genéticos , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismoRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.