Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nat Methods ; 2024 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-38684783

RESUMO

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here we describe the high-throughput, functional assessment of phosphorylation sites through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally resolved phosphoproteomics. Using T cell activation as a model, we observe hundreds of unstudied phosphorylation sites that modulate NFAT transcriptional activity. We identify the phosphorylation-mediated nuclear localization of PHLPP1, which promotes NFAT but inhibits NFκB activity. We also find that specific phosphosite mutants can alter gene expression in subtle yet distinct patterns, demonstrating the potential for fine-tuning transcriptional responses. Overall, base editor screening of PTM sites provides a powerful platform to dissect PTM function within signaling pathways.

2.
Nat Struct Mol Biol ; 31(3): 559-567, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38448573

RESUMO

Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.


Assuntos
Saccharomyces cerevisiae , Transcriptoma , Humanos , Saccharomyces cerevisiae/genética , Genoma , DNA , Cromatina
4.
bioRxiv ; 2024 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-38405704

RESUMO

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. While some benchmarks produced similar results across the top-performing models, others differed substantially. All top-performing models used neural networks, but diverged in architectures and novel training strategies, tailored to genomics sequence data. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide any given model into logically equivalent building blocks. We tested all possible combinations for the top three models and observed performance improvements for each. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets. Overall, we demonstrate that high-quality gold-standard genomics datasets can drive significant progress in model development.

5.
Nature ; 625(7993): 41-50, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38093018

RESUMO

Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.


Assuntos
Genômica , Aprendizado de Máquina , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico , DNA/síntese química , DNA/genética , DNA/metabolismo , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo
6.
bioRxiv ; 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-38014346

RESUMO

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here, we describe "signaling-to-transcription network" mapping through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally-resolved phosphoproteomics. Using T cell activation as a model, we observe hundreds of unstudied phosphorylation sites that modulate NFAT transcriptional activity. We identify the phosphorylation-mediated nuclear localization of the phosphatase PHLPP1 which promotes NFAT but inhibits NFκB activity. We also find that specific phosphosite mutants can alter gene expression in subtle yet distinct patterns, demonstrating the potential for fine-tuning transcriptional responses. Overall, base editor screening of PTM sites provides a powerful platform to dissect PTM function within signaling pathways.

7.
Bioinformatics ; 39(8)2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37490428

RESUMO

MOTIVATION: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar. RESULTS: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.org team in the DREAM 2022 challenge of predicting gene expression from gigantic parallel reporter assays. Using published data, here, we demonstrate that LegNet outperforms existing models and accurately predicts gene expression per se as well as the effects of single-nucleotide variants. Furthermore, we show how LegNet can be used in a diffusion network manner for the rational design of promoter sequences yielding the desired expression level. AVAILABILITY AND IMPLEMENTATION: https://github.com/autosome-ru/LegNet. The GitHub repository includes Jupyter Notebook tutorials and Python scripts under the MIT license to reproduce the results presented in the study.


Assuntos
Aprendizado Profundo , Sequências Reguladoras de Ácido Nucleico , DNA , Regiões Promotoras Genéticas , Software
8.
Bioinformatics ; 39(6)2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37208164

RESUMO

SUMMARY: Generate Indexes for Libraries (GIL) is a software tool for generating primers to be used in the production of multiplexed sequencing libraries. GIL can be customized in numerous ways to meet user specifications, including length, sequencing modality, color balancing, and compatibility with existing primers, and produces ordering and demultiplexing-ready outputs. AVAILABILITY AND IMPLEMENTATION: GIL is written in Python and is freely available on GitHub under the MIT license: https://github.com/de-Boer-Lab/GIL and can be accessed as a web-application implemented in Streamlit at https://dbl-gil.streamlitapp.com.


Assuntos
Primers do DNA , Software
9.
Nat Genet ; 54(5): 603-612, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35513721

RESUMO

Genome-wide association studies (GWASs) have uncovered hundreds of autoimmune disease-associated loci; however, the causal genetic variants within each locus are mostly unknown. Here, we perform high-throughput allele-specific reporter assays to prioritize disease-associated variants for five autoimmune diseases. By examining variants that both promote allele-specific reporter expression and are located in accessible chromatin, we identify 60 putatively causal variants that enrich for statistically fine-mapped variants by up to 57.8-fold. We introduced the risk allele of a prioritized variant (rs72928038) into a human T cell line and deleted the orthologous sequence in mice, both resulting in reduced BACH2 expression. Naive CD8 T cells from mice containing the deletion had reduced expression of genes that suppress activation and maintain stemness and, upon acute viral infection, displayed greater propensity to become effector T cells. Our results represent an example of an effective approach for prioritizing variants and studying their physiologically relevant effects.


Assuntos
Doenças Autoimunes , Estudo de Associação Genômica Ampla , Alelos , Animais , Doenças Autoimunes/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Camundongos , Polimorfismo de Nucleotídeo Único/genética , Sequências Reguladoras de Ácido Nucleico , Linfócitos T
10.
Nature ; 603(7901): 455-463, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35264797

RESUMO

Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.


Assuntos
Deriva Genética , Modelos Genéticos , Evolução Biológica , DNA , Evolução Molecular , Regulação da Expressão Gênica , Mutação/genética , Fenótipo , Saccharomyces cerevisiae/genética
11.
Hum Mol Genet ; 31(12): 1946-1961, 2022 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-34970970

RESUMO

BACKGROUND: FCGR2A binds antibody-antigen complexes to regulate the abundance of circulating and deposited complexes along with downstream immune and autoimmune responses. Although the abundance of FCRG2A may be critical in immune-mediated diseases, little is known about whether its surface expression is regulated through cis genomic elements and non-coding variants. In the current study, we aimed to characterize the regulation of FCGR2A expression, the impact of genetic variation and its association with autoimmune disease. METHODS: We applied CRISPR-based interference and editing to scrutinize 1.7 Mb of open chromatin surrounding the FCGR2A gene to identify regulatory elements. Relevant transcription factors (TFs) binding to these regions were defined through public databases. Genetic variants affecting regulation were identified using luciferase reporter assays and were verified in a cohort of 1996 genotyped healthy individuals using flow cytometry. RESULTS: We identified a complex proximal region and five distal enhancers regulating FCGR2A. The proximal region split into subregions upstream and downstream of the transcription start site, was enriched in binding of inflammation-regulated TFs, and harbored a variant associated with FCGR2A expression in primary myeloid cells. One distal enhancer region was occupied by CCCTC-binding factor (CTCF) whose binding site was disrupted by a rare genetic variant, altering gene expression. CONCLUSIONS: The FCGR2A gene is regulated by multiple proximal and distal genomic regions, with links to autoimmune disease. These findings may open up novel therapeutic avenues where fine-tuning of FCGR2A levels may constitute a part of treatment strategies for immune-mediated diseases.


Assuntos
Doenças Autoimunes , Elementos Facilitadores Genéticos , Receptores de IgG , Doenças Autoimunes/genética , Sítios de Ligação , Genômica , Genótipo , Humanos , Receptores de IgG/genética
12.
Nat Commun ; 12(1): 1611, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33712590

RESUMO

Genome-wide association studies of Systemic Lupus Erythematosus (SLE) nominate 3073 genetic variants at 91 risk loci. To systematically screen these variants for allelic transcriptional enhancer activity, we construct a massively parallel reporter assay (MPRA) library comprising 12,396 DNA oligonucleotides containing the genomic context around every allele of each SLE variant. Transfection into the Epstein-Barr virus-transformed B cell line GM12878 reveals 482 variants with enhancer activity, with 51 variants showing genotype-dependent (allelic) enhancer activity at 27 risk loci. Comparison of MPRA results in GM12878 and Jurkat T cell lines highlights shared and unique allelic transcriptional regulatory mechanisms at SLE risk loci. In-depth analysis of allelic transcription factor (TF) binding at and around allelic variants identifies one class of TFs whose DNA-binding motif tends to be directly altered by the risk variant and a second class of TFs that bind allelically without direct alteration of their motif by the variant. Collectively, our approach provides a blueprint for the discovery of allelic gene regulation at risk loci for any disease and offers insight into the transcriptional regulatory mechanisms underlying SLE.


Assuntos
Alelos , Predisposição Genética para Doença/genética , Lúpus Eritematoso Sistêmico/genética , Linfócitos B , Linhagem Celular , Cromatina , Regulação da Expressão Gênica , Estudo de Associação Genômica Ampla , Genótipo , Herpesvirus Humano 4 , Humanos , Locos de Características Quantitativas , Sinaptogirinas/genética , Linfócitos T
13.
Nat Biotechnol ; 38(10): 1211, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32792646

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

14.
Genome Biol ; 21(1): 134, 2020 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-32493396

RESUMO

Improved methods are needed to model CRISPR screen data for interrogation of genetic elements that alter reporter gene expression readout. We create MAUDE (Mean Alterations Using Discrete Expression) for quantifying the impact of guide RNAs on a target gene's expression in a pooled, sorting-based expression screen. MAUDE quantifies guide-level effects by modeling the distribution of cells across sorting expression bins. It then combines guides to estimate the statistical significance and effect size of targeted genetic elements. We demonstrate that MAUDE outperforms previous approaches and provide experimental design guidelines to best leverage MAUDE, which is available on https://github.com/Carldeboer/MAUDE.


Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Expressão Gênica , Técnicas Genéticas , RNA Guia de Cinetoplastídeos , Software , Algoritmos , Sistemas CRISPR-Cas , Modelos Genéticos
15.
Nat Commun ; 11(1): 1237, 2020 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-32144282

RESUMO

Genome-wide association studies have associated thousands of genetic variants with complex traits and diseases, but pinpointing the causal variant(s) among those in tight linkage disequilibrium with each associated variant remains a major challenge. Here, we use seven experimental assays to characterize all common variants at the multiple disease-associated TNFAIP3 locus in five disease-relevant immune cell lines, based on a set of features related to regulatory potential. Trait/disease-associated variants are enriched among SNPs prioritized based on either: (1) residing within CRISPRi-sensitive regulatory regions, or (2) localizing in a chromatin accessible region while displaying allele-specific reporter activity. Of the 15 trait/disease-associated haplotypes at TNFAIP3, 9 have at least one variant meeting one or both of these criteria, 5 of which are further supported by genetic fine-mapping. Our work provides a comprehensive strategy to characterize genetic variation at important disease-associated loci, and aids in the effort to identify trait causal genetic variants.


Assuntos
Doenças Autoimunes/genética , Loci Gênicos/genética , Estudo de Associação Genômica Ampla/métodos , Herança Multifatorial/genética , Proteína 3 Induzida por Fator de Necrose Tumoral alfa/genética , Linhagem Celular Tumoral , Predisposição Genética para Doença , Variação Genética/imunologia , Haplótipos/genética , Haplótipos/imunologia , Humanos , Desequilíbrio de Ligação , Herança Multifatorial/imunologia , Estudo de Prova de Conceito
16.
Nat Biotechnol ; 38(1): 56-65, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31792407

RESUMO

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF's specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.


Assuntos
Eucariotos/genética , Regulação da Expressão Gênica , Lógica , Regiões Promotoras Genéticas , Sítios de Ligação , DNA/metabolismo , Genes Reporter , Modelos Genéticos , Saccharomyces cerevisiae/genética , Fatores de Transcrição/metabolismo
18.
Cell Rep ; 25(11): 2992-3005.e5, 2018 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-30540934

RESUMO

Long-term hematopoietic stem cells (LT-HSCs) maintain hematopoietic output throughout an animal's lifespan. However, with age, the balance is disrupted, and LT-HSCs produce a myeloid-biased output, resulting in poor immune responses to infectious challenge and the development of myeloid leukemias. Here, we show that young and aged LT-HSCs respond differently to inflammatory stress, such that aged LT-HSCs produce a cell-intrinsic, myeloid-biased expression program. Using single-cell RNA sequencing (scRNA-seq), we identify a myeloid-biased subset within the LT-HSC population (mLT-HSCs) that is prevalent among aged LT-HSCs. We identify CD61 as a marker of mLT-HSCs and show that CD61-high LT-HSCs are uniquely primed to respond to acute inflammatory challenge. We predict that several transcription factors regulate the mLT-HSCs gene program and show that Klf5, Ikzf1, and Stat3 play an important role in age-related inflammatory myeloid bias. We have therefore identified and isolated an LT-HSC subset that regulates myeloid versus lymphoid balance under inflammatory challenge and with age.


Assuntos
Envelhecimento/patologia , Células-Tronco Hematopoéticas/metabolismo , Inflamação/patologia , Animais , Biomarcadores/metabolismo , Inflamação/genética , Ligantes , Camundongos Endogâmicos C57BL , Modelos Biológicos , Células Mieloides/metabolismo , Receptores Toll-Like/metabolismo , Transcrição Gênica
19.
Cell ; 175(4): 998-1013.e20, 2018 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30388456

RESUMO

Treatment of cancer has been revolutionized by immune checkpoint blockade therapies. Despite the high rate of response in advanced melanoma, the majority of patients succumb to disease. To identify factors associated with success or failure of checkpoint therapy, we profiled transcriptomes of 16,291 individual immune cells from 48 tumor samples of melanoma patients treated with checkpoint inhibitors. Two distinct states of CD8+ T cells were defined by clustering and associated with patient tumor regression or progression. A single transcription factor, TCF7, was visualized within CD8+ T cells in fixed tumor samples and predicted positive clinical outcome in an independent cohort of checkpoint-treated patients. We delineated the epigenetic landscape and clonality of these T cell states and demonstrated enhanced antitumor immunity by targeting novel combinations of factors in exhausted cells. Our study of immune cell transcriptomes from tumors demonstrates a strategy for identifying predictors, mechanisms, and targets for enhancing checkpoint immunotherapy.


Assuntos
Linfócitos T CD8-Positivos/imunologia , Imunoterapia/métodos , Melanoma/imunologia , Transcriptoma , Animais , Anticorpos Monoclonais Humanizados/imunologia , Anticorpos Monoclonais Humanizados/farmacologia , Antígenos CD/imunologia , Antineoplásicos Imunológicos/imunologia , Antineoplásicos Imunológicos/farmacologia , Apirase/antagonistas & inibidores , Apirase/imunologia , Linhagem Celular Tumoral , Humanos , Antígenos Comuns de Leucócito/antagonistas & inibidores , Antígenos Comuns de Leucócito/imunologia , Melanoma/terapia , Camundongos , Camundongos Endogâmicos BALB C , Camundongos Endogâmicos C57BL , Fator 1 de Transcrição de Linfócitos T/metabolismo
20.
BMC Bioinformatics ; 19(1): 253, 2018 07 03.
Artigo em Inglês | MEDLINE | ID: mdl-29970004

RESUMO

BACKGROUND: Variation in chromatin organization across single cells can help shed important light on the mechanisms controlling gene expression, but scale, noise, and sparsity pose significant challenges for interpretation of single cell chromatin data. Here, we develop BROCKMAN (Brockman Representation Of Chromatin by K-mers in Mark-Associated Nucleotides), an approach to infer variation in transcription factor (TF) activity across samples through unsupervised analysis of the variation in DNA sequences associated with an epigenomic mark. RESULTS: BROCKMAN represents each sample as a vector of epigenomic-mark-associated DNA word frequencies, and decomposes the resulting matrix to find hidden structure in the data, followed by unsupervised grouping of samples and identification of the TFs that distinguish groups. Applied to single cell ATAC-seq, BROCKMAN readily distinguished cell types, treatments, batch effects, experimental artifacts, and cycling cells. We show that each variable component in the k-mer landscape reflects a set of co-varying TFs, which are often known to physically interact. For example, in K562 cells, AP-1 TFs were central determinant of variability in chromatin accessibility through their variable expression levels and diverse interactions with other TFs. We provide a theoretical basis for why cooperative TF binding - and any associated epigenomic mark - is inherently more variable than non-cooperative binding. CONCLUSIONS: BROCKMAN and related approaches will help gain a mechanistic understanding of the trans determinants of chromatin variability between cells, treatments, and individuals.


Assuntos
Epigenômica/métodos , Fatores de Transcrição/metabolismo , Sítios de Ligação , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...