Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
ACS Synth Biol ; 13(8): 2328-2334, 2024 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-39038190

RESUMEN

DNA libraries are critical components of many biological assays. These libraries are often kept in plasmids that are amplified in E. coli to generate sufficient material for an experiment. Library uniformity is critical for ensuring that every element in the library is tested similarly and is thought to be influenced by the culture approach used during library amplification. We tested five commonly used culturing methods for their ability to uniformly amplify plasmid libraries: liquid, semisolid agar, cell spreader-spread plates with high or low colony density, and bead-spread plates. Each approach was evaluated with two library types: a random 80-mer library, representing high complexity and low coverage of similar sequence lengths, and a human TF ORF library, representing low complexity and high coverage of diverse sequence lengths. We found that no method was better than liquid culture, which produced relatively uniform libraries regardless of library type. However, when libraries were transformed with high coverage, the culturing method had minimal impact on uniformity or amplification bias. Plating libraries was the worst approach by almost every measure for both library types and, counterintuitively, produced the strongest biases against long sequence representation. Semisolid agar amplified most elements of the library uniformly but also included outliers with orders of magnitude higher abundance. For amplifying DNA libraries, liquid culture, the simplest method, appears to be best.


Asunto(s)
Amplificación de Genes , Plásmidos , Plásmidos/genética , Humanos , Factores de Transcripción/genética , Escherichia coli/genética , Técnicas de Cultivo/métodos
2.
Nat Methods ; 21(6): 1033-1043, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38684783

RESUMEN

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here we describe the high-throughput, functional assessment of phosphorylation sites through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally resolved phosphoproteomics. Using T cell activation as a model, we observe hundreds of unstudied phosphorylation sites that modulate NFAT transcriptional activity. We identify the phosphorylation-mediated nuclear localization of PHLPP1, which promotes NFAT but inhibits NFκB activity. We also find that specific phosphosite mutants can alter gene expression in subtle yet distinct patterns, demonstrating the potential for fine-tuning transcriptional responses. Overall, base editor screening of PTM sites provides a powerful platform to dissect PTM function within signaling pathways.


Asunto(s)
Procesamiento Proteico-Postraduccional , Fosforilación , Humanos , Factores de Transcripción NFATC/metabolismo , Factores de Transcripción NFATC/genética , Transducción de Señal , Células HEK293 , Proteómica/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Linfocitos T/metabolismo , Células Jurkat , FN-kappa B/metabolismo
3.
Nat Struct Mol Biol ; 31(3): 559-567, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38448573

RESUMEN

Genomes encode for genes and non-coding DNA, both capable of transcriptional activity. However, unlike canonical genes, many transcripts from non-coding DNA have limited evidence of conservation or function. Here, to determine how much biological noise is expected from non-genic sequences, we quantify the regulatory activity of evolutionarily naive DNA using RNA-seq in yeast and computational predictions in humans. In yeast, more than 99% of naive DNA bases were transcribed. Unlike the evolved transcriptome, naive transcripts frequently overlapped with opposite sense transcripts, suggesting selection favored coherent gene structures in the yeast genome. In humans, regulation-associated chromatin activity is predicted to be common in naive dinucleotide-content-matched randomized DNA. Here, naive and evolved DNA have similar co-occurrence and cell-type specificity of chromatin marks, challenging these as indicators of selection. However, in both yeast and humans, extreme high activities were rare in naive DNA, suggesting they result from selection. Overall, basal regulatory activity seems to be the default, which selection can hone to evolve a function or, if detrimental, repress.


Asunto(s)
Saccharomyces cerevisiae , Transcriptoma , Humanos , Saccharomyces cerevisiae/genética , Genoma , ADN , Cromatina
4.
bioRxiv ; 2024 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-38405704

RESUMEN

Neural networks have emerged as immensely powerful tools in predicting functional genomic regions, notably evidenced by recent successes in deciphering gene regulatory logic. However, a systematic evaluation of how model architectures and training strategies impact genomics model performance is lacking. To address this gap, we held a DREAM Challenge where competitors trained models on a dataset of millions of random promoter DNA sequences and corresponding expression levels, experimentally determined in yeast, to best capture the relationship between regulatory DNA and gene expression. For a robust evaluation of the models, we designed a comprehensive suite of benchmarks encompassing various sequence types. While some benchmarks produced similar results across the top-performing models, others differed substantially. All top-performing models used neural networks, but diverged in architectures and novel training strategies, tailored to genomics sequence data. To dissect how architectural and training choices impact performance, we developed the Prix Fixe framework to divide any given model into logically equivalent building blocks. We tested all possible combinations for the top three models and observed performance improvements for each. The DREAM Challenge models not only achieved state-of-the-art results on our comprehensive yeast dataset but also consistently surpassed existing benchmarks on Drosophila and human genomic datasets. Overall, we demonstrate that high-quality gold-standard genomics datasets can drive significant progress in model development.

6.
Nature ; 625(7993): 41-50, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38093018

RESUMEN

Gene expression is regulated by transcription factors that work together to read cis-regulatory DNA sequences. The 'cis-regulatory code' - how cells interpret DNA sequences to determine when, where and how much genes should be expressed - has proven to be exceedingly complex. Recently, advances in the scale and resolution of functional genomics assays and machine learning have enabled substantial progress towards deciphering this code. However, the cis-regulatory code will probably never be solved if models are trained only on genomic sequences; regions of homology can easily lead to overestimation of predictive performance, and our genome is too short and has insufficient sequence diversity to learn all relevant parameters. Fortunately, randomly synthesized DNA sequences enable testing a far larger sequence space than exists in our genomes, and designed DNA sequences enable targeted queries to maximally improve the models. As the same biochemical principles are used to interpret DNA regardless of its source, models trained on these synthetic data can predict genomic activity, often better than genome-trained models. Here we provide an outlook on the field, and propose a roadmap towards solving the cis-regulatory code by a combination of machine learning and massively parallel assays using synthetic DNA.


Asunto(s)
Genómica , Aprendizaje Automático , Modelos Genéticos , Secuencias Reguladoras de Ácidos Nucleicos , ADN/síntesis química , ADN/genética , ADN/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/metabolismo
7.
bioRxiv ; 2023 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-38014346

RESUMEN

Signaling pathways that drive gene expression are typically depicted as having a dozen or so landmark phosphorylation and transcriptional events. In reality, thousands of dynamic post-translational modifications (PTMs) orchestrate nearly every cellular function, and we lack technologies to find causal links between these vast biochemical pathways and genetic circuits at scale. Here, we describe "signaling-to-transcription network" mapping through the development of PTM-centric base editing coupled to phenotypic screens, directed by temporally-resolved phosphoproteomics. Using T cell activation as a model, we observe hundreds of unstudied phosphorylation sites that modulate NFAT transcriptional activity. We identify the phosphorylation-mediated nuclear localization of the phosphatase PHLPP1 which promotes NFAT but inhibits NFκB activity. We also find that specific phosphosite mutants can alter gene expression in subtle yet distinct patterns, demonstrating the potential for fine-tuning transcriptional responses. Overall, base editor screening of PTM sites provides a powerful platform to dissect PTM function within signaling pathways.

8.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37490428

RESUMEN

MOTIVATION: The increasing volume of data from high-throughput experiments including parallel reporter assays facilitates the development of complex deep-learning approaches for modeling DNA regulatory grammar. RESULTS: Here, we introduce LegNet, an EfficientNetV2-inspired convolutional network for modeling short gene regulatory regions. By approaching the sequence-to-expression regression problem as a soft classification task, LegNet secured first place for the autosome.org team in the DREAM 2022 challenge of predicting gene expression from gigantic parallel reporter assays. Using published data, here, we demonstrate that LegNet outperforms existing models and accurately predicts gene expression per se as well as the effects of single-nucleotide variants. Furthermore, we show how LegNet can be used in a diffusion network manner for the rational design of promoter sequences yielding the desired expression level. AVAILABILITY AND IMPLEMENTATION: https://github.com/autosome-ru/LegNet. The GitHub repository includes Jupyter Notebook tutorials and Python scripts under the MIT license to reproduce the results presented in the study.


Asunto(s)
Aprendizaje Profundo , Secuencias Reguladoras de Ácidos Nucleicos , ADN , Regiones Promotoras Genéticas , Programas Informáticos
9.
Bioinformatics ; 39(6)2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-37208164

RESUMEN

SUMMARY: Generate Indexes for Libraries (GIL) is a software tool for generating primers to be used in the production of multiplexed sequencing libraries. GIL can be customized in numerous ways to meet user specifications, including length, sequencing modality, color balancing, and compatibility with existing primers, and produces ordering and demultiplexing-ready outputs. AVAILABILITY AND IMPLEMENTATION: GIL is written in Python and is freely available on GitHub under the MIT license: https://github.com/de-Boer-Lab/GIL and can be accessed as a web-application implemented in Streamlit at https://dbl-gil.streamlitapp.com.


Asunto(s)
Cartilla de ADN , Programas Informáticos
10.
Nat Genet ; 54(5): 603-612, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35513721

RESUMEN

Genome-wide association studies (GWASs) have uncovered hundreds of autoimmune disease-associated loci; however, the causal genetic variants within each locus are mostly unknown. Here, we perform high-throughput allele-specific reporter assays to prioritize disease-associated variants for five autoimmune diseases. By examining variants that both promote allele-specific reporter expression and are located in accessible chromatin, we identify 60 putatively causal variants that enrich for statistically fine-mapped variants by up to 57.8-fold. We introduced the risk allele of a prioritized variant (rs72928038) into a human T cell line and deleted the orthologous sequence in mice, both resulting in reduced BACH2 expression. Naive CD8 T cells from mice containing the deletion had reduced expression of genes that suppress activation and maintain stemness and, upon acute viral infection, displayed greater propensity to become effector T cells. Our results represent an example of an effective approach for prioritizing variants and studying their physiologically relevant effects.


Asunto(s)
Enfermedades Autoinmunes , Estudio de Asociación del Genoma Completo , Alelos , Animales , Enfermedades Autoinmunes/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Ratones , Polimorfismo de Nucleótido Simple/genética , Secuencias Reguladoras de Ácidos Nucleicos , Linfocitos T
11.
Nature ; 603(7901): 455-463, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35264797

RESUMEN

Mutations in non-coding regulatory DNA sequences can alter gene expression, organismal phenotype and fitness1-3. Constructing complete fitness landscapes, in which DNA sequences are mapped to fitness, is a long-standing goal in biology, but has remained elusive because it is challenging to generalize reliably to vast sequence spaces4-6. Here we build sequence-to-expression models that capture fitness landscapes and use them to decipher principles of regulatory evolution. Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. Using our models, we study expression divergence under genetic drift and strong-selection weak-mutation regimes to find that regulatory evolution is rapid and subject to diminishing returns epistasis; that conflicting expression objectives in different environments constrain expression adaptation; and that stabilizing selection on gene expression leads to the moderation of regulatory complexity. We present an approach for using such models to detect signatures of selection on expression from natural variation in regulatory sequences and use it to discover an instance of convergent regulatory evolution. We assess mutational robustness, finding that regulatory mutation effect sizes follow a power law, characterize regulatory evolvability, visualize promoter fitness landscapes, discover evolvability archetypes and illustrate the mutational robustness of natural regulatory sequence populations. Our work provides a general framework for designing regulatory sequences and addressing fundamental questions in regulatory evolution.


Asunto(s)
Flujo Genético , Modelos Genéticos , Evolución Biológica , ADN , Evolución Molecular , Regulación de la Expresión Génica , Mutación/genética , Fenotipo , Saccharomyces cerevisiae/genética
12.
Hum Mol Genet ; 31(12): 1946-1961, 2022 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-34970970

RESUMEN

BACKGROUND: FCGR2A binds antibody-antigen complexes to regulate the abundance of circulating and deposited complexes along with downstream immune and autoimmune responses. Although the abundance of FCRG2A may be critical in immune-mediated diseases, little is known about whether its surface expression is regulated through cis genomic elements and non-coding variants. In the current study, we aimed to characterize the regulation of FCGR2A expression, the impact of genetic variation and its association with autoimmune disease. METHODS: We applied CRISPR-based interference and editing to scrutinize 1.7 Mb of open chromatin surrounding the FCGR2A gene to identify regulatory elements. Relevant transcription factors (TFs) binding to these regions were defined through public databases. Genetic variants affecting regulation were identified using luciferase reporter assays and were verified in a cohort of 1996 genotyped healthy individuals using flow cytometry. RESULTS: We identified a complex proximal region and five distal enhancers regulating FCGR2A. The proximal region split into subregions upstream and downstream of the transcription start site, was enriched in binding of inflammation-regulated TFs, and harbored a variant associated with FCGR2A expression in primary myeloid cells. One distal enhancer region was occupied by CCCTC-binding factor (CTCF) whose binding site was disrupted by a rare genetic variant, altering gene expression. CONCLUSIONS: The FCGR2A gene is regulated by multiple proximal and distal genomic regions, with links to autoimmune disease. These findings may open up novel therapeutic avenues where fine-tuning of FCGR2A levels may constitute a part of treatment strategies for immune-mediated diseases.


Asunto(s)
Enfermedades Autoinmunes , Elementos de Facilitación Genéticos , Receptores de IgG , Enfermedades Autoinmunes/genética , Sitios de Unión , Genómica , Genotipo , Humanos , Receptores de IgG/genética
13.
Nat Commun ; 12(1): 1611, 2021 03 12.
Artículo en Inglés | MEDLINE | ID: mdl-33712590

RESUMEN

Genome-wide association studies of Systemic Lupus Erythematosus (SLE) nominate 3073 genetic variants at 91 risk loci. To systematically screen these variants for allelic transcriptional enhancer activity, we construct a massively parallel reporter assay (MPRA) library comprising 12,396 DNA oligonucleotides containing the genomic context around every allele of each SLE variant. Transfection into the Epstein-Barr virus-transformed B cell line GM12878 reveals 482 variants with enhancer activity, with 51 variants showing genotype-dependent (allelic) enhancer activity at 27 risk loci. Comparison of MPRA results in GM12878 and Jurkat T cell lines highlights shared and unique allelic transcriptional regulatory mechanisms at SLE risk loci. In-depth analysis of allelic transcription factor (TF) binding at and around allelic variants identifies one class of TFs whose DNA-binding motif tends to be directly altered by the risk variant and a second class of TFs that bind allelically without direct alteration of their motif by the variant. Collectively, our approach provides a blueprint for the discovery of allelic gene regulation at risk loci for any disease and offers insight into the transcriptional regulatory mechanisms underlying SLE.


Asunto(s)
Alelos , Predisposición Genética a la Enfermedad/genética , Lupus Eritematoso Sistémico/genética , Linfocitos B , Línea Celular , Cromatina , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Genotipo , Herpesvirus Humano 4 , Humanos , Sitios de Carácter Cuantitativo , Sinaptogirinas/genética , Linfocitos T
14.
Nat Biotechnol ; 38(10): 1211, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-32792646

RESUMEN

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

15.
Genome Biol ; 21(1): 134, 2020 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-32493396

RESUMEN

Improved methods are needed to model CRISPR screen data for interrogation of genetic elements that alter reporter gene expression readout. We create MAUDE (Mean Alterations Using Discrete Expression) for quantifying the impact of guide RNAs on a target gene's expression in a pooled, sorting-based expression screen. MAUDE quantifies guide-level effects by modeling the distribution of cells across sorting expression bins. It then combines guides to estimate the statistical significance and effect size of targeted genetic elements. We demonstrate that MAUDE outperforms previous approaches and provide experimental design guidelines to best leverage MAUDE, which is available on https://github.com/Carldeboer/MAUDE.


Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Expresión Génica , Técnicas Genéticas , ARN Guía de Kinetoplastida , Programas Informáticos , Algoritmos , Sistemas CRISPR-Cas , Modelos Genéticos
16.
Nat Commun ; 11(1): 1237, 2020 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-32144282

RESUMEN

Genome-wide association studies have associated thousands of genetic variants with complex traits and diseases, but pinpointing the causal variant(s) among those in tight linkage disequilibrium with each associated variant remains a major challenge. Here, we use seven experimental assays to characterize all common variants at the multiple disease-associated TNFAIP3 locus in five disease-relevant immune cell lines, based on a set of features related to regulatory potential. Trait/disease-associated variants are enriched among SNPs prioritized based on either: (1) residing within CRISPRi-sensitive regulatory regions, or (2) localizing in a chromatin accessible region while displaying allele-specific reporter activity. Of the 15 trait/disease-associated haplotypes at TNFAIP3, 9 have at least one variant meeting one or both of these criteria, 5 of which are further supported by genetic fine-mapping. Our work provides a comprehensive strategy to characterize genetic variation at important disease-associated loci, and aids in the effort to identify trait causal genetic variants.


Asunto(s)
Enfermedades Autoinmunes/genética , Sitios Genéticos/genética , Estudio de Asociación del Genoma Completo/métodos , Herencia Multifactorial/genética , Proteína 3 Inducida por el Factor de Necrosis Tumoral alfa/genética , Línea Celular Tumoral , Predisposición Genética a la Enfermedad , Variación Genética/inmunología , Haplotipos/genética , Haplotipos/inmunología , Humanos , Desequilibrio de Ligamiento , Herencia Multifactorial/inmunología , Prueba de Estudio Conceptual
17.
Nat Biotechnol ; 38(1): 56-65, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31792407

RESUMEN

How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF's specificity, activity and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation.


Asunto(s)
Eucariontes/genética , Regulación de la Expresión Génica , Lógica , Regiones Promotoras Genéticas , Sitios de Unión , ADN/metabolismo , Genes Reporteros , Modelos Genéticos , Saccharomyces cerevisiae/genética , Factores de Transcripción/metabolismo
19.
Cell Rep ; 25(11): 2992-3005.e5, 2018 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-30540934

RESUMEN

Long-term hematopoietic stem cells (LT-HSCs) maintain hematopoietic output throughout an animal's lifespan. However, with age, the balance is disrupted, and LT-HSCs produce a myeloid-biased output, resulting in poor immune responses to infectious challenge and the development of myeloid leukemias. Here, we show that young and aged LT-HSCs respond differently to inflammatory stress, such that aged LT-HSCs produce a cell-intrinsic, myeloid-biased expression program. Using single-cell RNA sequencing (scRNA-seq), we identify a myeloid-biased subset within the LT-HSC population (mLT-HSCs) that is prevalent among aged LT-HSCs. We identify CD61 as a marker of mLT-HSCs and show that CD61-high LT-HSCs are uniquely primed to respond to acute inflammatory challenge. We predict that several transcription factors regulate the mLT-HSCs gene program and show that Klf5, Ikzf1, and Stat3 play an important role in age-related inflammatory myeloid bias. We have therefore identified and isolated an LT-HSC subset that regulates myeloid versus lymphoid balance under inflammatory challenge and with age.


Asunto(s)
Envejecimiento/patología , Células Madre Hematopoyéticas/metabolismo , Inflamación/patología , Animales , Biomarcadores/metabolismo , Inflamación/genética , Ligandos , Ratones Endogámicos C57BL , Modelos Biológicos , Células Mieloides/metabolismo , Receptores Toll-Like/metabolismo , Transcripción Genética
20.
Cell ; 175(4): 998-1013.e20, 2018 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30388456

RESUMEN

Treatment of cancer has been revolutionized by immune checkpoint blockade therapies. Despite the high rate of response in advanced melanoma, the majority of patients succumb to disease. To identify factors associated with success or failure of checkpoint therapy, we profiled transcriptomes of 16,291 individual immune cells from 48 tumor samples of melanoma patients treated with checkpoint inhibitors. Two distinct states of CD8+ T cells were defined by clustering and associated with patient tumor regression or progression. A single transcription factor, TCF7, was visualized within CD8+ T cells in fixed tumor samples and predicted positive clinical outcome in an independent cohort of checkpoint-treated patients. We delineated the epigenetic landscape and clonality of these T cell states and demonstrated enhanced antitumor immunity by targeting novel combinations of factors in exhausted cells. Our study of immune cell transcriptomes from tumors demonstrates a strategy for identifying predictors, mechanisms, and targets for enhancing checkpoint immunotherapy.


Asunto(s)
Linfocitos T CD8-positivos/inmunología , Inmunoterapia/métodos , Melanoma/inmunología , Transcriptoma , Animales , Anticuerpos Monoclonales Humanizados/inmunología , Anticuerpos Monoclonales Humanizados/farmacología , Antígenos CD/inmunología , Antineoplásicos Inmunológicos/inmunología , Antineoplásicos Inmunológicos/farmacología , Apirasa/antagonistas & inhibidores , Apirasa/inmunología , Línea Celular Tumoral , Humanos , Antígenos Comunes de Leucocito/antagonistas & inhibidores , Antígenos Comunes de Leucocito/inmunología , Melanoma/terapia , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Factor 1 de Transcripción de Linfocitos T/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA