Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Science ; 381(6664): eadd1250, 2023 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-37733848

RESUMEN

Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.


Asunto(s)
Regulación de la Expresión Génica , Repeticiones de Microsatélite , Factores de Transcripción , Células Eucariotas , Factores de Transcripción/química , Factores de Transcripción/genética , Unión Proteica , Humanos , Animales , Saccharomyces cerevisiae , Dominios Proteicos , Conformación Proteica
2.
bioRxiv ; 2023 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-37214836

RESUMEN

Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences, in vivo genomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained on in vitro TF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific, in vivo binding profiles. Conversely, deep learning models, trained on in vivo TF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models of in vitro and in vivo TF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinities de-novo from deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diverse in vitro assays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant of in vivo binding, suggest that deep learning models of in vivo binding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughput in silico experiments to explore the influence of sequence context and variation on both intrinsic affinity and in vivo occupancy.

3.
Bioinformatics ; 38(9): 2397-2403, 2022 04 28.
Artículo en Inglés | MEDLINE | ID: mdl-35238376

RESUMEN

MOTIVATION: Deep-learning models, such as convolutional neural networks, are able to accurately map biological sequences to associated functional readouts and properties by learning predictive de novo representations. In silico saturation mutagenesis (ISM) is a popular feature attribution technique for inferring contributions of all characters in an input sequence to the model's predicted output. The main drawback of ISM is its runtime, as it involves multiple forward propagations of all possible mutations of each character in the input sequence through the trained model to predict the effects on the output. RESULTS: We present fastISM, an algorithm that speeds up ISM by a factor of over 10× for commonly used convolutional neural network architectures. fastISM is based on the observations that the majority of computation in ISM is spent in convolutional layers, and a single mutation only disrupts a limited region of intermediate layers, rendering most computation redundant. fastISM reduces the gap between backpropagation-based feature attribution methods and ISM. It far surpasses the runtime of backpropagation-based methods on multi-output architectures, making it feasible to run ISM on a large number of sequences. AVAILABILITY AND IMPLEMENTATION: An easy-to-use Keras/TensorFlow 2 implementation of fastISM is available at https://github.com/kundajelab/fastISM. fastISM can be installed using pip install fastism. A hands-on tutorial can be found at https://colab.research.google.com/github/kundajelab/fastISM/blob/master/notebooks/colab/DeepSEA.ipynb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Neurales de la Computación , Mutagénesis , Mutación
4.
Genome Res ; 32(3): 512-523, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35042722

RESUMEN

The intrinsic DNA sequence preferences and cell type-specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type-specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species-specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.


Asunto(s)
Redes Neurales de la Computación , Factores de Transcripción , Sitios de Unión , Secuenciación de Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Unión Proteica , Factores de Transcripción/metabolismo
5.
Nat Genet ; 53(3): 354-366, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33603233

RESUMEN

The arrangement (syntax) of transcription factor (TF) binding motifs is an important part of the cis-regulatory code, yet remains elusive. We introduce a deep learning model, BPNet, that uses DNA sequence to predict base-resolution chromatin immunoprecipitation (ChIP)-nexus binding profiles of pluripotency TFs. We develop interpretation tools to learn predictive motif representations and identify soft syntax rules for cooperative TF binding interactions. Strikingly, Nanog preferentially binds with helical periodicity, and TFs often cooperate in a directional manner, which we validate using clustered regularly interspaced short palindromic repeat (CRISPR)-induced point mutations. Our model represents a powerful general approach to uncover the motifs and syntax of cis-regulatory sequences in genomics data.


Asunto(s)
Biología Computacional/métodos , Motivos de Nucleótidos , Factores de Transcripción/metabolismo , Animales , Sitios de Unión , Inmunoprecipitación de Cromatina , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Aprendizaje Profundo , Ratones , Células Madre Embrionarias de Ratones/fisiología , Proteína Homeótica Nanog/metabolismo , Redes Neurales de la Computación , Factor 3 de Transcripción de Unión a Octámeros/metabolismo , Reproducibilidad de los Resultados , Factores de Transcripción SOXB1/metabolismo
6.
Bioinformatics ; 35(14): i173-i182, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510661

RESUMEN

SUMMARY: Support Vector Machines with gapped k-mer kernels (gkm-SVMs) have been used to learn predictive models of regulatory DNA sequence. However, interpreting predictive sequence patterns learned by gkm-SVMs can be challenging. Existing interpretation methods such as deltaSVM, in-silico mutagenesis (ISM) or SHAP either do not scale well or make limiting assumptions about the model that can produce misleading results when the gkm kernel is combined with nonlinear kernels. Here, we propose GkmExplain: a computationally efficient feature attribution method for interpreting predictive sequence patterns from gkm-SVM models that has theoretical connections to the method of Integrated Gradients. Using simulated regulatory DNA sequences, we show that GkmExplain identifies predictive patterns with high accuracy while avoiding pitfalls of deltaSVM and ISM and being orders of magnitude more computationally efficient than SHAP. By applying GkmExplain and a recently developed motif discovery method called TF-MoDISco to gkm-SVM models trained on in vivo transcription factor (TF) binding data, we recover consolidated, non-redundant TF motifs. Mutation impact scores derived using GkmExplain consistently outperform deltaSVM and ISM at identifying regulatory genetic variants from gkm-SVM models of chromatin accessibility in lymphoblastoid cell-lines. AVAILABILITY AND IMPLEMENTATION: Code and example notebooks to reproduce results are at https://github.com/kundajelab/gkmexplain. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cromatina , Unión Proteica , Máquina de Vectores de Soporte
7.
PLoS One ; 14(6): e0218073, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31206543

RESUMEN

The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.


Asunto(s)
ADN/genética , Genes Reporteros/genética , Polimorfismo de Nucleótido Simple/genética , ARN no Traducido/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Alelos , Bioensayo/métodos , Línea Celular Tumoral , Cromatina/genética , Genoma Humano/genética , Células Hep G2 , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Células K562 , Redes Neurales de la Computación , Análisis de Secuencia de ADN/métodos , Programas Informáticos
9.
J R Soc Interface ; 15(141)2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29618526

RESUMEN

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.


Asunto(s)
Investigación Biomédica/tendencias , Tecnología Biomédica/tendencias , Aprendizaje Profundo/tendencias , Algoritmos , Investigación Biomédica/métodos , Toma de Decisiones , Atención a la Salud/métodos , Atención a la Salud/tendencias , Enfermedad/genética , Diseño de Fármacos , Registros Electrónicos de Salud/tendencias , Humanos , Terminología como Asunto
10.
Circ Res ; 116(5): 804-15, 2015 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-25477501

RESUMEN

RATIONALE: Neonatal mice have the capacity to regenerate their hearts in response to injury, but this potential is lost after the first week of life. The transcriptional changes that underpin mammalian cardiac regeneration have not been fully characterized at the molecular level. OBJECTIVE: The objectives of our study were to determine whether myocytes revert the transcriptional phenotype to a less differentiated state during regeneration and to systematically interrogate the transcriptional data to identify and validate potential regulators of this process. METHODS AND RESULTS: We derived a core transcriptional signature of injury-induced cardiac myocyte (CM) regeneration in mouse by comparing global transcriptional programs in a dynamic model of in vitro and in vivo CM differentiation, in vitro CM explant model, as well as a neonatal heart resection model. The regenerating mouse heart revealed a transcriptional reversion of CM differentiation processes, including reactivation of latent developmental programs similar to those observed during destabilization of a mature CM phenotype in the explant model. We identified potential upstream regulators of the core network, including interleukin 13, which induced CM cell cycle entry and STAT6/STAT3 signaling in vitro. We demonstrate that STAT3/periostin and STAT6 signaling are critical mediators of interleukin 13 signaling in CMs. These downstream signaling molecules are also modulated in the regenerating mouse heart. CONCLUSIONS: Our work reveals new insights into the transcriptional regulation of mammalian cardiac regeneration and provides the founding circuitry for identifying potential regulators for stimulating heart regeneration.


Asunto(s)
Miocitos Cardíacos/metabolismo , Regeneración/fisiología , Transcripción Genética , Animales , Animales Recién Nacidos , Moléculas de Adhesión Celular/fisiología , Ciclo Celular , Desdiferenciación Celular/genética , Diferenciación Celular , Células Cultivadas , Medio de Cultivo Libre de Suero , Replicación del ADN , Regulación del Desarrollo de la Expresión Génica , Redes Reguladoras de Genes , Ventrículos Cardíacos/citología , Interleucina-13/farmacología , Interleucina-13/fisiología , Subunidad alfa1 del Receptor de Interleucina-13/antagonistas & inhibidores , Subunidad alfa1 del Receptor de Interleucina-13/genética , Subunidad alfa del Receptor de Interleucina-4/antagonistas & inhibidores , Subunidad alfa del Receptor de Interleucina-4/genética , Ratones , Desarrollo de Músculos , Miocitos Cardíacos/efectos de los fármacos , Interferencia de ARN , ARN Interferente Pequeño/farmacología , Ratas , Ratas Sprague-Dawley , Factor de Transcripción STAT3/fisiología , Factor de Transcripción STAT6/fisiología , Alineación de Secuencia , Factores de Transcripción/fisiología , Transcriptoma
11.
Cell ; 151(1): 206-20, 2012 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-22981692

RESUMEN

Heart development is exquisitely sensitive to the precise temporal regulation of thousands of genes that govern developmental decisions during differentiation. However, we currently lack a detailed understanding of how chromatin and gene expression patterns are coordinated during developmental transitions in the cardiac lineage. Here, we interrogated the transcriptome and several histone modifications across the genome during defined stages of cardiac differentiation. We find distinct chromatin patterns that are coordinated with stage-specific expression of functionally related genes, including many human disease-associated genes. Moreover, we discover a novel preactivation chromatin pattern at the promoters of genes associated with heart development and cardiac function. We further identify stage-specific distal enhancer elements and find enriched DNA binding motifs within these regions that predict sets of transcription factors that orchestrate cardiac differentiation. Together, these findings form a basis for understanding developmentally regulated chromatin transitions during lineage commitment and the molecular etiology of congenital heart disease.


Asunto(s)
Epigénesis Genética , Redes Reguladoras de Genes , Miocardio/citología , Animales , Diferenciación Celular , Cromatina/metabolismo , Células Madre Embrionarias/metabolismo , Elementos de Facilitación Genéticos , Corazón/embriología , Humanos , Ratones , Factores de Transcripción/metabolismo , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...