Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Nat Genet ; 2024 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-39284975

RESUMEN

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with orthogonal experimental data, providing insights into generalization but offering limited insights into their decision-making process. Existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we present cis-regulatory element model explanations (CREME), an in silico perturbation toolkit that interprets the rules of gene regulation learned by a genomic DNN. Applying CREME to Enformer, a state-of-the-art DNN, we identify cis-regulatory elements that enhance or silence gene expression and characterize their complex interactions. CREME can provide interpretations across multiple scales of genomic organization, from cis-regulatory elements to fine-mapped functional sequence elements within them, offering high-resolution insights into the regulatory architecture of the genome. CREME provides a powerful toolkit for translating the predictions of genomic DNNs into mechanistic insights of gene regulation.

2.
bioRxiv ; 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-37461616

RESUMEN

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an in silico perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing cis-regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.

4.
Genome Biol ; 24(1): 105, 2023 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-37143118

RESUMEN

Deep neural networks (DNNs) hold promise for functional genomics prediction, but their generalization capability may be limited by the amount of available data. To address this, we propose EvoAug, a suite of evolution-inspired augmentations that enhance the training of genomic DNNs by increasing genetic variation. Random transformation of DNA sequences can potentially alter their function in unknown ways, so we employ a fine-tuning procedure using the original non-transformed data to preserve functional integrity. Our results demonstrate that EvoAug substantially improves the generalization and interpretability of established DNNs across prominent regulatory genomics prediction tasks, offering a robust solution for genomic DNNs.


Asunto(s)
Genómica , Redes Neurales de la Computación , Genómica/métodos
5.
Nat Cell Biol ; 25(2): 298-308, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36658219

RESUMEN

The EWS-FLI1 fusion oncoprotein deregulates transcription to initiate the paediatric cancer Ewing sarcoma. Here we used a domain-focused CRISPR screen to implicate the transcriptional repressor ETV6 as a unique dependency in this tumour. Using biochemical assays and epigenomics, we show that ETV6 competes with EWS-FLI1 for binding to select DNA elements enriched for short GGAA repeat sequences. Upon inactivating ETV6, EWS-FLI1 overtakes and hyper-activates these cis-elements to promote mesenchymal differentiation, with SOX11 being a key downstream target. We show that squelching of ETV6 with a dominant-interfering peptide phenocopies these effects and suppresses Ewing sarcoma growth in vivo. These findings reveal targeting of ETV6 as a strategy for neutralizing the EWS-FLI1 oncoprotein by reprogramming of genomic occupancy.


Asunto(s)
Sarcoma de Ewing , Niño , Humanos , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patología , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Proteína EWS de Unión a ARN/genética , Proteína EWS de Unión a ARN/metabolismo , Proteína Proto-Oncogénica c-fli-1/genética , Proteína Proto-Oncogénica c-fli-1/metabolismo , Proteínas de Fusión Oncogénica/genética , Proteínas de Fusión Oncogénica/metabolismo
6.
Proc Mach Learn Res ; 200: 131-149, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37205975

RESUMEN

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

7.
Nat Mach Intell ; 4(12): 1088-1100, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37324054

RESUMEN

Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models continue to emerge with different architectures and training configurations, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modeling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modeling of epigenomic profiles leads to better generalizability and interpretability.

8.
Bioinformatics ; 37(24): 4727-4736, 2021 12 11.
Artículo en Inglés | MEDLINE | ID: mdl-34382072

RESUMEN

MOTIVATION: Quantification of isoform abundance has been extensively studied at the mature RNA level using RNA-seq but not at the level of precursor RNAs using nascent RNA sequencing. RESULTS: We address this problem with a new computational method called Deconvolution of Expression for Nascent RNA-sequencing data (DENR), which models nascent RNA-sequencing read-counts as a mixture of user-provided isoforms. The baseline algorithm is enhanced by machine-learning predictions of active transcription start sites and an adjustment for the typical 'shape profile' of read-counts along a transcription unit. We show that DENR outperforms simple read-count-based methods for estimating gene and isoform abundances, and that transcription of multiple pre-RNA isoforms per gene is widespread, with frequent differences between cell types. In addition, we provide evidence that a majority of human isoform diversity derives from primary transcription rather than from post-transcriptional processes. AVAILABILITY AND IMPLEMENTATION: DENR and nascentRNASim are freely available at https://github.com/CshlSiepelLab/DENR (version v1.0.0) and https://github.com/CshlSiepelLab/nascentRNASim (version v0.3.0). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Isoformas de ARN , ARN , Humanos , Isoformas de ARN/genética , Programas Informáticos , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Factores Eucarióticos de Iniciación/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...