Búsqueda | OPS/OMS Uruguay

1.

Learning single-cell chromatin accessibility profiles using meta-analytic marker genes.

Kawaguchi, Risa Karakida; Tang, Ziqi; Fischer, Stephan; Rajesh, Chandana; Tripathy, Rohit; Koo, Peter K; Gillis, Jesse.

Brief Bioinform ; 24(1)2023 01 19.

Artículo en Inglés | MEDLINE | ID: mdl-36549922

RESUMEN

MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.

Asunto(s)

Cromatina , Epigénesis Genética , Animales , Ratones , Cromatina/genética , Secuencias Reguladoras de Ácidos Nucleicos , Redes Neurales de la Computación

2.

EvoAug-TF: extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.

Yu, Yiyang; Muthukumar, Shivani; Koo, Peter K.

Bioinformatics ; 40(3)2024 Mar 04.

Artículo en Inglés | MEDLINE | ID: mdl-38366935

RESUMEN

SUMMARY: Deep neural networks (DNNs) have been widely applied to predict the molecular functions of the non-coding genome. DNNs are data hungry and thus require many training examples to fit data well. However, functional genomics experiments typically generate limited amounts of data, constrained by the activity levels of the molecular function under study inside the cell. Recently, EvoAug was introduced to train a genomic DNN with evolution-inspired augmentations. EvoAug-trained DNNs have demonstrated improved generalization and interpretability with attribution analysis. However, EvoAug only supports PyTorch-based models, which limits its applications to a broad class of genomic DNNs based in TensorFlow. Here, we extend EvoAug's functionality to TensorFlow in a new package, we call EvoAug-TF. Through a systematic benchmark, we find that EvoAug-TF yields comparable performance with the original EvoAug package. AVAILABILITY AND IMPLEMENTATION: EvoAug-TF is freely available for users and is distributed under an open-source MIT license. Researchers can access the open-source code on GitHub (https://github.com/p-koo/evoaug-tf). The pre-compiled package is provided via PyPI (https://pypi.org/project/evoaug-tf) with in-depth documentation on ReadTheDocs (https://evoaug-tf.readthedocs.io). The scripts for reproducing the results are available at (https://github.com/p-koo/evoaug-tf_analysis).

Asunto(s)

Aprendizaje Profundo , Genómica/métodos , Genoma , Programas Informáticos , Redes Neurales de la Computación

3.

ZBED2 is an antagonist of interferon regulatory factor 1 and modifies cell identity in pancreatic cancer.

Somerville, Tim D D; Xu, Yali; Wu, Xiaoli S; Maia-Silva, Diogo; Hur, Stella K; de Almeida, Larissa M N; Preall, Jonathan B; Koo, Peter K; Vakoc, Christopher R.

Proc Natl Acad Sci U S A ; 117(21): 11471-11482, 2020 05 26.

Artículo en Inglés | MEDLINE | ID: mdl-32385160

RESUMEN

Lineage plasticity is a prominent feature of pancreatic ductal adenocarcinoma (PDA) cells, which can occur via deregulation of lineage-specifying transcription factors. Here, we show that the zinc finger protein ZBED2 is aberrantly expressed in PDA and alters tumor cell identity in this disease. Unexpectedly, our epigenomic experiments reveal that ZBED2 is a sequence-specific transcriptional repressor of IFN-stimulated genes, which occurs through antagonism of IFN regulatory factor 1 (IRF1)-mediated transcriptional activation at cooccupied promoter elements. Consequently, ZBED2 attenuates the transcriptional output and growth arrest phenotypes downstream of IFN signaling in multiple PDA cell line models. We also found that ZBED2 is preferentially expressed in the squamous molecular subtype of human PDA, in association with inferior patient survival outcomes. Consistent with this observation, we show that ZBED2 can repress the pancreatic progenitor transcriptional program, enhance motility, and promote invasion in PDA cells. Collectively, our findings suggest that high ZBED2 expression is acquired during PDA progression to suppress the IFN response pathway and to promote lineage plasticity in this disease.

Asunto(s)

Carcinoma Ductal Pancreático/patología , Proteínas de Unión al ADN/metabolismo , Factor 1 Regulador del Interferón/metabolismo , Neoplasias Pancreáticas/patología , Factores de Transcripción/metabolismo , Animales , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/metabolismo , Carcinoma Ductal Pancreático/mortalidad , Línea Celular Tumoral , Proliferación Celular/efectos de los fármacos , Inmunoprecipitación de Cromatina , Proteínas de Unión al ADN/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Factor 1 Regulador del Interferón/genética , Interferón gamma/farmacología , Ratones , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/metabolismo , Neoplasias Pancreáticas/mortalidad , Regiones Promotoras Genéticas , Análisis de Supervivencia , Factores de Transcripción/genética

4.

Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks.

Koo, Peter K; Majdandzic, Antonio; Ploenzke, Matthew; Anand, Praveen; Paul, Steffan B.

PLoS Comput Biol ; 17(5): e1008925, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33983921

RESUMEN

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.

Asunto(s)

Aprendizaje Profundo , Genómica , Redes Neurales de la Computación , Biología Computacional/métodos , Humanos

5.

Representation learning of genomic sequence motifs with convolutional neural networks.

Koo, Peter K; Eddy, Sean R.

PLoS Comput Biol ; 15(12): e1007560, 2019 12.

Artículo en Inglés | MEDLINE | ID: mdl-31856220

RESUMEN

Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs-assembling partial features into whole features in deeper layers-tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.

Asunto(s)

Genómica/estadística & datos numéricos , Redes Neurales de la Computación , Secuencias de Aminoácidos , Sitios de Unión/genética , Biología Computacional , Simulación por Computador , ADN/genética , Bases de Datos Genéticas/estadística & datos numéricos , Aprendizaje Profundo/estadística & datos numéricos , Genoma Humano , Humanos , Factores de Transcripción/química , Factores de Transcripción/genética , Factores de Transcripción/metabolismo

6.

A demonstration of unsupervised machine learning in species delimitation.

Derkarabetian, Shahan; Castillo, Stephanie; Koo, Peter K; Ovchinnikov, Sergey; Hedin, Marshal.

Mol Phylogenet Evol ; 139: 106562, 2019 10.

Artículo en Inglés | MEDLINE | ID: mdl-31323334

RESUMEN

One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics.

Asunto(s)

Aprendizaje Automático no Supervisado , Animales , Arácnidos/clasificación , Arácnidos/genética , Análisis por Conglomerados , Filogenia , Polimorfismo de Nucleótido Simple , Análisis de Componente Principal

7.

Improved Determination of Subnuclear Position Enabled by Three-Dimensional Membrane Reconstruction.

Zhao, Yao; Schreiner, Sarah M; Koo, Peter K; Colombi, Paolo; King, Megan C; Mochrie, Simon G J.

Biophys J ; 111(1): 19-24, 2016 Jul 12.

Artículo en Inglés | MEDLINE | ID: mdl-27410730

RESUMEN

Many aspects of chromatin biology are influenced by the nuclear compartment in which a locus resides, from transcriptional regulation to DNA repair. Further, the dynamic and variable localization of a particular locus across cell populations and over time makes analysis of a large number of cells critical. As a consequence, robust and automatable methods to measure the position of individual loci within the nuclear volume in populations of cells are necessary to support quantitative analysis of nuclear position. Here, we describe a three-dimensional membrane reconstruction approach that uses fluorescently tagged nuclear envelope or endoplasmic reticulum membrane marker proteins to precisely map the nuclear volume. This approach is robust to a variety of nuclear shapes, providing greater biological accuracy than alternative methods that enforce nuclear circularity, while also describing nuclear position in all three dimensions. By combining this method with established approaches to reconstruct the position of diffraction-limited chromatin markers-in this case, lac Operator arrays bound by lacI-GFP-the distribution of loci positions within the nuclear volume with respect to the nuclear periphery can be quantitatively obtained. This stand-alone image analysis pipeline should be of broad practical utility for individuals interested in various aspects of chromatin biology, while also providing, to our knowledge, a new conceptual framework for investigators who study organelle shape.

Asunto(s)

Imagenología Tridimensional , Membrana Nuclear/metabolismo , Animales , Retículo Endoplásmico/metabolismo , Colorantes Fluorescentes/metabolismo , Ratones , Modelos Biológicos , Células 3T3 NIH , Schizosaccharomyces/citología

8.

Extracting Diffusive States of Rho GTPase in Live Cells: Towards In Vivo Biochemistry.

Koo, Peter K; Weitzman, Matthew; Sabanaygam, Chandran R; van Golen, Kenneth L; Mochrie, Simon G J.

PLoS Comput Biol ; 11(10): e1004297, 2015 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-26512894

RESUMEN

Resolving distinct biochemical interaction states when analyzing the trajectories of diffusing proteins in live cells on an individual basis remains challenging because of the limited statistics provided by the relatively short trajectories available experimentally. Here, we introduce a novel, machine-learning based classification methodology, which we call perturbation expectation-maximization (pEM), that simultaneously analyzes a population of protein trajectories to uncover the system of diffusive behaviors which collectively result from distinct biochemical interactions. We validate the performance of pEM in silico and demonstrate that pEM is capable of uncovering the proper number of underlying diffusive states with an accurate characterization of their diffusion properties. We then apply pEM to experimental protein trajectories of Rho GTPases, an integral regulator of cytoskeletal dynamics and cellular homeostasis, in vivo via single particle tracking photo-activated localization microscopy. Remarkably, pEM uncovers 6 distinct diffusive states conserved across various Rho GTPase family members. The variability across family members in the propensities for each diffusive state reveals non-redundant roles in the activation states of RhoA and RhoC. In a resting cell, our results support a model where RhoA is constantly cycling between activation states, with an imbalance of rates favoring an inactive state. RhoC, on the other hand, remains predominantly inactive.

Asunto(s)

Difusión , Modelos Biológicos , Modelos Químicos , Imagen Molecular/métodos , Fracciones Subcelulares/química , Proteínas de Unión al GTP rho/química , Simulación por Computador , Aprendizaje Automático , Modelos Estadísticos

9.

Interpreting Cis-Regulatory Interactions from Large-Scale Deep Neural Networks for Genomics.

Toneyan, Shushan; Koo, Peter K.

bioRxiv ; 2024 Mar 20.

Artículo en Inglés | MEDLINE | ID: mdl-37461616

RESUMEN

The rise of large-scale, sequence-based deep neural networks (DNNs) for predicting gene expression has introduced challenges in their evaluation and interpretation. Current evaluations align DNN predictions with experimental perturbation assays, which provides insights into the generalization capabilities within the studied loci but offers a limited perspective of what drives their predictions. Moreover, existing model explainability tools focus mainly on motif analysis, which becomes complex when interpreting longer sequences. Here we introduce CREME, an in silico perturbation toolkit that interrogates large-scale DNNs to uncover rules of gene regulation that it learns. Using CREME, we investigate Enformer, a prominent DNN in gene expression prediction, revealing cis-regulatory elements (CREs) that directly enhance or silence target genes. We explore the intricate complexity of higher-order CRE interactions, the relationship between CRE distance from transcription start sites on gene expression, as well as the biochemical features of enhancers and silencers learned by Enformer. Moreover, we demonstrate the flexibility of CREME to efficiently uncover a higher-resolution view of functional sequence elements within CREs. This work demonstrates how CREME can be employed to translate the powerful predictions of large-scale DNNs to study open questions in gene regulation.

10.

Evaluating the representational power of pre-trained DNA language models for regulatory genomics.

Tang, Ziqi; Koo, Peter K.

bioRxiv ; 2024 Mar 03.

Artículo en Inglés | MEDLINE | ID: mdl-38464101

RESUMEN

The emergence of genomic language models (gLMs) offers an unsupervised approach to learn a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown pre-trained gLMs can be leveraged to improve prediction performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that current gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major limitation with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.

11.

EvoAug-TF: Extending evolution-inspired data augmentations for genomic deep learning to TensorFlow.

Yu, Yiyang; Muthukumar, Shivani; Koo, Peter K.

bioRxiv ; 2024 Jan 18.

Artículo en Inglés | MEDLINE | ID: mdl-38293144

RESUMEN

Deep neural networks (DNNs) have been widely applied to predict the molecular functions of regulatory regions in the non-coding genome. DNNs are data hungry and thus require many training examples to fit data well. However, functional genomics experiments typically generate limited amounts of data, constrained by the activity levels of the molecular function under study inside the cell. Recently, EvoAug was introduced to train a genomic DNN with evolution-inspired augmentations. EvoAug-trained DNNs have demonstrated improved generalization and interpretability with attribution analysis. However, EvoAug only supports PyTorch-based models, which limits its applications to a broad class of genomic DNNs based in TensorFlow. Here, we extend EvoAug's functionality to TensorFlow in a new package we call EvoAug-TF. Through a systematic benchmark, we find that EvoAug-TF yields comparable performance with the original EvoAug package. Availability: EvoAug-TF is freely available for users and is distributed under an open-source MIT license. Researchers can access the open-source code on GitHub (https://github.com/p-koo/evoaug-tf). The pre-compiled package is provided via PyPI (https://pypi.org/project/evoaug-tf) with in-depth documentation on ReadTheDocs (https://evoaug-tf.readthedocs.io). The scripts for reproducing the results are available at (https://github.com/p-koo/evoaug-tf_analysis).

12.

Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models.

Seitz, Evan E; McCandlish, David M; Kinney, Justin B; Koo, Peter K.

bioRxiv ; 2024 Mar 02.

Artículo en Inglés | MEDLINE | ID: mdl-38013993

RESUMEN

Deep neural networks (DNNs) have greatly advanced the ability to predict genome function from sequence. Interpreting genomic DNNs in terms of biological mechanisms, however, remains difficult. Here we introduce SQUID, a genomic DNN interpretability framework based on surrogate modeling. SQUID approximates genomic DNNs in user-specified regions of sequence space using surrogate models, i.e., simpler models that are mechanistically interpretable. Importantly, SQUID removes the confounding effects that nonlinearities and heteroscedastic noise in functional genomics data can have on model interpretation. Benchmarking analysis on multiple genomic DNNs shows that SQUID, when compared to established interpretability methods, identifies motifs that are more consistent across genomic loci and yields improved single-nucleotide variant-effect predictions. SQUID also supports surrogate models that quantify epistatic interactions within and between cis-regulatory elements. SQUID thus advances the ability to mechanistically interpret genomic DNNs.

13.

Interpretably deep learning amyloid nucleation by massive experimental quantification of random sequences.

Thompson, Mike; Martín, Mariano; Sanmartín Olmo, Trinidad; Rajesh, Chandana; Koo, Peter K; Bolognesi, Benedetta; Lehner, Ben.

bioRxiv ; 2024 Jul 17.

Artículo en Inglés | MEDLINE | ID: mdl-39071305

RESUMEN

Insoluble amyloid aggregates are the hallmarks of more than fifty human diseases, including the most common neurodegenerative disorders. The process by which soluble proteins nucleate to form amyloid fibrils is, however, quite poorly characterized. Relatively few sequences are known that form amyloids with high propensity and this data shortage likely limits our capacity to understand, predict, engineer, and prevent the formation of amyloid fibrils. Here we quantify the nucleation of amyloids at an unprecedented scale and use the data to train a deep learning model of amyloid nucleation. In total, we quantify the nucleation rates of >100,000 20-amino-acid-long peptides. This large and diverse dataset allows us to train CANYA, a convolution-attention hybrid neural network. CANYA is fast and outperforms existing methods with stable performance across diverse prediction tasks. Interpretability analyses reveal CANYA's decision-making process and learned grammar, providing mechanistic insights into amyloid nucleation. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict amyloid nucleation.

14.

Correcting gradient-based interpretations of deep neural networks for genomics.

Majdandzic, Antonio; Rajesh, Chandana; Koo, Peter K.

Genome Biol ; 24(1): 109, 2023 05 09.

Artículo en Inglés | MEDLINE | ID: mdl-37161475

RESUMEN

Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA. We demonstrate this noise is pervasive across various genomic DNNs and introduce a statistical correction that effectively reduces it, leading to more reliable attribution maps. Our approach represents a promising step towards gaining meaningful insights from DNNs in regulatory genomics.

Asunto(s)

Genómica , Aprendizaje , Redes Neurales de la Computación , Nucleótidos

15.

EvoAug: improving generalization and interpretability of genomic deep neural networks with evolution-inspired data augmentations.

Lee, Nicholas Keone; Tang, Ziqi; Toneyan, Shushan; Koo, Peter K.

Genome Biol ; 24(1): 105, 2023 05 05.

Artículo en Inglés | MEDLINE | ID: mdl-37143118

RESUMEN

Deep neural networks (DNNs) hold promise for functional genomics prediction, but their generalization capability may be limited by the amount of available data. To address this, we propose EvoAug, a suite of evolution-inspired augmentations that enhance the training of genomic DNNs by increasing genetic variation. Random transformation of DNA sequences can potentially alter their function in unknown ways, so we employ a fine-tuning procedure using the original non-transformed data to preserve functional integrity. Our results demonstrate that EvoAug substantially improves the generalization and interpretability of established DNNs across prominent regulatory genomics prediction tasks, offering a robust solution for genomic DNNs.

Asunto(s)

Genómica , Redes Neurales de la Computación , Genómica/métodos

16.

Light and temperature regulate m⁶A-RNA modification to regulate growth in plants.

Artz, Oliver; Ackermann, Amanda; Taylor, Laura; Koo, Peter K; Pedmale, Ullas V.

bioRxiv ; 2023 Jan 17.

Artículo en Inglés | MEDLINE | ID: mdl-36711495

RESUMEN

N6-methyladenosine is a highly dynamic, abundant mRNA modification which is an excellent potential mechanism for fine tuning gene expression. Plants adapt to their surrounding light and temperature environment using complex gene regulatory networks. The role of m6A in controlling gene expression in response to variable environmental conditions has so far been unexplored. Here, we map the transcriptome-wide m6A landscape under various light and temperature environments. Identified m6A-modifications show a highly specific spatial distribution along transcripts with enrichment occurring in 5'UTR regions and around transcriptional end sites. We show that the position of m6A modifications on transcripts might influence cellular transcript localization and the presence of m6A-modifications is associated with alternative polyadenylation, a process which results in multiple RNA isoforms with varying 3'UTR lengths. RNA with m6A-modifications exhibit a higher preference for shorter 3'UTRs. These shorter 3'UTR regions might directly influence transcript abundance and localization by including or excluding cis-regulatory elements. We propose that environmental stimuli might change the m6A landscape of plants as one possible way of fine tuning gene regulation through alternative polyadenylation and transcript localization.

17.

ResidualBind: Uncovering Sequence-Structure Preferences of RNA-Binding Proteins with Deep Neural Networks.

Koo, Peter K; Ploenzke, Matt; Anand, Praveen; Paul, Steffan; Majdandzic, Antonio.

Methods Mol Biol ; 2586: 197-215, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-36705906

RESUMEN

Deep neural networks have demonstrated improved performance at predicting sequence specificities of DNA- and RNA-binding proteins. However, it remains unclear why they perform better than previous methods that rely on k-mers and position weight matrices. Here, we highlight a recent deep learning-based software package, called ResidualBind, that analyzes RNA-protein interactions using only RNA sequence as an input feature and performs global importance analysis for model interpretability. We discuss practical considerations for model interpretability to uncover learned sequence motifs and their secondary structure preferences.

Asunto(s)

Redes Neurales de la Computación , ARN , ARN/genética , Proteínas de Unión al ARN/metabolismo , ADN/metabolismo , Posición Específica de Matrices de Puntuación , Unión Proteica

18.

ChampKit: A framework for rapid evaluation of deep neural networks for patch-based histopathology classification.

Kaczmarzyk, Jakub R; Gupta, Rajarsi; Kurc, Tahsin M; Abousamra, Shahira; Saltz, Joel H; Koo, Peter K.

Comput Methods Programs Biomed ; 239: 107631, 2023 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-37271050

RESUMEN

BACKGROUND AND OBJECTIVE: Histopathology is the gold standard for diagnosis of many cancers. Recent advances in computer vision, specifically deep learning, have facilitated the analysis of histopathology images for many tasks, including the detection of immune cells and microsatellite instability. However, it remains difficult to identify optimal models and training configurations for different histopathology classification tasks due to the abundance of available architectures and the lack of systematic evaluations. Our objective in this work is to present a software tool that addresses this need and enables robust, systematic evaluation of neural network models for patch classification in histology in a light-weight, easy-to-use package for both algorithm developers and biomedical researchers. METHODS: Here we present ChampKit (Comprehensive Histopathology Assessment of Model Predictions toolKit): an extensible, fully reproducible evaluation toolkit that is a one-stop-shop to train and evaluate deep neural networks for patch classification. ChampKit curates a broad range of public datasets. It enables training and evaluation of models supported by timm directly from the command line, without the need for users to write any code. External models are enabled through a straightforward API and minimal coding. As a result, Champkit facilitates the evaluation of existing and new models and deep learning architectures on pathology datasets, making it more accessible to the broader scientific community. To demonstrate the utility of ChampKit, we establish baseline performance for a subset of possible models that could be employed with ChampKit, focusing on several popular deep learning models, namely ResNet18, ResNet50, and R26-ViT, a hybrid vision transformer. In addition, we compare each model trained either from random weight initialization or with transfer learning from ImageNet pretrained models. For ResNet18, we also consider transfer learning from a self-supervised pretrained model. RESULTS: The main result of this paper is the ChampKit software. Using ChampKit, we were able to systemically evaluate multiple neural networks across six datasets. We observed mixed results when evaluating the benefits of pretraining versus random intialization, with no clear benefit except in the low data regime, where transfer learning was found to be beneficial. Surprisingly, we found that transfer learning from self-supervised weights rarely improved performance, which is counter to other areas of computer vision. CONCLUSIONS: Choosing the right model for a given digital pathology dataset is nontrivial. ChampKit provides a valuable tool to fill this gap by enabling the evaluation of hundreds of existing (or user-defined) deep learning models across a variety of pathology tasks. Source code and data for the tool are freely accessible at https://github.com/SBU-BMI/champkit.

Asunto(s)

Neoplasias , Redes Neurales de la Computación , Humanos , Algoritmos , Programas Informáticos , Técnicas Histológicas

19.

ETV6 dependency in Ewing sarcoma by antagonism of EWS-FLI1-mediated enhancer activation.

Gao, Yuan; He, Xue-Yan; Wu, Xiaoli S; Huang, Yu-Han; Toneyan, Shushan; Ha, Taehoon; Ipsaro, Jonathan J; Koo, Peter K; Joshua-Tor, Leemor; Bailey, Kelly M; Egeblad, Mikala; Vakoc, Christopher R.

Nat Cell Biol ; 25(2): 298-308, 2023 02.

Artículo en Inglés | MEDLINE | ID: mdl-36658219

RESUMEN

The EWS-FLI1 fusion oncoprotein deregulates transcription to initiate the paediatric cancer Ewing sarcoma. Here we used a domain-focused CRISPR screen to implicate the transcriptional repressor ETV6 as a unique dependency in this tumour. Using biochemical assays and epigenomics, we show that ETV6 competes with EWS-FLI1 for binding to select DNA elements enriched for short GGAA repeat sequences. Upon inactivating ETV6, EWS-FLI1 overtakes and hyper-activates these cis-elements to promote mesenchymal differentiation, with SOX11 being a key downstream target. We show that squelching of ETV6 with a dominant-interfering peptide phenocopies these effects and suppresses Ewing sarcoma growth in vivo. These findings reveal targeting of ETV6 as a strategy for neutralizing the EWS-FLI1 oncoprotein by reprogramming of genomic occupancy.

Asunto(s)

Sarcoma de Ewing , Niño , Humanos , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patología , Línea Celular Tumoral , Regulación Neoplásica de la Expresión Génica , Proteína EWS de Unión a ARN/genética , Proteína EWS de Unión a ARN/metabolismo , Proteína Proto-Oncogénica c-fli-1/genética , Proteína Proto-Oncogénica c-fli-1/metabolismo , Proteínas de Fusión Oncogénica/genética , Proteínas de Fusión Oncogénica/metabolismo

20.

Evaluating deep learning for predicting epigenomic profiles.

Toneyan, Shushan; Tang, Ziqi; Koo, Peter K.

Nat Mach Intell ; 4(12): 1088-1100, 2022 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37324054

RESUMEN

Deep learning has been successful at predicting epigenomic profiles from DNA sequences. Most approaches frame this task as a binary classification relying on peak callers to define functional activity. Recently, quantitative models have emerged to directly predict the experimental coverage values as a regression. As new models continue to emerge with different architectures and training configurations, a major bottleneck is forming due to the lack of ability to fairly assess the novelty of proposed models and their utility for downstream biological discovery. Here we introduce a unified evaluation framework and use it to compare various binary and quantitative models trained to predict chromatin accessibility data. We highlight various modeling choices that affect generalization performance, including a downstream application of predicting variant effects. In addition, we introduce a robustness metric that can be used to enhance model selection and improve variant effect predictions. Our empirical study largely supports that quantitative modeling of epigenomic profiles leads to better generalizability and interpretability.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA