Pesquisa | Portal de Pesquisa da BVS Enfermagem

Learning single-cell chromatin accessibility profiles using meta-analytic marker genes.

Kawaguchi, Risa Karakida; Tang, Ziqi; Fischer, Stephan; Rajesh, Chandana; Tripathy, Rohit; Koo, Peter K; Gillis, Jesse.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36549922

RESUMO

MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.

Assuntos

Cromatina , Epigênese Genética , Animais , Camundongos , Cromatina/genética , Sequências Reguladoras de Ácido Nucleico , Redes Neurais de Computação

Interpretably deep learning amyloid nucleation by massive experimental quantification of random sequences.

Thompson, Mike; Martín, Mariano; Sanmartín Olmo, Trinidad; Rajesh, Chandana; Koo, Peter K; Bolognesi, Benedetta; Lehner, Ben.

bioRxiv ; 2024 Jul 17.

Artigo em Inglês | MEDLINE | ID: mdl-39071305

RESUMO

Insoluble amyloid aggregates are the hallmarks of more than fifty human diseases, including the most common neurodegenerative disorders. The process by which soluble proteins nucleate to form amyloid fibrils is, however, quite poorly characterized. Relatively few sequences are known that form amyloids with high propensity and this data shortage likely limits our capacity to understand, predict, engineer, and prevent the formation of amyloid fibrils. Here we quantify the nucleation of amyloids at an unprecedented scale and use the data to train a deep learning model of amyloid nucleation. In total, we quantify the nucleation rates of >100,000 20-amino-acid-long peptides. This large and diverse dataset allows us to train CANYA, a convolution-attention hybrid neural network. CANYA is fast and outperforms existing methods with stable performance across diverse prediction tasks. Interpretability analyses reveal CANYA's decision-making process and learned grammar, providing mechanistic insights into amyloid nucleation. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict amyloid nucleation.

Correcting gradient-based interpretations of deep neural networks for genomics.

Majdandzic, Antonio; Rajesh, Chandana; Koo, Peter K.

Genome Biol ; 24(1): 109, 2023 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-37161475

RESUMO

Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA. We demonstrate this noise is pervasive across various genomic DNNs and introduce a statistical correction that effectively reduces it, leading to more reliable attribution maps. Our approach represents a promising step towards gaining meaningful insights from DNNs in regulatory genomics.

Assuntos

Genômica , Aprendizagem , Redes Neurais de Computação , Nucleotídeos

Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.

Majdandzic, Antonio; Rajesh, Chandana; Tang, Amber; Toneyan, Shushan; Labelson, Ethan; Tripathy, Rohit; Koo, Peter K.

Proc Mach Learn Res ; 200: 131-149, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37205975

RESUMO

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA