Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36549922

RESUMO

MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.


Assuntos
Cromatina , Epigênese Genética , Animais , Camundongos , Cromatina/genética , Sequências Reguladoras de Ácido Nucleico , Redes Neurais de Computação
2.
bioRxiv ; 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39071305

RESUMO

Protein aggregation is a pathological hallmark of more than fifty human diseases and a major problem for biotechnology. Methods have been proposed to predict aggregation from sequence, but these have been trained and evaluated on small and biased experimental datasets. Here we directly address this data shortage by experimentally quantifying the amyloid nucleation of >100,000 protein sequences. This unprecedented dataset reveals the limited performance of existing computational methods and allows us to train CANYA, a convolution-attention hybrid neural network that accurately predicts amyloid nucleation from sequence. We adapt genomic neural network interpretability analyses to reveal CANYA's decision-making process and learned grammar. Our results illustrate the power of massive experimental analysis of random sequence-spaces and provide an interpretable and robust neural network model to predict amyloid nucleation.

3.
Genome Biol ; 24(1): 109, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-37161475

RESUMO

Post hoc attribution methods can provide insights into the learned patterns from deep neural networks (DNNs) trained on high-throughput functional genomics data. However, in practice, their resultant attribution maps can be challenging to interpret due to spurious importance scores for seemingly arbitrary nucleotides. Here, we identify a previously overlooked attribution noise source that arises from how DNNs handle one-hot encoded DNA. We demonstrate this noise is pervasive across various genomic DNNs and introduce a statistical correction that effectively reduces it, leading to more reliable attribution maps. Our approach represents a promising step towards gaining meaningful insights from DNNs in regulatory genomics.


Assuntos
Genômica , Aprendizagem , Redes Neurais de Computação , Nucleotídeos
4.
Proc Mach Learn Res ; 200: 131-149, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37205975

RESUMO

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA