Búsqueda | Portal de Búsqueda de la BVS Enfermería

Learning single-cell chromatin accessibility profiles using meta-analytic marker genes.

Kawaguchi, Risa Karakida; Tang, Ziqi; Fischer, Stephan; Rajesh, Chandana; Tripathy, Rohit; Koo, Peter K; Gillis, Jesse.

Brief Bioinform ; 24(1)2023 01 19.

Artículo en Inglés | MEDLINE | ID: mdl-36549922

RESUMEN

MOTIVATION: Single-cell assay for transposase accessible chromatin using sequencing (scATAC-seq) is a valuable resource to learn cis-regulatory elements such as cell-type specific enhancers and transcription factor binding sites. However, cell-type identification of scATAC-seq data is known to be challenging due to the heterogeneity derived from different protocols and the high dropout rate. RESULTS: In this study, we perform a systematic comparison of seven scATAC-seq datasets of mouse brain to benchmark the efficacy of neuronal cell-type annotation from gene sets. We find that redundant marker genes give a dramatic improvement for a sparse scATAC-seq annotation across the data collected from different studies. Interestingly, simple aggregation of such marker genes achieves performance comparable or higher than that of machine-learning classifiers, suggesting its potential for downstream applications. Based on our results, we reannotated all scATAC-seq data for detailed cell types using robust marker genes. Their meta scATAC-seq profiles are publicly available at https://gillisweb.cshl.edu/Meta_scATAC. Furthermore, we trained a deep neural network to predict chromatin accessibility from only DNA sequence and identified key motifs enriched for each neuronal subtype. Those predicted profiles are visualized together in our database as a valuable resource to explore cell-type specific epigenetic regulation in a sequence-dependent and -independent manner.

Asunto(s)

Cromatina , Epigénesis Genética , Animales , Ratones , Cromatina/genética , Secuencias Reguladoras de Ácidos Nucleicos , Redes Neurales de la Computación

An explainable graph neural network approach for integrating multi-omics data with prior knowledge to identify biomarkers from interacting biological domains.

Tripathy, Rohit K; Frohock, Zachary; Wang, Hong; Cary, Gregory A; Keegan, Stephen; Carter, Gregory W; Li, Yi.

bioRxiv ; 2024 Aug 26.

Artículo en Inglés | MEDLINE | ID: mdl-39253523

RESUMEN

The rapid growth of multi-omics datasets, in addition to the wealth of existing biological prior knowledge, necessitates the development of effective methods for their integration. Such methods are essential for building predictive models and identifying disease-related molecular markers. We propose a framework for supervised integration of multi-omics data with biological priors represented as knowledge graphs. Our framework is based on the use of graph neural networks (GNNs) to model the relationships among features from high-dimensional 'omics data and set transformers to integrate low dimensional representations of 'omics features. Furthermore, our framework incorporates explainability methods to elucidate important biomarkers and extract interaction relationships between biological quantities of interest. We demonstrate the effectiveness of our approach by applying it to Alzheimer's disease (AD) multi-omics data from the ROSMAP cohort, showing that the integration of transcriptomics and proteomics data with AD biological domain network priors improves the prediction accuracy of AD status and highlights robust AD biomarkers.

Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.

Majdandzic, Antonio; Rajesh, Chandana; Tang, Amber; Toneyan, Shushan; Labelson, Ethan; Tripathy, Rohit; Koo, Peter K.

Proc Mach Learn Res ; 200: 131-149, 2022 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37205975

RESUMEN

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA