Pesquisa | BVS Violência e Saúde

1.

Automatic cell-type harmonization and integration across Human Cell Atlas datasets.

Xu, Chuan; Prete, Martin; Webb, Simone; Jardine, Laura; Stewart, Benjamin J; Hoo, Regina; He, Peng; Meyer, Kerstin B; Teichmann, Sarah A.

Cell ; 186(26): 5876-5891.e20, 2023 12 21.

Artigo em Inglês | MEDLINE | ID: mdl-38134877

RESUMO

Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with â¼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Humanos , Bases de Dados Factuais , Análise de Célula Única

2.

Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations.

Qin, Peng; Lu, Hongwei; Du, Huilong; Wang, Hao; Chen, Weilan; Chen, Zhuo; He, Qiang; Ou, Shujun; Zhang, Hongyu; Li, Xuanzhao; Li, Xiuxiu; Li, Yan; Liao, Yi; Gao, Qiang; Tu, Bin; Yuan, Hua; Ma, Bingtian; Wang, Yuping; Qian, Yangwen; Fan, Shijun; Li, Weitao; Wang, Jing; He, Min; Yin, Junjie; Li, Ting; Jiang, Ning; Chen, Xuewei; Liang, Chengzhi; Li, Shigui.

Cell ; 184(13): 3542-3558.e16, 2021 06 24.

Artigo em Inglês | MEDLINE | ID: mdl-34051138

RESUMO

Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.

Assuntos

Ecótipo , Variação Genética , Genoma de Planta , Oryza/genética , Adaptação Fisiológica/genética , Agricultura , Domesticação , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Variação Estrutural do Genoma , Anotação de Sequência Molecular , Fenótipo

3.

Pan-Genome of Wild and Cultivated Soybeans.

Liu, Yucheng; Du, Huilong; Li, Pengcheng; Shen, Yanting; Peng, Hua; Liu, Shulin; Zhou, Guo-An; Zhang, Haikuan; Liu, Zhi; Shi, Miao; Huang, Xuehui; Li, Yan; Zhang, Min; Wang, Zheng; Zhu, Baoge; Han, Bin; Liang, Chengzhi; Tian, Zhixi.

Cell ; 182(1): 162-176.e13, 2020 07 09.

Artigo em Inglês | MEDLINE | ID: mdl-32553274

RESUMO

Soybean is one of the most important vegetable oil and protein feed crops. To capture the entire genomic diversity, it is needed to construct a complete high-quality pan-genome from diverse soybean accessions. In this study, we performed individual de novo genome assemblies for 26 representative soybeans that were selected from 2,898 deeply sequenced accessions. Using these assembled genomes together with three previously reported genomes, we constructed a graph-based genome and performed pan-genome analysis, which identified numerous genetic variations that cannot be detected by direct mapping of short sequence reads onto a single reference genome. The structural variations from the 2,898 accessions that were genotyped based on the graph-based genome and the RNA sequencing (RNA-seq) data from the representative 26 accessions helped to link genetic variations to candidate genes that are responsible for important traits. This pan-genome resource will promote evolutionary and functional genomics studies in soybean.

Assuntos

Genoma de Planta , Glycine max/crescimento & desenvolvimento , Glycine max/genética , Sequência de Bases , Cromossomos de Plantas/genética , Domesticação , Ecótipo , Duplicação Gênica , Regulação da Expressão Gênica de Plantas , Fusão Gênica , Geografia , Anotação de Sequência Molecular , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Poliploidia

4.

Ornaments for efficient allele-specific expression estimation with bias correction.

Adduri, Abhinav; Kim, Seyoung.

Am J Hum Genet ; 2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39047729

RESUMO

Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.

5.

Link prediction using low-dimensional node embeddings: The measurement problem.

Menand, Nicolas; Seshadhri, C.

Proc Natl Acad Sci U S A ; 121(8): e2312527121, 2024 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-38363864

RESUMO

Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.

6.

Renormalization group analysis of the Anderson model on random regular graphs.

Vanoni, Carlo; Altshuler, Boris L; Kravtsov, Vladimir E; Scardicchio, Antonello.

Proc Natl Acad Sci U S A ; 121(29): e2401955121, 2024 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-38990943

RESUMO

We present a renormalization group (RG) analysis of the problem of Anderson localization on a random regular graph (RRG) which generalizes the RG of Abrahams, Anderson, Licciardello, and Ramakrishnan to infinite-dimensional graphs. The RG equations necessarily involve two parameters (one being the changing connectivity of subtrees), but we show that the one-parameter scaling hypothesis is recovered for sufficiently large system sizes for both eigenstates and spectrum observables. We also explain the nonmonotonic behavior of dynamical and spectral quantities as a function of the system size for values of disorder close to the transition, by identifying two terms in the beta function of the running fractal dimension of different signs and functional dependence. Our theory provides a simple and coherent explanation for the unusual scaling behavior observed in numerical data of the Anderson model on RRG and of many-body localization.

7.

Homophily modulates double descent generalization in graph convolution networks.

Shi, Cheng; Pan, Liming; Hu, Hong; Dokmanic, Ivan.

Proc Natl Acad Sci U S A ; 121(8): e2309504121, 2024 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-38346190

RESUMO

Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of "transductive" double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets.

8.

Tree-based QTL mapping with expected local genetic relatedness matrices.

Link, Vivian; Schraiber, Joshua G; Fan, Caoqi; Dinh, Bryan; Mancuso, Nicholas; Chiang, Charleston W K; Edge, Michael D.

Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.

Artigo em Inglês | MEDLINE | ID: mdl-38065072

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

Assuntos

Genética Populacional , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética

9.

ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler.

Zhang, Yufang; Chu, Yanyi; Lin, Shenggeng; Xiong, Yi; Wei, Dong-Qing.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38517693

RESUMO

Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.

Assuntos

MicroRNAs , Neoplasias da Próstata , Humanos , Masculino , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , MicroRNAs/genética , Estudos Prospectivos , Neoplasias da Próstata/genética , Feminino

10.

A multi-view graph contrastive learning framework for deciphering spatially resolved transcriptomics data.

Zhang, Lei; Liang, Shu; Wan, Lin.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38801701

RESUMO

Spatially resolved transcriptomics data are being used in a revolutionary way to decipher the spatial pattern of gene expression and the spatial architecture of cell types. Much work has been done to exploit the genomic spatial architectures of cells. Such work is based on the common assumption that gene expression profiles of spatially adjacent spots are more similar than those of more distant spots. However, related work might not consider the nonlocal spatial co-expression dependency, which can better characterize the tissue architectures. Therefore, we propose MuCoST, a Multi-view graph Contrastive learning framework for deciphering complex Spatially resolved Transcriptomic architectures with dual scale structural dependency. To achieve this, we employ spot dependency augmentation by fusing gene expression correlation and spatial location proximity, thereby enabling MuCoST to model both nonlocal spatial co-expression dependency and spatially adjacent dependency. We benchmark MuCoST on four datasets, and we compare it with other state-of-the-art spatial domain identification methods. We demonstrate that MuCoST achieves the highest accuracy on spatial domain identification from various datasets. In particular, MuCoST accurately deciphers subtle biological textures and elaborates the variation of spatially functional patterns.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Humanos , Algoritmos , Aprendizado de Máquina , Biologia Computacional/métodos

11.

Accurately deciphering spatial domains for spatially resolved transcriptomics with stCluster.

Wang, Tao; Shu, Han; Hu, Jialu; Wang, Yongtian; Chen, Jing; Peng, Jiajie; Shang, Xuequn.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38975895

RESUMO

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Biologia Computacional/métodos , Algoritmos , Humanos , Animais , Software , Aprendizado de Máquina

12.

SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication.

Vasighizaker, Akram; Hora, Sheena; Zeng, Raymond; Rueda, Luis.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38605638

RESUMO

Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.

Assuntos

Comunicação Celular , Transdução de Sinais , Humanos , Animais , Camundongos , Comunicação Celular/genética , Aprendizagem , Redes Neurais de Computação , Expressão Gênica

13.

MS-BACL: enhancing metabolic stability prediction through bond graph augmentation and contrastive learning.

Wang, Tao; Li, Zhen; Zhuo, Linlin; Chen, Yifan; Fu, Xiangzheng; Zou, Quan.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38555479

RESUMO

MOTIVATION: Accurately predicting molecular metabolic stability is of great significance to drug research and development, ensuring drug safety and effectiveness. Existing deep learning methods, especially graph neural networks, can reveal the molecular structure of drugs and thus efficiently predict the metabolic stability of molecules. However, most of these methods focus on the message passing between adjacent atoms in the molecular graph, ignoring the relationship between bonds. This makes it difficult for these methods to estimate accurate molecular representations, thereby being limited in molecular metabolic stability prediction tasks. RESULTS: We propose the MS-BACL model based on bond graph augmentation technology and contrastive learning strategy, which can efficiently and reliably predict the metabolic stability of molecules. To our knowledge, this is the first time that bond-to-bond relationships in molecular graph structures have been considered in the task of metabolic stability prediction. We build a bond graph based on 'atom-bond-atom', and the model can simultaneously capture the information of atoms and bonds during the message propagation process. This enhances the model's ability to reveal the internal structure of the molecule, thereby improving the structural representation of the molecule. Furthermore, we perform contrastive learning training based on the molecular graph and its bond graph to learn the final molecular representation. Multiple sets of experimental results on public datasets show that the proposed MS-BACL model outperforms the state-of-the-art model. AVAILABILITY AND IMPLEMENTATION: The code and data are publicly available at https://github.com/taowang11/MS.

Assuntos

Redes Neurais de Computação

14.

Attention-guided variational graph autoencoders reveal heterogeneity in spatial transcriptomics.

Lei, Lixin; Han, Kaitai; Wang, Zijun; Shi, Chaojing; Wang, Zhenghui; Dai, Ruoyan; Zhang, Zhiwei; Wang, Mengqiu; Guo, Qianjin.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38627939

RESUMO

The latest breakthroughs in spatially resolved transcriptomics technology offer comprehensive opportunities to delve into gene expression patterns within the tissue microenvironment. However, the precise identification of spatial domains within tissues remains challenging. In this study, we introduce AttentionVGAE (AVGN), which integrates slice images, spatial information and raw gene expression while calibrating low-quality gene expression. By combining the variational graph autoencoder with multi-head attention blocks (MHA blocks), AVGN captures spatial relationships in tissue gene expression, adaptively focusing on key features and alleviating the need for prior knowledge of cluster numbers, thereby achieving superior clustering performance. Particularly, AVGN attempts to balance the model's attention focus on local and global structures by utilizing MHA blocks, an aspect that current graph neural networks have not extensively addressed. Benchmark testing demonstrates its significant efficacy in elucidating tissue anatomy and interpreting tumor heterogeneity, indicating its potential in advancing spatial transcriptomics research and understanding complex biological phenomena.

Assuntos

Benchmarking , Perfilação da Expressão Gênica , Análise por Conglomerados , Redes Neurais de Computação

15.

Mix-Key: graph mixup with key structures for molecular property prediction.

Jiang, Tianyi; Wang, Zeyu; Yu, Wenchao; Wang, Jinhuan; Yu, Shanqing; Bao, Xiaoze; Wei, Bin; Xuan, Qi.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38706318

RESUMO

Molecular property prediction faces the challenge of limited labeled data as it necessitates a series of specialized experiments to annotate target molecules. Data augmentation techniques can effectively address the issue of data scarcity. In recent years, Mixup has achieved significant success in traditional domains such as image processing. However, its application in molecular property prediction is relatively limited due to the irregular, non-Euclidean nature of graphs and the fact that minor variations in molecular structures can lead to alterations in their properties. To address these challenges, we propose a novel data augmentation method called Mix-Key tailored for molecular property prediction. Mix-Key aims to capture crucial features of molecular graphs, focusing separately on the molecular scaffolds and functional groups. By generating isomers that are relatively invariant to the scaffolds or functional groups, we effectively preserve the core information of molecules. Additionally, to capture interactive information between the scaffolds and functional groups while ensuring correlation between the original and augmented graphs, we introduce molecular fingerprint similarity and node similarity. Through these steps, Mix-Key determines the mixup ratio between the original graph and two isomers, thus generating more informative augmented molecular graphs. We extensively validate our approach on molecular datasets of different scales with several Graph Neural Network architectures. The results demonstrate that Mix-Key consistently outperforms other data augmentation methods in enhancing molecular property prediction on several datasets.

Assuntos

Algoritmos , Estrutura Molecular , Biologia Computacional/métodos , Software

16.

NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach.

Xue, Zhengfa; Zhou, Aifen; Zhu, Xiaoyan; Li, Linxuan; Zhu, Huanhuan; Jin, Xin; Wang, Jiayin.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-38836702

RESUMO

Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.

Assuntos

Aneuploidia , Teste Pré-Natal não Invasivo , Humanos , Feminino , Gravidez , Teste Pré-Natal não Invasivo/métodos , Algoritmos , Genômica/métodos , Diagnóstico Pré-Natal/métodos , Análise de Sequência de DNA/métodos

17.

HyGAnno: hybrid graph neural network-based cell type annotation for single-cell ATAC sequencing data.

Zhang, Weihang; Cui, Yang; Liu, Bowen; Loza, Martin; Park, Sung-Joon; Nakai, Kenta.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38581422

RESUMO

Reliable cell type annotations are crucial for investigating cellular heterogeneity in single-cell omics data. Although various computational approaches have been proposed for single-cell RNA sequencing (scRNA-seq) annotation, high-quality cell labels are still lacking in single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, because of extreme sparsity and inconsistent chromatin accessibility between datasets. Here, we present a novel automated cell annotation method that transfers cell type information from a well-labeled scRNA-seq reference to an unlabeled scATAC-seq target, via a parallel graph neural network, in a semi-supervised manner. Unlike existing methods that utilize only gene expression or gene activity features, HyGAnno leverages genome-wide accessibility peak features to facilitate the training process. In addition, HyGAnno reconstructs a reference-target cell graph to detect cells with low prediction reliability, according to their specific graph connectivity patterns. HyGAnno was assessed across various datasets, showcasing its strengths in precise cell annotation, generating interpretable cell embeddings, robustness to noisy reference data and adaptability to tumor tissues.

Assuntos

Cromatina , Redes Neurais de Computação , Reprodutibilidade dos Testes

18.

Prior knowledge-guided multilevel graph neural network for tumor risk prediction and interpretation via multi-omics data integration.

Yan, Hongxi; Weng, Dawei; Li, Dongguo; Gu, Yu; Ma, Wenji; Liu, Qingjie.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38670157

RESUMO

The interrelation and complementary nature of multi-omics data can provide valuable insights into the intricate molecular mechanisms underlying diseases. However, challenges such as limited sample size, high data dimensionality and differences in omics modalities pose significant obstacles to fully harnessing the potential of these data. The prior knowledge such as gene regulatory network and pathway information harbors useful gene-gene interaction and gene functional module information. To effectively integrate multi-omics data and make full use of the prior knowledge, here, we propose a Multilevel-graph neural network (GNN): a hierarchically designed deep learning algorithm that sequentially leverages multi-omics data, gene regulatory networks and pathway information to extract features and enhance accuracy in predicting survival risk. Our method achieved better accuracy compared with existing methods. Furthermore, key factors nonlinearly associated with the tumor pathogenesis are prioritized by employing two interpretation algorithms (i.e. GNN-Explainer and IGscore) for neural networks, at gene and pathway level, respectively. The top genes and pathways exhibit strong associations with disease in survival analyses, many of which such as SEC61G and CYP27B1 are previously reported in the literature.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Neoplasias , Redes Neurais de Computação , Humanos , Neoplasias/genética , Biologia Computacional/métodos , Aprendizado Profundo , Genômica/métodos , Multiômica

19.

scENCORE: leveraging single-cell epigenetic data to predict chromatin conformation using graph embedding.

Duan, Ziheng; Xu, Siwei; Sai Srinivasan, Shushrruth; Hwang, Ahyeon; Lee, Che Yu; Yue, Feng; Gerstein, Mark; Luan, Yu; Girgenti, Matthew; Zhang, Jing.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38493342

RESUMO

Dynamic compartmentalization of eukaryotic DNA into active and repressed states enables diverse transcriptional programs to arise from a single genetic blueprint, whereas its dysregulation can be strongly linked to a broad spectrum of diseases. While single-cell Hi-C experiments allow for chromosome conformation profiling across many cells, they are still expensive and not widely available for most labs. Here, we propose an alternate approach, scENCORE, to computationally reconstruct chromatin compartments from the more affordable and widely accessible single-cell epigenetic data. First, scENCORE constructs a long-range epigenetic correlation graph to mimic chromatin interaction frequencies, where nodes and edges represent genome bins and their correlations. Then, it learns the node embeddings to cluster genome regions into A/B compartments and aligns different graphs to quantify chromatin conformation changes across conditions. Benchmarking using cell-type-matched Hi-C experiments demonstrates that scENCORE can robustly reconstruct A/B compartments in a cell-type-specific manner. Furthermore, our chromatin confirmation switching studies highlight substantial compartment-switching events that may introduce substantial regulatory and transcriptional changes in psychiatric disease. In summary, scENCORE allows accurate and cost-effective A/B compartment reconstruction to delineate higher-order chromatin structure heterogeneity in complex tissues.

Assuntos

Cromatina , Cromossomos , Cromatina/genética , DNA , Conformação Molecular , Epigênese Genética

20.

KDGene: knowledge graph completion for disease gene prediction using interactional tensor decomposition.

Wang, Xinyan; Yang, Kuo; Jia, Ting; Gu, Fanghui; Wang, Chongyu; Xu, Kuan; Shu, Zixin; Xia, Jianan; Zhu, Qiang; Zhou, Xuezhong.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38605639

RESUMO

The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.

Assuntos

Disciplinas das Ciências Biológicas , Reconhecimento Automatizado de Padrão , Algoritmos , Aprendizado de Máquina , Semântica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA