Búsqueda | Portal de Búsqueda de la BVS

1.

Automatic cell-type harmonization and integration across Human Cell Atlas datasets.

Xu, Chuan; Prete, Martin; Webb, Simone; Jardine, Laura; Stewart, Benjamin J; Hoo, Regina; He, Peng; Meyer, Kerstin B; Teichmann, Sarah A.

Cell ; 186(26): 5876-5891.e20, 2023 12 21.

Artículo en Inglés | MEDLINE | ID: mdl-38134877

RESUMEN

Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with â¼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.

Asunto(s)

Perfilación de la Expresión Génica , Transcriptoma , Humanos , Bases de Datos Factuales , Análisis de la Célula Individual

2.

Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations.

Qin, Peng; Lu, Hongwei; Du, Huilong; Wang, Hao; Chen, Weilan; Chen, Zhuo; He, Qiang; Ou, Shujun; Zhang, Hongyu; Li, Xuanzhao; Li, Xiuxiu; Li, Yan; Liao, Yi; Gao, Qiang; Tu, Bin; Yuan, Hua; Ma, Bingtian; Wang, Yuping; Qian, Yangwen; Fan, Shijun; Li, Weitao; Wang, Jing; He, Min; Yin, Junjie; Li, Ting; Jiang, Ning; Chen, Xuewei; Liang, Chengzhi; Li, Shigui.

Cell ; 184(13): 3542-3558.e16, 2021 06 24.

Artículo en Inglés | MEDLINE | ID: mdl-34051138

RESUMEN

Structural variations (SVs) and gene copy number variations (gCNVs) have contributed to crop evolution, domestication, and improvement. Here, we assembled 31 high-quality genomes of genetically diverse rice accessions. Coupling with two existing assemblies, we developed pan-genome-scale genomic resources including a graph-based genome, providing access to rice genomic variations. Specifically, we discovered 171,072 SVs and 25,549 gCNVs and used an Oryza glaberrima assembly to infer the derived states of SVs in the Oryza sativa population. Our analyses of SV formation mechanisms, impacts on gene expression, and distributions among subpopulations illustrate the utility of these resources for understanding how SVs and gCNVs shaped rice environmental adaptation and domestication. Our graph-based genome enabled genome-wide association study (GWAS)-based identification of phenotype-associated genetic variations undetectable when using only SNPs and a single reference assembly. Our work provides rich population-scale resources paired with easy-to-access tools to facilitate rice breeding as well as plant functional genomics and evolutionary biology research.

Asunto(s)

Ecotipo , Variación Genética , Genoma de Planta , Oryza/genética , Adaptación Fisiológica/genética , Agricultura , Domesticación , Perfilación de la Expresión Génica , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Variación Estructural del Genoma , Anotación de Secuencia Molecular , Fenotipo

3.

Pan-Genome of Wild and Cultivated Soybeans.

Liu, Yucheng; Du, Huilong; Li, Pengcheng; Shen, Yanting; Peng, Hua; Liu, Shulin; Zhou, Guo-An; Zhang, Haikuan; Liu, Zhi; Shi, Miao; Huang, Xuehui; Li, Yan; Zhang, Min; Wang, Zheng; Zhu, Baoge; Han, Bin; Liang, Chengzhi; Tian, Zhixi.

Cell ; 182(1): 162-176.e13, 2020 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-32553274

RESUMEN

Soybean is one of the most important vegetable oil and protein feed crops. To capture the entire genomic diversity, it is needed to construct a complete high-quality pan-genome from diverse soybean accessions. In this study, we performed individual de novo genome assemblies for 26 representative soybeans that were selected from 2,898 deeply sequenced accessions. Using these assembled genomes together with three previously reported genomes, we constructed a graph-based genome and performed pan-genome analysis, which identified numerous genetic variations that cannot be detected by direct mapping of short sequence reads onto a single reference genome. The structural variations from the 2,898 accessions that were genotyped based on the graph-based genome and the RNA sequencing (RNA-seq) data from the representative 26 accessions helped to link genetic variations to candidate genes that are responsible for important traits. This pan-genome resource will promote evolutionary and functional genomics studies in soybean.

Asunto(s)

Genoma de Planta , Glycine max/crecimiento & desarrollo , Glycine max/genética , Secuencia de Bases , Cromosomas de las Plantas/genética , Domesticación , Ecotipo , Duplicación de Gen , Regulación de la Expresión Génica de las Plantas , Fusión Génica , Geografía , Anotación de Secuencia Molecular , Filogenia , Polimorfismo de Nucleótido Simple/genética , Poliploidía

4.

Ornaments for efficient allele-specific expression estimation with bias correction.

Adduri, Abhinav; Kim, Seyoung.

Am J Hum Genet ; 111(8): 1770-1781, 2024 Aug 08.

Artículo en Inglés | MEDLINE | ID: mdl-39047729

RESUMEN

Allele-specific expression plays a crucial role in unraveling various biological mechanisms, including genomic imprinting and gene expression controlled by cis-regulatory variants. However, existing methods for quantification from RNA-sequencing (RNA-seq) reads do not adequately and efficiently remove various allele-specific read mapping biases, such as reference bias arising from reads containing the alternative allele that do not map to the reference transcriptome or ambiguous mapping bias caused by reads containing the reference allele that map differently from reads containing the alternative allele. We present Ornaments, a computational tool for rapid and accurate estimation of allele-specific transcript expression at unphased heterozygous loci from RNA-seq reads while correcting for allele-specific read mapping biases. Ornaments removes reference bias by mapping reads to a personalized transcriptome and ambiguous mapping bias by probabilistically assigning reads to multiple transcripts and variant loci they map to. Ornaments is a lightweight extension of kallisto, a popular tool for fast RNA-seq quantification, that improves the efficiency and accuracy of WASP, a popular tool for bias correction in allele-specific read mapping. In experiments with simulated and human lymphoblastoid cell-line RNA-seq reads with the genomes of the 1000 Genomes Project, we demonstrate that Ornaments improves the accuracy of WASP and kallisto, is nearly as efficient as kallisto, and is an order of magnitude faster than WASP per sample, with the additional cost of constructing a personalized index for multiple samples. Additionally, we show that Ornaments finds imprinted transcripts with higher sensitivity than WASP, which detects imprinted signals only at gene level.

Asunto(s)

Alelos , Humanos , Transcriptoma/genética , Impresión Genómica , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Perfilación de la Expresión Génica/métodos

5.

Human adolescent brain similarity development is different for paralimbic versus neocortical zones.

Dorfschmidt, Lena; Vása, Frantisek; White, Simon R; Romero-García, Rafael; Kitzbichler, Manfred G; Alexander-Bloch, Aaron; Cieslak, Matthew; Mehta, Kahini; Satterthwaite, Theodore D; Bethlehem, Richard A I; Seidlitz, Jakob; Vértes, Petra E; Bullmore, Edward T.

Proc Natl Acad Sci U S A ; 121(33): e2314074121, 2024 Aug 13.

Artículo en Inglés | MEDLINE | ID: mdl-39121162

RESUMEN

Adolescent development of human brain structural and functional networks is increasingly recognized as fundamental to emergence of typical and atypical adult cognitive and emotional proodal magnetic resonance imaging (MRI) data collected from N [Formula: see text] 300 healthy adolescents (51%; female; 14 to 26 y) each scanned repeatedly in an accelerated longitudinal design, to provide an analyzable dataset of 469 structural scans and 448 functional MRI scans. We estimated the morphometric similarity between each possible pair of 358 cortical areas on a feature vector comprising six macro- and microstructural MRI metrics, resulting in a morphometric similarity network (MSN) for each scan. Over the course of adolescence, we found that morphometric similarity increased in paralimbic cortical areas, e.g., insula and cingulate cortex, but generally decreased in neocortical areas, and these results were replicated in an independent developmental MRI cohort (N [Formula: see text] 304). Increasing hubness of paralimbic nodes in MSNs was associated with increased strength of coupling between their morphometric similarity and functional connectivity. Decreasing hubness of neocortical nodes in MSNs was associated with reduced strength of structure-function coupling and increasingly diverse functional connections in the corresponding fMRI networks. Neocortical areas became more structurally differentiated and more functionally integrative in a metabolically expensive process linked to cortical thinning and myelination, whereas paralimbic areas specialized for affective and interoceptive functions became less differentiated, as hypothetically predicted by a developmental transition from periallocortical to proisocortical organization of the cortex. Cytoarchitectonically distinct zones of the human cortex undergo distinct neurodevelopmental programs during typical adolescence.

Asunto(s)

Imagen por Resonancia Magnética , Neocórtex , Humanos , Adolescente , Femenino , Masculino , Neocórtex/diagnóstico por imagen , Neocórtex/crecimiento & desarrollo , Neocórtex/fisiología , Adulto , Adulto Joven , Mapeo Encefálico/métodos , Desarrollo del Adolescente/fisiología , Red Nerviosa/fisiología , Red Nerviosa/diagnóstico por imagen , Red Nerviosa/crecimiento & desarrollo , Encéfalo/diagnóstico por imagen , Encéfalo/crecimiento & desarrollo , Encéfalo/fisiología

6.

Link prediction using low-dimensional node embeddings: The measurement problem.

Menand, Nicolas; Seshadhri, C.

Proc Natl Acad Sci U S A ; 121(8): e2312527121, 2024 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-38363864

RESUMEN

Graph representation learning is a fundamental technique for machine learning (ML) on complex networks. Given an input network, these methods represent the vertices by low-dimensional real-valued vectors. These vectors can be used for a multitude of downstream ML tasks. We study one of the most important such task, link prediction. Much of the recent literature on graph representation learning has shown remarkable success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.

7.

Renormalization group analysis of the Anderson model on random regular graphs.

Vanoni, Carlo; Altshuler, Boris L; Kravtsov, Vladimir E; Scardicchio, Antonello.

Proc Natl Acad Sci U S A ; 121(29): e2401955121, 2024 Jul 16.

Artículo en Inglés | MEDLINE | ID: mdl-38990943

RESUMEN

We present a renormalization group (RG) analysis of the problem of Anderson localization on a random regular graph (RRG) which generalizes the RG of Abrahams, Anderson, Licciardello, and Ramakrishnan to infinite-dimensional graphs. The RG equations necessarily involve two parameters (one being the changing connectivity of subtrees), but we show that the one-parameter scaling hypothesis is recovered for sufficiently large system sizes for both eigenstates and spectrum observables. We also explain the nonmonotonic behavior of dynamical and spectral quantities as a function of the system size for values of disorder close to the transition, by identifying two terms in the beta function of the running fractal dimension of different signs and functional dependence. Our theory provides a simple and coherent explanation for the unusual scaling behavior observed in numerical data of the Anderson model on RRG and of many-body localization.

8.

Message-Passing Monte Carlo: Generating low-discrepancy point sets via graph neural networks.

Rusch, T Konstantin; Kirk, Nathan; Bronstein, Michael M; Lemieux, Christiane; Rus, Daniela.

Proc Natl Acad Sci U S A ; 121(40): e2409913121, 2024 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-39325425

RESUMEN

Discrepancy is a well-known measure for the irregularity of the distribution of a point set. Point sets with small discrepancy are called low discrepancy and are known to efficiently fill the space in a uniform manner. Low-discrepancy points play a central role in many problems in science and engineering, including numerical integration, computer vision, machine perception, computer graphics, machine learning, and simulation. In this work, we present a machine learning approach to generate a new class of low-discrepancy point sets named Message-Passing Monte Carlo (MPMC) points. Motivated by the geometric nature of generating low-discrepancy point sets, we leverage tools from Geometric Deep Learning and base our model on graph neural networks. We further provide an extension of our framework to higher dimensions, which flexibly allows the generation of custom-made points that emphasize the uniformity in specific dimensions that are primarily important for the particular problem at hand. Finally, we demonstrate that our proposed model achieves state-of-the-art performance superior to previous methods by a significant margin. In fact, MPMC points are empirically shown to be either optimal or near-optimal with respect to the discrepancy for low dimension and small number of points, i.e., for which the optimal discrepancy can be determined.

9.

Homophily modulates double descent generalization in graph convolution networks.

Shi, Cheng; Pan, Liming; Hu, Hong; Dokmanic, Ivan.

Proc Natl Acad Sci U S A ; 121(8): e2309504121, 2024 Feb 20.

Artículo en Inglés | MEDLINE | ID: mdl-38346190

RESUMEN

Graph neural networks (GNNs) excel in modeling relational data such as biological, social, and transportation networks, but the underpinnings of their success are not well understood. Traditional complexity measures from statistical learning theory fail to account for observed phenomena like the double descent or the impact of relational semantics on generalization error. Motivated by experimental observations of "transductive" double descent in key networks and datasets, we use analytical tools from statistical physics and random matrix theory to precisely characterize generalization in simple graph convolution networks on the contextual stochastic block model. Our results illuminate the nuances of learning on homophilic versus heterophilic data and predict double descent whose existence in GNNs has been questioned by recent work. We show how risk is shaped by the interplay between the graph noise, feature noise, and the number of training labels. Our findings apply beyond stylized models, capturing qualitative trends in real-world GNNs and datasets. As a case in point, we use our analytic insights to improve performance of state-of-the-art graph convolution networks on heterophilic datasets.

10.

Tree-based QTL mapping with expected local genetic relatedness matrices.

Link, Vivian; Schraiber, Joshua G; Fan, Caoqi; Dinh, Bryan; Mancuso, Nicholas; Chiang, Charleston W K; Edge, Michael D.

Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.

Artículo en Inglés | MEDLINE | ID: mdl-38065072

RESUMEN

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.

Asunto(s)

Genética de Población , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Humanos , Mapeo Cromosómico/métodos , Modelos Genéticos , Fenotipo , Sitios de Carácter Cuantitativo/genética , Nativos de Hawái y Otras Islas del Pacífico/genética

11.

ReHoGCNES-MDA: prediction of miRNA-disease associations using homogenous graph convolutional networks based on regular graph with random edge sampler.

Zhang, Yufang; Chu, Yanyi; Lin, Shenggeng; Xiong, Yi; Wei, Dong-Qing.

Brief Bioinform ; 25(2)2024 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-38517693

RESUMEN

Numerous investigations increasingly indicate the significance of microRNA (miRNA) in human diseases. Hence, unearthing associations between miRNA and diseases can contribute to precise diagnosis and efficacious remediation of medical conditions. The detection of miRNA-disease linkages via computational techniques utilizing biological information has emerged as a cost-effective and highly efficient approach. Here, we introduced a computational framework named ReHoGCNES, designed for prospective miRNA-disease association prediction (ReHoGCNES-MDA). This method constructs homogenous graph convolutional network with regular graph structure (ReHoGCN) encompassing disease similarity network, miRNA similarity network and known MDA network and then was tested on four experimental tasks. A random edge sampler strategy was utilized to expedite processes and diminish training complexity. Experimental results demonstrate that the proposed ReHoGCNES-MDA method outperforms both homogenous graph convolutional network and heterogeneous graph convolutional network with non-regular graph structure in all four tasks, which implicitly reveals steadily degree distribution of a graph does play an important role in enhancement of model performance. Besides, ReHoGCNES-MDA is superior to several machine learning algorithms and state-of-the-art methods on the MDA prediction. Furthermore, three case studies were conducted to further demonstrate the predictive ability of ReHoGCNES. Consequently, 93.3% (breast neoplasms), 90% (prostate neoplasms) and 93.3% (prostate neoplasms) of the top 30 forecasted miRNAs were validated by public databases. Hence, ReHoGCNES-MDA might serve as a dependable and beneficial model for predicting possible MDAs.

Asunto(s)

MicroARNs , Neoplasias de la Próstata , Humanos , Masculino , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , MicroARNs/genética , Estudios Prospectivos , Neoplasias de la Próstata/genética , Femenino

12.

GGN-GO: geometric graph networks for predicting protein function by multi-scale structure features.

Mi, Jia; Wang, Han; Li, Jing; Sun, Jinghong; Li, Chang; Wan, Jing; Zeng, Yuan; Gao, Jingyang.

Brief Bioinform ; 25(6)2024 Sep 23.

Artículo en Inglés | MEDLINE | ID: mdl-39487084

RESUMEN

Recent advances in high-throughput sequencing have led to an explosion of genomic and transcriptomic data, offering a wealth of protein sequence information. However, the functions of most proteins remain unannotated. Traditional experimental methods for annotation of protein functions are costly and time-consuming. Current deep learning methods typically rely on Graph Convolutional Networks to propagate features between protein residues. However, these methods fail to capture fine atomic-level geometric structural features and cannot directly compute or propagate structural features (such as distances, directions, and angles) when transmitting features, often simplifying them to scalars. Additionally, difficulties in capturing long-range dependencies limit the model's ability to identify key nodes (residues). To address these challenges, we propose a geometric graph network (GGN-GO) for predicting protein function that enriches feature extraction by capturing multi-scale geometric structural features at the atomic and residue levels. We use a geometric vector perceptron to convert these features into vector representations and aggregate them with node features for better understanding and propagation in the network. Moreover, we introduce a graph attention pooling layer captures key node information by adaptively aggregating local functional motifs, while contrastive learning enhances graph representation discriminability through random noise and different views. The experimental results show that GGN-GO outperforms six comparative methods in tasks with the most labels for both experimentally validated and predicted protein structures. Furthermore, GGN-GO identifies functional residues corresponding to those experimentally confirmed, showcasing its interpretability and the ability to pinpoint key protein regions. The code and data are available at: https://github.com/MiJia-ID/GGN-GO.

Asunto(s)

Biología Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biología Computacional/métodos , Redes Neurales de la Computación , Algoritmos , Aprendizaje Profundo , Bases de Datos de Proteínas

13.

A multi-view graph contrastive learning framework for deciphering spatially resolved transcriptomics data.

Zhang, Lei; Liang, Shu; Wan, Lin.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38801701

RESUMEN

Spatially resolved transcriptomics data are being used in a revolutionary way to decipher the spatial pattern of gene expression and the spatial architecture of cell types. Much work has been done to exploit the genomic spatial architectures of cells. Such work is based on the common assumption that gene expression profiles of spatially adjacent spots are more similar than those of more distant spots. However, related work might not consider the nonlocal spatial co-expression dependency, which can better characterize the tissue architectures. Therefore, we propose MuCoST, a Multi-view graph Contrastive learning framework for deciphering complex Spatially resolved Transcriptomic architectures with dual scale structural dependency. To achieve this, we employ spot dependency augmentation by fusing gene expression correlation and spatial location proximity, thereby enabling MuCoST to model both nonlocal spatial co-expression dependency and spatially adjacent dependency. We benchmark MuCoST on four datasets, and we compare it with other state-of-the-art spatial domain identification methods. We demonstrate that MuCoST achieves the highest accuracy on spatial domain identification from various datasets. In particular, MuCoST accurately deciphers subtle biological textures and elaborates the variation of spatially functional patterns.

Asunto(s)

Perfilación de la Expresión Génica , Transcriptoma , Perfilación de la Expresión Génica/métodos , Humanos , Algoritmos , Aprendizaje Automático , Biología Computacional/métodos

14.

Attention-guided variational graph autoencoders reveal heterogeneity in spatial transcriptomics.

Lei, Lixin; Han, Kaitai; Wang, Zijun; Shi, Chaojing; Wang, Zhenghui; Dai, Ruoyan; Zhang, Zhiwei; Wang, Mengqiu; Guo, Qianjin.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38627939

RESUMEN

The latest breakthroughs in spatially resolved transcriptomics technology offer comprehensive opportunities to delve into gene expression patterns within the tissue microenvironment. However, the precise identification of spatial domains within tissues remains challenging. In this study, we introduce AttentionVGAE (AVGN), which integrates slice images, spatial information and raw gene expression while calibrating low-quality gene expression. By combining the variational graph autoencoder with multi-head attention blocks (MHA blocks), AVGN captures spatial relationships in tissue gene expression, adaptively focusing on key features and alleviating the need for prior knowledge of cluster numbers, thereby achieving superior clustering performance. Particularly, AVGN attempts to balance the model's attention focus on local and global structures by utilizing MHA blocks, an aspect that current graph neural networks have not extensively addressed. Benchmark testing demonstrates its significant efficacy in elucidating tissue anatomy and interpreting tumor heterogeneity, indicating its potential in advancing spatial transcriptomics research and understanding complex biological phenomena.

Asunto(s)

Benchmarking , Perfilación de la Expresión Génica , Análisis por Conglomerados , Redes Neurales de la Computación

15.

SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication.

Vasighizaker, Akram; Hora, Sheena; Zeng, Raymond; Rueda, Luis.

Brief Bioinform ; 25(3)2024 Mar 27.

Artículo en Inglés | MEDLINE | ID: mdl-38605638

RESUMEN

Recent advances in single-cell RNA sequencing technology have eased analyses of signaling networks of cells. Recently, cell-cell interaction has been studied based on various link prediction approaches on graph-structured data. These approaches have assumptions about the likelihood of node interaction, thus showing high performance for only some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent and explicit attributes of undirected, attributed graphs constructed from the gene expression profile of individual cells. High-dimensional and sparse single-cell RNA-seq data make converting the data into a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell communication network is built using a cell-cell similarity matrix which is learned from gene expression data. We performed experiments on six datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, and the state-of-the-art method for link prediction, WLNM, with 0.99 ROC and 99% prediction accuracy. The datasets can be found at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84133 and the code is publicly available at Github https://github.com/sheenahora/SEGCECO and Code Ocean https://codeocean.com/capsule/8244724/tree.

Asunto(s)

Comunicación Celular , Transducción de Señal , Humanos , Animales , Ratones , Comunicación Celular/genética , Aprendizaje , Redes Neurales de la Computación , Expresión Génica

16.

NIPT-PG: empowering non-invasive prenatal testing to learn from population genomics through an incremental pan-genomic approach.

Xue, Zhengfa; Zhou, Aifen; Zhu, Xiaoyan; Li, Linxuan; Zhu, Huanhuan; Jin, Xin; Wang, Jiayin.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38836702

RESUMEN

Non-invasive prenatal testing (NIPT) is a quite popular approach for detecting fetal genomic aneuploidies. However, due to the limitations on sequencing read length and coverage, NIPT suffers a bottleneck on further improving performance and conducting earlier detection. The errors mainly come from reference biases and population polymorphism. To break this bottleneck, we proposed NIPT-PG, which enables the NIPT algorithm to learn from population data. A pan-genome model is introduced to incorporate variant and polymorphic loci information from tested population. Subsequently, we proposed a sequence-to-graph alignment method, which considers the read mis-match rates during the mapping process, and an indexing method using hash indexing and adjacency lists to accelerate the read alignment process. Finally, by integrating multi-source aligned read and polymorphic sites across the pan-genome, NIPT-PG obtains a more accurate z-score, thereby improving the accuracy of chromosomal aneuploidy detection. We tested NIPT-PG on two simulated datasets and 745 real-world cell-free DNA sequencing data sets from pregnant women. Results demonstrate that NIPT-PG outperforms the standard z-score test. Furthermore, combining experimental and theoretical analyses, we demonstrate the probably approximately correct learnability of NIPT-PG. In summary, NIPT-PG provides a new perspective for fetal chromosomal aneuploidies detection. NIPT-PG may have broad applications in clinical testing, and its detection results can serve as a reference for false positive samples approaching the critical threshold.

Asunto(s)

Aneuploidia , Pruebas Prenatales no Invasivas , Humanos , Femenino , Embarazo , Pruebas Prenatales no Invasivas/métodos , Algoritmos , Genómica/métodos , Diagnóstico Prenatal/métodos , Análisis de Secuencia de ADN/métodos

17.

Accurately deciphering spatial domains for spatially resolved transcriptomics with stCluster.

Wang, Tao; Shu, Han; Hu, Jialu; Wang, Yongtian; Chen, Jing; Peng, Jiajie; Shang, Xuequn.

Brief Bioinform ; 25(4)2024 May 23.

Artículo en Inglés | MEDLINE | ID: mdl-38975895

RESUMEN

Spatial transcriptomics provides valuable insights into gene expression within the native tissue context, effectively merging molecular data with spatial information to uncover intricate cellular relationships and tissue organizations. In this context, deciphering cellular spatial domains becomes essential for revealing complex cellular dynamics and tissue structures. However, current methods encounter challenges in seamlessly integrating gene expression data with spatial information, resulting in less informative representations of spots and suboptimal accuracy in spatial domain identification. We introduce stCluster, a novel method that integrates graph contrastive learning with multi-task learning to refine informative representations for spatial transcriptomic data, consequently improving spatial domain identification. stCluster first leverages graph contrastive learning technology to obtain discriminative representations capable of recognizing spatially coherent patterns. Through jointly optimizing multiple tasks, stCluster further fine-tunes the representations to be able to capture complex relationships between gene expression and spatial organization. Benchmarked against six state-of-the-art methods, the experimental results reveal its proficiency in accurately identifying complex spatial domains across various datasets and platforms, spanning tissue, organ, and embryo levels. Moreover, stCluster can effectively denoise the spatial gene expression patterns and enhance the spatial trajectory inference. The source code of stCluster is freely available at https://github.com/hannshu/stCluster.

Asunto(s)

Perfilación de la Expresión Génica , Transcriptoma , Perfilación de la Expresión Génica/métodos , Biología Computacional/métodos , Algoritmos , Humanos , Animales , Programas Informáticos , Aprendizaje Automático

18.

scMGATGRN: a multiview graph attention network-based method for inferring gene regulatory networks from single-cell transcriptomic data.

Yuan, Lin; Zhao, Ling; Jiang, Yufeng; Shen, Zhen; Zhang, Qinhu; Zhang, Ming; Zheng, Chun-Hou; Huang, De-Shuang.

Brief Bioinform ; 25(6)2024 Sep 23.

Artículo en Inglés | MEDLINE | ID: mdl-39417321

RESUMEN

The gene regulatory network (GRN) plays a vital role in understanding the structure and dynamics of cellular systems, revealing complex regulatory relationships, and exploring disease mechanisms. Recently, deep learning (DL)-based methods have been proposed to infer GRNs from single-cell transcriptomic data and achieved impressive performance. However, these methods do not fully utilize graph topological information and high-order neighbor information from multiple receptive fields. To overcome those limitations, we propose a novel model based on multiview graph attention network, namely, scMGATGRN, to infer GRNs. scMGATGRN mainly consists of GAT, multiview, and view-level attention mechanism. GAT can extract essential features of the gene regulatory network. The multiview model can simultaneously utilize local feature information and high-order neighbor feature information of nodes in the gene regulatory network. The view-level attention mechanism dynamically adjusts the relative importance of node embedding representations and efficiently aggregates node embedding representations from two views. To verify the effectiveness of scMGATGRN, we compared its performance with 10 methods (five shallow learning algorithms and five state-of-the-art DL-based methods) on seven benchmark single-cell RNA sequencing (scRNA-seq) datasets from five cell lines (two in human and three in mouse) with four different kinds of ground-truth networks. The experimental results not only show that scMGATGRN outperforms competing methods but also demonstrate the potential of this model in inferring GRNs. The code and data of scMGATGRN are made freely available on GitHub (https://github.com/nathanyl/scMGATGRN).

Asunto(s)

Redes Reguladoras de Genes , Análisis de la Célula Individual , Transcriptoma , Análisis de la Célula Individual/métodos , Humanos , Biología Computacional/métodos , Algoritmos , Aprendizaje Profundo , Perfilación de la Expresión Génica/métodos , Ratones

19.

Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.

Zhang, Zhenhao; Liu, Yuxi; Xiao, Meichen; Wang, Kun; Huang, Yu; Bian, Jiang; Yang, Ruolin; Li, Fuyi.

Brief Bioinform ; 25(6)2024 Sep 23.

Artículo en Inglés | MEDLINE | ID: mdl-39487083

RESUMEN

Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL.

Asunto(s)

Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis por Conglomerados , RNA-Seq/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Aprendizaje Automático , Biología Computacional/métodos , Redes Neurales de la Computación , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Transcriptoma , Análisis de Expresión Génica de una Sola Célula

20.

HTINet2: herb-target prediction via knowledge graph embedding and residual-like graph neural network.

Duan, Pengbo; Yang, Kuo; Su, Xin; Fan, Shuyue; Dong, Xin; Zhang, Fenghui; Li, Xianan; Xing, Xiaoyan; Zhu, Qiang; Yu, Jian; Zhou, Xuezhong.

Brief Bioinform ; 25(5)2024 Jul 25.

Artículo en Inglés | MEDLINE | ID: mdl-39175133

RESUMEN

Target identification is one of the crucial tasks in drug research and development, as it aids in uncovering the action mechanism of herbs/drugs and discovering new therapeutic targets. Although multiple algorithms of herb target prediction have been proposed, due to the incompleteness of clinical knowledge and the limitation of unsupervised models, accurate identification for herb targets still faces huge challenges of data and models. To address this, we proposed a deep learning-based target prediction framework termed HTINet2, which designed three key modules, namely, traditional Chinese medicine (TCM) and clinical knowledge graph embedding, residual graph representation learning, and supervised target prediction. In the first module, we constructed a large-scale knowledge graph that covers the TCM properties and clinical treatment knowledge of herbs, and designed a component of deep knowledge embedding to learn the deep knowledge embedding of herbs and targets. In the remaining two modules, we designed a residual-like graph convolution network to capture the deep interactions among herbs and targets, and a Bayesian personalized ranking loss to conduct supervised training and target prediction. Finally, we designed comprehensive experiments, of which comparison with baselines indicated the excellent performance of HTINet2 (HR@10 increased by 122.7% and NDCG@10 by 35.7%), ablation experiments illustrated the positive effect of our designed modules of HTINet2, and case study demonstrated the reliability of the predicted targets of Artemisia annua and Coptis chinensis based on the knowledge base, literature, and molecular docking.

Asunto(s)

Medicamentos Herbarios Chinos , Medicina Tradicional China , Redes Neurales de la Computación , Medicamentos Herbarios Chinos/química , Medicamentos Herbarios Chinos/farmacología , Algoritmos , Humanos , Aprendizaje Profundo , Teorema de Bayes

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA