Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation.

Whittington, James C R; Muller, Timothy H; Mark, Shirley; Chen, Guifen; Barry, Caswell; Burgess, Neil; Behrens, Timothy E J.

Cell ; 183(5): 1249-1263.e23, 2020 11 25.

Artigo em Inglês | MEDLINE | ID: mdl-33181068

RESUMO

The hippocampal-entorhinal system is important for spatial and relational memory tasks. We formally link these domains, provide a mechanistic understanding of the hippocampal role in generalization, and offer unifying principles underlying many entorhinal and hippocampal cell types. We propose medial entorhinal cells form a basis describing structural knowledge, and hippocampal cells link this basis with sensory representations. Adopting these principles, we introduce the Tolman-Eichenbaum machine (TEM). After learning, TEM entorhinal cells display diverse properties resembling apparently bespoke spatial responses, such as grid, band, border, and object-vector cells. TEM hippocampal cells include place and landmark cells that remap between environments. Crucially, TEM also aligns with empirically recorded representations in complex non-spatial tasks. TEM also generates predictions that hippocampal remapping is not random as previously believed; rather, structural knowledge is preserved across environments. We confirm this structural transfer over remapping in simultaneously recorded place and grid cells.

Assuntos

Córtex Entorrinal/fisiologia , Generalização Psicológica , Hipocampo/fisiologia , Memória/fisiologia , Modelos Neurológicos , Animais , Conhecimento , Células de Lugar/citologia , Sensação , Análise e Desempenho de Tarefas

2.

Human Representation Learning.

Radulescu, Angela; Shin, Yeon Soon; Niv, Yael.

Annu Rev Neurosci ; 44: 253-273, 2021 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-33730510

RESUMO

The central theme of this review is the dynamic interaction between information selection and learning. We pose a fundamental question about this interaction: How do we learn what features of our experiences are worth learning about? In humans, this process depends on attention and memory, two cognitive functions that together constrain representations of the world to features that are relevant for goal attainment. Recent evidence suggests that the representations shaped by attention and memory are themselves inferred from experience with each task. We review this evidence and place it in the context of work that has explicitly characterized representation learning as statistical inference. We discuss how inference can be scaled to real-world decisions by approximating beliefs based on a small number of experiences. Finally, we highlight some implications of this inference process for human decision-making in social environments.

Assuntos

Cognição , Aprendizagem , Atenção , Humanos

3.

Partial order relation-based gene ontology embedding improves protein function prediction.

Li, Wenjing; Wang, Bin; Dai, Jin; Kou, Yan; Chen, Xiaojun; Pan, Yi; Hu, Shuangwei; Xu, Zhenjiang Zech.

Brief Bioinform ; 25(2)2024 Jan 22.

Artigo em Inglês | MEDLINE | ID: mdl-38446740

RESUMO

Protein annotation has long been a challenging task in computational biology. Gene Ontology (GO) has become one of the most popular frameworks to describe protein functions and their relationships. Prediction of a protein annotation with proper GO terms demands high-quality GO term representation learning, which aims to learn a low-dimensional dense vector representation with accompanying semantic meaning for each functional label, also known as embedding. However, existing GO term embedding methods, which mainly take into account ancestral co-occurrence information, have yet to capture the full topological information in the GO-directed acyclic graph (DAG). In this study, we propose a novel GO term representation learning method, PO2Vec, to utilize the partial order relationships to improve the GO term representations. Extensive evaluations show that PO2Vec achieves better outcomes than existing embedding methods in a variety of downstream biological tasks. Based on PO2Vec, we further developed a new protein function prediction method PO2GO, which demonstrates superior performance measured in multiple metrics and annotation specificity as well as few-shot prediction capability in the benchmarks. These results suggest that the high-quality representation of GO structure is critical for diverse biological tasks including computational protein annotation.

Assuntos

Benchmarking , Biologia Computacional , Ontologia Genética , Aprendizagem , Anotação de Sequência Molecular

4.

scCRT: a contrastive-based dimensionality reduction model for scRNA-seq trajectory inference.

Shi, Yuchen; Wan, Jian; Zhang, Xin; Liang, Tingting; Yin, Yuyu.

Brief Bioinform ; 25(3)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38701412

RESUMO

Trajectory inference is a crucial task in single-cell RNA-sequencing downstream analysis, which can reveal the dynamic processes of biological development, including cell differentiation. Dimensionality reduction is an important step in the trajectory inference process. However, most existing trajectory methods rely on cell features derived from traditional dimensionality reduction methods, such as principal component analysis and uniform manifold approximation and projection. These methods are not specifically designed for trajectory inference and fail to fully leverage prior information from upstream analysis, limiting their performance. Here, we introduce scCRT, a novel dimensionality reduction model for trajectory inference. In order to utilize prior information to learn accurate cells representation, scCRT integrates two feature learning components: a cell-level pairwise module and a cluster-level contrastive module. The cell-level module focuses on learning accurate cell representations in a reduced-dimensionality space while maintaining the cell-cell positional relationships in the original space. The cluster-level contrastive module uses prior cell state information to aggregate similar cells, preventing excessive dispersion in the low-dimensional space. Experimental findings from 54 real and 81 synthetic datasets, totaling 135 datasets, highlighted the superior performance of scCRT compared with commonly used trajectory inference methods. Additionally, an ablation study revealed that both cell-level and cluster-level modules enhance the model's ability to learn accurate cell features, facilitating cell lineage inference. The source code of scCRT is available at https://github.com/yuchen21-web/scCRT-for-scRNA-seq.

Assuntos

Algoritmos , Análise da Expressão Gênica de Célula Única , Biologia Computacional/métodos , RNA-Seq/métodos , Análise da Expressão Gênica de Célula Única/métodos , Software

5.

Multiview representation learning for identification of novel cancer genes and their causative biological mechanisms.

Yang, Jianye; Fu, Haitao; Xue, Feiyang; Li, Menglu; Wu, Yuyang; Yu, Zhanhui; Luo, Haohui; Gong, Jing; Niu, Xiaohui; Zhang, Wen.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39210506

RESUMO

Tumorigenesis arises from the dysfunction of cancer genes, leading to uncontrolled cell proliferation through various mechanisms. Establishing a complete cancer gene catalogue will make precision oncology possible. Although existing methods based on graph neural networks (GNN) are effective in identifying cancer genes, they fall short in effectively integrating data from multiple views and interpreting predictive outcomes. To address these shortcomings, an interpretable representation learning framework IMVRL-GCN is proposed to capture both shared and specific representations from multiview data, offering significant insights into the identification of cancer genes. Experimental results demonstrate that IMVRL-GCN outperforms state-of-the-art cancer gene identification methods and several baselines. Furthermore, IMVRL-GCN is employed to identify a total of 74 high-confidence novel cancer genes, and multiview data analysis highlights the pivotal roles of shared, mutation-specific, and structure-specific representations in discriminating distinctive cancer genes. Exploration of the mechanisms behind their discriminative capabilities suggests that shared representations are strongly associated with gene functions, while mutation-specific and structure-specific representations are linked to mutagenic propensity and functional synergy, respectively. Finally, our in-depth analyses of these candidates suggest potential insights for individualized treatments: afatinib could counteract many mutation-driven risks, and targeting interactions with cancer gene SRC is a reasonable strategy to mitigate interaction-induced risks for NR3C1, RXRA, HNF4A, and SP1.

Assuntos

Neoplasias , Humanos , Neoplasias/genética , Biologia Computacional/métodos , Redes Neurais de Computação , Mutação , Genes Neoplásicos , Fator 4 Nuclear de Hepatócito/genética , Aprendizado de Máquina

6.

PTBGRP: predicting phage-bacteria interactions with graph representation learning on microbial heterogeneous information network.

Pan, Jie; You, Zhuhong; You, Wencai; Zhao, Tian; Feng, Chenlu; Zhang, Xuexia; Ren, Fengzhi; Ma, Sanxing; Wu, Fan; Wang, Shiwei; Sun, Yanmei.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37742053

RESUMO

Identifying the potential bacteriophages (phage) candidate to treat bacterial infections plays an essential role in the research of human pathogens. Computational approaches are recognized as a valid way to predict bacteria and target phages. However, most of the current methods only utilize lower-order biological information without considering the higher-order connectivity patterns, which helps to improve the predictive accuracy. Therefore, we developed a novel microbial heterogeneous interaction network (MHIN)-based model called PTBGRP to predict new phages for bacterial hosts. Specifically, PTBGRP first constructs an MHIN by integrating phage-bacteria interaction (PBI) and six bacteria-bacteria interaction networks with their biological attributes. Then, different representation learning methods are deployed to extract higher-level biological features and lower-level topological features from MHIN. Finally, PTBGRP employs a deep neural network as the classifier to predict unknown PBI pairs based on the fused biological information. Experiment results demonstrated that PTBGRP achieves the best performance on the corresponding ESKAPE pathogens and PBI dataset when compared with state-of-art methods. In addition, case studies of Klebsiella pneumoniae and Staphylococcus aureus further indicate that the consideration of rich heterogeneous information enables PTBGRP to accurately predict PBI from a more comprehensive perspective. The webserver of the PTBGRP predictor is freely available at http://120.77.11.78/PTBGRP/.

Assuntos

Bacteriófagos , Infecções Estafilocócicas , Humanos , Aprendizagem , Bactérias , Redes Neurais de Computação

7.

WSGMB: weight signed graph neural network for microbial biomarker identification.

Pan, Shuheng; Jiang, Xinyi; Zhang, Kai.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38084923

RESUMO

The stability of the gut microenvironment is inextricably linked to human health, with the onset of many diseases accompanied by dysbiosis of the gut microbiota. It has been reported that there are differences in the microbial community composition between patients and healthy individuals, and many microbes are considered potential biomarkers. Accurately identifying these biomarkers can lead to more precise and reliable clinical decision-making. To improve the accuracy of microbial biomarker identification, this study introduces WSGMB, a computational framework that uses the relative abundance of microbial taxa and health status as inputs. This method has two main contributions: (1) viewing the microbial co-occurrence network as a weighted signed graph and applying graph convolutional neural network techniques for graph classification; (2) designing a new architecture to compute the role transitions of each microbial taxon between health and disease networks, thereby identifying disease-related microbial biomarkers. The weighted signed graph neural network enhances the quality of graph embeddings; quantifying the importance of microbes in different co-occurrence networks better identifies those microbes critical to health. Microbes are ranked according to their importance change scores, and when this score exceeds a set threshold, the microbe is considered a biomarker. This framework's identification performance is validated by comparing the biomarkers identified by WSGMB with actual microbial biomarkers associated with specific diseases from public literature databases. The study tests the proposed computational framework using actual microbial community data from colorectal cancer and Crohn's disease samples. It compares it with the most advanced microbial biomarker identification methods. The results show that the WSGMB method outperforms similar approaches in the accuracy of microbial biomarker identification.

Assuntos

Doença de Crohn , Microbioma Gastrointestinal , Microbiota , Humanos , Redes Neurais de Computação , Biomarcadores

8.

Accurate TCR-pMHC interaction prediction using a BERT-based transfer learning method.

Zhang, Jiawei; Ma, Wang; Yao, Hui.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38040492

RESUMO

Accurate prediction of TCR-pMHC binding is important for the development of cancer immunotherapies, especially TCR-based agents. Existing algorithms often experience diminished performance when dealing with unseen epitopes, primarily due to the complexity in TCR-pMHC recognition patterns and the scarcity of available data for training. We have developed a novel deep learning model, 'TCR Antigen Binding Recognition' based on BERT, named as TABR-BERT. Leveraging BERT's potent representation learning capabilities, TABR-BERT effectively captures essential information regarding TCR-pMHC interactions from TCR sequences, antigen epitope sequences and epitope-MHC binding. By transferring this knowledge to predict TCR-pMHC recognition, TABR-BERT demonstrated better results in benchmark tests than existing methods, particularly for unseen epitopes.

Assuntos

Algoritmos , Receptores de Antígenos de Linfócitos T , Receptores de Antígenos de Linfócitos T/genética , Ligação Proteica , Epitopos/metabolismo , Aprendizado de Máquina

9.

SMG: self-supervised masked graph learning for cancer gene identification.

Cui, Yan; Wang, Zhikang; Wang, Xiaoyu; Zhang, Yiwen; Zhang, Ying; Pan, Tong; Zhang, Zhe; Li, Shanshan; Guo, Yuming; Akutsu, Tatsuya; Song, Jiangning.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37950905

RESUMO

Cancer genomics is dedicated to elucidating the genes and pathways that contribute to cancer progression and development. Identifying cancer genes (CGs) associated with the initiation and progression of cancer is critical for characterization of molecular-level mechanism in cancer research. In recent years, the growing availability of high-throughput molecular data and advancements in deep learning technologies has enabled the modelling of complex interactions and topological information within genomic data. Nevertheless, because of the limited labelled data, pinpointing CGs from a multitude of potential mutations remains an exceptionally challenging task. To address this, we propose a novel deep learning framework, termed self-supervised masked graph learning (SMG), which comprises SMG reconstruction (pretext task) and task-specific fine-tuning (downstream task). In the pretext task, the nodes of multi-omic featured protein-protein interaction (PPI) networks are randomly substituted with a defined mask token. The PPI networks are then reconstructed using the graph neural network (GNN)-based autoencoder, which explores the node correlations in a self-prediction manner. In the downstream tasks, the pre-trained GNN encoder embeds the input networks into feature graphs, whereas a task-specific layer proceeds with the final prediction. To assess the performance of the proposed SMG method, benchmarking experiments are performed on three node-level tasks (identification of CGs, essential genes and healthy driver genes) and one graph-level task (identification of disease subnetwork) across eight PPI networks. Benchmarking experiments and performance comparison with existing state-of-the-art methods demonstrate the superiority of SMG on multi-omic feature engineering.

Assuntos

Neoplasias , Oncogenes , Mutação , Benchmarking , Genes Essenciais , Genômica , Neoplasias/genética

10.

A social theory-enhanced graph representation learning framework for multitask prediction of drug-drug interactions.

Feng, Yue-Hua; Zhang, Shao-Wu; Feng, Yi-Yang; Zhang, Qing-Qing; Shi, Ming-Hui; Shi, Jian-Yu.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36642408

RESUMO

Current machine learning-based methods have achieved inspiring predictions in the scenarios of mono-type and multi-type drug-drug interactions (DDIs), but they all ignore enhancive and depressive pharmacological changes triggered by DDIs. In addition, these pharmacological changes are asymmetric since the roles of two drugs in an interaction are different. More importantly, these pharmacological changes imply significant topological patterns among DDIs. To address the above issues, we first leverage Balance theory and Status theory in social networks to reveal the topological patterns among directed pharmacological DDIs, which are modeled as a signed and directed network. Then, we design a novel graph representation learning model named SGRL-DDI (social theory-enhanced graph representation learning for DDI) to realize the multitask prediction of DDIs. SGRL-DDI model can capture the task-joint information by integrating relation graph convolutional networks with Balance and Status patterns. Moreover, we utilize task-specific deep neural networks to perform two tasks, including the prediction of enhancive/depressive DDIs and the prediction of directed DDIs. Based on DDI entries collected from DrugBank, the superiority of our model is demonstrated by the comparison with other state-of-the-art methods. Furthermore, the ablation study verifies that Balance and Status patterns help characterize directed pharmacological DDIs, and that the joint of two tasks provides better DDI representations than individual tasks. Last, we demonstrate the practical effectiveness of our model by a version-dependent test, where 88.47 and 81.38% DDI out of newly added entries provided by the latest release of DrugBank are validated in two predicting tasks respectively.

Assuntos

Aprendizado de Máquina , Redes Neurais de Computação , Interações Medicamentosas

11.

Tree visualizations of protein sequence embedding space enable improved functional clustering of diverse protein superfamilies.

Yeung, Wayland; Zhou, Zhongliang; Mathew, Liju; Gravel, Nathan; Taujale, Rahil; O'Boyle, Brady; Salcedo, Mariah; Venkat, Aarya; Lanzilotta, William; Li, Sheng; Kannan, Natarajan.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36642409

RESUMO

Protein language models, trained on millions of biologically observed sequences, generate feature-rich numerical representations of protein sequences. These representations, called sequence embeddings, can infer structure-functional properties, despite protein language models being trained on primary sequence alone. While sequence embeddings have been applied toward tasks such as structure and function prediction, applications toward alignment-free sequence classification have been hindered by the lack of studies to derive, quantify and evaluate relationships between protein sequence embeddings. Here, we develop workflows and visualization methods for the classification of protein families using sequence embedding derived from protein language models. A benchmark of manifold visualization methods reveals that Neighbor Joining (NJ) embedding trees are highly effective in capturing global structure while achieving similar performance in capturing local structure compared with popular dimensionality reduction techniques such as t-SNE and UMAP. The statistical significance of hierarchical clusters on a tree is evaluated by resampling embeddings using a variational autoencoder (VAE). We demonstrate the application of our methods in the classification of two well-studied enzyme superfamilies, phosphatases and protein kinases. Our embedding-based classifications remain consistent with and extend upon previously published sequence alignment-based classifications. We also propose a new hierarchical classification for the S-Adenosyl-L-Methionine (SAM) enzyme superfamily which has been difficult to classify using traditional alignment-based approaches. Beyond applications in sequence classification, our results further suggest NJ trees are a promising general method for visualizing high-dimensional data sets.

Assuntos

Sequência de Aminoácidos , Proteínas , Análise por Conglomerados , Proteínas/química , Alinhamento de Sequência

12.

Spatially aware self-representation learning for tissue structure characterization and spatial functional genes identification.

Zhang, Chuanchao; Li, Xinxing; Huang, Wendong; Wang, Lequn; Shi, Qianqian.

Brief Bioinform ; 24(4)2023 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-37253698

RESUMO

Spatially resolved transcriptomics (SRT) enable the comprehensive characterization of transcriptomic profiles in the context of tissue microenvironments. Unveiling spatial transcriptional heterogeneity needs to effectively incorporate spatial information accounting for the substantial spatial correlation of expression measurements. Here, we develop a computational method, SpaSRL (spatially aware self-representation learning), which flexibly enhances and decodes spatial transcriptional signals to simultaneously achieve spatial domain detection and spatial functional genes identification. This novel tunable spatially aware strategy of SpaSRL not only balances spatial and transcriptional coherence for the two tasks, but also can transfer spatial correlation constraint between them based on a unified model. In addition, this joint analysis by SpaSRL deciphers accurate and fine-grained tissue structures and ensures the effective extraction of biologically informative genes underlying spatial architecture. We verified the superiority of SpaSRL on spatial domain detection, spatial functional genes identification and data denoising using multiple SRT datasets obtained by different platforms and tissue sections. Our results illustrate SpaSRL's utility in flexible integration of spatial information and novel discovery of biological insights from spatial transcriptomic datasets.

Assuntos

Perfilação da Expressão Gênica , Aprendizagem , Transcriptoma

13.

GSRF-DTI: a framework for drug-target interaction prediction based on a drug-target pair network and representation learning on a large graph.

Zhu, Yongdi; Ning, Chunhui; Zhang, Naiqian; Wang, Mingyi; Zhang, Yusen.

BMC Biol ; 22(1): 156, 2024 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-39020316

RESUMO

BACKGROUND: Identification of potential drug-target interactions (DTIs) with high accuracy is a key step in drug discovery and repositioning, especially concerning specific drug targets. Traditional experimental methods for identifying the DTIs are arduous, time-intensive, and financially burdensome. In addition, robust computational methods have been developed for predicting the DTIs and are widely applied in drug discovery research. However, advancing more precise algorithms for predicting DTIs is essential to meet the stringent standards demanded by drug discovery. RESULTS: We proposed a novel method called GSRF-DTI, which integrates networks with a deep learning algorithm to identify DTIs. Firstly, GSRF-DTI learned the embedding representation of drugs and targets by integrating multiple drug association information and target association information, respectively. Then, GSRF-DTI considered the influence of drug-target pair (DTP) association on DTI prediction to construct a drug-target pair network (DTP-NET). Next, we utilized GraphSAGE on DTP-NET to learn the potential features of the network and applied random forest (RF) to predict the DTIs. Furthermore, we conducted ablation experiments to validate the necessity of integrating different types of network features for identifying DTIs. It is worth noting that GSRF-DTI proposed three novel DTIs. CONCLUSIONS: GSRF-DTI not only considered the influence of the interaction relationship between drug and target but also considered the impact of DTP association relationship on DTI prediction. We initially use GraphSAGE to aggregate the neighbor information of nodes for better identification. Experimental analysis on Luo's dataset and the newly constructed dataset revealed that the GSRF-DTI framework outperformed several state-of-the-art methods significantly.

Assuntos

Descoberta de Drogas , Descoberta de Drogas/métodos , Aprendizado Profundo , Biologia Computacional/métodos , Algoritmos , Preparações Farmacêuticas

14.

Integration of scRNA-seq data by disentangled representation learning with condition domain adaptation.

Liu, Renjing; Qian, Kun; He, Xinwei; Li, Hongwei.

BMC Bioinformatics ; 25(1): 116, 2024 Mar 16.

Artigo em Inglês | MEDLINE | ID: mdl-38493095

RESUMO

BACKGROUND: The integration of single-cell RNA sequencing data from multiple experimental batches and diverse biological conditions holds significant importance in the study of cellular heterogeneity. RESULTS: To expedite the exploration of systematic disparities under various biological contexts, we propose a scRNA-seq integration method called scDisco, which involves a domain-adaptive decoupling representation learning strategy for the integration of dissimilar single-cell RNA data. It constructs a condition-specific domain-adaptive network founded on variational autoencoders. scDisco not only effectively reduces batch effects but also successfully disentangles biological effects and condition-specific effects, and further augmenting condition-specific representations through the utilization of condition-specific Domain-Specific Batch Normalization layers. This enhancement enables the identification of genes specific to particular conditions. The effectiveness and robustness of scDisco as an integration method were analyzed using both simulated and real datasets, and the results demonstrate that scDisco can yield high-quality visualizations and quantitative outcomes. Furthermore, scDisco has been validated using real datasets, affirming its proficiency in cell clustering quality, retaining batch-specific cell types and identifying condition-specific genes. CONCLUSION: scDisco is an effective integration method based on variational autoencoders, which improves analytical tasks of reducing batch effects, cell clustering, retaining batch-specific cell types and identifying condition-specific genes.

Assuntos

Aprendizagem , Análise da Expressão Gênica de Célula Única , Análise por Conglomerados , RNA , Análise de Célula Única , Análise de Sequência de RNA , Perfilação da Expressão Gênica , Algoritmos

15.

Effective type label-based synergistic representation learning for biomedical event trigger detection.

Hao, Anran; Yuan, Haohan; Hui, Siu Cheung; Su, Jian.

BMC Bioinformatics ; 25(1): 251, 2024 Jul 31.

Artigo em Inglês | MEDLINE | ID: mdl-39085787

RESUMO

BACKGROUND: Detecting event triggers in biomedical texts, which contain domain knowledge and context-dependent terms, is more challenging than in general-domain texts. Most state-of-the-art models rely mainly on external resources such as linguistic tools and knowledge bases to improve system performance. However, they lack effective mechanisms to obtain semantic clues from label specification and sentence context. Given its success in image classification, label representation learning is a promising approach to enhancing biomedical event trigger detection models by leveraging the rich semantics of pre-defined event type labels. RESULTS: In this paper, we propose the Biomedical Label-based Synergistic representation Learning (BioLSL) model, which effectively utilizes event type labels by learning their correlation with trigger words and enriches the representation contextually. The BioLSL model consists of three modules. Firstly, the Domain-specific Joint Encoding module employs a transformer-based, domain-specific pre-trained architecture to jointly encode input sentences and pre-defined event type labels. Secondly, the Label-based Synergistic Representation Learning module learns the semantic relationships between input texts and event type labels, and generates a Label-Trigger Aware Representation (LTAR) and a Label-Context Aware Representation (LCAR) for enhanced semantic representations. Finally, the Trigger Classification module makes structured predictions, where each label is predicted with respect to its neighbours. We conduct experiments on three benchmark BioNLP datasets, namely MLEE, GE09, and GE11, to evaluate our proposed BioLSL model. Results show that BioLSL has achieved state-of-the-art performance, outperforming the baseline models. CONCLUSIONS: The proposed BioLSL model demonstrates good performance for biomedical event trigger detection without using any external resources. This suggests that label representation learning and context-aware enhancement are promising directions for improving the task. The key enhancement is that BioLSL effectively learns to construct semantic linkages between the event mentions and type labels, which provide the latent information of label-trigger and label-context relationships in biomedical texts. Moreover, additional experiments on BioLSL show that it performs exceptionally well with limited training data under the data-scarce scenarios.

Assuntos

Semântica , Processamento de Linguagem Natural , Aprendizado de Máquina , Mineração de Dados/métodos , Algoritmos

16.

Learning self-supervised molecular representations for drug-drug interaction prediction.

Kpanou, Rogia; Dallaire, Patrick; Rousseau, Elsa; Corbeil, Jacques.

BMC Bioinformatics ; 25(1): 47, 2024 Jan 30.

Artigo em Inglês | MEDLINE | ID: mdl-38291362

RESUMO

Drug-drug interactions (DDI) are a critical concern in healthcare due to their potential to cause adverse effects and compromise patient safety. Supervised machine learning models for DDI prediction need to be optimized to learn abstract, transferable features, and generalize to larger chemical spaces, primarily due to the scarcity of high-quality labeled DDI data. Inspired by recent advances in computer vision, we present SMR-DDI, a self-supervised framework that leverages contrastive learning to embed drugs into a scaffold-based feature space. Molecular scaffolds represent the core structural motifs that drive pharmacological activities, making them valuable for learning informative representations. Specifically, we pre-trained SMR-DDI on a large-scale unlabeled molecular dataset. We generated augmented views for each molecule via SMILES enumeration and optimized the embedding process through contrastive loss minimization between views. This enables the model to capture relevant and robust molecular features while reducing noise. We then transfer the learned representations for the downstream prediction of DDI. Experiments show that the new feature space has comparable expressivity to state-of-the-art molecular representations and achieved competitive DDI prediction results while training on less data. Additional investigations also revealed that pre-training on more extensive and diverse unlabeled molecular datasets improved the model's capability to embed molecules more effectively. Our results highlight contrastive learning as a promising approach for DDI prediction that can identify potentially hazardous drug combinations using only structural information.

Assuntos

Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Interações Medicamentosas , Aprendizado de Máquina Supervisionado

17.

SAFER: sub-hypergraph attention-based neural network for predicting effective responses to dose combinations.

Tang, Yi-Ching; Li, Rongbin; Tang, Jing; Zheng, W Jim; Jiang, Xiaoqian.

BMC Bioinformatics ; 25(1): 250, 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-39080535

RESUMO

BACKGROUND: The potential benefits of drug combination synergy in cancer medicine are significant, yet the risks must be carefully managed due to the possibility of increased toxicity. Although artificial intelligence applications have demonstrated notable success in predicting drug combination synergy, several key challenges persist: (1) Existing models often predict average synergy values across a restricted range of testing dosages, neglecting crucial dose amounts and the mechanisms of action of the drugs involved. (2) Many graph-based models rely on static protein-protein interactions, failing to adapt to dynamic and higher-order relationships. These limitations constrain the applicability of current methods. RESULTS: We introduce SAFER, a Sub-hypergraph Attention-based graph model, addressing these issues by incorporating complex relationships among biological knowledge networks and considering dosing effects on subject-specific networks. SAFER outperformed previous models on the benchmark and the independent test set. The analysis of subgraph attention weight for the lung cancer cell line highlighted JAK-STAT signaling pathway, PRDM12, ZNF781, and CDC5L that have been implicated in lung fibrosis. CONCLUSIONS: SAFER presents an interpretable framework designed to identify drug-responsive signals. Tailored for comprehending dose effects on subject-specific molecular contexts, our model uniquely captures dose-level drug combination responses. This capability unlocks previously inaccessible avenues of investigation compared to earlier models. Furthermore, the SAFER framework can be leveraged by future inquiries to investigate molecular networks that uniquely characterize individual patients and can be applied to prioritize personalized effective treatment based on safe dose combinations.

Assuntos

Redes Neurais de Computação , Humanos , Linhagem Celular Tumoral , Sinergismo Farmacológico , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/metabolismo , Relação Dose-Resposta a Droga , Transdução de Sinais/efeitos dos fármacos , Antineoplásicos/farmacologia

18.

Identifying subgroups of eating behavior traits unrelated to obesity using functional connectivity and feature representation learning.

Choi, Hyoungshin; Byeon, Kyoungseob; Lee, Jong-Eun; Hong, Seok-Jun; Park, Bo-Yong; Park, Hyunjin.

Hum Brain Mapp ; 45(1): e26581, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38224537

RESUMO

Eating behavior is highly heterogeneous across individuals and cannot be fully explained using only the degree of obesity. We utilized unsupervised machine learning and functional connectivity measures to explore the heterogeneity of eating behaviors measured by a self-assessment instrument using 424 healthy adults (mean ± standard deviation [SD] age = 47.07 ± 18.89 years; 67% female). We generated low-dimensional representations of functional connectivity using resting-state functional magnetic resonance imaging and estimated latent features using the feature representation capabilities of an autoencoder by nonlinearly compressing the functional connectivity information. The clustering approaches applied to latent features identified three distinct subgroups. The subgroups exhibited different levels of hunger traits, while their body mass indices were comparable. The results were replicated in an independent dataset consisting of 212 participants (mean ± SD age = 38.97 ± 19.80 years; 35% female). The model interpretation technique of integrated gradients revealed that the between-group differences in the integrated gradient maps were associated with functional reorganization in heteromodal association and limbic cortices and reward-related subcortical structures such as the accumbens, amygdala, and caudate. The cognitive decoding analysis revealed that these systems are associated with reward- and emotion-related systems. Our findings provide insights into the macroscopic brain organization of eating behavior-related subgroups independent of obesity.

Assuntos

Imageamento por Ressonância Magnética , Obesidade , Adulto , Humanos , Feminino , Pessoa de Meia-Idade , Idoso , Adulto Jovem , Masculino , Imageamento por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Mapeamento Encefálico/métodos , Comportamento Alimentar

19.

Compressed representation of brain genetic transcription.

Ruffle, James K; Watkins, Henry; Gray, Robert J; Hyare, Harpreet; Thiebaut de Schotten, Michel; Nachev, Parashkev.

Hum Brain Mapp ; 45(11): e26795, 2024 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-39045881

RESUMO

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space. The task is especially challenging with high-dimensional data, such as gene expression, where the joint complexity of anatomical and transcriptional patterns demands maximum compression. The established practice is to use standard principal component analysis (PCA), whose computational felicity is offset by limited expressivity, especially at great compression ratios. Employing whole-brain, voxel-wise Allen Brain Atlas transcription data, here we systematically compare compressed representations based on the most widely supported linear and non-linear methods-PCA, kernel PCA, non-negative matrix factorisation (NMF), t-stochastic neighbour embedding (t-SNE), uniform manifold approximation and projection (UMAP), and deep auto-encoding-quantifying reconstruction fidelity, anatomical coherence, and predictive utility across signalling, microstructural, and metabolic targets, drawn from large-scale open-source MRI and PET data. We show that deep auto-encoders yield superior representations across all metrics of performance and target domains, supporting their use as the reference standard for representing transcription patterns in the human brain.

Assuntos

Encéfalo , Imageamento por Ressonância Magnética , Transcrição Gênica , Humanos , Encéfalo/diagnóstico por imagem , Encéfalo/metabolismo , Transcrição Gênica/fisiologia , Tomografia por Emissão de Pósitrons , Processamento de Imagem Assistida por Computador/métodos , Análise de Componente Principal , Compressão de Dados/métodos , Atlas como Assunto

20.

Learning representations for gene ontology terms by jointly encoding graph structure and textual node descriptors.

Zhao, Lingling; Sun, Huiting; Cao, Xinyi; Wen, Naifeng; Wang, Junjie; Wang, Chunyu.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35901452

RESUMO

Measuring the semantic similarity between Gene Ontology (GO) terms is a fundamental step in numerous functional bioinformatics applications. To fully exploit the metadata of GO terms, word embedding-based methods have been proposed recently to map GO terms to low-dimensional feature vectors. However, these representation methods commonly overlook the key information hidden in the whole GO structure and the relationship between GO terms. In this paper, we propose a novel representation model for GO terms, named GT2Vec, which jointly considers the GO graph structure obtained by graph contrastive learning and the semantic description of GO terms based on BERT encoders. Our method is evaluated on a protein similarity task on a collection of benchmark datasets. The experimental results demonstrate the effectiveness of using a joint encoding graph structure and textual node descriptors to learn vector representations for GO terms.

Assuntos

Biologia Computacional , Semântica , Biologia Computacional/métodos , Ontologia Genética , Metadados

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA