Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Heterogeneous biomedical entity representation learning for gene-disease association prediction.

Meng, Zhaohan; Liu, Siwei; Liang, Shangsong; Jani, Bhautesh; Meng, Zaiqiao.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39154194

RESUMO

Understanding the genetic basis of disease is a fundamental aspect of medical research, as genes are the classic units of heredity and play a crucial role in biological function. Identifying associations between genes and diseases is critical for diagnosis, prevention, prognosis, and drug development. Genes that encode proteins with similar sequences are often implicated in related diseases, as proteins causing identical or similar diseases tend to show limited variation in their sequences. Predicting gene-disease association (GDA) requires time-consuming and expensive experiments on a large number of potential candidate genes. Although methods have been proposed to predict associations between genes and diseases using traditional machine learning algorithms and graph neural networks, these approaches struggle to capture the deep semantic information within the genes and diseases and are dependent on training data. To alleviate this issue, we propose a novel GDA prediction model named FusionGDA, which utilizes a pre-training phase with a fusion module to enrich the gene and disease semantic representations encoded by pre-trained language models. Multi-modal representations are generated by the fusion module, which includes rich semantic information about two heterogeneous biomedical entities: protein sequences and disease descriptions. Subsequently, the pooling aggregation strategy is adopted to compress the dimensions of the multi-modal representation. In addition, FusionGDA employs a pre-training phase leveraging a contrastive learning loss to extract potential gene and disease features by training on a large public GDA dataset. To rigorously evaluate the effectiveness of the FusionGDA model, we conduct comprehensive experiments on five datasets and compare our proposed model with five competitive baseline models on the DisGeNet-Eval dataset. Notably, our case study further demonstrates the ability of FusionGDA to discover hidden associations effectively. The complete code and datasets of our experiments are available at https://github.com/ZhaohanM/FusionGDA.

Assuntos

Aprendizado de Máquina , Humanos , Biologia Computacional/métodos , Predisposição Genética para Doença , Semântica , Algoritmos , Estudos de Associação Genética , Redes Neurais de Computação

Sequence pre-training-based graph neural network for predicting lncRNA-miRNA associations.

Wang, Zixiao; Liang, Shiyang; Liu, Siwei; Meng, Zhaohan; Wang, Jingjie; Liang, Shangsong.

Brief Bioinform ; 24(5)2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37651605

RESUMO

MicroRNAs (miRNAs) silence genes by binding to messenger RNAs, whereas long non-coding RNAs (lncRNAs) act as competitive endogenous RNAs (ceRNAs) that can relieve miRNA silencing effects and upregulate target gene expression. The ceRNA association between lncRNAs and miRNAs has been a research hotspot due to its medical importance, but it is challenging to verify experimentally. In this paper, we propose a novel deep learning scheme, i.e. sequence pre-training-based graph neural network (SPGNN), that combines pre-training and fine-tuning stages to predict lncRNA-miRNA associations from RNA sequences and the existing interactions represented as a graph. First, we utilize a sequence-to-vector technique to generate pre-trained embeddings based on the sequences of all RNAs during the pre-training stage. In the fine-tuning stage, we use Graph Neural Network to learn node representations from the heterogeneous graph constructed using lncRNA-miRNA association information. We evaluate our proposed scheme SPGNN on our newly collected animal lncRNA-miRNA association dataset and demonstrate that combining the $k$-mer technique and Doc2vec model for pre-training with the Simple Graph Convolution Network for fine-tuning is effective in predicting lncRNA-miRNA associations. Our approach outperforms state-of-the-art baselines across various evaluation metrics. We also conduct an ablation study and hyperparameter analysis to verify the effectiveness of each component and parameter of our scheme. The complete code and dataset are available on GitHub: https://github.com/zixwang/SPGNN.

Assuntos

MicroRNAs , RNA Longo não Codificante , Animais , MicroRNAs/genética , RNA Longo não Codificante/genética , Benchmarking , Redes Neurais de Computação , RNA Mensageiro

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA