Pesquisa | Biblioteca Virtual em Saúde

A probabilistic knowledge graph for target identification.

Liu, Chang; Xiao, Kaimin; Yu, Cuinan; Lei, Yipin; Lyu, Kangbo; Tian, Tingzhong; Zhao, Dan; Zhou, Fengfeng; Tang, Haidong; Zeng, Jianyang.

PLoS Comput Biol ; 20(4): e1011945, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38578805

RESUMO

Early identification of safe and efficacious disease targets is crucial to alleviating the tremendous cost of drug discovery projects. However, existing experimental methods for identifying new targets are generally labor-intensive and failure-prone. On the other hand, computational approaches, especially machine learning-based frameworks, have shown remarkable application potential in drug discovery. In this work, we propose Progeni, a novel machine learning-based framework for target identification. In addition to fully exploiting the known heterogeneous biological networks from various sources, Progeni integrates literature evidence about the relations between biological entities to construct a probabilistic knowledge graph. Graph neural networks are then employed in Progeni to learn the feature embeddings of biological entities to facilitate the identification of biologically relevant target candidates. A comprehensive evaluation of Progeni demonstrated its superior predictive power over the baseline methods on the target identification task. In addition, our extensive tests showed that Progeni exhibited high robustness to the negative effect of exposure bias, a common phenomenon in recommendation systems, and effectively identified new targets that can be strongly supported by the literature. Moreover, our wet lab experiments successfully validated the biological significance of the top target candidates predicted by Progeni for melanoma and colorectal cancer. All these results suggested that Progeni can identify biologically effective targets and thus provide a powerful and useful tool for advancing the drug discovery process.

Assuntos

Biologia Computacional , Descoberta de Drogas , Aprendizado de Máquina , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Descoberta de Drogas/métodos , Algoritmos , Melanoma , Probabilidade , Neoplasias Colorretais

MRNDR: Multihead Attention-Based Recommendation Network for Drug Repurposing.

Feng, Xin; Ma, Zhansen; Yu, Cuinan; Xin, Ruihao.

J Chem Inf Model ; 64(7): 2654-2669, 2024 04 08.

Artigo em Inglês | MEDLINE | ID: mdl-38373300

RESUMO

As is well-known, the process of developing new drugs is extremely expensive, whereas drug repurposing represents a promising approach to augment the efficiency of new drug development. While this method can indeed spare us from expensive drug toxicity and safety experiments, it still demands a substantial amount of time to carry out precise efficacy experiments for specific diseases, thereby consuming a significant quantity of resources. Therefore, if we can prescreen potential other indications for selected drugs, it could result in substantial cost savings. In light of this, this paper introduces a drug repurposing recommendation model called MRNDR, which stands for Multi-head attention-based Recommendation Network for Drug Repurposing. This model serves as a prediction tool for drug-disease relationships, leveraging the multihead self-attention mechanism that demonstrates robust generalization capabilities. These capabilities stem not only from our extensive million-level training data set, BioRE (Biology Recommended Entity data), but also from the utilization of the WRDS (Weighted Representation Distance Score) algorithm proposed by us. The MRNDR model has achieved new state-of-the-art results on the GP-KG public data set, with an MRR (Mean Reciprocal Rank) score of 0.308 and a Hits@10 score of 0.628. This represents significant improvements of 4.7% (MRR) and 18.1% (Hits@10) over the current best-performing models. Additionally, to further validate the practical utility of the model, we examined results recommended by MRNDR that were not present in the training data set. Some of these recommendations have undergone clinical trials, as evidenced by their presence on ClinicalTrials.gov and the China Clinical Trials Center, indirectly confirming the applicability of MRNDR. The MRNDR model can predict the reusability of candidate drugs, reducing the need for manual expert assessments and enabling efficient drug repurposing.

Assuntos

Reposicionamento de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Reposicionamento de Medicamentos/métodos , Algoritmos

SDBA: Score Domain-Based Attention for DNA N4-Methylcytosine Site Prediction from Multiperspectives.

Xin, Ruihao; Zhang, Fan; Zheng, Jiaxin; Zhang, Yangyi; Yu, Cuinan; Feng, Xin.

J Chem Inf Model ; 64(7): 2839-2853, 2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-37646411

RESUMO

In tasks related to DNA sequence classification, choosing the appropriate encoding methods is challenging. Some of the methods encode sequences based on prior knowledge that limits the ability of the model to obtain multiperspective information from the sequences. We introduced a new trainable ensemble method based on the attention mechanism SDBA, which stands for Score Domain-Based Attention. Unlike other methods, we fed the task-independent encoding results into the models and dynamically ensembled features from different perspectives using the SDBA mechanism. This approach allows the model to acquire and weight sequence features voluntarily. SDBA is conceptually general and empirically powerful. It has achieved new state-of-the-art results on the benchmark data sets associated with DNA N4-methylcytosine site prediction.

Assuntos

Citosina , DNA , DNA/química , Citosina/análogos & derivados

Improving target-disease association prediction through a graph neural network with credibility information.

Liu, Chang; Yu, Cuinan; Lei, Yipin; Lyu, Kangbo; Tian, Tingzhong; Li, Qianhao; Zhao, Dan; Zhou, Fengfeng; Zeng, Jianyang.

Pac Symp Biocomput ; 28: 157-168, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36540973

RESUMO

Identifying effective target-disease associations (TDAs) can alleviate the tremendous cost incurred by clinical failures of drug development. Although many machine learning models have been proposed to predict potential novel TDAs rapidly, their credibility is not guaranteed, thus requiring extensive experimental validation. In addition, it is generally challenging for current models to predict meaningful associations for entities with less information, hence limiting the application potential of these models in guiding future research. Based on recent advances in utilizing graph neural networks to extract features from heterogeneous biological data, we develop CreaTDA, an end-to-end deep learning-based framework that effectively learns latent feature representations of targets and diseases to facilitate TDA prediction. We also propose a novel way of encoding credibility information obtained from literature to enhance the performance of TDA prediction and predict more novel TDAs with real evidence support from previous studies. Compared with state-of-the-art baseline methods, CreaTDA achieves substantially better prediction performance on the whole TDA network and its sparse sub-networks containing the proteins associated with few known diseases. Our results demonstrate that CreaTDA can provide a powerful and helpful tool for identifying novel target-disease associations, thereby facilitating drug discovery.

Assuntos

Biologia Computacional , Redes Neurais de Computação , Humanos , Biologia Computacional/métodos , Aprendizado de Máquina , Descoberta de Drogas , Proteínas

IDDLncLoc: Subcellular Localization of LncRNAs Based on a Framework for Imbalanced Data Distributions.

Wang, Yan; Zhu, Xiaopeng; Yang, Lili; Hu, Xuemei; He, Kai; Yu, Cuinan; Jiao, Shaoqing; Chen, Jiali; Guo, Rui; Yang, Sen.

Interdiscip Sci ; 14(2): 409-420, 2022 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-35192174

RESUMO

Long non-coding RNAs play a crucial role in many life processes of cell, such as genetic markers, RNA splicing, signaling, and protein regulation. Considering that identifying lncRNA's localization in the cell through experimental methods is complicated, hard to reproduce, and expensive, we propose a novel method named IDDLncLoc in this paper, which adopts an ensemble model to solve the problem of the subcellular localization. In the proposal model, dinucleotide-based auto-cross covariance features, k-mer nucleotide composition features, and composition, transition, and distribution features are introduced to encode a raw RNA sequence to vector. To screen out reliable features, feature selection through binomial distribution, and recursive feature elimination is employed. Furthermore, strategies of oversampling in mini-batch, random sampling, and stacking ensemble strategies are customized to overcome the problem of data imbalance on the benchmark dataset. Finally, compared with the latest methods, IDDLncLoc achieves an accuracy of 94.96% on the benchmark dataset, which is 2.59% higher than the best method, and the results further demonstrate IDDLncLoc is excellent on the subcellular localization of lncRNA. Besides, a user-friendly web server is available at http://lncloc.club .

Assuntos

RNA Longo não Codificante , Sequência de Bases , Biologia Computacional/métodos , Nucleotídeos , Proteínas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA