Pesquisa | BVS Doenças Infecciosas e Parasitárias

An extensive benchmark study on biomedical text generation and mining with ChatGPT.

Chen, Qijie; Sun, Haotong; Liu, Haoyang; Jiang, Yinghui; Ran, Ting; Jin, Xurui; Xiao, Xianglu; Lin, Zhimin; Chen, Hongming; Niu, Zhangmin.

Bioinformatics ; 39(9)2023 Sep 02.

Artigo em Inglês | MEDLINE | ID: mdl-37682111

RESUMO

MOTIVATION: In recent years, the development of natural language process (NLP) technologies and deep learning hardware has led to significant improvement in large language models (LLMs). The ChatGPT, the state-of-the-art LLM built on GPT-3.5 and GPT-4, shows excellent capabilities in general language understanding and reasoning. Researchers also tested the GPTs on a variety of NLP-related tasks and benchmarks and got excellent results. With exciting performance on daily chat, researchers began to explore the capacity of ChatGPT on expertise that requires professional education for human and we are interested in the biomedical domain. RESULTS: To evaluate the performance of ChatGPT on biomedical-related tasks, this article presents a comprehensive benchmark study on the use of ChatGPT for biomedical corpus, including article abstracts, clinical trials description, biomedical questions, and so on. Typical NLP tasks like named entity recognization, relation extraction, sentence similarity, question and answering, and document classification are included. Overall, ChatGPT got a BLURB score of 58.50 while the state-of-the-art model had a score of 84.30. Through a series of experiments, we demonstrated the effectiveness and versatility of ChatGPT in biomedical text understanding, reasoning and generation, and the limitation of ChatGPT build on GPT-3.5. AVAILABILITY AND IMPLEMENTATION: All the datasets are available from BLURB benchmark https://microsoft.github.io/BLURB/index.html. The prompts are described in the article.

PharmKG: a dedicated knowledge graph benchmark for bomedical data mining.

Zheng, Shuangjia; Rao, Jiahua; Song, Ying; Zhang, Jixian; Xiao, Xianglu; Fang, Evandro Fei; Yang, Yuedong; Niu, Zhangming.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33341877

RESUMO

Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.

Assuntos

Pesquisa Biomédica , Mineração de Dados , Redes Neurais de Computação , Semântica , Software , Benchmarking , Humanos

PharmaBench: Enhancing ADMET benchmarks with large language models.

Niu, Zhangming; Xiao, Xianglu; Wu, Wenfan; Cai, Qiwei; Jiang, Yinghui; Jin, Wangzhen; Wang, Minhao; Yang, Guojian; Kong, Lingkang; Jin, Xurui; Yang, Guang; Chen, Hongming.

Sci Data ; 11(1): 985, 2024 Sep 10.

Artigo em Inglês | MEDLINE | ID: mdl-39256394

RESUMO

Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing ADMET-related benchmark sets are limited in utility due to their small dataset sizes and the lack of representation of compounds used in drug discovery projects. These shortcomings hinder their application in model building for drug discovery. To address this issue, we propose a multi-agent data mining system based on Large Language Models that effectively identifies experimental conditions within 14,401 bioassays. This approach facilitates merging entries from different sources, culminating in the creation of PharmaBench. Additionally, we have developed a data processing workflow to integrate data from various sources, resulting in 156,618 raw entries. Through this workflow, we constructed PharmaBench, a comprehensive benchmark set for ADMET properties, which comprises eleven ADMET datasets and 52,482 entries. This benchmark set is designed to serve as an open-source dataset for the development of AI models relevant to drug discovery projects.

Assuntos

Benchmarking , Descoberta de Drogas , Mineração de Dados , Farmacocinética , Preparações Farmacêuticas , Humanos

Can generative AI replace immunofluorescent staining processes? A comparison study of synthetically generated cellpainting images from brightfield.

Xing, Xiaodan; Murdoch, Siofra; Tang, Chunling; Papanastasiou, Giorgos; Cross-Zamirski, Jan; Guo, Yunzhe; Xiao, Xianglu; Schönlieb, Carola-Bibiane; Wang, Yinhai; Yang, Guang.

Comput Biol Med ; 182: 109102, 2024 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-39255659

RESUMO

Cell imaging assays utilising fluorescence stains are essential for observing sub-cellular organelles and their responses to perturbations. Immunofluorescent staining process is routinely in labs, however the recent innovations in generative AI is challenging the idea of wet lab immunofluorescence (IF) staining. This is especially true when the availability and cost of specific fluorescence dyes is a problem to some labs. Furthermore, staining process takes time and leads to inter-intra-technician and hinders downstream image and data analysis, and the reusability of image data for other projects. Recent studies showed the use of generated synthetic IF images from brightfield (BF) images using generative AI algorithms in the literature. Therefore, in this study, we benchmark and compare five models from three types of IF generation backbones-CNN, GAN, and diffusion models-using a publicly available dataset. This paper not only serves as a comparative study to determine the best-performing model but also proposes a comprehensive analysis pipeline for evaluating the efficacy of generators in IF image synthesis. We highlighted the potential of deep learning-based generators for IF image synthesis, while also discussed potential issues and future research directions. Although generative AI shows promise in simplifying cell phenotyping using only BF images with IF staining, further research and validations are needed to address the key challenges of model generalisability, batch effects, feature relevance and computational costs.

Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction.

Jiang, Yinghui; Jin, Shuting; Jin, Xurui; Xiao, Xianglu; Wu, Wenfan; Liu, Xiangrong; Zhang, Qiang; Zeng, Xiangxiang; Yang, Guang; Niu, Zhangming.

Commun Chem ; 6(1): 60, 2023 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-37012352

RESUMO

Informative representation of molecules is a crucial prerequisite in AI-driven drug design and discovery. Pharmacophore information including functional groups and chemical reactions can indicate molecular properties, which have not been fully exploited by prior atom-based molecular graph representation. To obtain a more informative representation of molecules for better molecule property prediction, we propose the Pharmacophoric-constrained Heterogeneous Graph Transformer (PharmHGT). We design a pharmacophoric-constrained multi-views molecular representation graph, enabling PharmHGT to extract vital chemical information from functional substructures and chemical reactions. With a carefully designed pharmacophoric-constrained multi-view molecular representation graph, PharmHGT can learn more chemical information from molecular functional substructures and chemical reaction information. Extensive downstream experiments prove that PharmHGT achieves remarkably superior performance over the state-of-the-art models the performance of our model is up to 1.55% in ROC-AUC and 0.272 in RMSE higher than the best baseline model) on molecular properties prediction. The ablation study and case study show that our proposed molecular graph representation method and heterogeneous graph transformer model can better capture the pharmacophoric structure and chemical information features. Further visualization studies also indicated a better representation capacity achieved by our model.

Amelioration of Alzheimer's disease pathology by mitophagy inducers identified via machine learning and a cross-species workflow.

Xie, Chenglong; Zhuang, Xu-Xu; Niu, Zhangming; Ai, Ruixue; Lautrup, Sofie; Zheng, Shuangjia; Jiang, Yinghui; Han, Ruiyu; Gupta, Tanima Sen; Cao, Shuqin; Lagartos-Donate, Maria Jose; Cai, Cui-Zan; Xie, Li-Ming; Caponio, Domenica; Wang, Wen-Wen; Schmauck-Medina, Tomas; Zhang, Jianying; Wang, He-Ling; Lou, Guofeng; Xiao, Xianglu; Zheng, Wenhua; Palikaras, Konstantinos; Yang, Guang; Caldwell, Kim A; Caldwell, Guy A; Shen, Han-Ming; Nilsen, Hilde; Lu, Jia-Hong; Fang, Evandro F.

Nat Biomed Eng ; 6(1): 76-93, 2022 01.

Artigo em Inglês | MEDLINE | ID: mdl-34992270

RESUMO

A reduced removal of dysfunctional mitochondria is common to aging and age-related neurodegenerative pathologies such as Alzheimer's disease (AD). Strategies for treating such impaired mitophagy would benefit from the identification of mitophagy modulators. Here we report the combined use of unsupervised machine learning (involving vector representations of molecular structures, pharmacophore fingerprinting and conformer fingerprinting) and a cross-species approach for the screening and experimental validation of new mitophagy-inducing compounds. From a library of naturally occurring compounds, the workflow allowed us to identify 18 small molecules, and among them two potent mitophagy inducers (Kaempferol and Rhapontigenin). In nematode and rodent models of AD, we show that both mitophagy inducers increased the survival and functionality of glutamatergic and cholinergic neurons, abrogated amyloid-ß and tau pathologies, and improved the animals' memory. Our findings suggest the existence of a conserved mechanism of memory loss across the AD models, this mechanism being mediated by defective mitophagy. The computational-experimental screening and validation workflow might help uncover potent mitophagy modulators that stimulate neuronal health and brain homeostasis.

Assuntos

Doença de Alzheimer , Mitofagia , Doença de Alzheimer/tratamento farmacológico , Doença de Alzheimer/patologia , Peptídeos beta-Amiloides , Animais , Aprendizado de Máquina , Mitofagia/fisiologia , Fluxo de Trabalho

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA