Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
IEEE J Biomed Health Inform ; 28(8): 5007-5019, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38568768

ABSTRACT

In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. Two case studies on extracting pathway information from literature of non-small cell lung cancer and Alzheimer's disease further demonstrate the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.


Subject(s)
Alzheimer Disease , Humans , Alzheimer Disease/genetics , Alzheimer Disease/diagnostic imaging , Data Mining/methods , Computational Biology/methods , Machine Learning , Image Processing, Computer-Assisted/methods , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/diagnostic imaging
2.
bioRxiv ; 2024 Jan 24.
Article in English | MEDLINE | ID: mdl-38328046

ABSTRACT

Background: Understanding complex biological pathways, including gene-gene interactions and gene regulatory networks, is critical for exploring disease mechanisms and drug development. Manual literature curation of biological pathways is useful but cannot keep up with the exponential growth of the literature. Large-scale language models (LLMs), notable for their vast parameter sizes and comprehensive training on extensive text corpora, have great potential in automated text mining of biological pathways. Method: This study assesses the effectiveness of 21 LLMs, including both API-based models and open-source models. The evaluation focused on two key aspects: gene regulatory relations (specifically, 'activation', 'inhibition', and 'phosphorylation') and KEGG pathway component recognition. The performance of these models was analyzed using statistical metrics such as precision, recall, F1 scores, and the Jaccard similarity index. Results: Our results indicated a significant disparity in model performance. Among the API-based models, ChatGPT-4 and Claude-Pro showed superior performance, with an F1 score of 0.4448 and 0.4386 for the gene regulatory relation prediction, and a Jaccard similarity index of 0.2778 and 0.2657 for the KEGG pathway prediction, respectively. Open-source models lagged their API-based counterparts, where Falcon-180b-chat and llama1-7b led with the highest performance in gene regulatory relations (F1 of 0.2787 and 0.1923, respectively) and KEGG pathway recognition (Jaccard similarity index of 0.2237 and 0. 2207, respectively). Conclusion: LLMs are valuable in biomedical research, especially in gene network analysis and pathway mapping. However, their effectiveness varies, necessitating careful model selection. This work also provided a case study and insight into using LLMs as knowledge graphs.

3.
bioRxiv ; 2023 Dec 23.
Article in English | MEDLINE | ID: mdl-38187653

ABSTRACT

ChatGPT has demonstrated its potential as a surrogate knowledge graph. Trained on extensive data sources, including open-access publications, peer-reviewed research articles and biomedical websites, ChatGPT extracted information on gene relationships and biological pathways. However, a major challenge is model hallucination, i.e., high false positive rates. To assess and address this challenge, we systematically evaluated ChatGPT's capacity for predicting gene relationships using GPT-3.5-turbo and GPT-4. Benchmarking against the KEGG Pathway Database as the ground truth, we experimented with diverse prompting strategies, targeting gene relationships of activation, inhibition, and phosphorylation. We introduced an innovative iterative prompt refinement technique. By assessing prompt efficacy using metrics like F-1 score, precision, and recall, GPT-4 was re-engaged to suggest improved prompts. A refined prompt, which combines a specialized role with explanatory text, significantly enhances the performance. Going beyond pairwise gene relationships, we also deciphered complex gene interplays, such as gene interaction chains and pathways pertinent to diseases like non-small cell lung cancer. Direct prompts showed limited success, but "least-to-most" prompting exhibited significant potentials for such network constructions. The methods in this study may be used for some other bioinformatics prediction problems.

SELECTION OF CITATIONS
SEARCH DETAIL