Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 223
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38609331

RESUMO

Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein-protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD's compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models' performances on the PEDD. This paper's outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.


Assuntos
Descoberta de Drogas , Processamento de Linguagem Natural , Transdução de Sinais
2.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36528805

RESUMO

In recent years, knowledge graphs (KGs) have gained a great deal of popularity as a tool for storing relationships between entities and for performing higher level reasoning. KGs in biomedicine and clinical practice aim to provide an elegant solution for diagnosing and treating complex diseases more efficiently and flexibly. Here, we provide a systematic review to characterize the state-of-the-art of KGs in the area of complex disease research. We cover the following topics: (1) knowledge sources, (2) entity extraction methods, (3) relation extraction methods and (4) the application of KGs in complex diseases. As a result, we offer a complete picture of the domain. Finally, we discuss the challenges in the field by identifying gaps and opportunities for further research and propose potential research directions of KGs for complex disease diagnosis and treatment.


Assuntos
Reconhecimento Automatizado de Padrão
3.
Methods ; 226: 9-18, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38604412

RESUMO

Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.


Assuntos
Mineração de Dados , Aprendizado de Máquina , Mineração de Dados/métodos , Humanos , Semântica , Processamento de Linguagem Natural , Algoritmos
4.
Methods ; 231: 8-14, 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39241919

RESUMO

Biomedical event causal relation extraction (BECRE), as a subtask of biomedical information extraction, aims to extract event causal relation facts from unstructured biomedical texts and plays an essential role in many downstream tasks. The existing works have two main problems: i) Only shallow features are limited in helping the model establish potential relationships between biomedical events. ii) Using the traditional oversampling method to solve the data imbalance problem of the BECRE tasks ignores the requirements for data diversifying. This paper proposes a novel biomedical event causal relation extraction method to solve the above problems using deep knowledge fusion and Roberta-based data augmentation. To address the first problem, we fuse deep knowledge, including structural event representation and entity relation path, for establishing potential semantic connections between biomedical events. We use the Graph Convolutional Neural network (GCN) and the predicated tensor model to acquire structural event representation, and entity relation paths are encoded based on the external knowledge bases (GTD, CDR, CHR, GDA and UMLS). We introduce the triplet attention mechanism to fuse structural event representation and entity relation path information. Besides, this paper proposes the Roberta-based data augmentation method to address the second problem, some words of biomedical text, except biomedical events, are masked proportionally and randomly, and then pre-trained Roberta generates data instances for the imbalance BECRE dataset. Extensive experimental results on Hahn-Powell's and BioCause datasets confirm that the proposed method achieves state-of-the-art performance compared to current advances.

5.
BMC Bioinformatics ; 25(1): 112, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38486137

RESUMO

BACKGROUND: The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS: We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS: MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.


Assuntos
Poder Psicológico , Semântica , PubMed
6.
BMC Bioinformatics ; 25(1): 101, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448845

RESUMO

PURPOSE: The expansion of research across various disciplines has led to a substantial increase in published papers and journals, highlighting the necessity for reliable text mining platforms for database construction and knowledge acquisition. This abstract introduces GPDMiner(Gene, Protein, and Disease Miner), a platform designed for the biomedical domain, addressing the challenges posed by the growing volume of academic papers. METHODS: GPDMiner is a text mining platform that utilizes advanced information retrieval techniques. It operates by searching PubMed for specific queries, extracting and analyzing information relevant to the biomedical field. This system is designed to discern and illustrate relationships between biomedical entities obtained from automated information extraction. RESULTS: The implementation of GPDMiner demonstrates its efficacy in navigating the extensive corpus of biomedical literature. It efficiently retrieves, extracts, and analyzes information, highlighting significant connections between genes, proteins, and diseases. The platform also allows users to save their analytical outcomes in various formats, including Excel and images. CONCLUSION: GPDMiner offers a notable additional functionality among the array of text mining tools available for the biomedical field. This tool presents an effective solution for researchers to navigate and extract relevant information from the vast unstructured texts found in biomedical literature, thereby providing distinctive capabilities that set it apart from existing methodologies. Its application is expected to greatly benefit researchers in this domain, enhancing their capacity for knowledge discovery and data management.


Assuntos
Gerenciamento de Dados , Mineração de Dados , Bases de Dados Factuais , Descoberta do Conhecimento , PubMed
7.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35849818

RESUMO

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for biomedical RE only focus on relations of a single type (e.g. protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then, we present a first-of-its-kind biomedical relation extraction dataset (BioRED) with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Furthermore, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. We assess the utility of BioRED by benchmarking several existing state-of-the-art methods, including Bidirectional Encoder Representations from Transformers (BERT)-based models, on the NER and RE tasks. Our results show that while existing approaches can reach high performance on the NER task (F-score of 89.3%), there is much room for improvement for the RE task, especially when extracting novel relations (F-score of 47.7%). Our experiments also demonstrate that such a rich dataset can successfully facilitate the development of more accurate, efficient and robust RE systems for biomedicine. Availability: The BioRED dataset and annotation guidelines are freely available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/.


Assuntos
Algoritmos , Mineração de Dados , Proteínas , PubMed
8.
J Biomed Inform ; 156: 104676, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38876451

RESUMO

Biomedical relation extraction has long been considered a challenging task due to the specialization and complexity of biomedical texts. Syntactic knowledge has been widely employed in existing research to enhance relation extraction, providing guidance for the semantic understanding and text representation of models. However, the utilization of syntactic knowledge in most studies is not exhaustive, and there is often a lack of fine-grained noise reduction, leading to confusion in relation classification. In this paper, we propose an attention generator that comprehensively considers both syntactic dependency type information and syntactic position information to distinguish the importance of different dependency connections. Additionally, we integrate positional information, dependency type information, and word representations together to introduce location-enhanced syntactic knowledge for guiding our biomedical relation extraction. Experimental results on three widely used English benchmark datasets in the biomedical domain consistently outperform a range of baseline models, demonstrating that our approach not only makes full use of syntactic knowledge but also effectively reduces the impact of noisy words.


Assuntos
Processamento de Linguagem Natural , Semântica , Mineração de Dados/métodos , Algoritmos , Humanos
9.
J Biomed Inform ; 153: 104630, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38548007

RESUMO

OBJECTIVE: To develop soft prompt-based learning architecture for large language models (LLMs), examine prompt-tuning using frozen/unfrozen LLMs, and assess their abilities in transfer learning and few-shot learning. METHODS: We developed a soft prompt-based learning architecture and compared 4 strategies including (1) fine-tuning without prompts; (2) hard-prompting with unfrozen LLMs; (3) soft-prompting with unfrozen LLMs; and (4) soft-prompting with frozen LLMs. We evaluated GatorTron, a clinical LLM with up to 8.9 billion parameters, and compared GatorTron with 4 existing transformer models for clinical concept and relation extraction on 2 benchmark datasets for adverse drug events and social determinants of health (SDoH). We evaluated the few-shot learning ability and generalizability for cross-institution applications. RESULTS AND CONCLUSION: When LLMs are unfrozen, GatorTron-3.9B with soft prompting achieves the best strict F1-scores of 0.9118 and 0.8604 for concept extraction, outperforming the traditional fine-tuning and hard prompt-based models by 0.6 âˆ¼ 3.1 % and 1.2 âˆ¼ 2.9 %, respectively; GatorTron-345 M with soft prompting achieves the best F1-scores of 0.8332 and 0.7488 for end-to-end relation extraction, outperforming other two models by 0.2 âˆ¼ 2 % and 0.6 âˆ¼ 11.7 %, respectively. When LLMs are frozen, small LLMs have a big gap to be competitive with unfrozen models; scaling LLMs up to billions of parameters makes frozen LLMs competitive with unfrozen models. Soft prompting with a frozen GatorTron-8.9B model achieved the best performance for cross-institution evaluation. We demonstrate that (1) machines can learn soft prompts better than hard prompts composed by human, (2) frozen LLMs have good few-shot learning ability and generalizability for cross-institution applications, (3) frozen LLMs reduce computing cost to 2.5 âˆ¼ 6 % of previous methods using unfrozen LLMs, and (4) frozen LLMs require large models (e.g., over several billions of parameters) for good performance.


Assuntos
Processamento de Linguagem Natural , Humanos , Aprendizado de Máquina , Mineração de Dados/métodos , Algoritmos , Determinantes Sociais da Saúde , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos
10.
J Biomed Inform ; 155: 104658, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38782169

RESUMO

OBJECTIVE: Relation extraction is an essential task in the field of biomedical literature mining and offers significant benefits for various downstream applications, including database curation, drug repurposing, and literature-based discovery. The broad-coverage natural language processing (NLP) tool SemRep has established a solid baseline for extracting subject-predicate-object triples from biomedical text and has served as the backbone of the Semantic MEDLINE Database (SemMedDB), a PubMed-scale repository of semantic triples. While SemRep achieves reasonable precision (0.69), its recall is relatively low (0.42). In this study, we aimed to enhance SemRep using a relation classification approach, in order to eventually increase the size and the utility of SemMedDB. METHODS: We combined and extended existing SemRep evaluation datasets to generate training data. We leveraged the pre-trained PubMedBERT model, enhancing it through additional contrastive pre-training and fine-tuning. We experimented with three entity representations: mentions, semantic types, and semantic groups. We evaluated the model performance on a portion of the SemRep Gold Standard dataset and compared it to SemRep performance. We also assessed the effect of the model on a larger set of 12K randomly selected PubMed abstracts. RESULTS: Our results show that the best model yields a precision of 0.62, recall of 0.81, and F1 score of 0.70. Assessment on 12K abstracts shows that the model could double the size of SemMedDB, when applied to entire PubMed. We also manually assessed the quality of 506 triples predicted by the model that SemRep had not previously identified, and found that 67% of these triples were correct. CONCLUSION: These findings underscore the promise of our model in achieving a more comprehensive coverage of relationships mentioned in biomedical literature, thereby showing its potential in enhancing various downstream applications of biomedical literature mining. Data and code related to this study are available at https://github.com/Michelle-Mings/SemRep_RelationClassification.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Semântica , Mineração de Dados/métodos , MEDLINE , PubMed , Algoritmos , Humanos , Bases de Dados Factuais
11.
J Biomed Inform ; 157: 104717, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39209087

RESUMO

BACKGROUND AND OBJECTIVE: Biomedical relation extraction aims to reveal the relation between entities in medical texts. Currently, the relation extraction models that have attracted much attention are mainly to fine-tune the pre-trained language models (PLMs) or add template prompt learning, which also limits the ability of the model to deal with grammatical dependencies. Graph convolutional networks (GCNs) can play an important role in processing syntactic dependencies in biomedical texts. METHODS: In this work, we propose a biomedical relation extraction model that fuses GCNs enhanced prompt learning to handle limitations in syntactic dependencies and achieve good performance. Specifically, we propose a model that combines prompt learning with GCNs for relation extraction, by integrating the syntactic dependency information analyzed by GCNs into the prompt learning model, by predicting the correspondence with [MASK] tokens labels for relation extraction. RESULTS: Our model achieved F1 scores of 85.57%, 80.15%, 95.10%, and 84.11% in the biomedical relation extraction datasets GAD, ChemProt, PGR, and DDI, respectively, all of which outperform some existing baseline models. CONCLUSIONS: In this paper, we propose enhancing prompt learning through GCNs, integrating syntactic information into biomedical relation extraction tasks. Experimental results show that our proposed method achieves excellent performance in the biomedical relation extraction task.


Assuntos
Processamento de Linguagem Natural , Redes Neurais de Computação , Algoritmos , Humanos , Mineração de Dados/métodos , Aprendizado de Máquina
12.
J Biomed Inform ; : 104733, 2024 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-39368528

RESUMO

Electronic Health Records (EHRs) contain various valuable medical entities and their relationships. Although the extraction of biomedical relationships has achieved good results in the mining of electronic health records and the construction of biomedical knowledge bases, there are still some problems. There may be implied complex associations between entities and relationships in overlapping triplets, and ignoring these interactions may lead to a decrease in the accuracy of entity extraction.To address this issue, a joint extraction model for medical entity relations based on a relation attention mechanism is proposed. The relation extraction module identifies candidate relationships within a sentence. The attention mechanism based on these relationships assigns weights to contextual words in the sentence that are associated with different relationships. Additionally, it extracts the subject and object entities.Under a specific relationship, entity vector representations are utilized to construct a global entity matching matrix based on Biaffine transformations. This matrix is designed to enhance the semantic dependencies and relational representations between entities, enabling triplet extraction. This allows the two subtasks of named entity recognition and relation extraction to be interrelated, fully utilizing contextual information within the sentence, and effectively addresses the issue of overlapping triplets. Experimental observations from the CMeIE Chinese medical relation extraction dataset and the Baidu2019 Chinese dataset confirm that our approach yields the superior F1 score across all cutting-edge baselines. Moreover, it offers substantial performance improvements in intricate situations involving diverse overlapping patterns, multitudes of triplets, and cross-sentence triplets.

13.
J Biomed Inform ; 151: 104603, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38331081

RESUMO

BACKGROUND: An adverse drug event (ADE) is any unfavorable effect that occurs due to the use of a drug. Extracting ADEs from unstructured clinical notes is essential to biomedical text extraction research because it helps with pharmacovigilance and patient medication studies. OBJECTIVE: From the considerable amount of clinical narrative text, natural language processing (NLP) researchers have developed methods for extracting ADEs and their related attributes. This work presents a systematic review of current methods. METHODOLOGY: Two biomedical databases have been searched from June 2022 until December 2023 for relevant publications regarding this review, namely the databases PubMed and Medline. Similarly, we searched the multi-disciplinary databases IEEE Xplore, Scopus, ScienceDirect, and the ACL Anthology. We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 statement guidelines and recommendations for reporting systematic reviews in conducting this review. Initially, we obtained 5,537 articles from the search results from the various databases between 2015 and 2023. Based on predefined inclusion and exclusion criteria for article selection, 100 publications have undergone full-text review, of which we consider 82 for our analysis. RESULTS: We determined the general pattern for extracting ADEs from clinical notes, with named entity recognition (NER) and relation extraction (RE) being the dual tasks considered. Researchers that tackled both NER and RE simultaneously have approached ADE extraction as a "pipeline extraction" problem (n = 22), as a "joint task extraction" problem (n = 7), and as a "multi-task learning" problem (n = 6), while others have tackled only NER (n = 27) or RE (n = 20). We further grouped the reviews based on the approaches for data extraction, namely rule-based (n = 8), machine learning (n = 11), deep learning (n = 32), comparison of two or more approaches (n = 11), hybrid (n = 12) and large language models (n = 8). The most used datasets are MADE 1.0, TAC 2017 and n2c2 2018. CONCLUSION: Extracting ADEs is crucial, especially for pharmacovigilance studies and patient medications. This survey showcases advances in ADE extraction research, approaches, datasets, and state-of-the-art performance in them. Challenges and future research directions are highlighted. We hope this review will guide researchers in gaining background knowledge and developing more innovative ways to address the challenges.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Aprendizado de Máquina , Farmacovigilância , Registros Eletrônicos de Saúde , Bases de Dados Factuais , Aprendizado Profundo
14.
J Biomed Inform ; 152: 104621, 2024 04.
Artigo em Inglês | MEDLINE | ID: mdl-38447600

RESUMO

OBJECTIVE: The primary objective of this review is to investigate the effectiveness of machine learning and deep learning methodologies in the context of extracting adverse drug events (ADEs) from clinical benchmark datasets. We conduct an in-depth analysis, aiming to compare the merits and drawbacks of both machine learning and deep learning techniques, particularly within the framework of named-entity recognition (NER) and relation classification (RC) tasks related to ADE extraction. Additionally, our focus extends to the examination of specific features and their impact on the overall performance of these methodologies. In a broader perspective, our research extends to ADE extraction from various sources, including biomedical literature, social media data, and drug labels, removing the limitation to exclusively machine learning or deep learning methods. METHODS: We conducted an extensive literature review on PubMed using the query "(((machine learning [Medical Subject Headings (MeSH) Terms]) OR (deep learning [MeSH Terms])) AND (adverse drug event [MeSH Terms])) AND (extraction)", and supplemented this with a snowballing approach to review 275 references sourced from retrieved articles. RESULTS: In our analysis, we included twelve articles for review. For the NER task, deep learning models outperformed machine learning models. In the RC task, gradient Boosting, multilayer perceptron and random forest models excelled. The Bidirectional Encoder Representations from Transformers (BERT) model consistently achieved the best performance in the end-to-end task. Future efforts in the end-to-end task should prioritize improving NER accuracy, especially for 'ADE' and 'Reason'. CONCLUSION: These findings hold significant implications for advancing the field of ADE extraction and pharmacovigilance, ultimately contributing to improved drug safety monitoring and healthcare outcomes.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Inteligência Artificial , Farmacovigilância , Benchmarking , Processamento de Linguagem Natural
15.
BMC Med Inform Decis Mak ; 24(1): 205, 2024 Jul 24.
Artigo em Inglês | MEDLINE | ID: mdl-39049015

RESUMO

BACKGROUND: Biomedical Relation Extraction (RE) is essential for uncovering complex relationships between biomedical entities within text. However, training RE classifiers is challenging in low-resource biomedical applications with few labeled examples. METHODS: We explore the potential of Shortest Dependency Paths (SDPs) to aid biomedical RE, especially in situations with limited labeled examples. In this study, we suggest various approaches to employ SDPs when creating word and sentence representations under supervised, semi-supervised, and in-context-learning settings. RESULTS: Through experiments on three benchmark biomedical text datasets, we find that incorporating SDP-based representations enhances the performance of RE classifiers. The improvement is especially notable when working with small amounts of labeled data. CONCLUSION: SDPs offer valuable insights into the complex sentence structure found in many biomedical text passages. Our study introduces several straightforward techniques that, as demonstrated experimentally, effectively enhance the accuracy of RE classifiers.


Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Aprendizado de Máquina
16.
BMC Med Inform Decis Mak ; 24(1): 218, 2024 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-39085892

RESUMO

BACKGROUND: Most Chinese joint entity and relation extraction tasks in medicine involve numerous nested entities, overlapping relations, and other challenging extraction issues. In response to these problems, some traditional methods decompose the joint extraction task into multiple steps or multiple modules, resulting in local dependency in the meantime. METHODS: To alleviate this issue, we propose a joint extraction model of Chinese medical entities and relations based on RoBERTa and single-module global pointer, namely RSGP, which formulates joint extraction as a global pointer linking problem. Considering the uniqueness of Chinese language structure, we introduce the RoBERTa-wwm pre-trained language model at the encoding layer to obtain a better embedding representation. Then, we represent the input sentence as a third-order tensor and score each position in the tensor to prepare for the subsequent process of decoding the triples. In the end, we design a novel single-module global pointer decoding approach to alleviate the generation of redundant information. Specifically, we analyze the decoding process of single character entities individually, improving the time and space performance of RSGP to some extent. RESULTS: In order to verify the effectiveness of our model in extracting Chinese medical entities and relations, we carry out the experiments on the public dataset, CMeIE. Experimental results show that RSGP performs significantly better on the joint extraction of Chinese medical entities and relations, and achieves state-of-the-art results compared with baseline models. CONCLUSION: The proposed RSGP can effectively extract entities and relations from Chinese medical texts and help to realize the structure of Chinese medical texts, so as to provide high-quality data support for the construction of Chinese medical knowledge graphs.


Assuntos
Processamento de Linguagem Natural , Humanos , China , Mineração de Dados , População do Leste Asiático
17.
J Med Syst ; 48(1): 81, 2024 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-39214943

RESUMO

Artificial intelligence (AI) has become a crucial element of modern technology, especially in the healthcare sector, which is apparent given the continuous development of large language models (LLMs), which are utilized in various domains, including medical beings. However, when it comes to using these LLMs for the medical domain, there's a need for an evaluation platform to determine their suitability and drive future development efforts. Towards that end, this study aims to address this concern by developing a comprehensive Multi-Criteria Decision Making (MCDM) approach that is specifically designed to evaluate medical LLMs. The success of AI, particularly LLMs, in the healthcare domain, depends on their efficacy, safety, and ethical compliance. Therefore, it is essential to have a robust evaluation framework for their integration into medical contexts. This study proposes using the Fuzzy-Weighted Zero-InConsistency (FWZIC) method extended to p, q-quasirung orthopair fuzzy set (p, q-QROFS) for weighing evaluation criteria. This extension enables the handling of uncertainties inherent in medical decision-making processes. The approach accommodates the imprecise and multifaceted nature of real-world medical data and criteria by incorporating fuzzy logic principles. The MultiAtributive Ideal-Real Comparative Analysis (MAIRCA) method is employed for the assessment of medical LLMs utilized in the case study of this research. The results of this research revealed that "Medical Relation Extraction" criteria with its sub-levels had more importance with (0.504) than "Clinical Concept Extraction" with (0.495). For the LLMs evaluated, out of 6 alternatives, ( A 4 ) "GatorTron S 10B" had the 1st rank as compared to ( A 1 ) "GatorTron 90B" had the 6th rank. The implications of this study extend beyond academic discourse, directly impacting healthcare practices and patient outcomes. The proposed framework can help healthcare professionals make more informed decisions regarding the adoption and utilization of LLMs in medical settings.


Assuntos
Inteligência Artificial , Lógica Fuzzy , Humanos , Tomada de Decisões
18.
Entropy (Basel) ; 26(5)2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38785601

RESUMO

Supervised learning methods excel in traditional relation extraction tasks. However, the quality and scale of the training data heavily influence their performance. Few-shot relation extraction is gradually becoming a research hotspot whose objective is to learn and extract semantic relationships between entities with only a limited number of annotated samples. In recent years, numerous studies have employed prototypical networks for few-shot relation extraction. However, these methods often suffer from overfitting of the relation classes, making it challenging to generalize effectively to new relationships. Therefore, this paper seeks to utilize a diffusion model for data augmentation to address the overfitting issue of prototypical networks. We propose a diffusion model-enhanced prototypical network framework. Specifically, we design and train a controllable conditional relation generation diffusion model on the relation extraction dataset, which can generate the corresponding instance representation according to the relation description. Building upon the trained diffusion model, we further present a pseudo-sample-enhanced prototypical network, which is able to provide more accurate representations for prototype classes, thereby alleviating overfitting and better generalizing to unseen relation classes. Additionally, we introduce a pseudo-sample-aware attention mechanism to enhance the model's adaptability to pseudo-sample data through a cross-entropy loss, further improving the model's performance. A series of experiments are conducted to prove our method's effectiveness. The results indicate that our proposed approach significantly outperforms existing methods, particularly in low-resource one-shot environments. Further ablation analyses underscore the necessity of each module in the model. As far as we know, this is the first research to employ a diffusion model for enhancing the prototypical network through data augmentation in few-shot relation extraction.

19.
Entropy (Basel) ; 26(3)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38539722

RESUMO

Recent years have seen a rise in interest in document-level relation extraction, which is defined as extracting all relations between entities in multiple sentences of a document. Typically, there are multiple mentions corresponding to a single entity in this context. Previous research predominantly employed a holistic representation for each entity to predict relations, but this approach often overlooks valuable information contained in fine-grained entity mentions. We contend that relation prediction and inference should be grounded in specific entity mentions rather than abstract entity concepts. To address this, our paper proposes a two-stage mention-level framework based on an enhanced heterogeneous graph attention network for document-level relation extraction. Our framework employs two different strategies to model intra-sentential and inter-sentential relations between fine-grained entity mentions, yielding local mention representations for intra-sentential relation prediction and global mention representations for inter-sentential relation prediction. For inter-sentential relation prediction and inference, we propose an enhanced heterogeneous graph attention network to better model the long-distance semantic relationships and design an entity-coreference path-based inference strategy to conduct relation inference. Moreover, we introduce a novel cross-entropy-based multilabel focal loss function to address the class imbalance problem and multilabel prediction simultaneously. Comprehensive experiments have been conducted to verify the effectiveness of our framework. Experimental results show that our approach significantly outperforms the existing methods.

20.
BMC Bioinformatics ; 24(1): 412, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-37915001

RESUMO

BACKGROUND: The PubMed archive contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: (1) they identify a relationship but not the type of relationship, (2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, (3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or (4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues. RESULTS: We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches. CONCLUSIONS: SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.


Assuntos
Algoritmos , Neoplasias , Humanos , PubMed , Conhecimento , Descoberta do Conhecimento
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa