Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 131: 104120, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35709900

RESUMO

OBJECTIVE: Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology. MATERIALS AND METHODS: We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility. RESULTS: SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs. CONCLUSION: For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Suplementos Nutricionais , PubMed , Semântica
2.
J Biomed Inform ; 115: 103696, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33571675

RESUMO

OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. METHODS: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from PubMed and other COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative and accurate subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant. We used this subset to construct a knowledge graph, and applied five state-of-the-art, neural knowledge graph completion algorithms (i.e., TransE, RotatE, DistMult, ComplEx, and STELP) to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. RESULTS: Accuracy classifier based on PubMedBERT achieved the best performance (F1 = 0.854) in identifying accurate semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1 = 0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as others that have not yet been studied. Discovery patterns enabled identification of additional candidate drugs and generation of plausible hypotheses regarding the links between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (i.e., paclitaxel, SB 203580, alpha 2-antiplasmin, metoclopramide, and oxymatrine) and the mechanistic explanations for their potential use are further discussed. CONCLUSION: We showed that a LBD approach can be feasible not only for discovering drug candidates for COVID-19, but also for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions. Source code and data are available at https://github.com/kilicogluh/lbd-covid.


Assuntos
Tratamento Farmacológico da COVID-19 , Reposicionamento de Medicamentos , Descoberta do Conhecimento , Algoritmos , Antivirais/uso terapêutico , COVID-19/virologia , Humanos , Redes Neurais de Computação , SARS-CoV-2/isolamento & purificação
3.
BMC Bioinformatics ; 21(1): 188, 2020 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-32410573

RESUMO

BACKGROUND: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS: A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS: SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , Semântica , Humanos , Processamento de Linguagem Natural , PubMed , Unified Medical Language System
4.
J Biomed Inform ; 98: 103275, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31473364

RESUMO

BACKGROUND: With the substantial growth in the biomedical research literature, a larger number of claims are published daily, some of which seemingly disagree with or contradict prior claims on the same topics. Resolving such contradictions is critical to advancing our understanding of human disease and developing effective treatments. Automated text analysis techniques can facilitate such analysis by extracting claims from the literature, flagging those that are potentially contradictory, and identifying any study characteristics that may explain such contradictions. METHODS: Using SemMedDB, our own PubMed-scale repository of semantic predications (subject-relation-object triples), we identified apparent contradictions in the biomedical research literature and developed a categorization of contextual characteristics that explain such contradictions. Clinically relevant semantic predications relating to 20 diseases and involving opposing predicate pairs (e.g., an intervention treats or causes a disease) were retrieved from SemMedDB. After addressing inference, uncertainty, generic concepts, and NLP errors through automatic and manual filtering steps, a set of apparent contradictions were identified and characterized. RESULTS: We retrieved 117,676 predication instances from 62,360 PubMed abstracts (Jan 1980-Dec 2016). From these instances, automatic filtering steps generated 2236 candidate contradictory pairs. Through manual analysis, we determined that 58 of these pairs (2.6%) were apparent contradictions. We identified five main categories of contextual characteristics that explain these contradictions: (a) internal to the patient, (b) external to the patient, (c) endogenous/exogenous, (d) known controversy, and (e) contradictions in literature. Categories (a) and (b) were subcategorized further (e.g., species, dosage) and accounted for the bulk of the contradictory information. CONCLUSIONS: Semantic predications, by accounting for lexical variability, and SemMedDB, owing to its literature scale, can support identification and elucidation of potentially contradictory claims across the biomedical domain. Further filtering and classification steps are needed to distinguish among them the true contradictory claims. The ability to detect contradictions automatically can facilitate important biomedical knowledge management tasks, such as tracking and verifying scientific claims, summarizing research on a given topic, identifying knowledge gaps, and assessing evidence for systematic reviews, with potential benefits to the scientific community. Future work will focus on automating these steps for fully automatic recognition of contradictions from the biomedical research literature.


Assuntos
Pesquisa Biomédica , Processamento de Linguagem Natural , Publicações , Semântica , Pesquisa Biomédica/normas , Pesquisa Biomédica/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação , PubMed , Publicações/normas , Publicações/estatística & dados numéricos , Reprodutibilidade dos Testes
5.
BMC Bioinformatics ; 17: 163, 2016 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-27080229

RESUMO

BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. RESULTS: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. CONCLUSIONS: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.


Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Processamento de Linguagem Natural , Linguística , Semântica
6.
J Biomed Inform ; 60: 23-37, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26732995

RESUMO

Findings from information-seeking behavior research can inform application development. In this report we provide a system description of Spark, an application based on findings from Serendipitous Knowledge Discovery studies and data structures known as semantic predications. Background information and the previously published IF-SKD model (outlining Serendipitous Knowledge Discovery in online environments) illustrate the potential use of information-seeking behavior in application design. A detailed overview of the Spark system illustrates how methodologies in design and retrieval functionality enable production of semantic predication graphs tailored to evoke Serendipitous Knowledge Discovery in users.


Assuntos
Comportamento de Busca de Informação , Bases de Conhecimento , Aplicações da Informática Médica , Software , Internet , Modelos Teóricos , PubMed , Semântica , Interface Usuário-Computador
7.
J Biomed Inform ; 60: 14-22, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26774763

RESUMO

UNLABELLED: Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. OBJECTIVE: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. MATERIALS AND METHODS: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. RESULTS: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). CONCLUSIONS: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina Supervisionado , Algoritmos , Teorema de Bayes , Humanos , Idioma , MEDLINE , Processamento de Linguagem Natural , Semântica , Terminologia como Assunto , Unified Medical Language System
8.
J Biomed Inform ; 52: 457-67, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25016293

RESUMO

OBJECTIVE: The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain. MATERIALS AND METHODS: MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation. RESULTS: Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural language processing (17; 50%) and a hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation. DISCUSSION: This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization. CONCLUSION: Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings.


Assuntos
Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Indexação e Redação de Resumos , Humanos , MEDLINE
9.
J Biomed Inform ; 49: 134-47, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24448204

RESUMO

In this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1's effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug-drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug-drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug-drug interaction pathways.


Assuntos
Interações Medicamentosas , Semântica , Inibidores da Enzima Conversora de Angiotensina/administração & dosagem , Inibidores da Enzima Conversora de Angiotensina/efeitos adversos , Humanos , Lisinopril/administração & dosagem , Lisinopril/efeitos adversos , Inibidores Seletivos de Recaptação de Serotonina/administração & dosagem , Inibidores Seletivos de Recaptação de Serotonina/efeitos adversos , Sertralina/administração & dosagem , Sertralina/efeitos adversos
10.
Sci Rep ; 14(1): 8693, 2024 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622164

RESUMO

Non-pharmaceutical interventions (NPI) have great potential to improve cognitive function but limited investigation to discover NPI repurposing for Alzheimer's Disease (AD). This is the first study to develop an innovative framework to extract and represent NPI information from biomedical literature in a knowledge graph (KG), and train link prediction models to repurpose novel NPIs for AD prevention. We constructed a comprehensive KG, called ADInt, by extracting NPI information from biomedical literature. We used the previously-created SuppKG and NPI lexicon to identify NPI entities. Four KG embedding models (i.e., TransE, RotatE, DistMult and ComplEX) and two novel graph convolutional network models (i.e., R-GCN and CompGCN) were trained and compared to learn the representation of ADInt. Models were evaluated and compared on two test sets (time slice and clinical trial ground truth) and the best performing model was used to predict novel NPIs for AD. Discovery patterns were applied to generate mechanistic pathways for high scoring candidates. The ADInt has 162,212 nodes and 1,017,284 edges. R-GCN performed best in time slice (MR = 5.2054, Hits@10 = 0.8496) and clinical trial ground truth (MR = 3.4996, Hits@10 = 0.9192) test sets. After evaluation by domain experts, 10 novel dietary supplements and 10 complementary and integrative health were proposed from the score table calculated by R-GCN. Among proposed novel NPIs, we found plausible mechanistic pathways for photodynamic therapy and Choerospondias axillaris to prevent AD, and validated psychotherapy and manual therapy techniques using real-world data analysis. The proposed framework shows potential for discovering new NPIs for AD prevention and understanding their mechanistic pathways.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Aprendizagem
11.
BMC Bioinformatics ; 14: 182, 2013 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-23742159

RESUMO

BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively.


Assuntos
Mineração de Dados/métodos , PubMed , Algoritmos , Pesquisa Biomédica , Análise por Conglomerados , Medical Subject Headings , Semântica
12.
Bioinformatics ; 28(23): 3158-60, 2012 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-23044550

RESUMO

SUMMARY: Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject-predicate-object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support. AVAILABILITY AND IMPLEMENTATION: The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required. CONTACT: kilicogluh@mail.nih.gov.


Assuntos
Bases de Dados Factuais , PubMed , Semântica , Algoritmos , Mineração de Dados , Humanos , Unified Medical Language System
13.
medRxiv ; 2023 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-37292731

RESUMO

Recently, computational drug repurposing has emerged as a promising method for identifying new pharmaceutical interventions (PI) for Alzheimer's Disease (AD). Non-pharmaceutical interventions (NPI), such as Vitamin E and Music therapy, have great potential to improve cognitive function and slow the progression of AD, but have largely been unexplored. This study predicts novel NPIs for AD through link prediction on our developed biomedical knowledge graph. We constructed a comprehensive knowledge graph containing AD concepts and various potential interventions, called ADInt, by integrating a dietary supplement domain knowledge graph, SuppKG, with semantic relations from SemMedDB database. Four knowledge graph embedding models (TransE, RotatE, DistMult and ComplEX) and two graph convolutional network models (R-GCN and CompGCN) were compared to learn the representation of ADInt. R-GCN outperformed other models by evaluating on the time slice test set and the clinical trial test set and was used to generate the score tables of the link prediction task. Discovery patterns were applied to generate mechanism pathways for high scoring triples. Our ADInt had 162,213 nodes and 1,017,319 edges. The graph convolutional network model, R-GCN, performed best in both the Time Slicing test set (MR = 7.099, MRR = 0.5007, Hits@1 = 0.4112, Hits@3 = 0.5058, Hits@10 = 0.6804) and the Clinical Trials test set (MR = 1.731, MRR = 0.8582, Hits@1 = 0.7906, Hits@3 = 0.9033, Hits@10 = 0.9848). Among high scoring triples in the link prediction results, we found the plausible mechanism pathways of (Photodynamic therapy, PREVENTS, Alzheimer's Disease) and (Choerospondias axillaris, PREVENTS, Alzheimer's Disease) by discovery patterns and discussed them further. In conclusion, we presented a novel methodology to extend an existing knowledge graph and discover NPIs (dietary supplements (DS) and complementary and integrative health (CIH)) for AD. We used discovery patterns to find mechanisms for predicted triples to solve the poor interpretability of artificial neural networks. Our method can potentially be applied to other clinical problems, such as discovering drug adverse reactions and drug-drug interactions.

14.
BMC Med Inform Decis Mak ; 12: 41, 2012 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-22621674

RESUMO

BACKGROUND: PubMed data potentially can provide decision support information, but PubMed was not exclusively designed to be a point-of-care tool. Natural language processing applications that summarize PubMed citations hold promise for extracting decision support information. The objective of this study was to evaluate the efficiency of a text summarization application called Semantic MEDLINE, enhanced with a novel dynamic summarization method, in identifying decision support data. METHODS: We downloaded PubMed citations addressing the prevention and drug treatment of four disease topics. We then processed the citations with Semantic MEDLINE, enhanced with the dynamic summarization method. We also processed the citations with a conventional summarization method, as well as with a baseline procedure. We evaluated the results using clinician-vetted reference standards built from recommendations in a commercial decision support product, DynaMed. RESULTS: For the drug treatment data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.848 and 0.377, while conventional summarization produced 0.583 average recall and 0.712 average precision, and the baseline method yielded average recall and precision values of 0.252 and 0.277. For the prevention data, Semantic MEDLINE enhanced with dynamic summarization achieved average recall and precision scores of 0.655 and 0.329. The baseline technique resulted in recall and precision scores of 0.269 and 0.247. No conventional Semantic MEDLINE method accommodating summarization for prevention exists. CONCLUSION: Semantic MEDLINE with dynamic summarization outperformed conventional summarization in terms of recall, and outperformed the baseline method in both recall and precision. This new approach to text summarization demonstrates potential in identifying decision support data for multiple needs.


Assuntos
Algoritmos , Técnicas de Apoio para a Decisão , Armazenamento e Recuperação da Informação/métodos , Semântica , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/prevenção & controle , Insuficiência Cardíaca/tratamento farmacológico , Insuficiência Cardíaca/prevenção & controle , Humanos , Hipertensão/tratamento farmacológico , Hipertensão/prevenção & controle , MEDLINE , Processamento de Linguagem Natural , Pneumonia Pneumocócica/tratamento farmacológico , PubMed
15.
BMC Bioinformatics ; 12: 486, 2011 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-22185221

RESUMO

BACKGROUND: Semantic relations increasingly underpin biomedical text mining and knowledge discovery applications. The success of such practical applications crucially depends on the quality of extracted relations, which can be assessed against a gold standard reference. Most such references in biomedical text mining focus on narrow subdomains and adopt different semantic representations, rendering them difficult to use for benchmarking independently developed relation extraction systems. In this article, we present a multi-phase gold standard annotation study, in which we annotated 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications. The UMLS Metathesaurus served as the main source for conceptual information and the UMLS Semantic Network for relational information. We measured interannotator agreement and analyzed the annotations closely to identify some of the challenges in annotating biomedical text with relations based on an ontology or a terminology. RESULTS: We obtain fair to moderate interannotator agreement in the practice phase (0.378-0.475). With improved guidelines and additional semantic equivalence criteria, the agreement increases by 12% (0.415 to 0.536) in the main annotation phase. In addition, we find that agreement increases to 0.688 when the agreement calculation is limited to those predications that are based only on the explicitly provided UMLS concepts and relations. CONCLUSIONS: While interannotator agreement in the practice phase confirms that conceptual annotation is a challenging task, the increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations, by setting stricter guidelines and establishing semantic equivalence criteria. Mapping text to ontological concepts emerges as the main challenge in conceptual annotation. Annotating predications involving biomolecular entities and processes is particularly challenging. While the resulting gold standard is mainly intended to serve as a test collection for our semantic interpreter, we believe that the lessons learned are applicable generally.


Assuntos
Mineração de Dados/normas , Humanos , MEDLINE , Semântica , Unified Medical Language System , Estados Unidos , Vocabulário Controlado
16.
J Biomed Inform ; 44(5): 830-8, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21575741

RESUMO

Automatic summarization has been proposed to help manage the results of biomedical information retrieval systems. Semantic MEDLINE, for example, summarizes semantic predications representing assertions in MEDLINE citations. Results are presented as a graph which maintains links to the original citations. Graphs summarizing more than 500 citations are hard to read and navigate, however. We exploit graph theory for focusing these large graphs. The method is based on degree centrality, which measures connectedness in a graph. Four categories of clinical concepts related to treatment of disease were identified and presented as a summary of input text. A baseline was created using term frequency of occurrence. The system was evaluated on summaries for treatment of five diseases compared to a reference standard produced manually by two physicians. The results showed that recall for system results was 72%, precision was 73%, and F-score was 0.72. The system F-score was considerably higher than that for the baseline (0.47).


Assuntos
Armazenamento e Recuperação da Informação/métodos , Semântica , Algoritmos , Humanos , MEDLINE , Processamento de Linguagem Natural , Unified Medical Language System
17.
ArXiv ; 2021 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-33564698

RESUMO

OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. METHODS: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. RESULTS: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. CONCLUSION: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.

18.
J Med Libr Assoc ; 98(4): 273-81, 2010 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-20936065

RESUMO

OBJECTIVE: This paper examines the development and evaluation of an automatic summarization system in the domain of molecular genetics. The system is a potential component of an advanced biomedical information management application called Semantic MEDLINE and could assist librarians in developing secondary databases of genetic information extracted from the primary literature. METHODS: An existing summarization system was modified for identifying biomedical text relevant to the genetic etiology of disease. The summarization system was evaluated on the task of identifying data describing genes associated with bladder cancer in MEDLINE citations. A gold standard was produced using records from Genetics Home Reference and Online Mendelian Inheritance in Man. Genes in text found by the system were compared to the gold standard. Recall, precision, and F-measure were calculated. RESULTS: The system achieved recall of 46%, and precision of 88% (F-measure=0.61) by taking Gene References into Function (GeneRIFs) into account. CONCLUSION: The new summarization schema for genetic etiology has potential as a component in Semantic MEDLINE to support the work of data curators.


Assuntos
Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , MEDLINE , Processamento de Linguagem Natural , Semântica , Terminologia como Assunto , Genética Médica , Humanos , Descritores , Estados Unidos
19.
Stud Health Technol Inform ; 160(Pt 1): 709-13, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20841778

RESUMO

Clinical practice guidelines are used to disseminate best practice to clinicians. Successful guidelines depend on literature that is both relevant to the questions posed and based on high quality research in accordance with evidence-based medicine. Meeting these standards requires extensive manual review. We describe a system that combines symbolic semantic processing with a statistical method for selecting both relevant and high quality studies. We focused on a cardiovascular risk factor guideline, and the overall performance of the system was 56% recall, 91% precision (F0.5-score 0.81). If quality of the evidence is not taken into account, performance drops to 62% recall, 79% precision (F0.5-score 0.75). We suggest that this system can potentially improve the efficiency of the literature review process in guideline development.


Assuntos
Mineração de Dados/métodos , Bases de Dados Bibliográficas , Medicina Baseada em Evidências/normas , Processamento de Linguagem Natural , Guias de Prática Clínica como Assunto , Garantia da Qualidade dos Cuidados de Saúde/métodos , Canadá
20.
J Biomed Inform ; 42(5): 801-13, 2009 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-19022398

RESUMO

As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for 53 diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p<0.01) and the increase in the overall score of clinical usefulness was 0.39 (p<0.05).


Assuntos
Processamento Eletrônico de Dados/métodos , Medicina Baseada em Evidências/métodos , Informática Médica/métodos , Processamento de Linguagem Natural , Inteligência Artificial , Doença , Tratamento Farmacológico , Humanos , Armazenamento e Recuperação da Informação/métodos , Internet , MEDLINE , Semântica , Interface Usuário-Computador , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA