Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Sci Rep ; 14(1): 8693, 2024 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622164

RESUMO

Non-pharmaceutical interventions (NPI) have great potential to improve cognitive function but limited investigation to discover NPI repurposing for Alzheimer's Disease (AD). This is the first study to develop an innovative framework to extract and represent NPI information from biomedical literature in a knowledge graph (KG), and train link prediction models to repurpose novel NPIs for AD prevention. We constructed a comprehensive KG, called ADInt, by extracting NPI information from biomedical literature. We used the previously-created SuppKG and NPI lexicon to identify NPI entities. Four KG embedding models (i.e., TransE, RotatE, DistMult and ComplEX) and two novel graph convolutional network models (i.e., R-GCN and CompGCN) were trained and compared to learn the representation of ADInt. Models were evaluated and compared on two test sets (time slice and clinical trial ground truth) and the best performing model was used to predict novel NPIs for AD. Discovery patterns were applied to generate mechanistic pathways for high scoring candidates. The ADInt has 162,212 nodes and 1,017,284 edges. R-GCN performed best in time slice (MR = 5.2054, Hits@10 = 0.8496) and clinical trial ground truth (MR = 3.4996, Hits@10 = 0.9192) test sets. After evaluation by domain experts, 10 novel dietary supplements and 10 complementary and integrative health were proposed from the score table calculated by R-GCN. Among proposed novel NPIs, we found plausible mechanistic pathways for photodynamic therapy and Choerospondias axillaris to prevent AD, and validated psychotherapy and manual therapy techniques using real-world data analysis. The proposed framework shows potential for discovering new NPIs for AD prevention and understanding their mechanistic pathways.


Assuntos
Doença de Alzheimer , Humanos , Doença de Alzheimer/tratamento farmacológico , Aprendizagem
2.
medRxiv ; 2023 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-37292731

RESUMO

Recently, computational drug repurposing has emerged as a promising method for identifying new pharmaceutical interventions (PI) for Alzheimer's Disease (AD). Non-pharmaceutical interventions (NPI), such as Vitamin E and Music therapy, have great potential to improve cognitive function and slow the progression of AD, but have largely been unexplored. This study predicts novel NPIs for AD through link prediction on our developed biomedical knowledge graph. We constructed a comprehensive knowledge graph containing AD concepts and various potential interventions, called ADInt, by integrating a dietary supplement domain knowledge graph, SuppKG, with semantic relations from SemMedDB database. Four knowledge graph embedding models (TransE, RotatE, DistMult and ComplEX) and two graph convolutional network models (R-GCN and CompGCN) were compared to learn the representation of ADInt. R-GCN outperformed other models by evaluating on the time slice test set and the clinical trial test set and was used to generate the score tables of the link prediction task. Discovery patterns were applied to generate mechanism pathways for high scoring triples. Our ADInt had 162,213 nodes and 1,017,319 edges. The graph convolutional network model, R-GCN, performed best in both the Time Slicing test set (MR = 7.099, MRR = 0.5007, Hits@1 = 0.4112, Hits@3 = 0.5058, Hits@10 = 0.6804) and the Clinical Trials test set (MR = 1.731, MRR = 0.8582, Hits@1 = 0.7906, Hits@3 = 0.9033, Hits@10 = 0.9848). Among high scoring triples in the link prediction results, we found the plausible mechanism pathways of (Photodynamic therapy, PREVENTS, Alzheimer's Disease) and (Choerospondias axillaris, PREVENTS, Alzheimer's Disease) by discovery patterns and discussed them further. In conclusion, we presented a novel methodology to extend an existing knowledge graph and discover NPIs (dietary supplements (DS) and complementary and integrative health (CIH)) for AD. We used discovery patterns to find mechanisms for predicted triples to solve the poor interpretability of artificial neural networks. Our method can potentially be applied to other clinical problems, such as discovering drug adverse reactions and drug-drug interactions.

3.
J Biomed Inform ; 131: 104120, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35709900

RESUMO

OBJECTIVE: Develop a novel methodology to create a comprehensive knowledge graph (SuppKG) to represent a domain with limited coverage in the Unified Medical Language System (UMLS), specifically dietary supplement (DS) information for discovering drug-supplement interactions (DSI), by leveraging biomedical natural language processing (NLP) technologies and a DS domain terminology. MATERIALS AND METHODS: We created SemRepDS (an extension of an NLP tool, SemRep), capable of extracting semantic relations from abstracts by leveraging a DS-specific terminology (iDISK) containing 28,884 DS terms not found in the UMLS. PubMed abstracts were processed using SemRepDS to generate semantic relations, which were then filtered using a PubMedBERT model to remove incorrect relations before generating SuppKG. Two discovery pathways were applied to SuppKG to identify potential DSIs, which are then compared with an existing DSI database and also evaluated by medical professionals for mechanistic plausibility. RESULTS: SemRepDS returned 158.5% more DS entities and 206.9% more DS relations than SemRep. The fine-tuned PubMedBERT model (significantly outperformed other machine learning and BERT models) obtained an F1 score of 0.8605 and removed 43.86% of semantic relations, improving the precision of the relations by 26.4% over pre-filtering. SuppKG consists of 56,635 nodes and 595,222 directed edges with 2,928 DS-specific nodes and 164,738 edges. Manual review of findings identified 182 of 250 (72.8%) proposed DS-Gene-Drug and 77 of 100 (77%) proposed DS-Gene1-Function-Gene2-Drug pathways to be mechanistically plausible. DISCUSSION: With added DS terminology to the UMLS, SemRepDS has the capability to find more DS-specific semantic relationships from PubMed than SemRep. The utility of the resulting SuppKG was demonstrated using discovery patterns to find novel DSIs. CONCLUSION: For the domain with limited coverage in the traditional terminology (e.g., UMLS), we demonstrated an approach to leverage domain terminology and improve existing NLP tools to generate a more comprehensive knowledge graph for the downstream task. Even this study focuses on DSI, the method may be adapted to other domains.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Suplementos Nutricionais , PubMed , Semântica
4.
J Biomed Inform ; 115: 103696, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33571675

RESUMO

OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. METHODS: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from PubMed and other COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative and accurate subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant. We used this subset to construct a knowledge graph, and applied five state-of-the-art, neural knowledge graph completion algorithms (i.e., TransE, RotatE, DistMult, ComplEx, and STELP) to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. RESULTS: Accuracy classifier based on PubMedBERT achieved the best performance (F1 = 0.854) in identifying accurate semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1 = 0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as others that have not yet been studied. Discovery patterns enabled identification of additional candidate drugs and generation of plausible hypotheses regarding the links between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (i.e., paclitaxel, SB 203580, alpha 2-antiplasmin, metoclopramide, and oxymatrine) and the mechanistic explanations for their potential use are further discussed. CONCLUSION: We showed that a LBD approach can be feasible not only for discovering drug candidates for COVID-19, but also for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions. Source code and data are available at https://github.com/kilicogluh/lbd-covid.


Assuntos
Tratamento Farmacológico da COVID-19 , Reposicionamento de Medicamentos , Descoberta do Conhecimento , Algoritmos , Antivirais/uso terapêutico , COVID-19/virologia , Humanos , Redes Neurais de Computação , SARS-CoV-2/isolamento & purificação
5.
ArXiv ; 2021 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-33564698

RESUMO

OBJECTIVE: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. METHODS: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. RESULTS: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. CONCLUSION: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.

6.
BMC Bioinformatics ; 21(1): 188, 2020 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-32410573

RESUMO

BACKGROUND: In the era of information overload, natural language processing (NLP) techniques are increasingly needed to support advanced biomedical information management and discovery applications. In this paper, we present an in-depth description of SemRep, an NLP system that extracts semantic relations from PubMed abstracts using linguistic principles and UMLS domain knowledge. We also evaluate SemRep on two datasets. In one evaluation, we use a manually annotated test collection and perform a comprehensive error analysis. In another evaluation, we assess SemRep's performance on the CDR dataset, a standard benchmark corpus annotated with causal chemical-disease relationships. RESULTS: A strict evaluation of SemRep on our manually annotated dataset yields 0.55 precision, 0.34 recall, and 0.42 F 1 score. A relaxed evaluation, which more accurately characterizes SemRep performance, yields 0.69 precision, 0.42 recall, and 0.52 F 1 score. An error analysis reveals named entity recognition/normalization as the largest source of errors (26.9%), followed by argument identification (14%) and trigger detection errors (12.5%). The evaluation on the CDR corpus yields 0.90 precision, 0.24 recall, and 0.38 F 1 score. The recall and the F 1 score increase to 0.35 and 0.50, respectively, when the evaluation on this corpus is limited to sentence-bound relationships, which represents a fairer evaluation, as SemRep operates at the sentence level. CONCLUSIONS: SemRep is a broad-coverage, interpretable, strong baseline system for extracting semantic relations from biomedical text. It also underpins SemMedDB, a literature-scale knowledge graph based on semantic relations. Through SemMedDB, SemRep has had significant impact in the scientific community, supporting a variety of clinical and translational applications, including clinical decision making, medical diagnosis, drug repurposing, literature-based discovery and hypothesis generation, and contributing to improved health outcomes. In ongoing development, we are redesigning SemRep to increase its modularity and flexibility, and addressing weaknesses identified in the error analysis.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação , Semântica , Humanos , Processamento de Linguagem Natural , PubMed , Unified Medical Language System
7.
J Biomed Inform ; 98: 103275, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31473364

RESUMO

BACKGROUND: With the substantial growth in the biomedical research literature, a larger number of claims are published daily, some of which seemingly disagree with or contradict prior claims on the same topics. Resolving such contradictions is critical to advancing our understanding of human disease and developing effective treatments. Automated text analysis techniques can facilitate such analysis by extracting claims from the literature, flagging those that are potentially contradictory, and identifying any study characteristics that may explain such contradictions. METHODS: Using SemMedDB, our own PubMed-scale repository of semantic predications (subject-relation-object triples), we identified apparent contradictions in the biomedical research literature and developed a categorization of contextual characteristics that explain such contradictions. Clinically relevant semantic predications relating to 20 diseases and involving opposing predicate pairs (e.g., an intervention treats or causes a disease) were retrieved from SemMedDB. After addressing inference, uncertainty, generic concepts, and NLP errors through automatic and manual filtering steps, a set of apparent contradictions were identified and characterized. RESULTS: We retrieved 117,676 predication instances from 62,360 PubMed abstracts (Jan 1980-Dec 2016). From these instances, automatic filtering steps generated 2236 candidate contradictory pairs. Through manual analysis, we determined that 58 of these pairs (2.6%) were apparent contradictions. We identified five main categories of contextual characteristics that explain these contradictions: (a) internal to the patient, (b) external to the patient, (c) endogenous/exogenous, (d) known controversy, and (e) contradictions in literature. Categories (a) and (b) were subcategorized further (e.g., species, dosage) and accounted for the bulk of the contradictory information. CONCLUSIONS: Semantic predications, by accounting for lexical variability, and SemMedDB, owing to its literature scale, can support identification and elucidation of potentially contradictory claims across the biomedical domain. Further filtering and classification steps are needed to distinguish among them the true contradictory claims. The ability to detect contradictions automatically can facilitate important biomedical knowledge management tasks, such as tracking and verifying scientific claims, summarizing research on a given topic, identifying knowledge gaps, and assessing evidence for systematic reviews, with potential benefits to the scientific community. Future work will focus on automating these steps for fully automatic recognition of contradictions from the biomedical research literature.


Assuntos
Pesquisa Biomédica , Processamento de Linguagem Natural , Publicações , Semântica , Pesquisa Biomédica/normas , Pesquisa Biomédica/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação , PubMed , Publicações/normas , Publicações/estatística & dados numéricos , Reprodutibilidade dos Testes
8.
J Biomed Semantics ; 9(1): 25, 2018 12 27.
Artigo em Inglês | MEDLINE | ID: mdl-30587224

RESUMO

BACKGROUND: Structured electronic health records are a rich resource for identifying novel correlations, such as co-morbidities and adverse drug reactions. For drug development and better understanding of biomedical phenomena, such correlations need to be supported by viable hypotheses about the mechanisms involved, which can then form the basis of experimental investigations. METHODS: In this study, we demonstrate the use of discovery browsing, a literature-based discovery method, to generate plausible hypotheses elucidating correlations identified from structured clinical data. The method is supported by Semantic MEDLINE web application, which pinpoints interesting concepts and relevant MEDLINE citations, which are used to build a coherent hypothesis. RESULTS: Discovery browsing revealed a plausible explanation for the correlation between epilepsy and inflammatory bowel disease that was found in an earlier population study. The generated hypothesis involves interleukin-1 beta (IL-1 beta) and glutamate, and suggests that IL-1 beta influence on glutamate levels is involved in the etiology of both epilepsy and inflammatory bowel disease. CONCLUSIONS: The approach presented in this paper can supplement population-based correlation studies by enabling the scientist to identify literature that may justify the novel patterns identified in such studies and can underpin basic biomedical research that can lead to improved treatments and better healthcare outcomes.


Assuntos
Mineração de Dados , Epilepsia/metabolismo , Ácido Glutâmico/metabolismo , Doenças Inflamatórias Intestinais/metabolismo , Interleucina-1beta/metabolismo , Encéfalo/metabolismo , Humanos , MEDLINE , Semântica
9.
ILAR J ; 58(1): 80-89, 2017 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-28838071

RESUMO

Informatics methodologies exploit computer-assisted techniques to help biomedical researchers manage large amounts of information. In this paper, we focus on the biomedical research literature (MEDLINE). We first provide an overview of some text mining techniques that offer assistance in research by identifying biomedical entities (e.g., genes, substances, and diseases) and relations between them in text.We then discuss Semantic MEDLINE, an application that integrates PubMed document retrieval, concept and relation identification, and visualization, thus enabling a user to explore concepts and relations from within a set of retrieved citations. Semantic MEDLINE provides a roadmap through content and helps users discern patterns in large numbers of retrieved citations. We illustrate its use with an informatics method we call "discovery browsing," which provides a principled way of navigating through selected aspects of some biomedical research area. The method supports an iterative process that accommodates learning and hypothesis formation in which a user is provided with high level connections before delving into details.As a use case, we examine current developments in basic research on mechanisms of Alzheimer's disease. Out of the nearly 90 000 citations returned by the PubMed query "Alzheimer's disease," discovery browsing led us to 73 citations on sortilin and that disorder. We provide a synopsis of the basic research reported in 15 of these. There is wide-spread consensus among researchers working with a range of animal models and human cells that increased sortilin expression and decreased receptor expression are associated with amyloid beta and/or amyloid precursor protein.


Assuntos
Mineração de Dados/métodos , Armazenamento e Recuperação da Informação , MEDLINE , Humanos , Semântica
10.
BMC Bioinformatics ; 17: 163, 2016 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-27080229

RESUMO

BACKGROUND: Entity coreference is common in biomedical literature and it can affect text understanding systems that rely on accurate identification of named entities, such as relation extraction and automatic summarization. Coreference resolution is a foundational yet challenging natural language processing task which, if performed successfully, is likely to enhance such systems significantly. In this paper, we propose a semantically oriented, rule-based method to resolve sortal anaphora, a specific type of coreference that forms the majority of coreference instances in biomedical literature. The method addresses all entity types and relies on linguistic components of SemRep, a broad-coverage biomedical relation extraction system. It has been incorporated into SemRep, extending its core semantic interpretation capability from sentence level to discourse level. RESULTS: We evaluated our sortal anaphora resolution method in several ways. The first evaluation specifically focused on sortal anaphora relations. Our methodology achieved a F1 score of 59.6 on the test portion of a manually annotated corpus of 320 Medline abstracts, a 4-fold improvement over the baseline method. Investigating the impact of sortal anaphora resolution on relation extraction, we found that the overall effect was positive, with 50 % of the changes involving uninformative relations being replaced by more specific and informative ones, while 35 % of the changes had no effect, and only 15 % were negative. We estimate that anaphora resolution results in changes in about 1.5 % of approximately 82 million semantic relations extracted from the entire PubMed. CONCLUSIONS: Our results demonstrate that a heavily semantic approach to sortal anaphora resolution is largely effective for biomedical literature. Our evaluation and error analysis highlight some areas for further improvements, such as coordination processing and intra-sentential antecedent selection.


Assuntos
Ontologias Biológicas , Bases de Dados Factuais , Processamento de Linguagem Natural , Linguística , Semântica
11.
J Biomed Inform ; 60: 14-22, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26774763

RESUMO

UNLABELLED: Most patient care questions raised by clinicians can be answered by online clinical knowledge resources. However, important barriers still challenge the use of these resources at the point of care. OBJECTIVE: To design and assess a method for extracting clinically useful sentences from synthesized online clinical resources that represent the most clinically useful information for directly answering clinicians' information needs. MATERIALS AND METHODS: We developed a Kernel-based Bayesian Network classification model based on different domain-specific feature types extracted from sentences in a gold standard composed of 18 UpToDate documents. These features included UMLS concepts and their semantic groups, semantic predications extracted by SemRep, patient population identified by a pattern-based natural language processing (NLP) algorithm, and cue words extracted by a feature selection technique. Algorithm performance was measured in terms of precision, recall, and F-measure. RESULTS: The feature-rich approach yielded an F-measure of 74% versus 37% for a feature co-occurrence method (p<0.001). Excluding predication, population, semantic concept or text-based features reduced the F-measure to 62%, 66%, 58% and 69% respectively (p<0.01). The classifier applied to Medline sentences reached an F-measure of 73%, which is equivalent to the performance of the classifier on UpToDate sentences (p=0.62). CONCLUSIONS: The feature-rich approach significantly outperformed general baseline methods. This approach significantly outperformed classifiers based on a single type of feature. Different types of semantic features provided a unique contribution to overall classification performance. The classifier's model and features used for UpToDate generalized well to Medline abstracts.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina Supervisionado , Algoritmos , Teorema de Bayes , Humanos , Idioma , MEDLINE , Processamento de Linguagem Natural , Semântica , Terminologia como Assunto , Unified Medical Language System
12.
J Biomed Inform ; 60: 23-37, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26732995

RESUMO

Findings from information-seeking behavior research can inform application development. In this report we provide a system description of Spark, an application based on findings from Serendipitous Knowledge Discovery studies and data structures known as semantic predications. Background information and the previously published IF-SKD model (outlining Serendipitous Knowledge Discovery in online environments) illustrate the potential use of information-seeking behavior in application design. A detailed overview of the Spark system illustrates how methodologies in design and retrieval functionality enable production of semantic predication graphs tailored to evoke Serendipitous Knowledge Discovery in users.


Assuntos
Comportamento de Busca de Informação , Bases de Conhecimento , Aplicações da Informática Médica , Software , Internet , Modelos Teóricos , PubMed , Semântica , Interface Usuário-Computador
13.
J Biomed Semantics ; 6: 25, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25992264

RESUMO

OBJECTIVE: Mild traumatic brain injury (mTBI) has high prevalence in the military, among athletes, and in the general population worldwide (largely due to falls). Consequences can include a range of neuropsychological disorders. Unfortunately, such neural injury often goes undiagnosed due to the difficulty in identifying symptoms, so the discovery of an effective biomarker would greatly assist diagnosis; however, no single biomarker has been identified. We identify several body substances as potential components of a panel of biomarkers to support the diagnosis of mild traumatic brain injury. METHODS: Our approach to diagnostic biomarker discovery combines ideas and techniques from systems medicine, natural language processing, and graph theory. We create a molecular interaction network that represents neural injury and is composed of relationships automatically extracted from the literature. We retrieve citations related to neurological injury and extract relationships (semantic predications) that contain potential biomarkers. After linking all relationships together to create a network representing neural injury, we filter the network by relationship frequency and concept connectivity to reduce the set to a manageable size of higher interest substances. RESULTS: 99,437 relevant citations yielded 26,441 unique relations. 18,085 of these contained a potential biomarker as subject or object with a total of 6246 unique concepts. After filtering by graph metrics, the set was reduced to 1021 relationships with 49 unique concepts, including 17 potential biomarkers. CONCLUSION: We created a network of relationships containing substances derived from 99,437 citations and filtered using graph metrics to provide a set of 17 potential biomarkers. We discuss the interaction of several of these (glutamate, glucose, and lactate) as the basis for more effective diagnosis than is currently possible. This method provides an opportunity to focus the effort of wet bench research on those substances with the highest potential as biomarkers for mTBI.

14.
AMIA Annu Symp Proc ; 2015: 727-36, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26958208

RESUMO

Orthographic and grammatical errors are a common feature of informal texts written by lay people. Health-related questions asked by consumers are a case in point. Automatic interpretation of consumer health questions is hampered by such errors. In this paper, we propose a method that combines techniques based on edit distance and frequency counts with a contextual similarity-based method for detecting and correcting orthographic errors, including misspellings, word breaks, and punctuation errors. We evaluate our method on a set of spell-corrected questions extracted from the NLM collection of consumer health questions. Our method achieves a F1 score of 0.61, compared to an informed baseline of 0.29, achieved using ESpell, a spelling correction system developed for biomedical queries. Our results show that orthographic similarity is most relevant in spelling error correction in consumer health questions and that frequency and contextual information are complementary to orthographic features.


Assuntos
Algoritmos , Informática Aplicada à Saúde dos Consumidores , Idioma , Processamento de Linguagem Natural , Conjuntos de Dados como Assunto , Humanos , Armazenamento e Recuperação da Informação
15.
AMIA Annu Symp Proc ; 2015: 2015-24, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26958301

RESUMO

OBJECTIVE: In a previous study, we investigated a sentence classification model that uses semantic features to extract clinically useful sentences from UpToDate, a synthesized clinical evidence resource. In the present study, we assess the generalizability of the sentence classifier to Medline abstracts. METHODS: We applied the classification model to an independent gold standard of high quality clinical studies from Medline. Then, the classifier trained on UpToDate sentences was optimized by re-retraining the classifier with Medline abstracts and adding a sentence location feature. RESULTS: The previous classifier yielded an F-measure of 58% on Medline versus 67% on UpToDate. Re-training the classifier on Medline improved F-measure to 68%; and to 76% (p<0.01) after adding the sentence location feature. CONCLUSIONS: The classifier's model and input features generalized to Medline abstracts, but the classifier needed to be retrained on Medline to achieve equivalent performance. Sentence location provided additional contribution to the overall classification performance.


Assuntos
MEDLINE , Semântica , Humanos , Aprendizado de Máquina
16.
Cancer Inform ; 13(Suppl 1): 103-11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25392688

RESUMO

In this study, we report on the performance of an automated approach to discovery of potential prostate cancer drugs from the biomedical literature. We used the semantic relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations using SemRep, to extract potential relationships using knowledge of cancer drugs pathways. Two cancer drugs pathway schemas were constructed using these relationships extracted from SemMedDB. Through both pathway schemas, we found drugs already used for prostate cancer therapy and drugs not currently listed as the prostate cancer medications. Our study demonstrates that the appropriate linking of relevant structured semantic relationships stored in SemMedDB can support the discovery of potential prostate cancer drugs.

17.
J Biomed Inform ; 52: 457-67, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25016293

RESUMO

OBJECTIVE: The amount of information for clinicians and clinical researchers is growing exponentially. Text summarization reduces information as an attempt to enable users to find and understand relevant source texts more quickly and effortlessly. In recent years, substantial research has been conducted to develop and evaluate various summarization techniques in the biomedical domain. The goal of this study was to systematically review recent published research on summarization of textual documents in the biomedical domain. MATERIALS AND METHODS: MEDLINE (2000 to October 2013), IEEE Digital Library, and the ACM digital library were searched. Investigators independently screened and abstracted studies that examined text summarization techniques in the biomedical domain. Information is derived from selected articles on five dimensions: input, purpose, output, method and evaluation. RESULTS: Of 10,786 studies retrieved, 34 (0.3%) met the inclusion criteria. Natural language processing (17; 50%) and a hybrid technique comprising of statistical, Natural language processing and machine learning (15; 44%) were the most common summarization approaches. Most studies (28; 82%) conducted an intrinsic evaluation. DISCUSSION: This is the first systematic review of text summarization in the biomedical domain. The study identified research gaps and provides recommendations for guiding future research on biomedical text summarization. CONCLUSION: Recent research has focused on a hybrid technique comprising statistical, language processing and machine learning techniques. Further research is needed on the application and evaluation of text summarization in real research or patient care settings.


Assuntos
Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Indexação e Redação de Resumos , Humanos , MEDLINE
18.
J Biomed Inform ; 49: 134-47, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24448204

RESUMO

In this study we report on potential drug-drug interactions between drugs occurring in patient clinical data. Results are based on relationships in SemMedDB, a database of structured knowledge extracted from all MEDLINE citations (titles and abstracts) using SemRep. The core of our methodology is to construct two potential drug-drug interaction schemas, based on relationships extracted from SemMedDB. In the first schema, Drug1 and Drug2 interact through Drug1's effect on some gene, which in turn affects Drug2. In the second, Drug1 affects Gene1, while Drug2 affects Gene2. Gene1 and Gene2, together, then have an effect on some biological function. After checking each drug pair from the medication lists of each of 22 patients, we found 19 known and 62 unknown drug-drug interactions using both schemas. For example, our results suggest that the interaction of Lisinopril, an ACE inhibitor commonly prescribed for hypertension, and the antidepressant sertraline can potentially increase the likelihood and possibly the severity of psoriasis. We also assessed the relationships extracted by SemRep from a linguistic perspective and found that the precision of SemRep was 0.58 for 300 randomly selected sentences from MEDLINE. Our study demonstrates that the use of structured knowledge in the form of relationships from the biomedical literature can support the discovery of potential drug-drug interactions occurring in patient clinical data. Moreover, SemMedDB provides a good knowledge resource for expanding the range of drugs, genes, and biological functions considered as elements in various drug-drug interaction pathways.


Assuntos
Interações Medicamentosas , Semântica , Inibidores da Enzima Conversora de Angiotensina/administração & dosagem , Inibidores da Enzima Conversora de Angiotensina/efeitos adversos , Humanos , Lisinopril/administração & dosagem , Lisinopril/efeitos adversos , Inibidores Seletivos de Recaptação de Serotonina/administração & dosagem , Inibidores Seletivos de Recaptação de Serotonina/efeitos adversos , Sertralina/administração & dosagem , Sertralina/efeitos adversos
19.
AMIA Annu Symp Proc ; 2014: 442-8, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954348

RESUMO

Adverse drug events account for two million combined injuries, hospitalizations, or deaths each year. Furthermore, there are few comprehensive, up-to-date, and free sources of drug information. Clinical decision support systems may significantly mitigate the number of adverse drug events. However, these systems depend on up-to-date, comprehensive, and codified data to serve as input. The DailyMed website, a resource managed by the FDA and NLM, contains all currently approved drugs. We used a semantic natural language processing approach that successfully extracted information for adverse drug events, at-risk conditions, and susceptible populations from black box warning labels on this site. The precision, recall, and F-score were, 94%, 52%, 0.67 for adverse drug events; 80%, 53%, and 0.64 for conditions; and 95%, 44%, 0.61 for populations. Overall performance was 90% precision, 51% recall, and 0.65 F-Score. Information extracted can be stored in a structured format and may support clinical decision support systems.


Assuntos
Rotulagem de Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Processamento de Linguagem Natural , Medicamentos sob Prescrição/efeitos adversos , Estudos de Viabilidade , Humanos , Internet , Semântica , Estados Unidos , United States Food and Drug Administration
20.
AMIA Annu Symp Proc ; 2014: 1018-27, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954411

RESUMO

We present a method for automatically classifying consumer health questions. Our thirteen question types are designed to aid in the automatic retrieval of medical answers from consumer health resources. To our knowledge, this is the first machine learning-based method specifically for classifying consumer health questions. We demonstrate how previous approaches to medical question classification are insufficient to achieve high accuracy on this task. Additionally, we describe, manually annotate, and automatically classify three important question elements that improve question classification over previous techniques. Our results and analysis illustrate the difficulty of the task and the future directions that are necessary to achieve high-performing consumer health question classification.


Assuntos
Informação de Saúde ao Consumidor , Armazenamento e Recuperação da Informação/classificação , Processamento de Linguagem Natural , Humanos , Comportamento de Busca de Informação/classificação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...