Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31603193

RESUMO

Knowledge of the molecular interactions of biological and chemical entities and their involvement in biological processes or clinical phenotypes is important for data interpretation. Unfortunately, this knowledge is mostly embedded in the literature in such a way that it is unavailable for automated data analysis procedures. Biological expression language (BEL) is a syntax representation allowing for the structured representation of a broad range of biological relationships. It is used in various situations to extract such knowledge and transform it into BEL networks. To support the tedious and time-intensive extraction work of curators with automated methods, we developed the BEL track within the framework of BioCreative Challenges. Within the BEL track, we provide training data and an evaluation environment to encourage the text mining community to tackle the automatic extraction of complex BEL relationships. In 2017 BioCreative VI, the 2015 BEL track was repeated with new test data. Although only minor improvements in text snippet retrieval for given statements were achieved during this second BEL task iteration, a significant increase of BEL statement extraction performance from provided sentences could be seen. The best performing system reached a 32% F-score for the extraction of complete BEL statements and with the given named entities this increased to 49%. This time, besides rule-based systems, new methods involving hierarchical sequence labeling and neural networks were applied for BEL statement extraction.


Assuntos
Mineração de Dados , Bases de Dados Factuais , Redes Neurais de Computação , Vocabulário Controlado
2.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30295724

RESUMO

Relation extraction is an important task in the field of natural language processing. In this paper, we describe our approach for the BioCreative VI Task 5: text mining chemical-protein interactions. We investigate multiple deep neural network (DNN) models, including convolutional neural networks, recurrent neural networks (RNNs) and attention-based (ATT-) RNNs (ATT-RNNs) to extract chemical-protein relations. Our experimental results indicate that ATT-RNN models outperform the same models without using attention and the ATT-gated recurrent unit (ATT-GRU) achieves the best performing micro average F1 score of 0.527 on the test set among the tested DNNs. In addition, the result of word-level attention weights also shows that attention mechanism is effective on selecting the most important trigger words when trained with semantic relation labels without the need of semantic parsing and feature engineering. The source code of this work is available at https://github.com/ohnlp/att-chemprot.


Assuntos
Algoritmos , Bases de Dados de Compostos Químicos , Bases de Dados de Proteínas , Redes Neurais de Computação , Proteínas/química
3.
J Womens Health (Larchmt) ; 27(5): 569-574, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29297754

RESUMO

BACKGROUND: A clinical decision support system (CDSS) for cervical cancer screening identifies patients due for routine cervical cancer screening. Yet, high-risk patients who require more frequent screening or earlier follow-up to address past abnormal results are not identified. We aimed to assess the effect of a complex CDSS, incorporating national guidelines for high-risk patient screening and abnormal result management, its implementation to identify patients overdue for testing, and the outcome of sending a targeted recommendation for follow-up. MATERIALS AND METHODS: At three primary care clinics affiliated with an academic medical center, a reminder recommending an appointment for Papanicolaou (Pap) testing or Pap and human papillomavirus cotesting was sent to high-risk women aged 18 through 65 years (intervention group) identified by CDSS as overdue for testing. Historical control patients, who did not receive a reminder, were identified by CDSS 1 year before the date when reminders were sent to the intervention group. Test completion rates were compared between the intervention and control groups through a generalized estimating equation extension. RESULTS: Across the three sites, the average completion rate of recommended follow-up testing was significantly higher in the intervention group at 23.7% (61/257) than the completion rate at 3.3% (17/516) in the control group (p < 0.001). CONCLUSIONS: A CDSS with enhanced capabilities to identify high-risk women due for cervical cancer testing beyond routine screening intervals, with subsequent patient notification, has the potential to decrease cervical precancer and cancer by improving adherence to guideline-compliant follow-up and needed treatment.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Detecção Precoce de Câncer/estatística & dados numéricos , Programas de Rastreamento , Teste de Papanicolaou/estatística & dados numéricos , Cooperação do Paciente/estatística & dados numéricos , Sistemas de Alerta/estatística & dados numéricos , Neoplasias do Colo do Útero/diagnóstico , Esfregaço Vaginal/estatística & dados numéricos , Adulto , Idoso , Feminino , Humanos , Pessoa de Meia-Idade , Fatores Socioeconômicos , Neoplasias do Colo do Útero/prevenção & controle
4.
Database (Oxford) ; 20172017 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-31725862

RESUMO

The recent movement towards open data in the biomedical domain has generated a large number of datasets that are publicly accessible. The Big Data to Knowledge data indexing project, biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE), has gathered these datasets in a one-stop portal aiming at facilitating their reuse for accelerating scientific advances. However, as the number of biomedical datasets stored and indexed increases, it becomes more and more challenging to retrieve the relevant datasets according to researchers' queries. In this article, we propose an information retrieval (IR) system to tackle this problem and implement it for the bioCADDIE Dataset Retrieval Challenge. The system leverages the unstructured texts of each dataset including the title and description for the dataset, and utilizes a state-of-the-art IR model, medical named entity extraction techniques, query expansion with deep learning-based word embeddings and a re-ranking strategy to enhance the retrieval performance. In empirical experiments, we compared the proposed system with 11 baseline systems using the bioCADDIE Dataset Retrieval Challenge datasets. The experimental results show that the proposed system outperforms other systems in terms of inference Average Precision and inference normalized Discounted Cumulative Gain, implying that the proposed system is a viable option for biomedical dataset retrieval. Database URL: https://github.com/yanshanwang/biocaddie2016mayodata.

5.
Appl Clin Inform ; 8(1): 124-136, 2017 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-28174820

RESUMO

BACKGROUND: The 2013 American College of Cardiology / American Heart Association Guidelines for the Treatment of Blood Cholesterol emphasize treatment based on cardiovascular risk. But finding time in a primary care visit to manually calculate cardiovascular risk and prescribe treatment based on risk is challenging. We developed an informatics-based clinical decision support tool, MayoExpertAdvisor, to deliver automated cardiovascular risk scores and guideline-based treatment recommendations based on patient-specific data in the electronic heath record. OBJECTIVE: To assess the impact of our clinical decision support tool on the efficiency and accuracy of clinician calculation of cardiovascular risk and its effect on the delivery of guideline-consistent treatment recommendations. METHODS: Clinicians were asked to review the EHR records of selected patients. We evaluated the amount of time and the number of clicks and keystrokes needed to calculate cardiovascular risk and provide a treatment recommendation with and without our clinical decision support tool. We also compared the treatment recommendation arrived at by clinicians with and without the use of our tool to those recommended by the guidelines. RESULTS: Clinicians saved 3 minutes and 38 seconds in completing both tasks with MayoExpertAdvisor, used 94 fewer clicks and 23 fewer key strokes, and improved accuracy from the baseline of 60.61% to 100% for both the risk score calculation and guideline-consistent treatment recommendation. CONCLUSION: Informatics solution can greatly improve the efficiency and accuracy of individualized treatment recommendations and have the potential to increase guideline compliance.


Assuntos
Anticolesterolemiantes/uso terapêutico , Colesterol/metabolismo , Sistemas de Apoio a Decisões Clínicas , Anticolesterolemiantes/farmacologia , Doenças Cardiovasculares/terapia , Registros Eletrônicos de Saúde , Atenção Primária à Saúde , Fatores de Risco , Inquéritos e Questionários
6.
Artigo em Inglês | MEDLINE | ID: mdl-27173525

RESUMO

Biological expression language (BEL) is one of the main formal representation models of biological networks. The primary source of information for curating biological networks in BEL representation has been literature. It remains a challenge to identify relevant articles and the corresponding evidence statements for curating and validating BEL statements. In this paper, we describe BELTracker, a tool used to retrieve and rank evidence sentences from PubMed abstracts and full-text articles for a given BEL statement (per the 2015 task requirements of BioCreative V BEL Task). The system is comprised of three main components, (i) translation of a given BEL statement to an information retrieval (IR) query, (ii) retrieval of relevant PubMed citations and (iii) finding and ranking the evidence sentences in those citations. BELTracker uses a combination of multiple approaches based on traditional IR, machine learning, and heuristics to accomplish the task. The system identified and ranked at least one fully relevant evidence sentence in the top 10 retrieved sentences for 72 out of 97 BEL statements in the test set. BELTracker achieved a precision of 0.392, 0.532 and 0.615 when evaluated with three criteria, namely full, relaxed and context criteria, respectively, by the task organizers. Our team at Mayo Clinic was the only participant in this task. BELTracker is available as a RESTful API and is available for public use.Database URL: http://www.openbionlp.org:8080/BelTracker/finder/Given_BEL_Statement.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Internet , Processamento de Linguagem Natural , Software , Curadoria de Dados , Semântica
7.
Stud Health Technol Inform ; 216: 539-43, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26262109

RESUMO

Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.


Assuntos
Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/classificação , MEDLINE , Processamento de Linguagem Natural , Publicações Periódicas como Assunto/classificação , Semântica , Humanos , Aprendizado de Máquina , Avaliação das Necessidades , Ciência
8.
Stud Health Technol Inform ; 216: 1033-4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26262333

RESUMO

In clinical NLP, one major barrier to adopting crowdsourcing for NLP annotation is the issue of confidentiality for protected health information (PHI) in clinical narratives. In this paper, we investigated the use of a frequency-based approach to extract sentences without PHI. Our approach is based on the assumption that sentences appearing frequently tend to contain no PHI. Both manual and automatic evaluations on 500 sentences out of the 7.9 million sentences of frequencies higher than one show that no PHI can be found among them. The promising results provide potentials of releasing those sentences for obtaining sentence-level NLP annotations via crowdsourcing.


Assuntos
Crowdsourcing/métodos , Interpretação Estatística de Dados , Registros Eletrônicos de Saúde/classificação , Aprendizado de Máquina , Processamento de Linguagem Natural , Semântica , Idioma , Minnesota , Reconhecimento Automatizado de Padrão/métodos , Terminologia como Assunto , Vocabulário Controlado
9.
J Am Med Inform Assoc ; 20(5): 836-42, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23558168

RESUMO

BACKGROUND: Temporal information detection systems have been developed by the Mayo Clinic for the 2012 i2b2 Natural Language Processing Challenge. OBJECTIVE: To construct automated systems for EVENT/TIMEX3 extraction and temporal link (TLINK) identification from clinical text. MATERIALS AND METHODS: The i2b2 organizers provided 190 annotated discharge summaries as the training set and 120 discharge summaries as the test set. Our Event system used a conditional random field classifier with a variety of features including lexical information, natural language elements, and medical ontology. The TIMEX3 system employed a rule-based method using regular expression pattern match and systematic reasoning to determine normalized values. The TLINK system employed both rule-based reasoning and machine learning. All three systems were built in an Apache Unstructured Information Management Architecture framework. RESULTS: Our TIMEX3 system performed the best (F-measure of 0.900, value accuracy 0.731) among the challenge teams. The Event system produced an F-measure of 0.870, and the TLINK system an F-measure of 0.537. CONCLUSIONS: Our TIMEX3 system demonstrated good capability of regular expression rules to extract and normalize time information. Event and TLINK machine learning systems required well-defined feature sets to perform well. We could also leverage expert knowledge as part of the machine learning features to further improve TLINK identification performance.


Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Sumários de Alta do Paciente Hospitalar , Humanos , Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA