Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
J Biomed Inform ; 75S: S62-S70, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28455151

RESUMO

The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.


Assuntos
Registros Eletrônicos de Saúde , Transtornos Mentais , Doenças do Sistema Nervoso , Humanos , Processamento de Linguagem Natural , Índice de Gravidade de Doença
2.
J Biomed Inform ; 75S: S4-S18, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28614702

RESUMO

The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.


Assuntos
Anonimização de Dados , Prontuários Médicos , Transtornos Mentais , Mineração de Dados , Registros Eletrônicos de Saúde , Humanos
3.
J Biomed Inform ; 72: 23-32, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28663072

RESUMO

Coronary Artery Disease (CAD) is not only the most common form of heart disease, but also the leading cause of death in both men and women (Coronary Artery Disease: MedlinePlus, 2015). We present a system that is able to automatically predict whether patients develop coronary artery disease based on their narrative medical histories, i.e., clinical free text. Although the free text in medical records has been used in several studies for identifying risk factors of coronary artery disease, to the best of our knowledge our work marks the first attempt at automatically predicting development of CAD. We tackle this task on a small corpus of diabetic patients. The size of this corpus makes it important to limit the number of features in order to avoid overfitting. We propose an ontology-guided approach to feature extraction, and compare it with two classic feature selection techniques. Our system achieves state-of-the-art performance of 77.4% F1 score.


Assuntos
Doença da Artéria Coronariana , Narração , Processamento de Linguagem Natural , Vocabulário Controlado , Feminino , Previsões , Humanos , Masculino , Prognóstico
4.
J Biomed Inform ; 72: 60-66, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28684255

RESUMO

In medical practices, doctors detail patients' care plan via discharge summaries written in the form of unstructured free texts, which among the others contain medication names and prescription information. Extracting prescriptions from discharge summaries is challenging due to the way these documents are written. Handwritten rules and medical gazetteers have proven to be useful for this purpose but come with limitations on performance, scalability, and generalizability. We instead present a machine learning approach to extract and organize medication names and prescription information into individual entries. Our approach utilizes word embeddings and tackles the task in two extraction steps, both of which are treated as sequence labeling problems. When evaluated on the 2009 i2b2 Challenge official benchmark set, the proposed approach achieves a horizontal phrase-level F1-measure of 0.864, which to the best of our knowledge represents an improvement over the current state-of-the-art.


Assuntos
Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Processamento de Linguagem Natural , Prescrições , Humanos , Sumários de Alta do Paciente Hospitalar
7.
J Am Med Inform Assoc ; 27(1): 3-12, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31584655

RESUMO

OBJECTIVE: This article summarizes the preparation, organization, evaluation, and results of Track 2 of the 2018 National NLP Clinical Challenges shared task. Track 2 focused on extraction of adverse drug events (ADEs) from clinical records and evaluated 3 tasks: concept extraction, relation classification, and end-to-end systems. We perform an analysis of the results to identify the state of the art in these tasks, learn from it, and build on it. MATERIALS AND METHODS: For all tasks, teams were given raw text of narrative discharge summaries, and in all the tasks, participants proposed deep learning-based methods with hand-designed features. In the concept extraction task, participants used sequence labelling models (bidirectional long short-term memory being the most popular), whereas in the relation classification task, they also experimented with instance-based classifiers (namely support vector machines and rules). Ensemble methods were also popular. RESULTS: A total of 28 teams participated in task 1, with 21 teams in tasks 2 and 3. The best performing systems set a high performance bar with F1 scores of 0.9418 for concept extraction, 0.9630 for relation classification, and 0.8905 for end-to-end. However, the results were much lower for concepts and relations of Reasons and ADEs. These were often missed because local context is insufficient to identify them. CONCLUSIONS: This challenge shows that clinical concept extraction and relation classification systems have a high performance for many concept types, but significant improvement is still required for ADEs and Reasons. Incorporating the larger context or outside knowledge will likely improve the performance of future systems.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Máquina de Vetores de Suporte , Conjuntos de Dados como Assunto , Humanos , Sumários de Alta do Paciente Hospitalar
8.
Stud Health Technol Inform ; 264: 218-222, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437917

RESUMO

De-identification aims to remove 18 categories of protected health information from electronic health records. Ideally, de-identification systems should be reliable and generalizable. Previous research has focused on improving performance but has not examined generalizability. This paper investigates both performance and generalizability. To improve current state-of-the-art performance based on long short-term memory (LSTM) units, we introduce a system that uses gated recurrent units (GRUs) and deep contextualized word representations, both of which have never been applied to de-identification. We measure performance and generalizability of each system using the 2014 i2b2/UTHealth and 2016 CEGS N-GRID de-identification datasets. We show that deep contextualized word representations improve state-of-the-art performance, while the benefit of switching LSTM units with GRUs is not significant. The generalizability of de-identification system significantly improved with deep contextualized word representations; in addition, LSTM units-based system is more generalizable than the GRUs-based system.


Assuntos
Anonimização de Dados , Registros Eletrônicos de Saúde
9.
J Am Med Inform Assoc ; 26(11): 1163-1171, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31562516

RESUMO

OBJECTIVE: Track 1 of the 2018 National NLP Clinical Challenges shared tasks focused on identifying which patients in a corpus of longitudinal medical records meet and do not meet identified selection criteria. MATERIALS AND METHODS: To address this challenge, we annotated American English clinical narratives for 288 patients according to whether they met these criteria. We chose criteria from existing clinical trials that represented a variety of natural language processing tasks, including concept extraction, temporal reasoning, and inference. RESULTS: A total of 47 teams participated in this shared task, with 224 participants in total. The participants represented 18 countries, and the teams submitted 109 total system outputs. The best-performing system achieved a micro F1 score of 0.91 using a rule-based approach. The top 10 teams used rule-based and hybrid systems to approach the problems. DISCUSSION: Clinical narratives are open to interpretation, particularly in cases where the selection criterion may be underspecified. This leaves room for annotators to use domain knowledge and intuition in selecting patients, which may lead to error in system outputs. However, teams who consulted medical professionals while building their systems were more likely to have high recall for patients, which is preferable for patient selection systems. CONCLUSIONS: There is not yet a 1-size-fits-all solution for natural language processing systems approaching this task. Future research in this area can look to examining criteria requiring even more complex inferences, temporal reasoning, and domain knowledge.


Assuntos
Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Aprendizado de Máquina , Processamento de Linguagem Natural , Seleção de Pacientes , Conjuntos de Dados como Assunto , Humanos
10.
Stud Health Technol Inform ; 264: 388-392, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437951

RESUMO

Prescription information and adverse drug reactions (ADR) are two components of detailed medication instructions that can benefit many aspects of clinical research. Automatic extraction of this information from free-text narratives via Information Extraction (IE) can open it up to downstream uses. IE is commonly tackled by supervised Natural Language Processing (NLP) systems which rely on annotated training data. However, training data generation is manual, time-consuming, and labor-intensive. It is desirable to develop automatic methods for augmenting manually labeled data. We propose pseudo-data generation as one such automatic method. Pseudo-data are synthetic data generated by combining elements of existing labeled data. We propose and evaluate two sets of pseudo-data generation methods: knowledge-driven methods based on gazetteers and data-driven methods based on deep learning. We use the resulting pseudo-data to improve medication and ADR extraction. Data-driven pseudo-data are suitable for concept categories with high semantic regularities and short textual spans. Knowledge-driven pseudo-data are effective for concept categories with longer textual spans, assuming the knowledge base offers good coverage of these concepts. Combining the knowledge- and data-driven pseudo-data achieves significant performance improvement on medication names and ADRs over baselines limited to the use of available labeled data.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Processamento de Linguagem Natural , Prescrições de Medicamentos , Armazenamento e Recuperação da Informação , Bases de Conhecimento , Semântica
11.
Yearb Med Inform ; 27(1): 184-192, 2018 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30157522

RESUMO

OBJECTIVES: To review the latest scientific challenges organized in clinical Natural Language Processing (NLP) by highlighting the tasks, the most effective methodologies used, the data, and the sharing strategies. METHODS: We harvested the literature by using Google Scholar and PubMed Central to retrieve all shared tasks organized since 2015 on clinical NLP problems on English data. RESULTS: We surveyed 17 shared tasks. We grouped the data into four types (synthetic, drug labels, social data, and clinical data) which are correlated with size and sensitivity. We found named entity recognition and classification to be the most common tasks. Most of the methods used to tackle the shared tasks have been data-driven. There is homogeneity in the methods used to tackle the named entity recognition tasks, while more diverse solutions are investigated for relation extraction, multi-class classification, and information retrieval problems. CONCLUSIONS: There is a clear trend in using data-driven methods to tackle problems in clinical NLP. The availability of more and varied data from different institutions will undoubtedly lead to bigger advances in the field, for the benefit of healthcare as a whole.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Sistemas de Notificação de Reações Adversas a Medicamentos , Confidencialidade , Rotulagem de Medicamentos , Mídias Sociais
12.
AMIA Annu Symp Proc ; 2018: 1534-1543, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815199

RESUMO

Prescription information is an important component of electronic health records (EHRs). This information contains detailed medication instructions that are crucial for patients' well-being and is often detailed in the narrative portions of EHRs. As a result, narratives of EHRs need to be processed with natural language processing (NLP) methods that can extract medication and prescription information from free text. However, automatic methods for medication and prescription extraction from narratives face two major challenges: (1) dictionaries can fall short even when identifying well-defined and syntactically consistent categories of medication entities, (2) some categories of medication entities are sparse, and at the same time lexically (and syntactically) diverse. In this paper, we describe FABLE, a system for automatically extracting prescription information from discharge summaries. FABLE utilizes unannotated data to enhance annotated training data: it performs semi-supervised extraction of medication information using pseudo-labels with Conditional Random Fields (CRFs) to improve its understanding of incomplete, sparse, and diverse medication entities. When evaluated against the official benchmark set from the 2009 i2b2 Shared Task and Workshop on Medication Extraction, FABLE achieves a horizontal phrase-level F1-measure of 0.878, giving state-of-the-art performance and significantly improving on nearly all entity categories.


Assuntos
Prescrições de Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Conjuntos de Dados como Assunto , Humanos , Sumários de Alta do Paciente Hospitalar , Aprendizado de Máquina Supervisionado
13.
PLoS One ; 11(6): e0157989, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27331905

RESUMO

Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.


Assuntos
Biologia Computacional/métodos , Mineração de Dados , Bases de Dados Genéticas , PubMed , Software , Biologia , Análise por Conglomerados , Medicina , Modelos Teóricos , Publicações Periódicas como Assunto , Reprodutibilidade dos Testes , Fatores de Tempo
14.
J Am Med Inform Assoc ; 20(5): 859-66, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23605114

RESUMO

OBJECTIVE: Identification of clinical events (eg, problems, tests, treatments) and associated temporal expressions (eg, dates and times) are key tasks in extracting and managing data from electronic health records. As part of the i2b2 2012 Natural Language Processing for Clinical Data challenge, we developed and evaluated a system to automatically extract temporal expressions and events from clinical narratives. The extracted temporal expressions were additionally normalized by assigning type, value, and modifier. MATERIALS AND METHODS: The system combines rule-based and machine learning approaches that rely on morphological, lexical, syntactic, semantic, and domain-specific features. Rule-based components were designed to handle the recognition and normalization of temporal expressions, while conditional random fields models were trained for event and temporal recognition. RESULTS: The system achieved micro F scores of 90% for the extraction of temporal expressions and 87% for clinical event extraction. The normalization component for temporal expressions achieved accuracies of 84.73% (expression's type), 70.44% (value), and 82.75% (modifier). DISCUSSION: Compared to the initial agreement between human annotators (87-89%), the system provided comparable performance for both event and temporal expression mining. While (lenient) identification of such mentions is achievable, finding the exact boundaries proved challenging. CONCLUSIONS: The system provides a state-of-the-art method that can be used to support automated identification of mentions of clinical events and temporal expressions in narratives either to support the manual review process or as a part of a large-scale processing of electronic health databases.


Assuntos
Inteligência Artificial , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Humanos , Processamento de Linguagem Natural , Tempo , Pesquisa Translacional Biomédica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA