Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 144: 104432, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37356640

RESUMO

BACKGROUND: An accurate medication history, foundational for providing quality medical care, requires understanding of medication change events documented in clinical notes. However, extracting medication changes without the necessary clinical context is insufficient for real-world applications. METHODS: To address this need, Track 1 of the 2022 National NLP Clinical Challenges focused on extracting the context for medication changes documented in clinical notes using the Contextualized Medication Event Dataset. Track 1 consisted of 3 subtasks: extracting medication mentions from clinical notes (NER), determining whether a medication change is being discussed (Event), and determining the action, negation, temporality, certainty, and actor for any change events (Context). Participants were allowed to participate in any one or more of the subtasks. RESULTS: A total of 32 teams with participants from 19 countries submitted a total of 211 systems across all subtasks. Most teams formulated NER as a token classification task and Event and Context as multi-class classification tasks, using transformer-based large language models. Overall, performance for NER was high across submitted systems. However, performance for Event and Context were much lower, often due to indirectly stated change events with no clear action verb, events requiring farther textual clues for understanding, and medication mentions with multiple change events. CONCLUSIONS: This shared task showed that while NLP research on medication extraction is relatively mature, understanding of contextual information surrounding medication events in clinical notes is still an open problem requiring further research to achieve the end goal of supporting real-world clinical applications.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Humanos , Idioma
2.
J Biomed Inform ; 142: 104346, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-37061012

RESUMO

Daily progress notes are a common note type in the electronic health record (EHR) where healthcare providers document the patient's daily progress and treatment plans. The EHR is designed to document all the care provided to patients, but it also enables note bloat with extraneous information that distracts from the diagnoses and treatment plans. Applications of natural language processing (NLP) in the EHR is a growing field with the majority of methods in information extraction. Few tasks use NLP methods for downstream diagnostic decision support. We introduced the 2022 National NLP Clinical Challenge (N2C2) Track 3: Progress Note Understanding - Assessment and Plan Reasoning as one step towards a new suite of tasks. The Assessment and Plan Reasoning task focuses on the most critical components of progress notes, Assessment and Plan subsections where health problems and diagnoses are contained. The goal of the task was to develop and evaluate NLP systems that automatically predict causal relations between the overall status of the patient contained in the Assessment section and its relation to each component of the Plan section which contains the diagnoses and treatment plans. The goal of the task was to identify and prioritize diagnoses as the first steps in diagnostic decision support to find the most relevant information in long documents like daily progress notes. We present the results of the 2022 N2C2 Track 3 and provide a description of the data, evaluation, participation and system performance.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Humanos , Processamento de Linguagem Natural , Pessoal de Saúde
3.
J Biomed Inform ; 139: 104302, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36754129

RESUMO

An accurate and detailed account of patient medications, including medication changes within the patient timeline, is essential for healthcare providers to provide appropriate patient care. Healthcare providers or the patients themselves may initiate changes to patient medication. Medication changes take many forms, including prescribed medication and associated dosage modification. These changes provide information about the overall health of the patient and the rationale that led to the current care. Future care can then build on the resulting state of the patient. This work explores the automatic extraction of medication change information from free-text clinical notes. The Contextual Medication Event Dataset (CMED) is a corpus of clinical notes with annotations that characterize medication changes through multiple change-related attributes, including the type of change (start, stop, increase, etc.), initiator of the change, temporality, change likelihood, and negation. Using CMED, we identify medication mentions in clinical text and propose three novel high-performing BERT-based systems that resolve the annotated medication change characteristics. We demonstrate that our proposed systems improve medication change classification performance over the initial work exploring CMED.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos , Narração
4.
J Biomed Inform ; 110: 103552, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32890727

RESUMO

Adverse drug events (ADEs) are unintended incidents that involve the taking of a medication. ADEs pose significant health and financial problems worldwide. Information about ADEs can inform health care and improve patient safety. However, much of this information is buried in narrative texts and needs to be extracted with Natural Language Processing techniques, in order to be useful to computerized methods. ADEs can be found on drug labels, contained in the different sections such as descriptions of the drug's active components or more prominently in descriptions of studied side-effects. Extracting these automatically could be useful in triaging and processing drug reports. In this paper, we present three base methods consisting of a Conditional Random Field (CRF), a bi-directional Long Short Term Memory unit with a CRF layer (biLSTM+CRF), and a pre-trained Bi-directional Encoder Representations from Transformers (BERT) model. We also present several ensembles of the CRF and biLSTM+CRF methods for extracting ADEs and their Reason from FDA drug labels. We show that all three methods perform well on our task, and that combining the models through different ensemble methods can improve results, providing increases in recall for the majority class and improving precision for all other classes. We also show the potential of framing ADE extraction from drug labels as a multi-class classification task on the Reason, or type, of ADE.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Preparações Farmacêuticas , Rotulagem de Medicamentos , Humanos , Processamento de Linguagem Natural
5.
Brief Bioinform ; 18(1): 160-178, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-26851224

RESUMO

Research on extracting biomedical relations has received growing attention recently, with numerous biological and clinical applications including those in pharmacogenomics, clinical trial screening and adverse drug reaction detection. The ability to accurately capture both semantic and syntactic structures in text expressing these relations becomes increasingly critical to enable deep understanding of scientific papers and clinical narratives. Shared task challenges have been organized by both bioinformatics and clinical informatics communities to assess and advance the state-of-the-art research. Significant progress has been made in algorithm development and resource construction. In particular, graph-based approaches bridge semantics and syntax, often achieving the best performance in shared tasks. However, a number of problems at the frontiers of biomedical relation extraction continue to pose interesting challenges and present opportunities for great improvement and fruitful research. In this article, we place biomedical relation extraction against the backdrop of its versatile applications, present a gentle introduction to its general pipeline and shared resources, review the current state-of-the-art in methodology advancement, discuss limitations and point out several promising future directions.


Assuntos
Semântica , Algoritmos , Biologia Computacional , Mineração de Dados , Humanos
7.
J Biomed Inform ; 75S: S4-S18, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28614702

RESUMO

The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.


Assuntos
Anonimização de Dados , Prontuários Médicos , Transtornos Mentais , Mineração de Dados , Registros Eletrônicos de Saúde , Humanos
8.
J Biomed Inform ; 72: 23-32, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28663072

RESUMO

Coronary Artery Disease (CAD) is not only the most common form of heart disease, but also the leading cause of death in both men and women (Coronary Artery Disease: MedlinePlus, 2015). We present a system that is able to automatically predict whether patients develop coronary artery disease based on their narrative medical histories, i.e., clinical free text. Although the free text in medical records has been used in several studies for identifying risk factors of coronary artery disease, to the best of our knowledge our work marks the first attempt at automatically predicting development of CAD. We tackle this task on a small corpus of diabetic patients. The size of this corpus makes it important to limit the number of features in order to avoid overfitting. We propose an ontology-guided approach to feature extraction, and compare it with two classic feature selection techniques. Our system achieves state-of-the-art performance of 77.4% F1 score.


Assuntos
Doença da Artéria Coronariana , Narração , Processamento de Linguagem Natural , Vocabulário Controlado , Feminino , Previsões , Humanos , Masculino , Prognóstico
9.
J Biomed Inform ; 72: 60-66, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28684255

RESUMO

In medical practices, doctors detail patients' care plan via discharge summaries written in the form of unstructured free texts, which among the others contain medication names and prescription information. Extracting prescriptions from discharge summaries is challenging due to the way these documents are written. Handwritten rules and medical gazetteers have proven to be useful for this purpose but come with limitations on performance, scalability, and generalizability. We instead present a machine learning approach to extract and organize medication names and prescription information into individual entries. Our approach utilizes word embeddings and tackles the task in two extraction steps, both of which are treated as sequence labeling problems. When evaluated on the 2009 i2b2 Challenge official benchmark set, the proposed approach achieves a horizontal phrase-level F1-measure of 0.864, which to the best of our knowledge represents an improvement over the current state-of-the-art.


Assuntos
Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Processamento de Linguagem Natural , Prescrições , Humanos , Sumários de Alta do Paciente Hospitalar
10.
J Biomed Inform ; 75S: S62-S70, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28455151

RESUMO

The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.


Assuntos
Registros Eletrônicos de Saúde , Transtornos Mentais , Doenças do Sistema Nervoso , Humanos , Processamento de Linguagem Natural , Índice de Gravidade de Doença
12.
J Biomed Inform ; 58 Suppl: S20-S29, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26319540

RESUMO

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.


Assuntos
Segurança Computacional , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Estudos de Coortes , Confidencialidade , Documentação/métodos , Vocabulário Controlado
13.
J Biomed Inform ; 58 Suppl: S78-S91, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26004790

RESUMO

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Complicações do Diabetes/epidemiologia , Documentação/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Mineração de Dados/métodos , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , New York/epidemiologia , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
14.
J Biomed Inform ; 58 Suppl: S92-S102, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26241355

RESUMO

Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.


Assuntos
Doenças Cardiovasculares/diagnóstico , Diabetes Mellitus/diagnóstico , Diagnóstico por Computador/métodos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Obesidade/diagnóstico , Doenças Cardiovasculares/epidemiologia , Mineração de Dados/métodos , Sistemas de Apoio a Decisões Clínicas/organização & administração , Humanos , New York , Reconhecimento Automatizado de Padrão/métodos , Fenótipo , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Máquina de Vetores de Suporte
15.
J Biomed Inform ; 58 Suppl: S11-S19, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26225918

RESUMO

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.


Assuntos
Segurança Computacional , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Estudos de Coortes , Confidencialidade , Estudos Longitudinais , Vocabulário Controlado
16.
J Biomed Inform ; 58 Suppl: S6-S10, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26433122

RESUMO

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Medição de Risco/métodos , Vocabulário Controlado
17.
J Biomed Inform ; 58 Suppl: S67-S77, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26210362

RESUMO

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
18.
J Biomed Inform ; 58 Suppl: S189-S196, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26210361

RESUMO

OBJECTIVE: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. MATERIALS AND METHODS: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. RESULTS: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. DISCUSSION: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.


Assuntos
Atitude Frente aos Computadores , Mineração de Dados/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Software , Mineração de Dados/métodos , Humanos , Pessoa de Meia-Idade , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA