Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 16 de 16
Filtrar
1.
J Biomed Inform ; 75S: S4-S18, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28614702

RESUMO

The 2016 CEGS N-GRID shared tasks for clinical records contained three tracks. Track 1 focused on de-identification of a new corpus of 1000 psychiatric intake records. This track tackled de-identification in two sub-tracks: Track 1.A was a "sight unseen" task, where nine teams ran existing de-identification systems, without any modifications or training, on 600 new records in order to gauge how well systems generalize to new data. The best-performing system for this track scored an F1 of 0.799. Track 1.B was a traditional Natural Language Processing (NLP) shared task on de-identification, where 15 teams had two months to train their systems on the new data, then test it on an unannotated test set. The best-performing system from this track scored an F1 of 0.914. The scores for Track 1.A show that unmodified existing systems do not generalize well to new data without the benefit of training data. The scores for Track 1.B are slightly lower than the 2014 de-identification shared task (which was almost identical to 2016 Track 1.B), indicating that these new psychiatric records pose a more difficult challenge to NLP systems. Overall, de-identification is still not a solved problem, though it is important to the future of clinical NLP.


Assuntos
Anonimização de Dados , Prontuários Médicos , Transtornos Mentais , Mineração de Dados , Registros Eletrônicos de Saúde , Humanos
2.
J Biomed Inform ; 75S: S62-S70, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28455151

RESUMO

The second track of the CEGS N-GRID 2016 natural language processing shared tasks focused on predicting symptom severity from neuropsychiatric clinical records. For the first time, initial psychiatric evaluation records have been collected, de-identified, annotated and shared with the scientific community. One-hundred-ten researchers organized in twenty-four teams participated in this track and submitted sixty-five system runs for evaluation. The top ten teams each achieved an inverse normalized macro-averaged mean absolute error score over 0.80. The top performing system employed an ensemble of six different machine learning-based classifiers to achieve a score 0.86. The task resulted to be generally easy with the exception of two specific classes of records: records with very few but crucial positive valence signals, and records describing patients predominantly affected by negative rather than positive valence. Those cases proved to be very challenging for most of the systems. Further research is required to consider the task solved. Overall, the results of this track demonstrate the effectiveness of data-driven approaches to the task of symptom severity classification.


Assuntos
Registros Eletrônicos de Saúde , Transtornos Mentais , Doenças do Sistema Nervoso , Humanos , Processamento de Linguagem Natural , Índice de Gravidade de Doença
3.
J Biomed Inform ; 58 Suppl: S78-S91, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26004790

RESUMO

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on identifying risk factors for heart disease (specifically, Cardiac Artery Disease) in clinical narratives. For this track, we used a "light" annotation paradigm to annotate a set of 1304 longitudinal medical records describing 296 patients for risk factors and the times they were present. We designed the annotation task for this track with the goal of balancing annotation load and time with quality, so as to generate a gold standard corpus that can benefit a clinically-relevant task. We applied light annotation procedures and determined the gold standard using majority voting. On average, the agreement of annotators with the gold standard was above 0.95, indicating high reliability. The resulting document-level annotations generated for each record in each longitudinal EMR in this corpus provide information that can support studies of progression of heart disease risk factors in the included patients over time. These annotations were used in the Risk Factor track of the 2014 i2b2/UTHealth shared task. Participating systems achieved a mean micro-averaged F1 measure of 0.815 and a maximum F1 measure of 0.928 for identifying these risk factors in patient records.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Complicações do Diabetes/epidemiologia , Documentação/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Mineração de Dados/métodos , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , New York/epidemiologia , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
4.
J Biomed Inform ; 58 Suppl: S20-S29, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26319540

RESUMO

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.


Assuntos
Segurança Computacional , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Estudos de Coortes , Confidencialidade , Documentação/métodos , Vocabulário Controlado
5.
J Biomed Inform ; 58 Suppl: S11-S19, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26225918

RESUMO

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.


Assuntos
Segurança Computacional , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Estudos de Coortes , Confidencialidade , Estudos Longitudinais , Vocabulário Controlado
6.
J Biomed Inform ; 58 Suppl: S67-S77, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26210362

RESUMO

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
7.
J Biomed Inform ; 58 Suppl: S6-S10, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26433122

RESUMO

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Masculino , Pessoa de Meia-Idade , Medição de Risco/métodos , Vocabulário Controlado
8.
J Biomed Inform ; 58 Suppl: S189-S196, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26210361

RESUMO

OBJECTIVE: In recognition of potential barriers that may inhibit the widespread adoption of biomedical software, the 2014 i2b2 Challenge introduced a special track, Track 3 - Software Usability Assessment, in order to develop a better understanding of the adoption issues that might be associated with the state-of-the-art clinical NLP systems. This paper reports the ease of adoption assessment methods we developed for this track, and the results of evaluating five clinical NLP system submissions. MATERIALS AND METHODS: A team of human evaluators performed a series of scripted adoptability test tasks with each of the participating systems. The evaluation team consisted of four "expert evaluators" with training in computer science, and eight "end user evaluators" with mixed backgrounds in medicine, nursing, pharmacy, and health informatics. We assessed how easy it is to adopt the submitted systems along the following three dimensions: communication effectiveness (i.e., how effective a system is in communicating its designed objectives to intended audience), effort required to install, and effort required to use. We used a formal software usability testing tool, TURF, to record the evaluators' interactions with the systems and 'think-aloud' data revealing their thought processes when installing and using the systems and when resolving unexpected issues. RESULTS: Overall, the ease of adoption ratings that the five systems received are unsatisfactory. Installation of some of the systems proved to be rather difficult, and some systems failed to adequately communicate their designed objectives to intended adopters. Further, the average ratings provided by the end user evaluators on ease of use and ease of interpreting output are -0.35 and -0.53, respectively, indicating that this group of users generally deemed the systems extremely difficult to work with. While the ratings provided by the expert evaluators are higher, 0.6 and 0.45, respectively, these ratings are still low indicating that they also experienced considerable struggles. DISCUSSION: The results of the Track 3 evaluation show that the adoptability of the five participating clinical NLP systems has a great margin for improvement. Remedy strategies suggested by the evaluators included (1) more detailed and operation system specific use instructions; (2) provision of more pertinent onscreen feedback for easier diagnosis of problems; (3) including screen walk-throughs in use instructions so users know what to expect and what might have gone wrong; (4) avoiding jargon and acronyms in materials intended for end users; and (5) packaging prerequisites required within software distributions so that prospective adopters of the software do not have to obtain each of the third-party components on their own.


Assuntos
Atitude Frente aos Computadores , Mineração de Dados/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Software , Mineração de Dados/métodos , Humanos , Pessoa de Meia-Idade , Interface Usuário-Computador
12.
J Am Med Inform Assoc ; 27(1): 3-12, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31584655

RESUMO

OBJECTIVE: This article summarizes the preparation, organization, evaluation, and results of Track 2 of the 2018 National NLP Clinical Challenges shared task. Track 2 focused on extraction of adverse drug events (ADEs) from clinical records and evaluated 3 tasks: concept extraction, relation classification, and end-to-end systems. We perform an analysis of the results to identify the state of the art in these tasks, learn from it, and build on it. MATERIALS AND METHODS: For all tasks, teams were given raw text of narrative discharge summaries, and in all the tasks, participants proposed deep learning-based methods with hand-designed features. In the concept extraction task, participants used sequence labelling models (bidirectional long short-term memory being the most popular), whereas in the relation classification task, they also experimented with instance-based classifiers (namely support vector machines and rules). Ensemble methods were also popular. RESULTS: A total of 28 teams participated in task 1, with 21 teams in tasks 2 and 3. The best performing systems set a high performance bar with F1 scores of 0.9418 for concept extraction, 0.9630 for relation classification, and 0.8905 for end-to-end. However, the results were much lower for concepts and relations of Reasons and ADEs. These were often missed because local context is insufficient to identify them. CONCLUSIONS: This challenge shows that clinical concept extraction and relation classification systems have a high performance for many concept types, but significant improvement is still required for ADEs and Reasons. Incorporating the larger context or outside knowledge will likely improve the performance of future systems.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Máquina de Vetores de Suporte , Conjuntos de Dados como Assunto , Humanos , Sumários de Alta do Paciente Hospitalar
13.
J Am Med Inform Assoc ; 26(11): 1163-1171, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31562516

RESUMO

OBJECTIVE: Track 1 of the 2018 National NLP Clinical Challenges shared tasks focused on identifying which patients in a corpus of longitudinal medical records meet and do not meet identified selection criteria. MATERIALS AND METHODS: To address this challenge, we annotated American English clinical narratives for 288 patients according to whether they met these criteria. We chose criteria from existing clinical trials that represented a variety of natural language processing tasks, including concept extraction, temporal reasoning, and inference. RESULTS: A total of 47 teams participated in this shared task, with 224 participants in total. The participants represented 18 countries, and the teams submitted 109 total system outputs. The best-performing system achieved a micro F1 score of 0.91 using a rule-based approach. The top 10 teams used rule-based and hybrid systems to approach the problems. DISCUSSION: Clinical narratives are open to interpretation, particularly in cases where the selection criterion may be underspecified. This leaves room for annotators to use domain knowledge and intuition in selecting patients, which may lead to error in system outputs. However, teams who consulted medical professionals while building their systems were more likely to have high recall for patients, which is preferable for patient selection systems. CONCLUSIONS: There is not yet a 1-size-fits-all solution for natural language processing systems approaching this task. Future research in this area can look to examining criteria requiring even more complex inferences, temporal reasoning, and domain knowledge.


Assuntos
Ensaios Clínicos como Assunto/métodos , Mineração de Dados/métodos , Aprendizado de Máquina , Processamento de Linguagem Natural , Seleção de Pacientes , Conjuntos de Dados como Assunto , Humanos
16.
J Am Med Inform Assoc ; 26(11): 1161-1162, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31613362
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA