ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data.

Warner, Jeremy L; Levy, Mia A; Neuss, Michael N; Warner, Jeremy L; Levy, Mia A; Neuss, Michael N

Warner, Jeremy L; Levy, Mia A; Neuss, Michael N; Warner, Jeremy L; Levy, Mia A; Neuss, Michael N.

Afiliação

Warner JL; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN jeremy.warner@vanderbilt.edu.
Levy MA; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN.
Neuss MN; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN.
Warner JL; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN jeremy.warner@vanderbilt.edu.
Levy MA; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN.
Neuss MN; Vanderbilt University; and Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN.

J Oncol Pract ; 12(2): 157-8; e169-7, 2016 Feb.

Article em En | MEDLINE | ID: mdl-26306621

ABSTRACT

ABSTRACT

PURPOSE:

Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs). Such documentation results in tedious and time-consuming abstraction efforts by tumor registrars and other secondary users. This information may be amenable to extraction by automated methods.

METHODS:

We developed a natural language processing algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR. These methods were developed in a training set of patients with lung cancer, independently validated in a test set of patients with lung cancer, and compared with the gold standard of Vanderbilt Cancer Registrydetermined stage (when available).

RESULTS:

In the combined data set of 2,323 patients (training set, n = 1,103; validation set, n = 1,220), 751,880 documents were analyzed. A stage statement was extracted from 2,239 (98.6%) patient EHRs (median, 24 documents per patient). Stage discordance was common, affecting 83.6% of these EHRs. Nevertheless, algorithmically derived stage accuracy was high in the validation set (κ = 0.906; 95% CI, 0.873 to 0.939), when including notes generated within 14 weeks from diagnosis.

CONCLUSION:

Accurate stage determination can be achieved through automated methods applied to narrative text, despite the frequent presence of discordance in such data. Our results also indicate that stage can be automatically captured in a shorter timeframe than the 6-month window used by cancer registries, as early as 5 weeks from diagnosis. These methods may be generalizable to large narrative cancer data sets.

Assuntos

Registros Eletrônicos de Saúde; Processamento de Linguagem Natural; Estadiamento de Neoplasias; Neoplasias/diagnóstico; Neoplasias/mortalidade; Algoritmos; Conjuntos de Dados como Assunto; Humanos; Metástase Neoplásica; Prognóstico; Reprodutibilidade dos Testes

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Registros Eletrônicos de Saúde / Estadiamento de Neoplasias / Neoplasias Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google