A Transparent and Adaptable Method to Extract Colonoscopy and Pathology Data Using Natural Language Processing.

Fevrier, Helene B; Liu, Liyan; Herrinton, Lisa J; Li, Dan

Fevrier, Helene B; Liu, Liyan; Herrinton, Lisa J; Li, Dan.

Afiliação

Fevrier HB; Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
Liu L; Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.
Herrinton LJ; Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA. Lisa.Herrinton@Kp.org.
Li D; Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA.

J Med Syst ; 44(9): 151, 2020 Jul 31.

Article em En | MEDLINE | ID: mdl-32737597

RESUMO

Key variables recorded as text in colonoscopy and pathology reports have been extracted using natural language processing (NLP) tools that were not easily adaptable to new settings. We aimed to develop a reliable NLP tool with broad adaptability. During 1996-2016, Kaiser Permanente Northern California performed 401,566 colonoscopies with linked pathology. We randomly sampled 1000 linked reports into a Training Set and developed an NLP tool using SAS® PERL regular expressions. The NLP tool captured five colonoscopy and pathology variables: type, size, and location of polyps; extent of procedure; and quality of bowel preparation. We used a Validation Set (N = 3000) to confirm the variables' classifications using manual chart review as the reference. Performance of the NLP tool was assessed using the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Cohen's κ. Cohen's κ ranged from 93 to 99%. The sensitivity and specificity ranged from 95 to 100% across all categories. For categories with prevalence exceeding 10%, the PPV ranged from 97% to 100% except for adequate quality of preparation (prevalence 92%), for which the PPV was 65%. For categories with prevalence below 10%, the PPVs ranged from 62% to 100%. NPVs ranged from 94% to 100% except for the "complete" extent of procedure, for which the NPV was 73%. Using information from a large community-based population, we developed a transparent and adaptable NLP tool for extracting five colonoscopy and pathology variables. The tool can be readily tested in other healthcare settings.

Assuntos

Colonoscopia; Processamento de Linguagem Natural; Análise de Dados; Atenção à Saúde; Humanos; Valor Preditivo dos Testes; Sensibilidade e Especificidade

Palavras-chave

Colonoscopy; Natural language processing; Pathology report

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Processamento de Linguagem Natural / Colonoscopia Tipo de estudo: Diagnostic_studies / Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Revista: J Med Syst Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google