Moving Biosurveillance Beyond Coded Data Using AI for Symptom Detection From Physician Notes: Retrospective Cohort Study.

McMurry, Andrew J; Zipursky, Amy R; Geva, Alon; Olson, Karen L; Jones, James R; Ignatov, Vladimir; Miller, Timothy A; Mandl, Kenneth D

McMurry, Andrew J; Zipursky, Amy R; Geva, Alon; Olson, Karen L; Jones, James R; Ignatov, Vladimir; Miller, Timothy A; Mandl, Kenneth D.

Afiliação

McMurry AJ; Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Zipursky AR; Department of Pediatrics, Harvard Medical School, Boston, MA, United States.
Geva A; Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Olson KL; Division of Pediatric Emergency Medicine, Department of Pediatrics, The Hospital for Sick Children, Toronto, ON, Canada.
Jones JR; Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.
Ignatov V; Division of Critical Care Medicine, Department of Anesthesiology, Critical Care, and Pain Medicine, Boston Children's Hospital, Boston, MA, United States.
Miller TA; Department of Anaesthesia, Harvard Medical School, Boston, MA, United States.
Mandl KD; Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, United States.

J Med Internet Res ; 26: e53367, 2024 Apr 04.

Article em En | MEDLINE | ID: mdl-38573752

ABSTRACT

ABSTRACT

BACKGROUND:

Real-time surveillance of emerging infectious diseases necessitates a dynamically evolving, computable case definition, which frequently incorporates symptom-related criteria. For symptom detection, both population health monitoring platforms and research initiatives primarily depend on structured data extracted from electronic health records.

OBJECTIVE:

This study sought to validate and test an artificial intelligence (AI)-based natural language processing (NLP) pipeline for detecting COVID-19 symptoms from physician notes in pediatric patients. We specifically study patients presenting to the emergency department (ED) who can be sentinel cases in an outbreak.

METHODS:

Subjects in this retrospective cohort study are patients who are 21 years of age and younger, who presented to a pediatric ED at a large academic children's hospital between March 1, 2020, and May 31, 2022. The ED notes for all patients were processed with an NLP pipeline tuned to detect the mention of 11 COVID-19 symptoms based on Centers for Disease Control and Prevention (CDC) criteria. For a gold standard, 3 subject matter experts labeled 226 ED notes and had strong agreement (F1-score=0.986; positive predictive value [PPV]=0.972; and sensitivity=1.0). F1-score, PPV, and sensitivity were used to compare the performance of both NLP and the International Classification of Diseases, 10th Revision (ICD-10) coding to the gold standard chart review. As a formative use case, variations in symptom patterns were measured across SARS-CoV-2 variant eras.

RESULTS:

There were 85,678 ED encounters during the study period, including 4% (n=3420) with patients with COVID-19. NLP was more accurate at identifying encounters with patients that had any of the COVID-19 symptoms (F1-score=0.796) than ICD-10 codes (F1-score =0.451). NLP accuracy was higher for positive symptoms (sensitivity=0.930) than ICD-10 (sensitivity=0.300). However, ICD-10 accuracy was higher for negative symptoms (specificity=0.994) than NLP (specificity=0.917). Congestion or runny nose showed the highest accuracy difference (NLP F1-score=0.828 and ICD-10 F1-score=0.042). For encounters with patients with COVID-19, prevalence estimates of each NLP symptom differed across variant eras. Patients with COVID-19 were more likely to have each NLP symptom detected than patients without this disease. Effect sizes (odds ratios) varied across pandemic eras.

CONCLUSIONS:

This study establishes the value of AI-based NLP as a highly effective tool for real-time COVID-19 symptom detection in pediatric patients, outperforming traditional ICD-10 methods. It also reveals the evolving nature of symptom prevalence across different virus variants, underscoring the need for dynamic, technology-driven approaches in infectious disease surveillance.

Assuntos

Biovigilância; COVID-19; Médicos; SARS-CoV-2; Estados Unidos; Humanos; Criança; Inteligência Artificial; Estudos Retrospectivos; COVID-19/diagnóstico; COVID-19/epidemiologia

Palavras-chave

AI; COVID-19; SARS-CoV-2; adolescent; adolescents; artificial intelligence; child; children; clinical note; clinical notes; detect; detection; diagnose; diagnosis; diagnostic; diagnostics; documentation; emergency; infectious; natural language processing; paediatric; paediatrics; pediatric; pediatrics; pipeline; pipelines; public health, biosurveillance; pulmonary; respiratory; surveillance; symptom; symptoms; teen; teenager; teenagers; teens; urgent; youth

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Médicos / Biovigilância / SARS-CoV-2 / COVID-19 Limite: Child / Humans País/Região como assunto: America do norte Idioma: En Revista: J Med Internet Res Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google