Your browser doesn't support javascript.
loading
Leveraging automated approaches to categorize birth defects from abstracted birth hospitalization data.
Newton, Suzanne M; Distler, Samantha; Woodworth, Kate R; Chang, Daniel; Roth, Nicole M; Board, Amy; Hutcherson, Hailee; Cragan, Janet D; Gilboa, Suzanne M; Tong, Van T.
  • Newton SM; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Distler S; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Woodworth KR; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Chang D; Eagle Global Scientific, LLC, San Antonio, Texas, USA.
  • Roth NM; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Board A; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Hutcherson H; G2S Corporation, San Antonio, Texas, USA.
  • Cragan JD; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Gilboa SM; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
  • Tong VT; Division of Birth Defects and Infant Disorders, Centers for Disease Control and Prevention, Atlanta, Georgia, USA.
Birth Defects Res ; 116(1): e2267, 2024 Jan.
Article en En | MEDLINE | ID: mdl-37932954
ABSTRACT

BACKGROUND:

The Surveillance for Emerging Threats to Pregnant People and Infants Network (SET-NET) collects data abstracted from medical records and birth defects registries on pregnant people and their infants to understand outcomes associated with prenatal exposures. We developed an automated process to categorize possible birth defects for prenatal COVID-19, hepatitis C, and syphilis surveillance. By employing keyword searches, fuzzy matching, natural language processing (NLP), and machine learning (ML), we aimed to decrease the number of cases needing manual clinician review.

METHODS:

SET-NET captures International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM) codes and free text describing birth defects. For unstructured data, we used keyword searches, and then conducted fuzzy matching with a cut-off match score of ≥90%. Finally, we employed NLP and ML by testing three predictive models to categorize birth defect data.

RESULTS:

As of June 2023, 8326 observations containing data on possible birth defects were submitted to SET-NET. The majority (n = 6758 [81%]) were matched to an ICD-10-CM code and 1568 (19%) were unable to be matched. Through keyword searches and fuzzy matching, we categorized 1387/1568 possible birth defects. Of the remaining 181 unmatched observations, we correctly categorized 144 (80%) using a predictive model.

CONCLUSIONS:

Using automated approaches allowed for categorization of 99.6% of reported possible birth defects, which helps detect possible patterns requiring further investigation. Without employing these analytic approaches, manual review would have been needed for 1568 observations. These methods can be employed to quickly and accurately sift through data to inform public health responses.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Registros Médicos Límite: Female / Humans / Infant / Pregnancy Idioma: En Año: 2024 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Registros Médicos Límite: Female / Humans / Infant / Pregnancy Idioma: En Año: 2024 Tipo del documento: Article