Natural language processing-driven state machines to extract social factors from unstructured clinical documentation.

Allen, Katie S; Hood, Dan R; Cummins, Jonathan; Kasturi, Suranga; Mendonca, Eneida A; Vest, Joshua R

Allen, Katie S; Hood, Dan R; Cummins, Jonathan; Kasturi, Suranga; Mendonca, Eneida A; Vest, Joshua R.

Afiliación

Allen KS; Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA.
Hood DR; Department of Health Policy and Management, Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, Indiana, USA.
Cummins J; Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA.
Kasturi S; Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA.
Mendonca EA; Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA.
Vest JR; Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA.

JAMIA Open ; 6(2): ooad024, 2023 Jul.

Article en En | MEDLINE | ID: mdl-37081945

RESUMEN

Objective: This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. Materials and Methods: Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. Results: PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Discussion: We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. Conclusion: The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.

Palabras clave

clinical data; natural language processing; social determinants of health; social factors

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Tipo de estudio: Guideline / Prognostic_studies / Risk_factors_studies Aspecto: Determinantes_sociais_saude / Equity_inequality Idioma: En Revista: JAMIA Open Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Estados Unidos

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google