Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes.

Barcelona, Veronica; Scharp, Danielle; Moen, Hans; Davoudi, Anahita; Idnay, Betina R; Cato, Kenrick; Topaz, Maxim

Barcelona, Veronica; Scharp, Danielle; Moen, Hans; Davoudi, Anahita; Idnay, Betina R; Cato, Kenrick; Topaz, Maxim.

Afiliação

Barcelona V; School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA. vb2534@cumc.columbia.edu.
Scharp D; School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA.
Moen H; Department of Computer Science, Aalto University, Espoo, Finland.
Davoudi A; VNS Health, New York, NY, USA.
Idnay BR; Department of Biomedical Informatics, Columbia University, New York, NY, USA.
Cato K; School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA.
Topaz M; University of Pennsylvania, Philadelphia, PA, USA.

Matern Child Health J ; 28(3): 578-586, 2024 Mar.

Article em En | MEDLINE | ID: mdl-38147277

ABSTRACT

ABSTRACT

INTRODUCTION:

Stigma and bias related to race and other minoritized statuses may underlie disparities in pregnancy and birth outcomes. One emerging method to identify bias is the study of stigmatizing language in the electronic health record. The objective of our study was to develop automated natural language processing (NLP) methods to identify two types of stigmatizing language marginalizing language and its complement, power/privilege language, accurately and automatically in labor and birth notes.

METHODS:

We analyzed notes for all birthing people > 20 weeks' gestation admitted for labor and birth at two hospitals during 2017. We then employed text preprocessing techniques, specifically using TF-IDF values as inputs, and tested machine learning classification algorithms to identify stigmatizing and power/privilege language in clinical notes. The algorithms assessed included Decision Trees, Random Forest, and Support Vector Machines. Additionally, we applied a feature importance evaluation method (InfoGain) to discern words that are highly correlated with these language categories.

RESULTS:

For marginalizing language, Decision Trees yielded the best classification with an F-score of 0.73. For power/privilege language, Support Vector Machines performed optimally, achieving an F-score of 0.91. These results demonstrate the effectiveness of the selected machine learning methods in classifying language categories in clinical notes.

CONCLUSION:

We identified well-performing machine learning methods to automatically detect stigmatizing language in clinical notes. To our knowledge, this is the first study to use NLP performance metrics to evaluate the performance of machine learning methods in discerning stigmatizing language. Future studies should delve deeper into refining and evaluating NLP methods, incorporating the latest algorithms rooted in deep learning.

Assuntos

Algoritmos; Processamento de Linguagem Natural; Feminino; Humanos; Registros Eletrônicos de Saúde; Aprendizado de Máquina; Idioma

Palavras-chave

Bias; Electronic health records; Natural language processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Processamento de Linguagem Natural Limite: Female / Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google