Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.

Bear Don't Walk Iv, Oliver J; Pichon, Adrienne; Nieva, Harry Reyes; Sun, Tony; Altosaar, Jaan; Natarajan, Karthik; Perotte, Adler; Tarczy-Hornoch, Peter; Demner-Fushman, Dina; Elhadad, Noémie

Bear Don't Walk Iv, Oliver J; Pichon, Adrienne; Nieva, Harry Reyes; Sun, Tony; Altosaar, Jaan; Natarajan, Karthik; Perotte, Adler; Tarczy-Hornoch, Peter; Demner-Fushman, Dina; Elhadad, Noémie.

Afiliación

Bear Don't Walk Iv OJ; University of Washington, Seattle, WA.
Pichon A; 2 Columbia University, New York, New York.
Nieva HR; 2 Columbia University, New York, New York.
Sun T; Harvard Medical School, Boston, Massachusetts.
Altosaar J; 2 Columbia University, New York, New York.
Natarajan K; One Fact Foundation, Claymont, DE.
Perotte A; 2 Columbia University, New York, New York.
Tarczy-Hornoch P; 2 Columbia University, New York, New York.
Demner-Fushman D; University of Washington, Seattle, WA.
Elhadad N; US National Library of Medicine, Bethesda, Maryland.

AMIA Annu Symp Proc ; 2023: 289-298, 2023.

Article en En | MEDLINE | ID: mdl-38222422

ABSTRACT

ABSTRACT

Complete and accurate race and ethnicity (RE) patient information is important for many areas of biomedical informatics research, such as defining and characterizing cohorts, performing quality assessments, and identifying health inequities. Patient-level RE data is often inaccurate or missing in structured sources, but can be supplemented through clinical notes and natural language processing (NLP). While NLP has made many improvements in recent years with large language models, bias remains an often-unaddressed concern, with research showing that harmful and negative language is more often used for certain racial/ethnic groups than others. We present an approach to audit the learned associations of models trained to identify RE information in clinical text by measuring the concordance between model-derived salient features and manually identified RE-related spans of text. We show that while models perform well on the surface, there exist concerning learned associations and potential for future harms from RE-identification models if left unaddressed.

Asunto(s)

Aprendizaje Profundo; Etnicidad; Humanos; Lenguaje; Procesamiento de Lenguaje Natural

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Etnicidad / Aprendizaje Profundo Tipo de estudio: Risk_factors_studies Límite: Humans Idioma: En Revista: AMIA Annu Symp Proc Asunto de la revista: INFORMATICA MEDICA Año: 2023 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google