Your browser doesn't support javascript.
loading
Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation.
Sholle, Evan T; Pinheiro, Laura C; Adekkanattu, Prakash; Davila, Marcos A; Johnson, Stephen B; Pathak, Jyotishman; Sinha, Sanjai; Li, Cassidie; Lubansky, Stasi A; Safford, Monika M; Campion, Thomas R.
Affiliation
  • Sholle ET; Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.
  • Pinheiro LC; Department of Medicine, Weill Cornell Medicine, New York, New York, USA.
  • Adekkanattu P; Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.
  • Davila MA; Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.
  • Johnson SB; Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA.
  • Pathak J; Department of Healthcare Policy & Research, Weill Cornell Medicine, New York, New York, USA.
  • Sinha S; Department of Medicine, Weill Cornell Medicine, New York, New York, USA.
  • Li C; Department of Medicine, Weill Cornell Medicine, New York, New York, USA.
  • Lubansky SA; Department of Medicine, Weill Cornell Medicine, New York, New York, USA.
  • Safford MM; Department of Medicine, Weill Cornell Medicine, New York, New York, USA.
  • Campion TR; Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.
J Am Med Inform Assoc ; 26(8-9): 722-729, 2019 08 01.
Article in En | MEDLINE | ID: mdl-31329882
OBJECTIVE: We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data. MATERIALS AND METHODS: Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data. RESULTS: For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity. DISCUSSION: Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes. CONCLUSIONS: Black or Hispanic patients who are not documented as such in structured EHR race/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Black or African American / Natural Language Processing / Hispanic or Latino / Vulnerable Populations / Electronic Health Records Type of study: Observational_studies / Prevalence_studies / Prognostic_studies / Qualitative_research / Risk_factors_studies Aspects: Determinantes_sociais_saude Limits: Female / Humans / Male Language: En Journal: J Am Med Inform Assoc Journal subject: INFORMATICA MEDICA Year: 2019 Document type: Article Affiliation country: United States Country of publication: United kingdom

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Black or African American / Natural Language Processing / Hispanic or Latino / Vulnerable Populations / Electronic Health Records Type of study: Observational_studies / Prevalence_studies / Prognostic_studies / Qualitative_research / Risk_factors_studies Aspects: Determinantes_sociais_saude Limits: Female / Humans / Male Language: En Journal: J Am Med Inform Assoc Journal subject: INFORMATICA MEDICA Year: 2019 Document type: Article Affiliation country: United States Country of publication: United kingdom