Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes.
JCO Clin Cancer Inform
; 7: e2200196, 2023 05.
Article
in En
| MEDLINE
| ID: mdl-37235847
ABSTRACT
PURPOSE:
There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities.METHODS:
This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers.RESULTS:
Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively).CONCLUSION:
Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Main subject:
Natural Language Processing
/
Neoplasms
Type of study:
Diagnostic_studies
/
Observational_studies
/
Prognostic_studies
/
Risk_factors_studies
Aspects:
Determinantes_sociais_saude
Limits:
Female
/
Humans
/
Male
Language:
En
Journal:
JCO Clin Cancer Inform
Year:
2023
Document type:
Article
Affiliation country: