Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes.
JCO Clin Cancer Inform
; 7: e2200196, 2023 05.
Article
en En
| MEDLINE
| ID: mdl-37235847
PURPOSE: There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities. METHODS: This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers. RESULTS: Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively). CONCLUSION: Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Procesamiento de Lenguaje Natural
/
Neoplasias
Tipo de estudio:
Diagnostic_studies
/
Observational_studies
/
Prognostic_studies
/
Risk_factors_studies
Aspecto:
Determinantes_sociais_saude
Límite:
Female
/
Humans
/
Male
Idioma:
En
Revista:
JCO Clin Cancer Inform
Año:
2023
Tipo del documento:
Article