Your browser doesn't support javascript.
loading
A Multi-Institutional Natural Language Processing Pipeline to Extract Performance Status From Electronic Health Records.
Maghsoudi, Arash; Sada, Yvonne H; Nowakowski, Sara; Guffey, Danielle; Zhu, Huili; Yarlagadda, Sudha R; Li, Ang; Razjouyan, Javad.
Affiliation
  • Maghsoudi A; Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA.
  • Sada YH; Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
  • Nowakowski S; Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA.
  • Guffey D; Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
  • Zhu H; Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, TX, USA.
  • Yarlagadda SR; Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
  • Li A; Section of Hematology-Oncology, Baylor College of Medicine, Houston, TX, USA.
  • Razjouyan J; Department of Medicine, Baylor College of Medicine, Houston, TX, USA.
Cancer Control ; 31: 10732748241279518, 2024.
Article de En | MEDLINE | ID: mdl-39222957
ABSTRACT

PURPOSE:

Performance status (PS), an essential indicator of patients' functional abilities, is often documented in clinical notes of patients with cancer. The use of natural language processing (NLP) in extracting PS from electronic medical records (EMRs) has shown promise in enhancing clinical decision-making, patient monitoring, and research studies. We designed and validated a multi-institute NLP pipeline to automatically extract performance status from free-text patient notes. PATIENTS AND

METHODS:

We collected data from 19,481 patients in Harris Health System (HHS) and 333,862 patients from veteran affair's corporate data warehouse (VA-CDW) and randomly selected 400 patients from each data source to train and validate (50%) and test (50%) the proposed pipeline. We designed an NLP pipeline using an expert-derived rule-based approach in conjunction with extensive post-processing to solidify its proficiency. To demonstrate the pipeline's application, we tested the compliance of PS documentation suggested by the American Society of Clinical Oncology (ASCO) Quality Metric and investigated the potential disparity in PS reporting for stage IV non-small cell lung cancer (NSCLC). We used a logistic regression test, considering patients in terms of race/ethnicity, conversing language, marital status, and gender.

RESULTS:

The test results on the HHS cohort showed 92% accuracy, and on VA data demonstrated 98.5% accuracy. For stage IV NSCLC patients, the proposed pipeline achieved an accuracy of 98.5%. Furthermore, our analysis revealed a documentation rate of over 85% for PS among NSCLC patients, surpassing the ASCO Quality Metrics. No disparities were observed in the documentation of PS.

CONCLUSION:

Our proposed NLP pipeline shows promising results in extracting PS from free-text notes from various health institutions. It may be used in longitudinal cancer data registries.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Traitement du langage naturel / Dossiers médicaux électroniques Limites: Female / Humans / Male / Middle aged Langue: En Journal: Cancer Control Sujet du journal: NEOPLASIAS Année: 2024 Type de document: Article Pays d'affiliation: États-Unis d'Amérique Pays de publication: États-Unis d'Amérique

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Traitement du langage naturel / Dossiers médicaux électroniques Limites: Female / Humans / Male / Middle aged Langue: En Journal: Cancer Control Sujet du journal: NEOPLASIAS Année: 2024 Type de document: Article Pays d'affiliation: États-Unis d'Amérique Pays de publication: États-Unis d'Amérique