Your browser doesn't support javascript.
loading
Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System.
Goryachev, Sergey D; Yildirim, Cenk; DuMontier, Clark; La, Jennifer; Dharne, Mayuri; Gaziano, J Michael; Brophy, Mary T; Munshi, Nikhil C; Driver, Jane A; Do, Nhan V; Fillmore, Nathanael R.
Affiliation
  • Goryachev SD; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA.
  • Yildirim C; VA Boston Healthcare System, Boston, MA.
  • DuMontier C; VA Boston Cooperative Studies Program, Boston, MA.
  • La J; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA.
  • Dharne M; VA Boston Healthcare System, Boston, MA.
  • Gaziano JM; VA Boston Cooperative Studies Program, Boston, MA.
  • Brophy MT; New England Geriatrics Research, Education and Clinical Center, VA Boston Healthcare System, Boston, MA.
  • Munshi NC; Division of Aging, Brigham and Women's Hospital, Boston, MA.
  • Driver JA; Divison of Population Sciences, Dana-Farber Cancer Institute, Boston, MA.
  • Do NV; Harvard Medical School, Boston, MA.
  • Fillmore NR; Massachusetts Veterans Epidemiology Research and Information Center (MAVERIC), Boston, MA.
JCO Clin Cancer Inform ; 8: e2300197, 2024 Jul.
Article in En | MEDLINE | ID: mdl-39038255
ABSTRACT

PURPOSE:

Stage in multiple myeloma (MM) is an essential measure of disease risk, but its measurement in large databases is often lacking. We aimed to develop and validate a natural language processing (NLP) algorithm to extract oncologists' documentation of stage in the national Veterans Affairs (VA) Healthcare System.

METHODS:

Using nationwide electronic health record (EHR) and cancer registry data from the VA Corporate Data Warehouse, we developed and validated a rule-based NLP algorithm to extract oncologist-determined MM stage. To that end, a clinician annotated MM stage within over 5,000 short snippets of clinical notes, and annotated MM stage at MM treatment initiation for 200 patients. These were allocated into snippet- and patient-level development and validation sets. We developed MM stage extraction and roll-up algorithms within the development sets. After the algorithms were finalized, we validated them using standard measures in held-out validation sets.

RESULTS:

We developed algorithms for three different MM staging systems that have been in widespread use (Revised International Staging System [R-ISS], International Staging System [ISS], and Durie-Salmon [DS]) and for stage reported without a clearly defined system. Precision and recall were uniformly high for MM stage at the snippet level, ranging from 0.92 to 0.99 for the different MM staging systems. Performance in identifying for MM stage at treatment initiation at the patient level was also excellent, with precision of 0.92, 0.96, 0.90, and 0.86 and recall of 0.99, 0.98, 0.94, and 0.92 for R-ISS, ISS, DS, and unclear stage, respectively.

CONCLUSION:

Our MM stage extraction algorithm uses rule-based NLP and data aggregation to accurately measure MM stage documented in oncology notes and pathology reports in VA's national EHR system. It may be adapted to other systems where MM stage is recorded in clinical notes.
Subject(s)

Full text: 1 Database: MEDLINE Main subject: Algorithms / Natural Language Processing / United States Department of Veterans Affairs / Electronic Health Records / Multiple Myeloma / Neoplasm Staging Limits: Female / Humans / Male Country/Region as subject: America do norte Language: En Journal: JCO Clin Cancer Inform Year: 2024 Type: Article Affiliation country: Morocco

Full text: 1 Database: MEDLINE Main subject: Algorithms / Natural Language Processing / United States Department of Veterans Affairs / Electronic Health Records / Multiple Myeloma / Neoplasm Staging Limits: Female / Humans / Male Country/Region as subject: America do norte Language: En Journal: JCO Clin Cancer Inform Year: 2024 Type: Article Affiliation country: Morocco