Machine learning-based natural language processing to extract PD-L1 expression levels from clinical notes.

Lin, Eric; Zwolinski, Robert; Wu, Julie Tsu-Yu; La, Jennifer; Goryachev, Sergey; Huhmann, Linden; Yildrim, Cenk; Tuck, David P; Elbers, Danne C; Brophy, Mary T; Do, Nhan V; Fillmore, Nathanael R

Lin, Eric; Zwolinski, Robert; Wu, Julie Tsu-Yu; La, Jennifer; Goryachev, Sergey; Huhmann, Linden; Yildrim, Cenk; Tuck, David P; Elbers, Danne C; Brophy, Mary T; Do, Nhan V; Fillmore, Nathanael R.

Afiliación

Lin E; VA Boston Healthcare System, Boston, MA, USA.
Zwolinski R; McLean Hospital, Institute for Technology in Psychiatry, Belmont, MA, USA.
Wu JT; VA Boston Healthcare System, Boston, MA, USA.
La J; VA Palo Alto Healthcare System, Palo Alto, CA, USA.
Goryachev S; Stanford University School of Medicine, Stanford, CA, USA.
Huhmann L; VA Boston Healthcare System, Boston, MA, USA.
Yildrim C; VA Boston Healthcare System, Boston, MA, USA.
Tuck DP; VA Boston Healthcare System, Boston, MA, USA.
Elbers DC; VA Boston Healthcare System, Boston, MA, USA.
Brophy MT; VA Boston Healthcare System, Boston, MA, USA.
Do NV; Boston University School of Medicine, Boston, MA, USA.
Fillmore NR; VA Boston Healthcare System, Boston, MA, USA.

Health Informatics J ; 29(3): 14604582231198021, 2023.

Article en En | MEDLINE | ID: mdl-37635280

ABSTRACT

ABSTRACT

Introduction:

PD-L1 expression is used to determine oncology patients' response to and eligibility for immunologic treatments; however, PD-L1 expression status often only exists in unstructured clinical notes, limiting ability to use it in population-level studies.

Methods:

We developed and evaluated a machine learning based natural language processing (NLP) tool to extract PD-L1 expression values from the nationwide Veterans Affairs electronic health record system.

Results:

The model demonstrated strong evaluation performance across multiple levels of label granularity. Mean precision of the overall PD-L1 positive label was 0.859 (sd, 0.039), recall 0.994 (sd, 0.013), and F1 0.921 (0.024). When a numeric PD-L1 value was identified, the mean absolute error of the value was 0.537 on a scale of 0 to 100.

Conclusion:

We presented an accurate NLP method for deriving PD-L1 status from clinical notes. By reducing the time and manual effort needed to review medical records, our work will enable future population-level studies in cancer immunotherapy.

Asunto(s)

Antígeno B7-H1; Procesamiento de Lenguaje Natural; Humanos; Registros Médicos; Programas Informáticos; Aprendizaje Automático; Registros Electrónicos de Salud

Palabras clave

PD-l1; cancer; electronic health records; machine learning; natural language processing

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Procesamiento de Lenguaje Natural / Antígeno B7-H1 Tipo de estudio: Guideline Límite: Humans Idioma: En Revista: Health Informatics J Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google