Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

Gupta, Samir; Belouali, Anas; Shah, Neil J; Atkins, Michael B; Madhavan, Subha

Gupta, Samir; Belouali, Anas; Shah, Neil J; Atkins, Michael B; Madhavan, Subha.

Afiliação

Gupta S; Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC.
Belouali A; Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC.
Shah NJ; Memorial Sloan Kettering Cancer Center, Manhattan, New York, NY.
Atkins MB; Lombardi Comprehensive Cancer Center, MedStar Georgetown University Hospital, Washington, DC.
Madhavan S; Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC.

JCO Clin Cancer Inform ; 5: 541-549, 2021 05.

Article em En | MEDLINE | ID: mdl-33989017

ABSTRACT

ABSTRACT

PURPOSE:

Although immune checkpoint inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies, they are associated with a unique spectrum of side effects termed immune-related adverse events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs. Retrospective analysis of data from electronic health records can provide knowledge to characterize these toxicities. However, such information is not captured in a structured format within the electronic health record and requires manual chart review. MATERIALS AND

METHODS:

In this work, we propose a natural language processing pipeline that can automatically annotate clinical notes and determine whether there is evidence that a patient developed an irAE. Seven hundred eighty-one cases were manually reviewed by clinicians and annotated for irAEs at the patient level. A dictionary of irAEs keywords was used to perform text reduction on clinical notes belonging to each patient; only sentences with relevant expressions were kept. Word embeddings were then used to generate vector representations over the reduced text, which served as input for the machine learning classifiers. The output of the models was presence or absence of any irAEs. Additional models were built to classify skin-related toxicities, endocrine toxicities, and colitis.

RESULTS:

The model for any irAE achieved an average F1-score = 0.75 and area under the receiver operating characteristic curve = 0.85. This outperformed a basic keyword filtering approach. Although the classifier of any irAEs achieved good accuracy, individual irAE classification still has room for improvement.

CONCLUSION:

We demonstrate that patient-level annotations combined with a machine learning approach using keywords filtering and word embeddings can achieve promising accuracy in classifying irAEs in clinical notes. This model may facilitate annotation and analysis of large irAEs data sets.

Assuntos

Aprendizado de Máquina; Neoplasias; Registros Eletrônicos de Saúde; Humanos; Processamento de Linguagem Natural; Neoplasias/terapia; Estudos Retrospectivos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Contexto em Saúde: 1_ASSA2030 Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina / Neoplasias Tipo de estudo: Diagnostic_studies / Guideline / Observational_studies / Prognostic_studies Limite: Humans Idioma: En Revista: JCO Clin Cancer Inform Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google