Early Diagnosis of Pancreatic Cancer via Machine Learning Analysis of a National Electronic Medical Record Database.

Matchaba, Siyabonga; Fellague-Chebra, Rafik; Purushottam, Purushottam; Johns, Adam

Matchaba, Siyabonga; Fellague-Chebra, Rafik; Purushottam, Purushottam; Johns, Adam.

Afiliación

Matchaba S; Health Economics and Evidence Development, Novartis Oncology, East Hanover, NJ.
Fellague-Chebra R; Mendel, San Jose, CA.
Purushottam P; Novartis Pharma SAS, Rueil-Malmaison, Paris, France.
Johns A; Novartis Healthcare Private Limited, Hyderabad, India.

JCO Clin Cancer Inform ; 7: e2300076, 2023 Sep.

Article en En | MEDLINE | ID: mdl-37816199

ABSTRACT

ABSTRACT

PURPOSE:

Pancreatic cancer (PaC) is often diagnosed at advanced stages, resulting in one of the lowest survival rates among patients with cancer. The purpose of this study was to investigate whether machine learning (ML) models can predict with high sensitivity and specificity an increased risk for PaC ahead of clinical diagnosis.

METHODS:

Optum deidentified electronic health record (EHR) data set was used to extract 1-year data for each patient and to sample for PaC diagnosis, the number of interactions with the health care system, and unique demographic and clinical features. Data for patients with PaC diagnosis were collected between 1 and 2 years before the diagnosis. Standard binary classification ML models were used on training and testing data sets. Data analyses were performed using the scikit-learn package version 1.0.1.

RESULTS:

The data set consisted of 18,987 patient EHRs collected between December 31, 2007, and December 31, 2017. EHRs with 10 unique features and at least three health care interactions were used for model training (N = 15,189; n = 8,438 [56%] with PaC) and testing (N = 3,798; n = 2,127 [56%] with PaC). The ensemble model achieved an AUC of 0.89, a sensitivity of 85.61%, and a specificity of 76.18% on the testing data set and produced superior results compared with other binary classifiers. Increasing unique health care interactions to nine failed to improve the AUC score. When the testing data set was enlarged to 5,696 patients, the ensemble model achieved an AUC of 0.92 and a specificity of 93.21%, but the sensitivity was compromised.

CONCLUSION:

The ensemble model exceeded the state-of-the-art level of performance for prediction of PaC ahead of clinical diagnosis with a minimal clinically guided input, providing a potential strategy for selection of high-risk patients for further screening.

Asunto(s)

Registros Electrónicos de Salud; Neoplasias Pancreáticas; Humanos; Detección Precoz del Cáncer; Aprendizaje Automático; Neoplasias Pancreáticas/diagnóstico; Neoplasias Pancreáticas/epidemiología; Neoplasias Pancreáticas

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Neoplasias Pancreáticas / Registros Electrónicos de Salud Tipo de estudio: Diagnostic_studies / Prognostic_studies / Screening_studies Límite: Humans Idioma: En Año: 2023 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google