Deep learning on time series laboratory test results from electronic health records for early detection of pancreatic cancer.

Park, Jiheum; Artin, Michael G; Lee, Kate E; Pumpalova, Yoanna S; Ingram, Myles A; May, Benjamin L; Park, Michael; Hur, Chin; Tatonetti, Nicholas P

Park, Jiheum; Artin, Michael G; Lee, Kate E; Pumpalova, Yoanna S; Ingram, Myles A; May, Benjamin L; Park, Michael; Hur, Chin; Tatonetti, Nicholas P.

Afiliação

Park J; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States.
Artin MG; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States.
Lee KE; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States.
Pumpalova YS; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States.
Ingram MA; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States.
May BL; Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, United States.
Park M; Applied Info Partners Inc, Worlds Fair Drive, Somerset, NJ, United States; X-Mechanics LLC, Cresskill, NJ, United States.
Hur C; Department of Medicine, Columbia University Irving Medical Center, New York, NY, United States. Electronic address: ch447@cumc.columbia.edu.
Tatonetti NP; Department of Biomedical Informatics, Columbia University, New York, NY, United States.

J Biomed Inform ; 131: 104095, 2022 07.

Article em En | MEDLINE | ID: mdl-35598881

RESUMO

The multi-modal and unstructured nature of observational data in Electronic Health Records (EHR) is currently a significant obstacle for the application of machine learning towards risk stratification. In this study, we develop a deep learning framework for incorporating longitudinal clinical data from EHR to infer risk for pancreatic cancer (PC). This framework includes a novel training protocol, which enforces an emphasis on early detection by applying an independent Poisson-random mask on proximal-time measurements for each variable. Data fusion for irregular multivariate time-series features is enabled by a "grouped" neural network (GrpNN) architecture, which uses representation learning to generate a dimensionally reduced vector for each measurement set before making a final prediction. These models were evaluated using EHR data from Columbia University Irving Medical Center-New York Presbyterian Hospital. Our framework demonstrated better performance on early detection (AUROC 0.671, CI 95% 0.667 - 0.675, p < 0.001) at 12 months prior to diagnosis compared to a logistic regression, xgboost, and a feedforward neural network baseline. We demonstrate that our masking strategy results greater improvements at distal times prior to diagnosis, and that our GrpNN model improves generalizability by reducing overfitting relative to the feedforward baseline. The results were consistent across reported race. Our proposed algorithm is potentially generalizable to other diseases including but not limited to cancer where early detection can improve survival.

Assuntos

Aprendizado Profundo; Neoplasias Pancreáticas; Detecção Precoce de Câncer; Registros Eletrônicos de Saúde; Humanos; Neoplasias Pancreáticas/diagnóstico; Fatores de Tempo; Neoplasias Pancreáticas

Palavras-chave

Early detection of cancer; Electronic Health Records; Machine learning; Pancreatic cancer

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Neoplasias Pancreáticas / Aprendizado Profundo Tipo de estudo: Diagnostic_studies / Guideline / Prognostic_studies / Screening_studies Limite: Humans Idioma: En Revista: J Biomed Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google