Prediction of incident myocardial infarction using machine learning applied to harmonized electronic health record data.

Mandair, Divneet; Tiwari, Premanand; Simon, Steven; Colborn, Kathryn L; Rosenberg, Michael A

Mandair, Divneet; Tiwari, Premanand; Simon, Steven; Colborn, Kathryn L; Rosenberg, Michael A.

Afiliação

Mandair D; Division of Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA.
Tiwari P; Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA.
Simon S; Division of Cardiology and Cardiac Electrophysiology, University of Colorado School of Medicine, 12631 E. 17th Avenue, Mail Stop B130, Aurora, CO, 80045, USA.
Colborn KL; Department of Surgery, University of Colorado School of Medicine, Aurora, CO, USA.
Rosenberg MA; Division of Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA. michael.a.rosenberg@cuanschutz.edu.

BMC Med Inform Decis Mak ; 20(1): 252, 2020 10 02.

Article em En | MEDLINE | ID: mdl-33008368

RESUMO

BACKGROUND: With cardiovascular disease increasing, substantial research has focused on the development of prediction tools. We compare deep learning and machine learning models to a baseline logistic regression using only 'known' risk factors in predicting incident myocardial infarction (MI) from harmonized EHR data. METHODS: Large-scale case-control study with outcome of 6-month incident MI, conducted using the top 800, from an initial 52 k procedures, diagnoses, and medications within the UCHealth system, harmonized to the Observational Medical Outcomes Partnership common data model, performed on 2.27 million patients. We compared several over- and under- sampling techniques to address the imbalance in the dataset. We compared regularized logistics regression, random forest, boosted gradient machines, and shallow and deep neural networks. A baseline model for comparison was a logistic regression using a limited set of 'known' risk factors for MI. Hyper-parameters were identified using 10-fold cross-validation. RESULTS: Twenty thousand Five hundred and ninety-one patients were diagnosed with MI compared with 2.25 million who did not. A deep neural network with random undersampling provided superior classification compared with other methods. However, the benefit of the deep neural network was only moderate, showing an F1 Score of 0.092 and AUC of 0.835, compared to a logistic regression model using only 'known' risk factors. Calibration for all models was poor despite adequate discrimination, due to overfitting from low frequency of the event of interest. CONCLUSIONS: Our study suggests that DNN may not offer substantial benefit when trained on harmonized data, compared to traditional methods using established risk factors for MI.

Assuntos

Registros Eletrônicos de Saúde/estatística & dados numéricos; Aprendizado de Máquina; Infarto do Miocárdio/epidemiologia; Estudos de Casos e Controles; Feminino; Humanos; Incidência; Infarto do Miocárdio/diagnóstico; Valor Preditivo dos Testes

Palavras-chave

Electronic health records; Machine learning; Myocardial infarction

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Registros Eletrônicos de Saúde / Aprendizado de Máquina / Infarto do Miocárdio Tipo de estudo: Diagnostic_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Female / Humans Idioma: En Revista: BMC Med Inform Decis Mak Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google