RESUMEN
BACKGROUND: Timely diagnosis of structural heart disease improves patient outcomes, yet many remain underdiagnosed. While population screening with echocardiography is impractical, ECG-based prediction models can help target high-risk patients. We developed a novel ECG-based machine learning approach to predict multiple structural heart conditions, hypothesizing that a composite model would yield higher prevalence and positive predictive values to facilitate meaningful recommendations for echocardiography. METHODS: Using 2 232 130 ECGs linked to electronic health records and echocardiography reports from 484 765 adults between 1984 to 2021, we trained machine learning models to predict the presence or absence of any of 7 echocardiography-confirmed diseases within 1 year. This composite label included the following: moderate or severe valvular disease (aortic/mitral stenosis or regurgitation, tricuspid regurgitation), reduced ejection fraction <50%, or interventricular septal thickness >15 mm. We tested various combinations of input features (demographics, laboratory values, structured ECG data, ECG traces) and evaluated model performance using 5-fold cross-validation, multisite validation trained on 1 site and tested on 10 independent sites, and simulated retrospective deployment trained on pre-2010 data and deployed in 2010. RESULTS: Our composite rECHOmmend model used age, sex, and ECG traces and had a 0.91 area under the receiver operating characteristic curve and a 42% positive predictive value at 90% sensitivity, with a composite label prevalence of 17.9%. Individual disease models had area under the receiver operating characteristic curves from 0.86 to 0.93 and lower positive predictive values from 1% to 31%. Area under the receiver operating characteristic curves for models using different input features ranged from 0.80 to 0.93, increasing with additional features. Multisite validation showed similar results to cross-validation, with an aggregate area under the receiver operating characteristic curve of 0.91 across our independent test set of 10 clinical sites after training on a separate site. Our simulated retrospective deployment showed that for ECGs acquired in patients without preexisting structural heart disease in the year 2010, 11% were classified as high risk and 41% (4.5% of total patients) developed true echocardiography-confirmed disease within 1 year. CONCLUSIONS: An ECG-based machine learning model using a composite end point can identify a high-risk population for having undiagnosed, clinically significant structural heart disease while outperforming single-disease models and improving practical utility with higher positive predictive values. This approach can facilitate targeted screening with echocardiography to improve underdiagnosis of structural heart disease.
Asunto(s)
Cardiopatías , Aprendizaje Automático , Adulto , Ecocardiografía , Electrocardiografía , Cardiopatías/diagnóstico por imagen , Cardiopatías/epidemiología , Humanos , Estudios RetrospectivosRESUMEN
BACKGROUND: Atrial fibrillation (AF) is associated with substantial morbidity, especially when it goes undetected. If new-onset AF could be predicted, targeted screening could be used to find it early. We hypothesized that a deep neural network could predict new-onset AF from the resting 12-lead ECG and that this prediction may help identify those at risk of AF-related stroke. METHODS: We used 1.6 M resting 12-lead digital ECG traces from 430 000 patients collected from 1984 to 2019. Deep neural networks were trained to predict new-onset AF (within 1 year) in patients without a history of AF. Performance was evaluated using areas under the receiver operating characteristic curve and precision-recall curve. We performed an incidence-free survival analysis for a period of 30 years following the ECG stratified by model predictions. To simulate real-world deployment, we trained a separate model using all ECGs before 2010 and evaluated model performance on a test set of ECGs from 2010 through 2014 that were linked to our stroke registry. We identified the patients at risk for AF-related stroke among those predicted to be high risk for AF by the model at different prediction thresholds. RESULTS: The area under the receiver operating characteristic curve and area under the precision-recall curve were 0.85 and 0.22, respectively, for predicting new-onset AF within 1 year of an ECG. The hazard ratio for the predicted high- versus low-risk groups over a 30-year span was 7.2 (95% CI, 6.9-7.6). In a simulated deployment scenario, the model predicted new-onset AF at 1 year with a sensitivity of 69% and specificity of 81%. The number needed to screen to find 1 new case of AF was 9. This model predicted patients at high risk for new-onset AF in 62% of all patients who experienced an AF-related stroke within 3 years of the index ECG. CONCLUSIONS: Deep learning can predict new-onset AF from the 12-lead ECG in patients with no previous history of AF. This prediction may help identify patients at risk for AF-related strokes.
Asunto(s)
Fibrilación Atrial/diagnóstico , Aprendizaje Profundo/normas , Accidente Cerebrovascular/etiología , Fibrilación Atrial/complicaciones , Electrocardiografía , Femenino , Humanos , Masculino , Redes Neurales de la Computación , Accidente Cerebrovascular/mortalidad , Análisis de SupervivenciaRESUMEN
BACKGROUND: Arrhythmogenic right ventricular cardiomyopathy (ARVC) is associated with variants in desmosome genes. Secondary findings of pathogenic/likely pathogenic variants, primarily loss-of-function (LOF) variants, are recommended for clinical reporting; however, their prevalence and associated phenotype in a general clinical population are not fully characterized. METHODS: From whole-exome sequencing of 61 019 individuals in the DiscovEHR cohort, we screened for putative loss-of-function variants in PKP2, DSC2, DSG2, and DSP. We evaluated measures from prior clinical ECG and echocardiograms, manually over-read to evaluate ARVC diagnostic criteria, and performed a PheWAS (phenome-wide association study). Finally, we estimated expected penetrance using Bayesian inference. RESULTS: One hundred forty individuals (0.23%; 59±18 years old at last encounter; 33% male) had an ARVC variant (G+). None had an existing diagnosis of ARVC in the electronic health record, nor significant differences in prior ECG or echocardiogram findings compared with matched controls without variants. Several G+ individuals satisfied major repolarization (n=4) and ventricular function (n=5) criteria, but this prevalence matched controls. PheWAS showed no significant associations of other heart disease diagnoses. Combining our best genetic and disease prevalence estimates yields an estimated penetrance of 6.0%. CONCLUSIONS: The prevalence of ARVC loss-of-function variants is ≈1:435 in a general clinical population of predominantly European descent, but with limited electronic health record-based evidence of phenotypic association in our population, consistent with a low penetrance estimate. Prospective deep phenotyping and longitudinal follow-up of a large sequenced cohort is needed to determine the true clinical relevance of an incidentally identified ARVC loss-of-function variant.