RESUMEN
BACKGROUND: Pulmonary hypertension (PH) is life-threatening, and often diagnosed late in its course. We aimed to evaluate if a deep learning approach using electrocardiogram (ECG) data alone can detect PH and clinically important subtypes. We asked: does an automated deep learning approach to ECG interpretation detect PH and its clinically important subtypes? METHODS AND RESULTS: Adults with right heart catheterization or an echocardiogram within 90 days of an ECG at the University of California, San Francisco (2012-2019) were retrospectively identified as PH or non-PH. A deep convolutional neural network was trained on patients' 12-lead ECG voltage data. Patients were divided into training, development, and test sets in a ratio of 7:1:2. Overall, 5016 PH and 19,454 patients without PH were used in the study. The mean age at the time of ECG was 62.29 ± 17.58 years and 49.88% were female. The mean interval between ECG and right heart catheterization or echocardiogram was 3.66 and 2.23 days for patients with PH and patients without PH, respectively. In the test dataset, the model achieved an area under the receiver operating characteristic curve, sensitivity, and specificity, respectively of 0.89, 0.79, and 0.84 to detect PH; 0.91, 0.83, and 0.84 to detect precapillary PH; 0.88, 0.81, and 0.81 to detect pulmonary arterial hypertension, and 0.80, 0.73, and 0.76 to detect group 3 PH. We additionally applied the trained model on ECGs from participants in the test dataset that were obtained from up to 2 years before diagnosis of PH; the area under the receiver operating characteristic curve was 0.79 or greater. CONCLUSIONS: A deep learning ECG algorithm can detect PH and PH subtypes around the time of diagnosis and can detect PH using ECGs that were done up to 2 years before right heart catheterization/echocardiogram diagnosis. This approach has the potential to decrease diagnostic delays in PH.
Asunto(s)
Aprendizaje Profundo , Insuficiencia Cardíaca , Hipertensión Pulmonar , Adulto , Humanos , Femenino , Masculino , Hipertensión Pulmonar/diagnóstico , Estudios Retrospectivos , Electrocardiografía/métodosAsunto(s)
Infecciones por Coronavirus/epidemiología , Ejercicio Físico , Neumonía Viral/epidemiología , COVID-19 , Infecciones por Coronavirus/psicología , Ejercicio Físico/psicología , Salud Global/estadística & datos numéricos , Humanos , Pandemias/estadística & datos numéricos , Neumonía Viral/psicologíaRESUMEN
Chest pain is a common clinical complaint for which myocardial injury is the primary concern and is associated with significant morbidity and mortality. To aid providers' decision-making, we aimed to analyze the electrocardiogram (ECG) using a deep convolutional neural network (CNN) to predict serum troponin I (TnI) from ECGs. We developed a CNN using 64,728 ECGs from 32,479 patients who underwent ECG within 2 h prior to a serum TnI laboratory result at the University of California, San Francisco (UCSF). In our primary analysis, we classified patients into groups of TnI < 0.02 or ≥ 0.02 µg/L using 12-lead ECGs. This was repeated with an alternative threshold of 1.0 µg/L and with single-lead ECG inputs. We also performed multiclass prediction for a set of serum troponin ranges. Finally, we tested the CNN in a cohort of patients selected for coronary angiography, including 3038 ECGs from 672 patients. Cohort patients were 49.0% female, 42.8% white, and 59.3% (19,283) never had a positive TnI value (≥ 0.02 µg/L). CNNs accurately predicted elevated TnI, both at a threshold of 0.02 µg/L (AUC = 0.783, 95% CI 0.780-0.786) and at a threshold of 1.0 µg/L (AUC = 0.802, 0.795-0.809). Models using single-lead ECG data achieved significantly lower accuracy, with AUCs ranging from 0.740 to 0.773 with variation by lead. Accuracy of the multi-class model was lower for intermediate TnI value-ranges. Our models performed similarly on the cohort of patients who underwent coronary angiography. Biomarker-defined myocardial injury can be predicted by CNNs from 12-lead and single-lead ECGs.
Asunto(s)
Aprendizaje Profundo , Lesiones Cardíacas , Humanos , Femenino , Masculino , Troponina I , Área Bajo la Curva , Biomarcadores , Electrocardiografía , Lesiones Cardíacas/diagnósticoRESUMEN
Importance: Understanding left ventricular ejection fraction (LVEF) during coronary angiography can assist in disease management. Objective: To develop an automated approach to predict LVEF from left coronary angiograms. Design, Setting, and Participants: This was a cross-sectional study with external validation using patient data from December 12, 2012, to December 31, 2019, from the University of California, San Francisco (UCSF). Data were randomly split into training, development, and test data sets. External validation data were obtained from the University of Ottawa Heart Institute. Included in the analysis were all patients 18 years or older who received a coronary angiogram and transthoracic echocardiogram (TTE) within 3 months before or 1 month after the angiogram. Exposure: A video-based deep neural network (DNN) called CathEF was used to discriminate (binary) reduced LVEF (≤40%) and to predict (continuous) LVEF percentage from standard angiogram videos of the left coronary artery. Guided class-discriminative gradient class activation mapping (GradCAM) was applied to visualize pixels in angiograms that contributed most to DNN LVEF prediction. Results: A total of 4042 adult angiograms with corresponding TTE LVEF from 3679 UCSF patients were included in the analysis. Mean (SD) patient age was 64.3 (13.3) years, and 2212 patients were male (65%). In the UCSF test data set (n = 813), the video-based DNN discriminated (binary) reduced LVEF (≤40%) with an area under the receiver operating characteristic curve (AUROC) of 0.911 (95% CI, 0.887-0.934); diagnostic odds ratio for reduced LVEF was 22.7 (95% CI, 14.0-37.0). DNN-predicted continuous LVEF had a mean absolute error (MAE) of 8.5% (95% CI, 8.1%-9.0%) compared with TTE LVEF. Although DNN-predicted continuous LVEF differed 5% or less compared with TTE LVEF in 38.0% (309 of 813) of test data set studies, differences greater than 15% were observed in 15.2% (124 of 813). In external validation (n = 776), video-based DNN discriminated (binary) reduced LVEF (≤40%) with an AUROC of 0.906 (95% CI, 0.881-0.931), and DNN-predicted continuous LVEF had an MAE of 7.0% (95% CI, 6.6%-7.4%). Video-based DNN tended to overestimate low LVEFs and underestimate high LVEFs. Video-based DNN performance was consistent across sex, body mass index, low estimated glomerular filtration rate (≤45), presence of acute coronary syndromes, obstructive coronary artery disease, and left ventricular hypertrophy. Conclusion and relevance: This cross-sectional study represents an early demonstration of estimating LVEF from standard angiogram videos of the left coronary artery using video-based DNNs. Further research can improve accuracy and reduce the variability of DNNs to maximize their clinical utility.
Asunto(s)
Disfunción Ventricular Izquierda , Función Ventricular Izquierda , Adulto , Humanos , Masculino , Persona de Mediana Edad , Femenino , Función Ventricular Izquierda/fisiología , Angiografía Coronaria , Volumen Sistólico/fisiología , Inteligencia Artificial , Disfunción Ventricular Izquierda/diagnóstico por imagen , Estudios Transversales , AlgoritmosRESUMEN
Coronary angiography is the primary procedure for diagnosis and management decisions in coronary artery disease (CAD), but ad-hoc visual assessment of angiograms has high variability. Here we report a fully automated approach to interpret angiographic coronary artery stenosis from standard coronary angiograms. Using 13,843 angiographic studies from 11,972 adult patients at University of California, San Francisco (UCSF), between April 1, 2008 and December 31, 2019, we train neural networks to accomplish four sequential necessary tasks for automatic coronary artery stenosis localization and estimation. Algorithms are internally validated against criterion-standard labels for each task in hold-out test datasets. Algorithms are then externally validated in real-world angiograms from the University of Ottawa Heart Institute (UOHI) and also retrained using quantitative coronary angiography (QCA) data from the Montreal Heart Institute (MHI) core lab. The CathAI system achieves state-of-the-art performance across all tasks on unselected, real-world angiograms. Positive predictive value, sensitivity and F1 score are all ≥90% to identify projection angle and ≥93% for left/right coronary artery angiogram detection. To predict obstructive CAD stenosis (≥70%), CathAI exhibits an AUC of 0.862 (95% CI: 0.843-0.880). In UOHI external validation, CathAI achieves AUC 0.869 (95% CI: 0.830-0.907) to predict obstructive CAD. In the MHI QCA dataset, CathAI achieves an AUC of 0.775 (95%. CI: 0.594-0.955) after retraining. In conclusion, multiple purpose-built neural networks can function in sequence to accomplish automated analysis of real-world angiograms, which could increase standardization and reproducibility in angiographic coronary stenosis assessment.
RESUMEN
Background: It remains difficult to definitively distinguish supraventricular tachycardia (SVT) mechanisms using a 12-lead electrocardiogram (ECG) alone. Machine learning may identify visually imperceptible changes on 12-lead ECGs and may improve ability to determine SVT mechanisms. Objective: We sought to develop a convolutional neural network (CNN) that identifies the SVT mechanism according to the gold standard of SVT ablation and to compare CNN performance against experienced electrophysiologists among patients with atrioventricular nodal re-entrant tachycardia (AVNRT), atrioventricular reciprocating tachycardia (AVRT), and atrial tachycardia (AT). Methods: All patients with 12-lead surface ECG during sinus rhythm and SVT and had successful SVT ablation from 2013 to 2020 were included. A CNN was trained using data from 1505 surface ECGs that were split into 1287 training and 218 test ECG datasets. We compared the CNN performance against independent adjudication by 2 experienced cardiac electrophysiologists on the test dataset. Results: Our dataset comprised 1505 ECGs (368 AVNRT, 304 AVRT, 95 AT, and 738 sinus rhythm) from 725 patients. The CNN areas under the receiver-operating characteristic curve for AVNRT, AVRT, and AT were 0.909, 0.867, and 0.817, respectively. When fixing the specificity of the CNN to the electrophysiologist adjudicators' specificity, the CNN identified all SVT classes with higher sensitivity: (1) AVNRT (91.7% vs 65.9%), (2) AVRT (78.4% vs 63.6%), and (3) AT (61.5% vs 50.0%). Conclusion: A CNN can be trained to differentiate SVT mechanisms from surface 12-lead ECGs with high overall performance, achieving similar performance to experienced electrophysiologists at fixed specificities.
RESUMEN
BACKGROUND: Mitral valve prolapse (MVP) is a common valvulopathy, with a subset developing sudden cardiac death or cardiac arrest. Complex ventricular ectopy (ComVE) is a marker of arrhythmic risk associated with myocardial fibrosis and increased mortality in MVP. OBJECTIVES: The authors sought to evaluate whether electrocardiogram (ECG)-based machine learning can identify MVP at risk for ComVE, death and/or myocardial fibrosis on cardiac magnetic resonance (CMR) imaging. METHODS: A deep convolutional neural network (CNN) was trained to detect ComVE using 6,916 12-lead ECGs from 569 MVP patients from the University of California-San Francisco between 2012 and 2020. A separate CNN was trained to detect late gadolinium enhancement (LGE) using 1,369 ECGs from 87 MVP patients with contrast CMR. RESULTS: The prevalence of ComVE was 28% (160/569). The area under the receiver operating characteristic curve (AUC) of the CNN to detect ComVE was 0.80 (95% CI: 0.77-0.83) and remained high after excluding patients with moderate-severe mitral regurgitation [0.80 (95% CI: 0.77-0.83)] or bileaflet MVP [0.81 (95% CI: 0.76-0.85)]. AUC to detect all-cause mortality was 0.82 (95% CI: 0.77-0.87). ECG segments relevant to ComVE prediction were related to ventricular depolarization/repolarization (early-mid ST-segment and QRS from V1, V3, and III). LGE in the papillary muscles or basal inferolateral wall was present in 24% patients with available CMR; AUC for detection of LGE was 0.75 (95% CI: 0.68-0.82). CONCLUSIONS: CNN-analyzed 12-lead ECGs can detect MVP at risk for ventricular arrhythmias, death and/or fibrosis and can identify novel ECG correlates of arrhythmic risk. ECG-based CNNs may help select those MVP patients requiring closer follow-up and/or a CMR.
RESUMEN
BACKGROUND: Artificial intelligence (AI) applied to 12-lead electrocardiographs (ECGs) can detect hypertrophic cardiomyopathy (HCM). OBJECTIVES: The purpose of this study was to determine if AI-enhanced ECG (AI-ECG) can track longitudinal therapeutic response and changes in cardiac structure, function, or hemodynamics in obstructive HCM during mavacamten treatment. METHODS: We applied 2 independently developed AI-ECG algorithms (University of California-San Francisco and Mayo Clinic) to serial ECGs (n = 216) from the phase 2 PIONEER-OLE trial of mavacamten for symptomatic obstructive HCM (n = 13 patients, mean age 57.8 years, 69.2% male). Control ECGs from 2,600 age- and sex-matched individuals without HCM were obtained. AI-ECG output was correlated longitudinally to echocardiographic and laboratory metrics of mavacamten treatment response. RESULTS: In the validation cohorts, both algorithms exhibited similar performance for HCM diagnosis, and exhibited mean HCM score decreases during mavacamten treatment: patient-level score reduction ranged from approximately 0.80 to 0.45 for Mayo and 0.70 to 0.35 for USCF algorithms; 11 of 13 patients demonstrated absolute score reduction from start to end of follow-up for both algorithms. HCM scores were significantly associated with other HCM-relevant parameters, including left ventricular outflow tract gradient at rest, postexercise, and with Valsalva, and NT-proBNP level, independent of age and sex (all P < 0.01). For both algorithms, the strongest longitudinal correlation was between AI-ECG HCM score and left ventricular outflow tract gradient postexercise (slope estimate: University of California-San Francisco 0.70 [95% CI: 0.45-0.96], P < 0.0001; Mayo 0.40 [95% CI: 0.11-0.68], P = 0.007). CONCLUSIONS: AI-ECG analysis longitudinally correlated with changes in echocardiographic and laboratory markers during mavacamten treatment in obstructive HCM. These results provide early evidence for a potential paradigm for monitoring HCM therapeutic response.
RESUMEN
BACKGROUND: Valvular heart disease is an important contributor to cardiovascular morbidity and mortality and remains underdiagnosed. Deep learning analysis of electrocardiography (ECG) may be useful in detecting aortic stenosis (AS), aortic regurgitation (AR), and mitral regurgitation (MR). OBJECTIVES: This study aimed to develop ECG deep learning algorithms to identify moderate or severe AS, AR, and MR alone and in combination. METHODS: A total of 77,163 patients undergoing ECG within 1 year before echocardiography from 2005-2021 were identified and split into train (n = 43,165), validation (n = 12,950), and test sets (n = 21,048; 7.8% with any of AS, AR, or MR). Model performance was assessed using area under the receiver-operating characteristic (AU-ROC) and precision-recall curves. Outside validation was conducted on an independent data set. Test accuracy was modeled using different disease prevalence levels to simulate screening efficacy using the deep learning model. RESULTS: The deep learning algorithm model accuracy was as follows: AS (AU-ROC: 0.88), AR (AU-ROC: 0.77), MR (AU-ROC: 0.83), and any of AS, AR, or MR (AU-ROC: 0.84; sensitivity 78%, specificity 73%) with similar accuracy in external validation. In screening program modeling, test characteristics were dependent on underlying prevalence and selected sensitivity levels. At a prevalence of 7.8%, the positive and negative predictive values were 20% and 97.6%, respectively. CONCLUSIONS: Deep learning analysis of the ECG can accurately detect AS, AR, and MR in this multicenter cohort and may serve as the basis for the development of a valvular heart disease screening program.
Asunto(s)
Insuficiencia de la Válvula Aórtica , Estenosis de la Válvula Aórtica , Aprendizaje Profundo , Enfermedades de las Válvulas Cardíacas , Insuficiencia de la Válvula Mitral , Insuficiencia de la Válvula Aórtica/diagnóstico , Estenosis de la Válvula Aórtica/diagnóstico , Electrocardiografía , Enfermedades de las Válvulas Cardíacas/diagnóstico , Enfermedades de las Válvulas Cardíacas/epidemiología , Humanos , Insuficiencia de la Válvula Mitral/diagnóstico , Insuficiencia de la Válvula Mitral/epidemiologíaRESUMEN
Importance: Millions of clinicians rely daily on automated preliminary electrocardiogram (ECG) interpretation. Critical comparisons of machine learning-based automated analysis against clinically accepted standards of care are lacking. Objective: To use readily available 12-lead ECG data to train and apply an explainability technique to a convolutional neural network (CNN) that achieves high performance against clinical standards of care. Design, Setting, and Participants: This cross-sectional study was conducted using data from January 1, 2003, to December 31, 2018. Data were obtained in a commonly available 12-lead ECG format from a single-center tertiary care institution. All patients aged 18 years or older who received ECGs at the University of California, San Francisco, were included, yielding a total of 365â¯009 patients. Data were analyzed from January 1, 2019, to March 2, 2021. Exposures: A CNN was trained to predict the presence of 38 diagnostic classes in 5 categories from 12-lead ECG data. A CNN explainability technique called LIME (Linear Interpretable Model-Agnostic Explanations) was used to visualize ECG segments contributing to CNN diagnoses. Main Outcomes and Measures: Area under the receiver operating characteristic curve (AUC), sensitivity, and specificity were calculated for the CNN in the holdout test data set against cardiologist clinical diagnoses. For a second validation, 3 electrophysiologists provided consensus committee diagnoses against which the CNN, cardiologist clinical diagnosis, and MUSE (GE Healthcare) automated analysis performance was compared using the F1 score; AUC, sensitivity, and specificity were also calculated for the CNN against the consensus committee. Results: A total of 992â¯748 ECGs from 365â¯009 adult patients (mean [SD] age, 56.2 [17.6] years; 183 600 women [50.3%]; and 175 277 White patients [48.0%]) were included in the analysis. In 91â¯440 test data set ECGs, the CNN demonstrated an AUC of at least 0.960 for 32 of 38 classes (84.2%). Against the consensus committee diagnoses, the CNN had higher frequency-weighted mean F1 scores than both cardiologists and MUSE in all 5 categories (CNN frequency-weighted F1 score for rhythm, 0.812; conduction, 0.729; chamber diagnosis, 0.598; infarct, 0.674; and other diagnosis, 0.875). For 32 of 38 classes (84.2%), the CNN had AUCs of at least 0.910 and demonstrated comparable F1 scores and higher sensitivity than cardiologists, except for atrial fibrillation (CNN F1 score, 0.847 vs cardiologist F1 score, 0.881), junctional rhythm (0.526 vs 0.727), premature ventricular complex (0.786 vs 0.800), and Wolff-Parkinson-White (0.800 vs 0.842). Compared with MUSE, the CNN had higher F1 scores for all classes except supraventricular tachycardia (CNN F1 score, 0.696 vs MUSE F1 score, 0.714). The LIME technique highlighted physiologically relevant ECG segments. Conclusions and Relevance: The results of this cross-sectional study suggest that readily available ECG data can be used to train a CNN algorithm to achieve comparable performance to clinical cardiologists and exceed the performance of MUSE automated analysis for most diagnoses, with some exceptions. The LIME explainability technique applied to CNNs highlights physiologically relevant ECG segments that contribute to the CNN's diagnoses.