Search | VHL Regional Portal

On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.

Helmer, Markus; Warrington, Shaun; Mohammadi-Nejad, Ali-Reza; Ji, Jie Lisa; Howell, Amber; Rosand, Benjamin; Anticevic, Alan; Sotiropoulos, Stamatios N; Murray, John D.

Commun Biol ; 7(1): 217, 2024 Feb 21.

Article in English | MEDLINE | ID: mdl-38383808

ABSTRACT

Associations between datasets can be discovered through multivariate methods like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). A requisite property for interpretability and generalizability of CCA/PLS associations is stability of their feature patterns. However, stability of CCA/PLS in high-dimensional datasets is questionable, as found in empirical characterizations. To study these issues systematically, we developed a generative modeling framework to simulate synthetic datasets. We found that when sample size is relatively small, but comparable to typical studies, CCA/PLS associations are highly unstable and inaccurate; both in their magnitude and importantly in the feature pattern underlying the association. We confirmed these trends across two neuroimaging modalities and in independent datasets with n ≈ 1000 and n = 20,000, and found that only the latter comprised sufficient observations for stable mappings between imaging-derived and behavioral features. We further developed a power calculator to provide sample sizes required for stability and reliability of multivariate analyses. Collectively, we characterize how to limit detrimental effects of overfitting on CCA/PLS stability, and provide recommendations for future studies.

Subject(s)

Algorithms , Canonical Correlation Analysis , Least-Squares Analysis , Reproducibility of Results , Brain/diagnostic imaging

Automated Identification of Heart Failure with Reduced Ejection Fraction using Deep Learning-based Natural Language Processing.

Nargesi, Arash A; Adejumo, Philip; Dhingra, Lovedeep; Rosand, Benjamin; Hengartner, Astrid; Coppi, Andreas; Benigeri, Simon; Sen, Sounok; Ahmad, Tariq; Nadkarni, Girish N; Lin, Zhenqiu; Ahmad, Faraz S; Krumholz, Harlan M; Khera, Rohan.

medRxiv ; 2023 Sep 11.

Article in English | MEDLINE | ID: mdl-37745445

ABSTRACT

Background: The lack of automated tools for measuring care quality has limited the implementation of a national program to assess and improve guideline-directed care in heart failure with reduced ejection fraction (HFrEF). A key challenge for constructing such a tool has been an accurate, accessible approach for identifying patients with HFrEF at hospital discharge, an opportunity to evaluate and improve the quality of care. Methods: We developed a novel deep learning-based language model for identifying patients with HFrEF from discharge summaries using a semi-supervised learning framework. For this purpose, hospitalizations with heart failure at Yale New Haven Hospital (YNHH) between 2015 to 2019 were labeled as HFrEF if the left ventricular ejection fraction was under 40% on antecedent echocardiography. The model was internally validated with model-based net reclassification improvement (NRI) assessed against chart-based diagnosis codes. We externally validated the model on discharge summaries from hospitalizations with heart failure at Northwestern Medicine, community hospitals of Yale New Haven Health in Connecticut and Rhode Island, and the publicly accessible MIMIC-III database, confirmed with chart abstraction. Results: A total of 13,251 notes from 5,392 unique individuals (mean age 73 ± 14 years, 48% female), including 2,487 patients with HFrEF (46.1%), were used for model development (train/held-out test: 70/30%). The deep learning model achieved an area under receiving operating characteristic (AUROC) of 0.97 and an area under precision-recall curve (AUPRC) of 0.97 in detecting HFrEF on the held-out set. In external validation, the model had high performance in identifying HFrEF from discharge summaries with AUROC 0.94 and AUPRC 0.91 on 19,242 notes from Northwestern Medicine, AUROC 0.95 and AUPRC 0.96 on 139 manually abstracted notes from Yale community hospitals, and AUROC 0.91 and AUPRC 0.92 on 146 manually reviewed notes at MIMIC-III. Model-based prediction of HFrEF corresponded to an overall NRI of 60.2 ± 1.9% compared with the chart diagnosis codes (p-value < 0.001) and an increase in AUROC from 0.61 [95% CI: 060-0.63] to 0.91 [95% CI 0.90-0.92]. Conclusions: We developed and externally validated a deep learning language model that automatically identifies HFrEF from clinical notes with high precision and accuracy, representing a key element in automating quality assessment and improvement for individuals with HFrEF.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL