On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.

Helmer, Markus; Warrington, Shaun; Mohammadi-Nejad, Ali-Reza; Ji, Jie Lisa; Howell, Amber; Rosand, Benjamin; Anticevic, Alan; Sotiropoulos, Stamatios N; Murray, John D

Helmer, Markus; Warrington, Shaun; Mohammadi-Nejad, Ali-Reza; Ji, Jie Lisa; Howell, Amber; Rosand, Benjamin; Anticevic, Alan; Sotiropoulos, Stamatios N; Murray, John D.

Afiliação

Helmer M; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.
Warrington S; Manifest Technologies, New Haven, CT, 06510, USA.
Mohammadi-Nejad AR; Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.
Ji JL; Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.
Howell A; National Institute for Health Research (NIHR) Nottingham Biomedical Research Ctr, Queens Medical Ctr, Nottingham, United Kingdom.
Rosand B; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.
Anticevic A; Manifest Technologies, New Haven, CT, 06510, USA.
Sotiropoulos SN; Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, 06511, USA.
Murray JD; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.

Commun Biol ; 7(1): 217, 2024 Feb 21.

Article em En | MEDLINE | ID: mdl-38383808

ABSTRACT

ABSTRACT

Associations between datasets can be discovered through multivariate methods like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). A requisite property for interpretability and generalizability of CCA/PLS associations is stability of their feature patterns. However, stability of CCA/PLS in high-dimensional datasets is questionable, as found in empirical characterizations. To study these issues systematically, we developed a generative modeling framework to simulate synthetic datasets. We found that when sample size is relatively small, but comparable to typical studies, CCA/PLS associations are highly unstable and inaccurate; both in their magnitude and importantly in the feature pattern underlying the association. We confirmed these trends across two neuroimaging modalities and in independent datasets with n ≈ 1000 and n = 20,000, and found that only the latter comprised sufficient observations for stable mappings between imaging-derived and behavioral features. We further developed a power calculator to provide sample sizes required for stability and reliability of multivariate analyses. Collectively, we characterize how to limit detrimental effects of overfitting on CCA/PLS stability, and provide recommendations for future studies.

Assuntos

Algoritmos; Análise de Correlação Canônica; Análise dos Mínimos Quadrados; Reprodutibilidade dos Testes; Encéfalo/diagnóstico por imagem

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Correlação Canônica Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Algoritmos / Análise de Correlação Canônica Idioma: En Ano de publicação: 2024 Tipo de documento: Article