Canonical correlation analysis for multi-omics: Application to cross-cohort analysis.

Jiang, Min-Zhi; Aguet, François; Ardlie, Kristin; Chen, Jiawen; Cornell, Elaine; Cruz, Dan; Durda, Peter; Gabriel, Stacey B; Gerszten, Robert E; Guo, Xiuqing; Johnson, Craig W; Kasela, Silva; Lange, Leslie A; Lappalainen, Tuuli; Liu, Yongmei; Reiner, Alex P; Smith, Josh; Sofer, Tamar; Taylor, Kent D; Tracy, Russell P; VanDenBerg, David J; Wilson, James G; Rich, Stephen S; Rotter, Jerome I; Love, Michael I; Raffield, Laura M; Li, Yun

Jiang, Min-Zhi; Aguet, François; Ardlie, Kristin; Chen, Jiawen; Cornell, Elaine; Cruz, Dan; Durda, Peter; Gabriel, Stacey B; Gerszten, Robert E; Guo, Xiuqing; Johnson, Craig W; Kasela, Silva; Lange, Leslie A; Lappalainen, Tuuli; Liu, Yongmei; Reiner, Alex P; Smith, Josh; Sofer, Tamar; Taylor, Kent D; Tracy, Russell P; VanDenBerg, David J; Wilson, James G; Rich, Stephen S; Rotter, Jerome I; Love, Michael I; Raffield, Laura M; Li, Yun.

Afiliação

Jiang MZ; Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
Aguet F; Illumina Artificial Intelligence Laboratory, Illumina, Inc., San Diego, California, United States of America.
Ardlie K; The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
Chen J; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
Cornell E; Laboratory for Clinical Biochemistry Research, University of Vermont, Burlington, Vermont, United States of America.
Cruz D; Department of Medicine, Cardiology, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
Durda P; Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America.
Gabriel SB; The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
Gerszten RE; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
Guo X; Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America.
Johnson CW; Department of Biostatistics, University of Washington at Seattle, Seattle, Washington, United States of America.
Kasela S; New York Genome Center, New York, New York, United States of America.
Lange LA; Department of Epidemiology, Department of Medicine, Division of Biomedical Informatics and Personalized Medicine, Lifecourse Epidemiology of Adiposity & Diabetes Center, Aurora, Colorado, United States of America.
Lappalainen T; New York Genome Center, New York, New York, United States of America.
Liu Y; Department of Medicine, Cardiology and Neurology, Duke University Medical Center, Durham, North Carolina, United States of America.
Reiner AP; Department of Epidemiology, University of Washington, Seattle, Washington, United States of America.
Smith J; Northwest Genomic Center, University of Washington, Seattle, Washington, United States of America.
Sofer T; Department of Biostatistics, Harvard Medical School, Medicine-Brigham and Women's Hospital, Boston, Massachusetts, United States of America.
Taylor KD; Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America.
Tracy RP; Department of Pathology & Laboratory Medicine, University of Vermont, Colchester, Vermont, United States of America.
VanDenBerg DJ; Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America.
Wilson JG; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America.
Rich SS; Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, United States of America.
Rotter JI; Department of Pediatrics, Genomic Outcomes, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, University of California at Los Angeles, Torrance, California, United States of America.
Love MI; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
Raffield LM; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.
Li Y; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.

PLoS Genet ; 19(5): e1010517, 2023 05.

Article em En | MEDLINE | ID: mdl-37216410

ABSTRACT

ABSTRACT

Integrative approaches that simultaneously model multi-omics data have gained increasing popularity because they provide holistic system biology views of multiple or all components in a biological system of interest. Canonical correlation analysis (CCA) is a correlation-based integrative method designed to extract latent features shared between multiple assays by finding the linear combinations of features-referred to as canonical variables (CVs)-within each assay that achieve maximal across-assay correlation. Although widely acknowledged as a powerful approach for multi-omics data, CCA has not been systematically applied to multi-omics data in large cohort studies, which has only recently become available. Here, we adapted sparse multiple CCA (SMCCA), a widely-used derivative of CCA, to proteomics and methylomics data from the Multi-Ethnic Study of Atherosclerosis (MESA) and Jackson Heart Study (JHS). To tackle challenges encountered when applying SMCCA to MESA and JHS, our adaptations include the incorporation of the Gram-Schmidt (GS) algorithm with SMCCA to improve orthogonality among CVs, and the development of Sparse Supervised Multiple CCA (SSMCCA) to allow supervised integration analysis for more than two assays. Effective application of SMCCA to the two real datasets reveals important findings. Applying our SMCCA-GS to MESA and JHS, we identified strong associations between blood cell counts and protein abundance, suggesting that adjustment of blood cell composition should be considered in protein-based association studies. Importantly, CVs obtained from two independent cohorts also demonstrate transferability across the cohorts. For example, proteomic CVs learned from JHS, when transferred to MESA, explain similar amounts of blood cell count phenotypic variance in MESA, explaining 39.0% ~ 50.0% variation in JHS and 38.9% ~ 49.1% in MESA. Similar transferability was observed for other omics-CV-trait pairs. This suggests that biologically meaningful and cohort-agnostic variation is captured by CVs. We anticipate that applying our SMCCA-GS and SSMCCA on various cohorts would help identify cohort-agnostic biologically meaningful relationships between multi-omics data and phenotypic traits.

Assuntos

Análise de Correlação Canônica; Proteômica; Humanos; Proteômica/métodos; Multiômica; Estudos de Coortes

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteômica / Análise de Correlação Canônica Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Proteômica / Análise de Correlação Canônica Idioma: En Ano de publicação: 2023 Tipo de documento: Article