A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data.

Lin, Nan; Zhu, Yun; Fan, Ruzong; Xiong, Momiao

Lin, Nan; Zhu, Yun; Fan, Ruzong; Xiong, Momiao.

Afiliação

Lin N; Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America.
Zhu Y; Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America.
Fan R; Biostatistics and Bioinformatics Branch (BBB), Division of Intramural Population Health Research (DIPHR), Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, United States of America.
Xiong M; Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America.

PLoS Comput Biol ; 13(10): e1005788, 2017 Oct.

Article em En | MEDLINE | ID: mdl-29040274

RESUMO

Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.

Assuntos

Biologia Computacional/métodos; Bases de Dados Genéticas; Pleiotropia Genética/genética; Modelos Estatísticos; Análise de Sequência de DNA; Algoritmos; Simulação por Computador; Estudo de Associação Genômica Ampla; Sequenciamento de Nucleotídeos em Larga Escala; Humanos; Fenótipo; Polimorfismo de Nucleotídeo Único/genética; Análise de Componente Principal

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Modelos Estatísticos / Análise de Sequência de DNA / Biologia Computacional / Bases de Dados Genéticas / Pleiotropia Genética Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Humans Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google