DOT: Gene-set analysis by combining decorrelated association statistics.

Vsevolozhskaya, Olga A; Shi, Min; Hu, Fengjiao; Zaykin, Dmitri V

Vsevolozhskaya, Olga A; Shi, Min; Hu, Fengjiao; Zaykin, Dmitri V.

Afiliação

Vsevolozhskaya OA; Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, Kentucky, United States of America.
Shi M; Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America.
Hu F; Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America.
Zaykin DV; Biostatistics and Computational Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America.

PLoS Comput Biol ; 16(4): e1007819, 2020 04.

Article em En | MEDLINE | ID: mdl-32287273

RESUMO

Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk.

Assuntos

Biologia Computacional/métodos; Estudo de Associação Genômica Ampla/métodos; Desequilíbrio de Ligação/genética; Polimorfismo de Nucleotídeo Único/genética; Neoplasias da Mama/genética; Fenda Labial/genética; Feminino; Marcadores Genéticos/genética; Predisposição Genética para Doença/genética; Humanos; Modelos Estatísticos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Desequilíbrio de Ligação / Biologia Computacional / Polimorfismo de Nucleotídeo Único / Estudo de Associação Genômica Ampla Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Female / Humans Idioma: En Revista: PLoS Comput Biol Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google