gammaMAXT: a fast multiple-testing correction algorithm.

Lishout, François Van; Gadaleta, Francesco; Moore, Jason H; Wehenkel, Louis; Steen, Kristel Van

Lishout, François Van; Gadaleta, Francesco; Moore, Jason H; Wehenkel, Louis; Steen, Kristel Van.

Afiliação

Lishout FV; Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium.
Gadaleta F; Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium.
Moore JH; Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, 19104-6021 PA USA.
Wehenkel L; Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium.
Steen KV; Systems and Modeling Unit, Montefiore Institute, University of Liège, Allée de la découverte 10, Liège, 4000 Belgium ; Bioinformatics and Modeling, GIGA-R, Avenue de l'Hôpital 1, Sart-Tilman, 4000 Belgium.

BioData Min ; 8: 36, 2015.

Article em En | MEDLINE | ID: mdl-26594243

RESUMO

BACKGROUND: The purpose of the MaxT algorithm is to provide a significance test algorithm that controls the family-wise error rate (FWER) during simultaneous hypothesis testing. However, the requirements in terms of computing time and memory of this procedure are proportional to the number of investigated hypotheses. The memory issue has been solved in 2013 by Van Lishout's implementation of MaxT, which makes the memory usage independent from the size of the dataset. This algorithm is implemented in MBMDR-3.0.3, a software that is able to identify genetic interactions, for a variety of SNP-SNP based epistasis models effectively. On the other hand, that implementation turned out to be less suitable for genome-wide interaction analysis studies, due to the prohibitive computational burden. RESULTS: In this work we introduce gammaMAXT, a novel implementation of the maxT algorithm for multiple testing correction. The algorithm was implemented in software MBMDR-4.2.2, as part of the MB-MDR framework to screen for SNP-SNP, SNP-environment or SNP-SNP-environment interactions at a genome-wide level. We show that, in the absence of interaction effects, test-statistics produced by the MB-MDR methodology follow a mixture distribution with a point mass at zero and a shifted gamma distribution for the top 10 % of the strictly positive values. We show that the gammaMAXT algorithm has a power comparable to MaxT and maintains FWER, but requires less computational resources and time. We analyze a dataset composed of 10(6) SNPs and 1000 individuals within one day on a 256-core computer cluster. The same analysis would take about 10(4) times longer with MBMDR-3.0.3. CONCLUSIONS: These results are promising for future GWAIs. However, the proposed gammaMAXT algorithm offers a general significance assessment and multiple testing approach, applicable to any context that requires performing hundreds of thousands of tests. It offers new perspectives for fast and efficient permutation-based significance assessment in large-scale (integrated) omics studies.

Palavras-chave

3-order interactions; Algorithmic; Gamma distribution; Genome-wide interaction studies; MaxT; Multiple testing; SNP-environment interactions

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2015 Tipo de documento: Article