Pesquisa | Biblioteca Virtual em Saúde

Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure.

Abegaz, Fentaw; Van Lishout, François; Mahachie John, Jestinah M; Chiachoompu, Kridsadakorn; Bhardwaj, Archana; Duroux, Diane; Gusareva, Elena S; Wei, Zhi; Hakonarson, Hakon; Van Steen, Kristel.

BioData Min ; 14(1): 16, 2021 Feb 19.

Artigo em Inglês | MEDLINE | ID: mdl-33608043

RESUMO

BACKGROUND: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. METHODS: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. RESULTS: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. CONCLUSION: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

Model-Based Multifactor Dimensionality Reduction for Rare Variant Association Analysis.

Fouladi, Ramouna; Bessonov, Kyrylo; Van Lishout, François; Van Steen, Kristel.

Hum Hered ; 79(3-4): 157-67, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26201701

RESUMO

Genome-wide association studies have revealed a vast amount of common loci associated to human complex diseases. Still, a large proportion of heritability remains unexplained. The extent to which rare genetic variants (RVs) are able to explain a relevant portion of the genetic heritability for complex traits leaves room for several debates and paves the way to the collection of RV databases and the development of novel analytic tools to analyze these. To date, several statistical methods have been proposed to uncover the association of RVs with complex diseases, but none of them is the clear winner in all possible scenarios of study design and assumed underlying disease model. The latter may involve differences in the distributions of effect sizes, proportions of causal variants, and ratios of protective to deleterious variants at distinct regions throughout the genome. Therefore, there is a need for robust scalable methods with acceptable overall performance in terms of power and type I error under various realistic scenarios. In this paper, we propose a novel RV association analysis strategy, which satisfies several of the desired properties that a RV analysis tool should exhibit.

Assuntos

Variação Genética , Estudo de Associação Genômica Ampla , Modelos Genéticos , Redução Dimensional com Múltiplos Fatores , Cromossomos Humanos Par 4/genética , Humanos

A robustness study of parametric and non-parametric tests in model-based multifactor dimensionality reduction for epistasis detection.

Mahachie John, Jestinah M; Van Lishout, François; Gusareva, Elena S; Van Steen, Kristel.

BioData Min ; 6(1): 9, 2013 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-23618370

RESUMO

BACKGROUND: Applying a statistical method implies identifying underlying (model) assumptions and checking their validity in the particular context. One of these contexts is association modeling for epistasis detection. Here, depending on the technique used, violation of model assumptions may result in increased type I error, power loss, or biased parameter estimates. Remedial measures for violated underlying conditions or assumptions include data transformation or selecting a more relaxed modeling or testing strategy. Model-Based Multifactor Dimensionality Reduction (MB-MDR) for epistasis detection relies on association testing between a trait and a factor consisting of multilocus genotype information. For quantitative traits, the framework is essentially Analysis of Variance (ANOVA) that decomposes the variability in the trait amongst the different factors. In this study, we assess through simulations, the cumulative effect of deviations from normality and homoscedasticity on the overall performance of quantitative Model-Based Multifactor Dimensionality Reduction (MB-MDR) to detect 2-locus epistasis signals in the absence of main effects. METHODOLOGY: Our simulation study focuses on pure epistasis models with varying degrees of genetic influence on a quantitative trait. Conditional on a multilocus genotype, we consider quantitative trait distributions that are normal, chi-square or Student's t with constant or non-constant phenotypic variances. All data are analyzed with MB-MDR using the built-in Student's t-test for association, as well as a novel MB-MDR implementation based on Welch's t-test. Traits are either left untransformed or are transformed into new traits via logarithmic, standardization or rank-based transformations, prior to MB-MDR modeling. RESULTS: Our simulation results show that MB-MDR controls type I error and false positive rates irrespective of the association test considered. Empirically-based MB-MDR power estimates for MB-MDR with Welch's t-tests are generally lower than those for MB-MDR with Student's t-tests. Trait transformations involving ranks tend to lead to increased power compared to the other considered data transformations. CONCLUSIONS: When performing MB-MDR screening for gene-gene interactions with quantitative traits, we recommend to first rank-transform traits to normality and then to apply MB-MDR modeling with Student's t-tests as internal tests for association.

An efficient algorithm to perform multiple testing in epistasis screening.

Van Lishout, François; Mahachie John, Jestinah M; Gusareva, Elena S; Urrea, Victor; Cleynen, Isabelle; Théâtre, Emilie; Charloteaux, Benoît; Calle, Malu Luz; Wehenkel, Louis; Van Steen, Kristel.

BMC Bioinformatics ; 14: 138, 2013 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-23617239

RESUMO

BACKGROUND: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn's disease. RESULTS: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn's disease (CD) data. CONCLUSIONS: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn's disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

Assuntos

Algoritmos , Epistasia Genética , Software , Doença de Crohn/genética , Estudos de Associação Genética , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único

Model-Based Multifactor Dimensionality Reduction to detect epistasis for quantitative traits in the presence of error-free and noisy data.

Mahachie John, Jestinah M; Van Lishout, François; Van Steen, Kristel.

Eur J Hum Genet ; 19(6): 696-703, 2011 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-21407267

RESUMO

Detecting gene-gene interactions or epistasis in studies of human complex diseases is a big challenge in the area of epidemiology. To address this problem, several methods have been developed, mainly in the context of data dimensionality reduction. One of these methods, Model-Based Multifactor Dimensionality Reduction, has so far mainly been applied to case-control studies. In this study, we evaluate the power of Model-Based Multifactor Dimensionality Reduction for quantitative traits to detect gene-gene interactions (epistasis) in the presence of error-free and noisy data. Considered sources of error are genotyping errors, missing genotypes, phenotypic mixtures and genetic heterogeneity. Our simulation study encompasses a variety of settings with varying minor allele frequencies and genetic variance for different epistasis models. On each simulated data, we have performed Model-Based Multifactor Dimensionality Reduction in two ways: with and without adjustment for main effects of (known) functional SNPs. In line with binary trait counterparts, our simulations show that the power is lowest in the presence of phenotypic mixtures or genetic heterogeneity compared to scenarios with missing genotypes or genotyping errors. In addition, empirical power estimates reduce even further with main effects corrections, but at the same time, false-positive percentages are reduced as well. In conclusion, phenotypic mixtures and genetic heterogeneity remain challenging for epistasis detection, and careful thought must be given to the way important lower-order effects are accounted for in the analysis.

Assuntos

Epistasia Genética , Modelos Genéticos , Redução Dimensional com Múltiplos Fatores/métodos , Herança Multifatorial/genética , Algoritmos , Estudos de Casos e Controles , Simulação por Computador , Frequência do Gene , Heterogeneidade Genética , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Característica Quantitativa Herdável , Software

Comparison of genetic association strategies in the presence of rare alleles.

Mahachie John, Jestinah M; Cattaert, Tom; De Lobel, Lizzy; Van Lishout, François; Empain, Alain; Van Steen, Kristel.

BMC Proc ; 5 Suppl 9: S32, 2011 Nov 29.

Artigo em Inglês | MEDLINE | ID: mdl-22373505

RESUMO

In the quest for the missing heritability of most complex diseases, rare variants have received increased attention. Advances in large-scale sequencing have led to a shift from the common disease/common variant hypothesis to the common disease/rare variant hypothesis or have at least reopened the debate about the relevance and importance of rare variants for gene discoveries. The investigation of modeling and testing approaches to identify significant disease/rare variant associations is in full motion. New methods to better deal with parameter estimation instabilities, convergence problems, or multiple testing corrections in the presence of rare variants or effect modifiers of rare variants are in their infancy. Using a recently developed semiparametric strategy to detect causal variants, we investigate the performance of the model-based multifactor dimensionality reduction (MB-MDR) technique in terms of power and family-wise error rate (FWER) control in the presence of rare variants, using population-based and family-based data (FAM-MDR). We compare family-based results obtained from MB-MDR analyses to screening findings from a quantitative trait Pedigree-based association test (PBAT). Population-based data were further examined using penalized regression models. We restrict attention to all available single-nucleotide polymorphisms on chromosome 4 and consider Q1 as the outcome of interest. The considered family-based methods identified marker C4S4935 in the VEGFC gene with estimated power not exceeding 0.35 (FAM-MDR), when FWER was kept under control. The considered population-based methods gave rise to highly inflated FWERs (up to 90% for PBAT screening).

Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise.

Cattaert, Tom; Calle, M Luz; Dudek, Scott M; Mahachie John, Jestinah M; Van Lishout, François; Urrea, Victor; Ritchie, Marylyn D; Van Steen, Kristel.

Ann Hum Genet ; 75(1): 78-89, 2011 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21158747

RESUMO

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.

Assuntos

Doença/genética , Epistasia Genética , Modelos Genéticos , Redução Dimensional com Múltiplos Fatores , Estudos de Casos e Controles , Simulação por Computador

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA