Pesquisa | Biblioteca Virtual em Saúde

Testing for associations between loci and environmental gradients using latent factor mixed models.

Frichot, Eric; Schoville, Sean D; Bouchard, Guillaume; François, Olivier.

Mol Biol Evol ; 30(7): 1687-99, 2013 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23543094

RESUMO

Adaptation to local environments often occurs through natural selection acting on a large number of loci, each having a weak phenotypic effect. One way to detect these loci is to identify genetic polymorphisms that exhibit high correlation with environmental variables used as proxies for ecological pressures. Here, we propose new algorithms based on population genetics, ecological modeling, and statistical learning techniques to screen genomes for signatures of local adaptation. Implemented in the computer program "latent factor mixed model" (LFMM), these algorithms employ an approach in which population structure is introduced using unobserved variables. These fast and computationally efficient algorithms detect correlations between environmental and genetic variation while simultaneously inferring background levels of population structure. Comparing these new algorithms with related methods provides evidence that LFMM can efficiently estimate random effects due to population history and isolation-by-distance patterns when computing gene-environment correlations, and decrease the number of false-positive associations in genome scans. We then apply these models to plant and human genetic data, identifying several genes with functions related to development that exhibit strong correlations with climatic gradients.

Assuntos

Adaptação Fisiológica/genética , Genética Populacional , Polimorfismo Genético , Seleção Genética/genética , Algoritmos , Ecologia , Meio Ambiente , Interação Gene-Ambiente , Variação Genética , Humanos , Modelos Teóricos

Genome scan methods against more complex models: when and how much should we trust them?

de Villemereuil, Pierre; Frichot, Éric; Bazin, Éric; François, Olivier; Gaggiotti, Oscar E.

Mol Ecol ; 23(8): 2006-19, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24611968

RESUMO

The recent availability of next-generation sequencing (NGS) has made possible the use of dense genetic markers to identify regions of the genome that may be under the influence of selection. Several statistical methods have been developed recently for this purpose. Here, we present the results of an individual-based simulation study investigating the power and error rate of popular or recent genome scan methods: linear regression, Bayescan, BayEnv and LFMM. Contrary to previous studies, we focus on complex, hierarchical population structure and on polygenic selection. Additionally, we use a false discovery rate (FDR)-based framework, which provides an unified testing framework across frequentist and Bayesian methods. Finally, we investigate the influence of population allele frequencies versus individual genotype data specification for LFMM and the linear regression. The relative ranking between the methods is impacted by the consideration of polygenic selection, compared to a monogenic scenario. For strongly hierarchical scenarios with confounding effects between demography and environmental variables, the power of the methods can be very low. Except for one scenario, Bayescan exhibited moderate power and error rate. BayEnv performance was good under nonhierarchical scenarios, while LFMM provided the best compromise between power and error rate across scenarios. We found that it is possible to greatly reduce error rates by considering the results of all three methods when identifying outlier loci.

Assuntos

Teorema de Bayes , Genética Populacional/métodos , Modelos Genéticos , Simulação por Computador , Interpretação Estatística de Dados , Frequência do Gene , Interação Gene-Ambiente , Genótipo , Modelos Lineares , Polimorfismo de Nucleotídeo Único

Fast and efficient estimation of individual ancestry coefficients.

Frichot, Eric; Mathieu, François; Trouillon, Théo; Bouchard, Guillaume; François, Olivier.

Genetics ; 196(4): 973-83, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24496008

RESUMO

Inference of individual ancestry coefficients, which is important for population genetic and association studies, is commonly performed using computer-intensive likelihood algorithms. With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. Reducing the computational burden of estimation algorithms remains, however, a major challenge. Here, we present a fast and efficient method for estimating individual ancestry coefficients based on sparse nonnegative matrix factorization algorithms. We implemented our method in the computer program sNMF and applied it to human and plant data sets. The performances of sNMF were then compared to the likelihood algorithm implemented in the computer program ADMIXTURE. Without loss of accuracy, sNMF computed estimates of ancestry coefficients with runtimes â¼10-30 times shorter than those of ADMIXTURE.

Assuntos

Algoritmos , Genética Populacional , Software , Biologia Computacional/métodos , Frequência do Gene , Estudos de Associação Genética , Genótipo , Humanos , Funções Verossimilhança , Plantas/genética , Grupos Populacionais , Fatores de Tempo

Correcting principal component maps for effects of spatial autocorrelation in population genetic data.

Frichot, Eric; Schoville, Sean; Bouchard, Guillaume; François, Olivier.

Front Genet ; 3: 254, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23181073

RESUMO

In many species, spatial genetic variation displays patterns of "isolation-by-distance." Characterized by locally correlated allele frequencies, these patterns are known to create periodic shapes in geographic maps of principal components which confound signatures of specific migration events and influence interpretations of principal component analyses (PCA). In this study, we introduced models combining probabilistic PCA and kriging models to infer population genetic structure from genetic data while correcting for effects generated by spatial autocorrelation. The corresponding algorithms are based on singular value decomposition and low rank approximation of the genotypic data. As their complexity is close to that of PCA, these algorithms scale with the dimensions of the data. To illustrate the utility of these new models, we simulated isolation-by-distance patterns and broad-scale geographic variation using spatial coalescent models. Our methods remove the horseshoe patterns usually observed in PC maps and simplify interpretations of spatial genetic variation. We demonstrate our approach by analyzing single nucleotide polymorphism data from the Human Genome Diversity Panel, and provide comparisons with other recently introduced methods.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA