Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Stat Appl Genet Mol Biol ; 21(1)2022 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-35245419

RESUMEN

Association of phenotypes or exposures with genomic and epigenomic data faces important statistical challenges. One of these challenges is to account for variation due to unobserved confounding factors, such as individual ancestry or cell-type composition in tissues. This issue can be addressed with penalized latent factor regression models, where penalties are introduced to cope with high dimension in the data. If a relatively small proportion of genomic or epigenomic markers correlate with the variable of interest, sparsity penalties may help to capture the relevant associations, but the improvement over non-sparse approaches has not been fully evaluated yet. Here, we present least-squares algorithms that jointly estimate effect sizes and confounding factors in sparse latent factor regression models. In simulated data, sparse latent factor regression models generally achieved higher statistical performance than other sparse methods, including the least absolute shrinkage and selection operator and a Bayesian sparse linear mixed model. In generative model simulations, statistical performance was slightly lower (while being comparable) to non-sparse methods, but in simulations based on empirical data, sparse latent factor regression models were more robust to departure from the model than the non-sparse approaches. We applied sparse latent factor regression models to a genome-wide association study of a flowering trait for the plant Arabidopsis thaliana and to an epigenome-wide association study of smoking status in pregnant women. For both applications, sparse latent factor regression models facilitated the estimation of non-null effect sizes while overcoming multiple testing issues. The results were not only consistent with previous discoveries, but they also pinpointed new genes with functional annotations relevant to each application.


Asunto(s)
Epigenoma , Estudio de Asociación del Genoma Completo , Algoritmos , Teorema de Bayes , Femenino , Estudio de Asociación del Genoma Completo/métodos , Humanos , Análisis de los Mínimos Cuadrados , Embarazo
2.
Mol Biol Evol ; 36(4): 852-860, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30657943

RESUMEN

Gene-environment association (GEA) studies are essential to understand the past and ongoing adaptations of organisms to their environment, but those studies are complicated by confounding due to unobserved demographic factors. Although the confounding problem has recently received considerable attention, the proposed approaches do not scale with the high-dimensionality of genomic data. Here, we present a new estimation method for latent factor mixed models (LFMMs) implemented in an upgraded version of the corresponding computer program. We developed a least-squares estimation approach for confounder estimation that provides a unique framework for several categories of genomic data, not restricted to genotypes. The speed of the new algorithm is several order faster than existing GEA approaches and then our previous version of the LFMM program. In addition, the new method outperforms other fast approaches based on principal component or surrogate variable analysis. We illustrate the program use with analyses of the 1000 Genomes Project data set, leading to new findings on adaptation of humans to their environment, and with analyses of DNA methylation profiles providing insights on how tobacco consumption could affect DNA methylation in patients with rheumatoid arthritis. Software availability: Software is available in the R package lfmm at https://bcm-uga.github.io/lfmm/.


Asunto(s)
Adaptación Biológica/genética , Algoritmos , Estudio de Asociación del Genoma Completo , Programas Informáticos , Artritis Reumatoide/genética , Clima , Metilación de ADN , Interacción Gen-Ambiente , Humanos , Fumar/efectos adversos
3.
Mol Ecol ; 25(2): 454-69, 2016 01.
Artículo en Inglés | MEDLINE | ID: mdl-26671840

RESUMEN

Population differentiation (PD) and ecological association (EA) tests have recently emerged as prominent statistical methods to investigate signatures of local adaptation using population genomic data. Based on statistical models, these genomewide testing procedures have attracted considerable attention as tools to identify loci potentially targeted by natural selection. An important issue with PD and EA tests is that incorrect model specification can generate large numbers of false-positive associations. Spurious association may indeed arise when shared demographic history, patterns of isolation by distance, cryptic relatedness or genetic background are ignored. Recent works on PD and EA tests have widely focused on improvements of test corrections for those confounding effects. Despite significant algorithmic improvements, there is still a number of open questions on how to check that false discoveries are under control and implement test corrections, or how to combine statistical tests from multiple genome scan methods. This tutorial study provides a detailed answer to these questions. It clarifies the relationships between traditional methods based on allele frequency differentiation and EA methods and provides a unified framework for their underlying statistical tests. We demonstrate how techniques developed in the area of genomewide association studies, such as inflation factors and linear mixed models, benefit genome scan methods and provide guidelines for good practice while conducting statistical tests in landscape and population genomic applications. Finally, we highlight how the combination of several well-calibrated statistical tests can increase the power to reject neutrality, improving our ability to infer patterns of local adaptation in large population genomic data sets.


Asunto(s)
Ecología/métodos , Genética de Población , Genómica/métodos , Selección Genética , Adaptación Fisiológica/genética , Algoritmos , Arabidopsis/genética , Frecuencia de los Genes , Estudios de Asociación Genética , Modelos Genéticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple
4.
Mol Ecol ; 25(20): 5029-5042, 2016 10.
Artículo en Inglés | MEDLINE | ID: mdl-27565448

RESUMEN

Finding genetic signatures of local adaptation is of great interest for many population genetic studies. Common approaches to sorting selective loci from their genomic background focus on the extreme values of the fixation index, FST , across loci. However, the computation of the fixation index becomes challenging when the population is genetically continuous, when predefining subpopulations is a difficult task, and in the presence of admixed individuals in the sample. In this study, we present a new method to identify loci under selection based on an extension of the FST statistic to samples with admixed individuals. In our approach, FST values are computed from the ancestry coefficients obtained with ancestry estimation programs. More specifically, we used factor models to estimate FST , and we compared our neutrality tests with those derived from a principal component analysis approach. The performances of the tests were illustrated using simulated data and by re-analysing genomic data from European lines of the plant species Arabidopsis thaliana and human genomic data from the population reference sample, POPRES.


Asunto(s)
Genética de Población/métodos , Genómica/métodos , Adaptación Biológica/genética , Arabidopsis/genética , Simulación por Computador , Frecuencia de los Genes , Sitios Genéticos , Genoma Humano , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Selección Genética
5.
Mol Ecol Resour ; 18(4): 789-797, 2018 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-29673087

RESUMEN

Association studies of polygenic traits are notoriously difficult when those studies are conducted at large geographic scales. The difficulty arises as genotype frequencies often vary in geographic space and across distinct environments. Those large-scale variations are known to yield false positives in standard association testing approaches. Although several methods alleviate this problem, no tools have been proposed to evaluate the power that association tests could achieve for a specific study design and set of genotypes. Our goal here is to present an R program fulfilling this objective, by allowing users to simulate phenotypes from observed genotypes and to estimate upper bounds on achievable power. The simulation model can incorporate realistic features such as population structure and gene-by-environment interactions, and the package implements a gold-standard test that evaluates power using information on confounders. We illustrated the use of the program with example studies based on data for the plant species Arabidopsis thaliana. Simulated phenotypes were used to compare the ability of two recent association methods to correctly remove confounding factors, to evaluate power to detect causal variants, and to assess the influence various parameters. For the simulated data, the new tests reached performances close to the gold-standard test and could be reasonably used with measured phenotypes. Power to detect causal variants was influenced by the number of variants and by the strength of their effect sizes, and specific thresholds were obtained from the simulation study. In conclusion, our program provides guidance on methodological choice of association tests, as well as useful knowledge on test performances in a user-specific context.


Asunto(s)
Simulación por Computador , Estudios de Asociación Genética/métodos , Programas Informáticos , Genotipo , Fenotipo
6.
Mol Ecol Resour ; 16(2): 540-8, 2016 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-26417651

RESUMEN

Geography and landscape are important determinants of genetic variation in natural populations, and several ancestry estimation methods have been proposed to investigate population structure using genetic and geographic data simultaneously. Those approaches are often based on computer-intensive stochastic simulations and do not scale with the dimensions of the data sets generated by high-throughput sequencing technologies. There is a growing demand for faster algorithms able to analyse genomewide patterns of population genetic variation in their geographic context. In this study, we present TESS3, a major update of the spatial ancestry estimation program TESS. By combining matrix factorization and spatial statistical methods, TESS3 provides estimates of ancestry coefficients with accuracy comparable to TESS and with run-times much faster than the Bayesian version. In addition, the TESS3 program can be used to perform genome scans for selection, and separate adaptive from nonadaptive genetic variation using ancestral allele frequency differentiation tests. The main features of TESS3 are illustrated using simulated data and analysing genomic data from European lines of the plant species Arabidopsis thaliana.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Genética de Población/métodos , Filogeografía/métodos , Arabidopsis/clasificación , Arabidopsis/genética , Europa (Continente) , Genoma de Planta
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA