RESUMEN
Estimating admixture histories is crucial for understanding the genetic diversity we see in present-day populations. Allele frequency or phylogeny-based methods are excellent for inferring the existence of admixture or its proportions. However, to estimate admixture times, spatial information from admixed chromosomes of local ancestry or the decay of admixture linkage disequilibrium (ALD) is used. One popular method, implemented in the programs ALDER and ROLLOFF, uses two-locus ALD to infer the time of a single admixture event, but is only able to estimate the time of the most recent admixture event based on this summary statistic. To address this limitation, we derive analytical expressions for the expected ALD in a three-locus system and provide a new statistical method based on these results that is able to resolve more complicated admixture histories. Using simulations, we evaluate the performance of this method on a range of different admixture histories. As an example, we apply the method to the Colombian and Mexican samples from the 1000 Genomes project. The implementation of our method is available at https://github.com/Genomics-HSE/LaNeta.
Asunto(s)
Genética de Población , Grupos de Población , Colombia , Frecuencia de los Genes/genética , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Grupos de Población/genéticaRESUMEN
Estimation of admixture proportions has become one of the most commonly used computational tools in population genomics. However, there is remarkably little population genetic theory on statistical properties of these variables. We develop theoretical results that can accurately predict means and variances of admixture proportions within a population using models with recombination and genetic drift. Based on established theory on measures of multilocus disequilibrium, we show that there is a set of recurrence relations that can be used to derive expectations for higher moments of the admixture proportions distribution. We obtain closed form solutions for some special cases. Using these results, we develop a method for estimating admixture parameters from estimated admixture proportions obtained from programs such as Structure or Admixture. We apply this method to HapMap 3 data and find that the population history of African Americans, as expected, is not best explained by a single admixture event between people of European and African ancestry. The model of constant gene flow starting at 8 generations and ending at 2 generations before present gives the best fit.
Asunto(s)
Flujo Génico , Genética de Población , Desequilibrio de Ligamiento , Conceptos Matemáticos , Modelos Genéticos , Blanco , Humanos , Negro o Afroamericano/genética , Flujo Genético , Genética de Población/estadística & datos numéricos , Recombinación Genética , Blanco/genéticaRESUMEN
We present a new haplotype-based statistic (nSL) for detecting both soft and hard sweeps in population genomic data from a single population. We compare our new method with classic single-population haplotype and site frequency spectrum (SFS)-based methods and show that it is more robust, particularly to recombination rate variation. However, all statistics show some sensitivity to the assumptions of the demographic model. Additionally, we show that nSL has at least as much power as other methods under a number of different selection scenarios, most notably in the cases of sweeps from standing variation and incomplete sweeps. This conclusion holds up under a variety of demographic models. In many aspects, our new method is similar to the iHS statistic; however, it is generally more robust and does not require a genetic map. To illustrate the utility of our new method, we apply it to HapMap3 data and show that in the Yoruban population, there is strong evidence of selection on genes relating to lipid metabolism. This observation could be related to the known differences in cholesterol levels, and lipid metabolism more generally, between African Americans and other populations. We propose that the underlying causes for the selection on these genes are pleiotropic effects relating to blood parasites rather than their role in lipid metabolism.
Asunto(s)
Genética de Población/métodos , Haplotipos , Modelos Genéticos , Selección Genética , África , Bioestadística , Población Negra/genética , Simulación por Computador , Evolución Molecular , Frecuencia de los Genes , Genética de Población/estadística & datos numéricos , Proyecto Mapa de Haplotipos , Humanos , Metabolismo de los Lípidos/genética , Mutación , Polimorfismo de Nucleótido Simple , Recombinación GenéticaRESUMEN
Gaussian processes, a class of stochastic processes including Brownian motion and the Ornstein-Uhlenbeck process, are widely used to model continuous trait evolution in statistical phylogenetics. Under such processes, observations at the tips of a phylogenetic tree have a multivariate Gaussian distribution, which may lead to suboptimal model specification under certain evolutionary conditions, as supposed in models of punctuated equilibrium or adaptive radiation. To consider non-normally distributed continuous trait evolution, we introduce a method to compute posterior probabilities when modeling continuous trait evolution as a Lévy process. Through data simulation and model testing, we establish that single-rate Brownian motion (BM) and Lévy processes with jumps generate distinct patterns in comparative data. We then analyzed body mass and endocranial volume measurements for 126 primates. We rejected single-rate BM in favor of a Lévy process with jumps for each trait, with the lineage leading to most recent common ancestor of great apes showing particularly strong evidence against single-rate BM.
Asunto(s)
Modelos Biológicos , Fenotipo , Filogenia , Animales , Índice de Masa Corporal , Simulación por Computador , Primates/clasificación , Primates/fisiologíaRESUMEN
The distribution of admixture tract lengths has received considerable attention, in part because it can be used to infer the timing of past gene flow events between populations. It is commonly assumed that these lengths can be modeled as independently and identically distributed (iid) exponential random variables. This assumption is fundamental for many popular methods that analyze admixture using hidden Markov models. We compare the expected distribution of admixture tract lengths under a number of population-genetic models to the distribution predicted by the Wright-Fisher model with recombination. We show that under the latter model, the assumption of iid exponential tract lengths does not hold for recent or for ancient admixture events and that relying on this assumption can lead to false positives when inferring the number of admixture events. To further investigate the tract-length distribution, we develop a dyadic interval-based stochastic process for generating admixture tracts. This representation is useful for analyzing admixture tract-length distributions for populations with recent admixture, a scenario in which existing models perform poorly.
Asunto(s)
Pool de Genes , Simulación por Computador , Humanos , Funciones de Verosimilitud , Cadenas de Markov , Modelos Genéticos , Recombinación Genética/genéticaRESUMEN
BACKGROUND: Rapa Nui (Easter Island), located in the easternmost corner of the Polynesian Triangle, is one of the most isolated locations on the planet inhabited by humans. Archaeological and genetic evidence suggests that the island was first colonized by Polynesians around AD 1200, during their eastward expansion. Although it remains contentious whether Polynesians reached South America, suggestive evidence has been brought forward supporting the possibility of Native American contact prior to the European "discovery" of the island in AD 1722. RESULTS: We generated genome-wide data for 27 Rapanui. We found a mostly Polynesian ancestry among Rapanui and detected genome-wide patterns consistent with Native American and European admixture. By considering the distribution of local ancestry tracts of eight unrelated Rapanui, we found statistical support for Native American admixture dating to AD 1280-1495 and European admixture dating to AD 1850-1895. CONCLUSIONS: These genetic results can be explained by one or more pre-European trans-Pacific contacts.