RESUMEN
Genetic diversity is heterogeneously distributed among populations of the same species, due to the joint effects of multiple demographic processes, including range contractions and expansions, and mating systems shifts. Here, we ask how both processes shape genomic diversity in space and time in the classical Primula vulgaris model. This perennial herb originated in the Caucasus region and was hypothesized to have expanded westward following glacial retreat in the Quaternary. Moreover, this species is a long-standing model for mating system transitions, exemplified by shifts from heterostyly to homostyly. Leveraging a high-quality reference genome of the closely related Primula veris and whole-genome resequencing data from both heterostylous and homostylous individuals from populations encompassing a wide distribution of P. vulgaris, we reconstructed the demographic history of P. vulgaris. Results are compatible with the previously proposed hypothesis of range expansion from the Caucasus region approximately 79,000 years ago and suggest later shifts to homostyly following rather than preceding postglacial colonization of England. Furthermore, in accordance with population genetic theoretical predictions, both processes are associated with reduced genetic diversity, increased linkage disequilibrium, and reduced efficacy of purifying selection. A novel result concerns the contrasting effects of range expansion versus shift to homostyly on transposable elements, for the former, process is associated with changes in transposable element genomic content, while the latter is not. Jointly, our results elucidate how the interactions among range expansion, transitions to selfing, and Quaternary climatic oscillations shape plant evolution.
Asunto(s)
Variación Genética , Genoma de Planta , Primula , Primula/genética , Reproducción/genética , Desequilibrio de LigamientoRESUMEN
Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Progress has been impeded by a conceptual and methodological divide between analyses that infer the demographic history of speciation and genome scans aimed at identifying locally maladaptive alleles i.e. genomic barriers to gene flow. Here we implement genomewide IM blockwise likelihood estimation (gIMble), a composite likelihood approach for the quantification of barriers, that bridges this divide. This analytic framework captures background selection and selection against barriers in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. gIMble includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied pair of sister species of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analyses uncover both large-effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of a polygenic barrier architecture.
Asunto(s)
Mariposas Diurnas , Flujo Génico , Animales , Funciones de Verosimilitud , Especiación Genética , Mariposas Diurnas/genética , Evolución BiológicaRESUMEN
The extent of interspecific gene flow and its consequences for the initiation, maintenance, and breakdown of species barriers in natural systems remain poorly understood. Interspecific gene flow by hybridization may weaken adaptive divergence, but can be overcome by selection against hybrids, which may ultimately promote reinforcement. An informative step towards understanding the role of gene flow during speciation is to describe patterns of past gene flow among extant species. We investigate signals of admixture between allopatric and sympatric populations of the two closely related European dung fly species Sepsis cynipsea and S. neocynipsea (Diptera: Sepsidae). Based on microsatellite genotypes, we first inferred a baseline demographic history using Approximate Bayesian Computation. We then used genomic data from pooled DNA of natural and laboratory populations to test for past interspecific gene flow based on allelic configurations discordant with the inferred population tree (ABBA-BABA test with D-statistic). Comparing the detected signals of gene flow with the contemporary geographic relationship among interspecific pairs of populations (sympatric vs. allopatric), we made two contrasting observations. At one site in the French Cevennes, we detected an excess of past interspecific gene flow, while at two sites in Switzerland we observed lower signals of past microsatellite genotypes gene flow among populations in sympatry compared to allopatric populations. These results suggest that the species boundaries between these two species depend on the past and/or present eco-geographic context in Europe, which indicates that there is no uniform link between contemporary geographic proximity and past interspecific gene flow in natural populations. Supplementary Information: The online version contains supplementary material available at 10.1007/s11692-023-09612-5.
RESUMEN
The southernmost regions of South America harbor some of the earliest evidence of human presence in the Americas. However, connections with the rest of the continent and the contextualization of present-day indigenous ancestries remain poorly resolved. In this study, we analyze the genetic ancestry of one of the largest indigenous groups in South America: the Mapuche. We generate genome-wide data from 64 participants from three Mapuche populations in Southern Chile: Pehuenche, Lafkenche, and Huilliche. Broadly, we describe three main ancestry blocks with a common origin, which characterize the Southern Cone, the Central Andes, and Amazonia. Within the Southern Cone, ancestors of the Mapuche lineages differentiated from those of the Far South during the Middle Holocene and did not experience further migration waves from the north. We find that the deep genetic split between the Central and Southern Andes is followed by instances of gene flow, which may have accompanied the southward spread of cultural traits from the Central Andes, including crops and loanwords from Quechua into Mapudungun (the language of the Mapuche). Finally, we report close genetic relatedness between the three populations analyzed, with the Huilliche characterized additionally by intense recent exchanges with the Far South. Our findings add new perspectives on the genetic (pre)history of South America, from the first settlement through to the present-day indigenous presence. Follow-up fieldwork took these results back to the indigenous communities to contextualize the genetic narrative alongside indigenous knowledge and perspectives. VIDEO ABSTRACT.
Asunto(s)
Flujo Génico , Grupos de Población , Humanos , Chile , Perú , Genética de PoblaciónRESUMEN
Biogeographical ancestry (BGA) inference from ancestry-informative markers (AIMs) has strong potential to support forensic investigations. Over the past two decades, several forensic panels composed of AIMs have been developed to predict ancestry at a continental scale. These panels typically comprise fewer than 200 AIMs and have been designed and tested with a limited set of populations. How well these panels recover patterns of genetic diversity relative to larger sets of markers, and how accurately they infer ancestry of individuals and populations not included in their design remains poorly understood. The lack of comparative studies addressing these aspects makes the selection of appropriate panels for forensic laboratories difficult. In this study, the model-based genetic clustering tool STRUCTURE was used to compare three popular forensic BGA panels: MAPlex, Precision ID Ancestry Panel (PIDAP), and VISAGE Basic Tool (VISAGE BT) relative to a genome-wide reference set of 10k SNPs. The genotypes for all these markers were obtained for a comprehensive set of 3957 individuals from 228 worldwide human populations. Our results indicate that at the broad continental scale (K=6) typically examined in forensic studies, all forensic panels produced similar genetic structure patterns compared to the reference set (G'≈90%) and had high classification performance across all regions (average AUC-PR > 97%). However, at K= 7 and K= 8, the forensic panels displayed some region-specific clustering deviations from the reference set, particularly in Europe and the region of East and South-East Asia, which may be attributed to differences in the design of the respective panels. Overall, the panel with the most consistent performance in all regions was VISAGE BT with an average weighted AUCÌ W score of 96.26% across the three scales of geographical resolution investigated.
Asunto(s)
Genética de Población , Grupos Raciales , Humanos , Grupos Raciales/genética , Grupos de Población , Genotipo , Dermatoglifia del ADN , Polimorfismo de Nucleótido SimpleRESUMEN
Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (CâG, AâT), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.
Asunto(s)
Evolución Molecular , Conversión Génica/genética , Genoma Humano/genética , Selección Genética/genética , Composición de Base , Demografía , Genómica , Humanos , Modelos Genéticos , Mutación , Recombinación Genética/genéticaRESUMEN
The interplay of divergent selection and gene flow is key to understanding how populations adapt to local environments and how new species form. Here, we use DNA polymorphism data and genome-wide variation in recombination rate to jointly infer the strength and timing of selection, as well as the baseline level of gene flow under various demographic scenarios. We model how divergent selection leads to a genome-wide negative correlation between recombination rate and genetic differentiation among populations. Our theory shows that the selection density (i.e., the selection coefficient per base pair) is a key parameter underlying this relationship. We then develop a procedure for parameter estimation that accounts for the confounding effect of background selection. Applying this method to two datasets from Mimulus guttatus, we infer a strong signal of adaptive divergence in the face of gene flow between populations growing on and off phytotoxic serpentine soils. However, the genome-wide intensity of this selection is not exceptional compared with what M. guttatus populations may typically experience when adapting to local conditions. We also find that selection against genome-wide introgression from the selfing sister species M. nasutus has acted to maintain a barrier between these two species over at least the last 250 ky. Our study provides a theoretical framework for linking genome-wide patterns of divergence and recombination with the underlying evolutionary mechanisms that drive this differentiation.
Asunto(s)
Flujo Génico , Genética de Población , Mimulus/genética , Polimorfismo Genético , Evolución Biológica , California , Especiación Genética , Genómica , Geografía , Modelos Genéticos , Filogenia , Recombinación Genética , Aislamiento Reproductivo , Selección Genética , Especificidad de la EspecieRESUMEN
Hybridization between humans and Neanderthals has resulted in a low level of Neanderthal ancestry scattered across the genomes of many modern-day humans. After hybridization, on average, selection appears to have removed Neanderthal alleles from the human population. Quantifying the strength and causes of this selection against Neanderthal ancestry is key to understanding our relationship to Neanderthals and, more broadly, how populations remain distinct after secondary contact. Here, we develop a novel method for estimating the genome-wide average strength of selection and the density of selected sites using estimates of Neanderthal allele frequency along the genomes of modern-day humans. We confirm that East Asians had somewhat higher initial levels of Neanderthal ancestry than Europeans even after accounting for selection. We find that the bulk of purifying selection against Neanderthal ancestry is best understood as acting on many weakly deleterious alleles. We propose that the majority of these alleles were effectively neutral-and segregating at high frequency-in Neanderthals, but became selected against after entering human populations of much larger effective size. While individually of small effect, these alleles potentially imposed a heavy genetic load on the early-generation human-Neanderthal hybrids. This work suggests that differences in effective population size may play a far more important role in shaping levels of introgression than previously thought.
Asunto(s)
Genética de Población , Genoma Humano , Hombre de Neandertal/genética , Selección Genética/genética , Alelos , Animales , Pueblo Asiatico/genética , Frecuencia de los Genes , Haplotipos , Humanos , Hibridación Genética , Filogenia , Polimorfismo de Nucleótido Simple , Población BlancaRESUMEN
Genomic islands are clusters of loci with elevated divergence that are commonly found in population genomic studies of local adaptation and speciation. One explanation for their evolution is that linkage between selected alleles confers a benefit, which increases the establishment probability of new mutations that are linked to existing locally adapted polymorphisms. Previous theory suggested there is only limited potential for the evolution of islands via this mechanism, but involved some simplifying assumptions that may limit the accuracy of this inference. Here, we extend previous analytical approaches to study the effect of linkage on the establishment probability of new mutations and identify parameter regimes that are most likely to lead to evolution of islands via this mechanism. We show how the interplay between migration and selection affects the establishment probability of linked vs. unlinked alleles, the expected maximum size of genomic islands, and the expected time required for their evolution. Our results agree with previous studies, suggesting that this mechanism alone is unlikely to be a general explanation for the evolution of genomic islands. However, this mechanism could occur more readily if there were other pre-adaptations to reduce local rates of recombination or increase the local density of mutational targets within the region of the island. We also show that island formation via erosion following secondary contact is much more rapid than island formation from de novo mutations, suggesting that this mechanism may be more likely.
Asunto(s)
Alelos , Evolución Molecular , Ligamiento Genético , Islas Genómicas , Modelos Genéticos , Flujo Génico , Especiación Genética , Genética de Población , Mutación , ProbabilidadRESUMEN
We study invasion and survival of weakly beneficial mutations arising in linkage to an established migration-selection polymorphism. Our focus is on a continent-island model of migration, with selection at two biallelic loci for adaptation to the island environment. Combining branching and diffusion processes, we provide the theoretical basis for understanding the evolution of islands of divergence, the genetic architecture of locally adaptive traits, and the importance of so-called "divergence hitchhiking" relative to other mechanisms, such as "genomic hitchhiking", chromosomal inversions, or translocations. We derive approximations to the invasion probability and the extinction time of a de novo mutation. Interestingly, the invasion probability is maximized at a nonzero recombination rate if the focal mutation is sufficiently beneficial. If a proportion of migrants carries a beneficial background allele, the mutation is less likely to become established. Linked selection may increase the survival time by several orders of magnitude. By altering the timescale of stochastic loss, it can therefore affect the dynamics at the focal site to an extent that is of evolutionary importance, especially in small populations. We derive an effective migration rate experienced by the weakly beneficial mutation, which accounts for the reduction in gene flow imposed by linked selection. Using the concept of the effective migration rate, we also quantify the long-term effects on neutral variation embedded in a genome with arbitrarily many sites under selection. Patterns of neutral diversity change qualitatively and quantitatively as the position of the neutral locus is moved along the chromosome. This will be useful for population-genomic inference. Our results strengthen the emerging view that physically linked selection is biologically relevant if linkage is tight or if selection at the background locus is strong.
Asunto(s)
Evolución Molecular , Ligamiento Genético , Mutación , Selección Genética , Adaptación Biológica/genética , Ambiente , Flujo Génico , Aptitud Genética , Migración Humana , Humanos , Polimorfismo Genético , Recombinación Genética/genéticaRESUMEN
The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θ(anc) = 4N(e)u) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L(2)-loss performs best. Applying that method to the ibex data, we estimate θ(anc)≈ 1.288 and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10(-4) and 3.5 × 10(-3) per locus per generation. The proportion of males with access to matings is estimated as ω≈ 0.21, which is in good agreement with recent independent estimates.