ABSTRACT
Introgression is a common evolutionary phenomenon that results in shared genetic material across non-sister taxa. Existing statistical methods such as Patterson's D statistic can detect introgression by measuring an excess of shared derived alleles between populations. The D statistic is effective to detect genome-wide patterns of introgression but can give spurious inferences of introgression when applied to local regions. We propose a new statistic, D+, that leverages both shared ancestral and derived alleles to infer local introgressed regions. Incorporating both shared derived and ancestral alleles increases the number of informative sites per region, improving our ability to identify local introgression. We use a coalescent framework to derive the expected value of this statistic as a function of different demographic parameters under an instantaneous admixture model and use coalescent simulations to compute the power and precision of D+. While the power of D and D+ is comparable, D+ has better precision than D. We apply D+ to empirical data from the 1000 Genome Project and Heliconius butterflies to infer local targets of introgression in humans and in butterflies.
Subject(s)
Butterflies , Humans , Animals , Butterflies/genetics , Genome , Biological EvolutionABSTRACT
Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged â¼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.
Subject(s)
Genetics, Population , Genome, Human , Humans , Latin America , Genome, Human/genetics , Demography , WhiteABSTRACT
Multispecies interbreeding networks, or syngameons, have been increasingly reported in natural systems. However, the formation, structure, and maintenance of syngameons have received little attention. Through gene flow, syngameons can increase genetic diversity, facilitate the colonization of new environments, and contribute to hybrid speciation. In this study, we evaluated the history, patterns, and consequences of hybridization in a pinyon pine syngameon using morphological and genomic data to assess genetic structure, demographic history, and geographic and climatic data to determine niche differentiation. We demonstrated that Pinus edulis, a dominant species in the Southwestern US and a barometer of climate change, is a core participant in the syngameon, involved in the formation of two drought-adapted hybrid lineages including the parapatric and taxonomically controversial fallax-type. We found that species remain morphologically and genetically distinct at range cores, maintaining species boundaries while undergoing extensive gene flow in areas of sympatry at range peripheries. Our study shows that sequential hybridization may have caused relatively rapid speciation and facilitated the colonization of different niches, resulting in the rapid formation of two new lineages. Participation in the syngameon may allow adaptive traits to be introgressed across species barriers and provide the changes needed to survive future climate scenarios.
Subject(s)
Hybridization, Genetic , Pinus , Humans , Nucleic Acid Hybridization , Gene Flow , Genomics , Pinus/geneticsABSTRACT
The grey wolf (Canis lupus) is one of the most widely distributed mammals in which a variety of distinct populations have been described. However, given their currently fragmented distribution and recent history of human-induced population decline, little is known about the events that led to their differentiation. Based on the analysis of whole canid genomes, we examined the divergence times between Southern European wolf populations and their ancient demographic history. We found that all present-day Eurasian wolves share a common ancestor ca 36 000 years ago, supporting the hypothesis that all extant wolves derive from a single population that subsequently expanded after the Last Glacial Maximum. We also estimated that the currently isolated European populations of the Iberian Peninsula, Italy and the Dinarics-Balkans diverged very closely in time, ca 10 500 years ago, and maintained negligible gene flow ever since. This indicates that the current genetic and morphological distinctiveness of Iberian and Italian wolves can be attributed to their isolation dating back to the end of the Pleistocene, predating the recent human-induced extinction of wolves in Central Europe by several millennia.
Subject(s)
Genetics, Population , Wolves/genetics , Animals , Europe , Gene Flow , GenomeABSTRACT
The gray wolf (Canis lupus) is a widely distributed top predator and ancestor of the domestic dog. To address questions about wolf relationships to each other and dogs, we assembled and analyzed a data set of 34 canine genomes. The divergence between New and Old World wolves is the earliest branching event and is followed by the divergence of Old World wolves and dogs, confirming that the dog was domesticated in the Old World. However, no single wolf population is more closely related to dogs, supporting the hypothesis that dogs were derived from an extinct wolf population. All extant wolves have a surprisingly recent common ancestry and experienced a dramatic population decline beginning at least â¼30 thousand years ago (kya). We suggest this crisis was related to the colonization of Eurasia by modern human hunter-gatherers, who competed with wolves for limited prey but also domesticated them, leading to a compensatory population expansion of dogs. We found extensive admixture between dogs and wolves, with up to 25% of Eurasian wolf genomes showing signs of dog ancestry. Dogs have influenced the recent history of wolves through admixture and vice versa, potentially enhancing adaptation. Simple scenarios of dog domestication are confounded by admixture, and studies that do not take admixture into account with specific demographic models are problematic.
Subject(s)
Dogs/genetics , Wolves/genetics , Animals , Bayes Theorem , DNA, Mitochondrial/genetics , Female , Genome , Hybridization, Genetic , Male , Markov Chains , Models, Genetic , Phylogeny , Polymorphism, Single Nucleotide , Principal Component Analysis , Sequence Analysis, DNAABSTRACT
The increasing abundance of DNA sequences obtained from fossils calls for new population genetics theory that takes account of both the temporal and spatial separation of samples. Here, we exploit the relationship between Wright's FST and average coalescence times to develop an analytic theory describing how FST depends on both the distance and time separating pairs of sampled genomes. We apply this theory to several simple models of population history. If there is a time series of samples, partial population replacement creates a discontinuity in pairwise FST values. The magnitude of the discontinuity depends on the extent of replacement. In stepping-stone models, pairwise FST values between archaic and present-day samples reflect both the spatial and temporal separation. At long distances, an isolation by distance pattern dominates. At short distances, the time separation dominates. Analytic predictions fit patterns generated by simulations. We illustrate our results with applications to archaic samples from European human populations. We compare present-day samples with a pair of archaic samples taken before and after a replacement event.
Subject(s)
DNA, Ancient/analysis , Genetics, Population/history , Genome , Fossils/history , History, Ancient , Models, GeneticABSTRACT
Population bottlenecks, inbreeding, and artificial selection can all, in principle, influence levels of deleterious genetic variation. However, the relative importance of each of these effects on genome-wide patterns of deleterious variation remains controversial. Domestic and wild canids offer a powerful system to address the role of these factors in influencing deleterious variation because their history is dominated by known bottlenecks and intense artificial selection. Here, we assess genome-wide patterns of deleterious variation in 90 whole-genome sequences from breed dogs, village dogs, and gray wolves. We find that the ratio of amino acid changing heterozygosity to silent heterozygosity is higher in dogs than in wolves and, on average, dogs have 2-3% higher genetic load than gray wolves. Multiple lines of evidence indicate this pattern is driven by less efficient natural selection due to bottlenecks associated with domestication and breed formation, rather than recent inbreeding. Further, we find regions of the genome implicated in selective sweeps are enriched for amino acid changing variants and Mendelian disease genes. To our knowledge, these results provide the first quantitative estimates of the increased burden of deleterious variants directly associated with domestication and have important implications for selective breeding programs and the conservation of rare and endangered species. Specifically, they highlight the costs associated with selective breeding and question the practice favoring the breeding of individuals that best fit breed standards. Our results also suggest that maintaining a large population size, rather than just avoiding inbreeding, is a critical factor for preventing the accumulation of deleterious variants.
Subject(s)
Animals, Domestic/genetics , Datasets as Topic , Dog Diseases/genetics , Dogs/genetics , Genetic Variation , Selective Breeding/genetics , Animals , Endangered Species , Genome/genetics , Heterozygote , Inbreeding , Population Density , Selection, Genetic , Wolves/geneticsABSTRACT
Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers.
Subject(s)
Genetics, Population , Genomics , Lipid Metabolism/genetics , Selection, Genetic , Animals , Demography , Dogs , Genome , Polymorphism, Single NucleotideABSTRACT
BACKGROUND: Homoplasy affects demographic inference estimates. This effect has been recognized and corrective methods have been developed. However, no studies so far have defined what homoplasy metrics best describe the effects on demographic inference, or have attempted to estimate such metrics in real data. Here we study how homoplasy in chloroplast microsatellites (cpSSR) affects inference of population expansion time. cpSSRs are popular markers for inferring historical demography in plants due to their high mutation rate and limited recombination. RESULTS: In cpSSRs, homoplasy is usually quantified as the probability that two markers or haplotypes that are identical by state are not identical by descent (Homoplasy index, P). Here we propose a new measure of multi-locus homoplasy in linked SSR called Distance Homoplasy (DH), which measures the proportion of pairwise differences not observed due to homoplasy, and we compare it to P and its per cpSSR locus average, which we call Mean Size Homoplasy (MSH). We use simulations and analytical derivations to show that, out of the three homoplasy metrics analyzed, MSH and DH are more correlated to changes in the population expansion time and to the underestimation of that demographic parameter using cpSSR. We perform simulations to show that Approximate Bayesian Computation (ABC) can be used to obtain reasonable estimates of MSH and DH. Finally, we use ABC to estimate the expansion time, MSH and DH from a chloroplast SSR dataset in Pinus caribaea. To our knowledge, this is the first time that homoplasy has been estimated in population genetic data. CONCLUSIONS: We show that MSH and DH should be used to quantify how homoplasy affects estimates of population expansion time. We also demonstrate how ABC provides a methodology to estimate homoplasy in population genetic data.
Subject(s)
Chloroplasts/genetics , Microsatellite Repeats , Pinus/genetics , Bayes Theorem , Central America , Computer Simulation , Genetics, Population , Haplotypes , Models, Genetic , Pinus/classificationABSTRACT
The Poisson Random Field (PRF) model has become an important tool in population genetics to study weakly deleterious genetic variation under complicated demographic scenarios. Currently, there are no freely available software applications that allow simulation of genetic variation data under this model. Here we present PReFerSim, an ANSI C program that performs forward simulations under the PRF model. PReFerSim models changes in population size, arbitrary amounts of inbreeding, dominance and distributions of selective effects. Users can track summaries of genetic variation over time and output trajectories of selected alleles. AVAILABILITY AND IMPLEMENTATION: PReFerSim is freely available at: https://github.com/LohmuellerLab/PReFerSim CONTACT: klohmueller@ucla.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Subject(s)
Genetics, Population , Software , Computer Simulation , Consanguinity , Demography , HumansABSTRACT
To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary.
Subject(s)
Amylases/genetics , Animals, Domestic/genetics , DNA Copy Number Variations/genetics , Evolution, Molecular , Animals , DNA, Mitochondrial/genetics , Diet , Dogs , Genetic Variation , Phylogeny , Population Density , Wolves/classification , Wolves/geneticsABSTRACT
The prediction of phenotypes from ancient humans has gained interest due to its potential to investigate the evolution of complex traits. These predictions are commonly performed using polygenic scores computed with DNA information from ancient humans along with genome-wide association studies (GWAS) data from present-day humans. However, numerous evolutionary processes could impact the prediction of phenotypes from ancient humans based on polygenic scores. In this work we investigate how natural selection impacts phenotypic predictions on ancient individuals using polygenic scores. We use simulations of an additive trait to analyze how natural selection impacts phenotypic predictions with polygenic scores. We simulate a trait evolving under neutrality, stabilizing selection and directional selection. We find that stabilizing and directional selection have contrasting effects on ancient phenotypic predictions. Stabilizing selection accelerates the loss of large-effect alleles contributing to trait variation. Conversely, directional selection accelerates the loss of small and large-effect alleles that drive individuals farther away from the optimal phenotypic value. These effects result in specific shared genetic variation patterns between ancient and modern populations which hamper the accuracy of polygenic scores to predict phenotypes. Furthermore, we conducted simulations that include realistic strengths of stabilizing selection and heritability estimates to show how natural selection could impact the predictive accuracy of ancient polygenic scores for two widely studied traits: height and body mass index. We emphasize the importance of considering how natural selection can decrease the reliability of ancient polygenic scores to perform phenotypic predictions on an ancient population.
ABSTRACT
The ancestral recombination graph (ARG) is a structure that represents the history of coalescent and recombination events connecting a set of sequences (Hudson RR. In: Futuyma D, Antonovics J, editors. Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology; 1991. p. 1 to 44.). The full ARG can be represented as a set of genealogical trees at every locus in the genome, annotated with recombination events that change the topology of the trees between adjacent loci and the mutations that occurred along the branches of those trees (Griffiths RC, Marjoram P. An ancestral recombination graph. In: Donnelly P, Tavare S, editors. Progress in population genetics and human evolution. Springer; 1997. p. 257 to 270.). Valuable insights can be gained into past evolutionary processes, such as demographic events or the influence of natural selection, by studying the ARG. It is regarded as the "holy grail" of population genetics (Hubisz M, Siepel A. Inference of ancestral recombination graphs using ARGweaver. In: Dutheil JY, editors. Statistical population genomics. New York, NY: Springer US; 2020. p. 231-266.) since it encodes the processes that generate all patterns of allelic and haplotypic variation from which all commonly used summary statistics in population genetic research (e.g. heterozygosity and linkage disequilibrium) can be derived. Many previous evolutionary inferences relied on summary statistics extracted from the genotype matrix. Evolutionary inferences using the ARG represent a significant advancement as the ARG is a representation of the evolutionary history of a sample that shows the past history of recombination, coalescence, and mutation events across a particular sequence. This representation in theory contains as much information, if not more, than the combination of all independent summary statistics that could be derived from the genotype matrix. Consistent with this idea, some of the first ARG-based analyses have proven to be more powerful than summary statistic-based analyses (Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nat Genet. 2019:51(9):1321 to 1329.; Stern AJ, Wilton PR, Nielsen R. An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data. PLoS Genet. 2019:15(9):e1008384.; Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLoS Genet. 2020:16(8):e1008895.; Fan C, Mancuso N, Chiang CWK. A genealogical estimate of genetic relationships. Am J Hum Genet. 2022:109(5):812-824.; Fan C, Cahoon JL, Dinh BL, Ortega-Del Vecchyo D, Huber C, Edge MD, Mancuso N, Chiang CWK. A likelihood-based framework for demographic inference from genealogical trees. bioRxiv. 2023.10.10.561787. 2023.; Hejase HA, Mo Z, Campagna L, Siepel A. A deep-learning approach for inference of selective sweeps from the ancestral recombination graph. Mol Biol Evol. 2022:39(1):msab332.; Link V, Schraiber JG, Fan C, Dinh B, Mancuso N, Chiang CWK, Edge MD. Tree-based QTL mapping with expected local genetic relatedness matrices. bioRxiv. 2023.04.07.536093. 2023.; Zhang BC, Biddanda A, Gunnarsson ÁF, Cooper F, Palamara PF. Biobank-scale inference of ancestral recombination graphs enables genealogical analysis of complex traits. Nat Genet. 2023:55(5):768-776.). As such, there has been significant interest in the field to investigate 2 main problems related to the ARG: (i) How can we estimate the ARG based on genomic data, and (ii) how can we extract information of past evolutionary processes from the ARG? In this perspective, we highlight 3 topics that pertain to these main issues: The development of computational innovations that enable the estimation of the ARG; remaining challenges in estimating the ARG; and methodological advances for deducing evolutionary forces and mechanisms using the ARG. This perspective serves to introduce the readers to the types of questions that can be explored using the ARG and to highlight some of the most pressing issues that must be addressed in order to make ARG-based inference an indispensable tool for evolutionary research.
Subject(s)
Algorithms , Recombination, Genetic , Humans , Likelihood Functions , Chromosome Mapping , Mutation , Models, GeneticABSTRACT
Elucidating phylogenetic relationships and species boundaries within complex taxonomic groups is challenging for intrinsic and extrinsic (i.e., technical) reasons. Mexican pinyon pines are a complex group whose phylogenetic relationships and species boundaries have been widely studied but poorly resolved, partly due to intrinsic ecological and evolutionary features such as low morphological and genetic differentiation caused by recent divergence, hybridization and introgression. Extrinsic factors such as limited sampling and difficulty in selecting informative molecular markers have also impeded progress. Some of the Mexican pinyon pines are of conservation concern but others may remain unprotected because the species boundaries have not been established. In this study we combined approaches to resolve the phylogenetic relationships in this complex group and to establish species boundaries in four recently diverged taxa: P. discolor, P. johannis, P. culminicola and P. cembroides. We performed phylogenetic analyses using the chloroplast markers matK and psbA-trnH as well as complete and partial chloroplast genomes of species of Subsection Cembroides. Additionally, we performed a phylogeographic analysis combining genetic data (18 chloroplast markers), morphological data and geographical data to define species boundaries in four recently diverged taxa. Ecological divergence was supported by differences in climate among localities for distinct genetic lineages. Whereas the phylogenetic analysis inferred with matK and psbA-trnH was unable to resolve the relationships in this complex group, we obtained a resolved phylogeny with the use of the chloroplast genomes. The resolved phylogeny was concordant with a haplotype network obtained using chloroplast markers. In species with potential for recent divergence, hybridization or introgression, nonhierarchical network-based approaches are probably more appropriate to protect against misclassification due to incomplete lineage sorting. The boundaries among genetic lineages were delimited by the inclusion of morphological, geographical and ecological data in the haplotype network. These multiple lines of evidence help to assign species boundaries in this complex group. P. johannis, P. discolor, P. culminicola and P. cembroides are different species based on their genetic, morphological and ecological niche differences. We suggest a reevaluation of the conservation status of these species considering the information generated in this study.
Subject(s)
Evolution, Molecular , Phylogeny , Pinus/classification , Bayes Theorem , Conservation of Natural Resources , DNA, Chloroplast/genetics , DNA, Plant/genetics , Genome, Chloroplast , Haplotypes , Mexico , Microsatellite Repeats , Models, Genetic , Phylogeography , Pinus/genetics , Sequence Analysis, DNAABSTRACT
The demographic history of a population drives the pattern of genetic variation and is encoded in the gene-genealogical trees of the sampled alleles. However, existing methods to infer demographic history from genetic data tend to use relatively low-dimensional summaries of the genealogy, such as allele frequency spectra. As a step toward capturing more of the information encoded in the genome-wide sequence of genealogical trees, here we propose a novel framework called the genealogical likelihood (gLike), which derives the full likelihood of a genealogical tree under any hypothesized demographic history. Employing a graph-based structure, gLike summarizes across independent trees the relationships among all lineages in a tree with all possible trajectories of population memberships through time and efficiently computes the exact marginal probability under a parameterized demographic model. Through extensive simulations and empirical applications on populations that have experienced multiple admixtures, we showed that gLike can accurately estimate dozens of demographic parameters when the true genealogy is known, including ancestral population sizes, admixture timing, and admixture proportions. Moreover, when using genealogical trees inferred from genetic data, we showed that gLike outperformed conventional demographic inference methods that leverage only the allele-frequency spectrum and yielded parameter estimates that align with established historical knowledge of the past demographic histories for populations like Latino Americans and Native Hawaiians. Furthermore, our framework can trace ancestral histories by analyzing a sample from the admixed population without proxies for its source populations, removing the need to sample ancestral populations that may no longer exist. Taken together, our proposed gLike framework harnesses underutilized genealogical information to offer exceptional sensitivity and accuracy in inferring complex demographies for humans and other species, particularly as estimation of genome-wide genealogies improves.
ABSTRACT
Recent genome sequencing studies with large sample sizes in humans have discovered a vast quantity of low-frequency variants, providing an important source of information to analyze how selection is acting on human genetic variation. In order to estimate the strength of natural selection acting on low-frequency variants, we have developed a likelihood-based method that uses the lengths of pairwise identity-by-state between haplotypes carrying low-frequency variants. We show that in some nonequilibrium populations (such as those that have had recent population expansions) it is possible to distinguish between positive or negative selection acting on a set of variants. With our new framework, one can infer a fixed selection intensity acting on a set of variants at a particular frequency, or a distribution of selection coefficients for standing variants and new mutations. We show an application of our method to the UK10K phased haplotype dataset of individuals.
Subject(s)
Models, Genetic , Selection, Genetic , Chromosome Mapping , Haplotypes , Humans , Likelihood Functions , Sample SizeABSTRACT
Climate changes, together with geographical barriers imposed by the Sierra Madre Oriental and the Chihuahuan Desert, have shaped the genetic diversity and spatial distribution of different species in northern Mexico. Pinus pinceana Gordon & Glend. tolerates extremely arid conditions. Northern Mexico became more arid during the Quaternary, modifying ecological communities. Here, we try to identify the processes underlying the demographic history of P. pinceana and characterize its genetic diversity using 3100 SNPs from genotyping by sequencing 90 adult individuals from 10 natural populations covering the species' entire geographic distribution. We inferred its population history and contrasted possible demographic scenarios of divergence that modeled the genetic diversity present in this restricted pinyon pine; in support, the past distribution was reconstructed using climate from the Last Glacial Maximum (LGM, 22 kya). We inferred that P. pinceana diverged into two lineages ~2.49 Ma (95% CI 3.28-1.62), colonizing two regions: the Sierra Madre Oriental (SMO) and the Chihuahuan Desert (ChD). Our results of population genomic analyses reveal the presence of heterozygous SNPs in all populations. In addition, low migration rates across regions are probably related to glacial-interglacial cycles, followed by the gradual aridification of the Chihuahuan Desert during the Holocene.
ABSTRACT
The spatial distribution of genetic variants is jointly determined by geography, past demographic processes, natural selection, and its interplay with environmental variation. A fraction of these genetic variants are "causal alleles" that affect the manifestation of a complex trait. The effect exerted by these causal alleles on complex traits can be independent or dependent on the environment. Understanding the evolutionary processes that shape the spatial structure of causal alleles is key to comprehend the spatial distribution of complex traits. Natural selection, past population size changes, range expansions, consanguinity, assortative mating, archaic introgression, admixture, and the environment can alter the frequencies, effect sizes, and heterozygosities of causal alleles. This provides a genetic axis along which complex traits can vary. However, complex traits also vary along biogeographical and sociocultural axes which are often correlated with genetic axes in complex ways. The purpose of this review is to consider these genetic and environmental axes in concert and examine the ways they can help us decipher the variation in complex traits that is visible in humans today. This initiative necessarily implies a discussion of populations, traits, the ability to infer and interpret "genetic" components of complex traits, and how these have been impacted by adaptive events. In this review, we provide a history-aware discussion on these topics using both the recent and more distant past of our academic discipline and its relevant contexts.
Subject(s)
Genetic Variation , Selection, Genetic , Alleles , Geography , Humans , PhenotypeABSTRACT
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.