RESUMO
The infinitesimal model of quantitative genetics relies on the Central Limit Theorem to stipulate that under additive models of quantitative traits determined by many loci having similar effect size, the difference between an offspring's genetic trait component and the average of their two parents' genetic trait components is Normally distributed and independent of the parents' values. Here, we investigate how the assumption of similar effect sizes affects the model: if, alternatively, the tail of the effect size distribution is polynomial with exponent α<2, then a different Central Limit Theorem implies that sums of effects should be well-approximated by a "stable distribution", for which single large effects are often still important. Empirically, we first find tail exponents between 1 and 2 in effect sizes estimated by genome-wide association studies of many human disease-related traits. We then show that the independence of offspring trait deviations from parental averages in many cases implies the Gaussian aspect of the infinitesimal model, suggesting that non-Gaussian models of trait evolution must explicitly track the underlying genetics, at least for loci of large effect. We also characterize possible limiting trait distributions of the infinitesimal model with infinitely divisible noise distributions, and compare our results to simulations.
Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Humanos , Distribuição Normal , FenótipoRESUMO
Speciation genomic studies aim to interpret patterns of genome-wide variation in light of the processes that give rise to new species. However, interpreting the genomic "landscape" of speciation is difficult, because many evolutionary processes can impact levels of variation. Facilitated by the first chromosome-level assembly for the group, we use whole-genome sequencing and simulations to shed light on the processes that have shaped the genomic landscape during a radiation of monkeyflowers. After inferring the phylogenetic relationships among the 9 taxa in this radiation, we show that highly similar diversity (π) and differentiation (FST) landscapes have emerged across the group. Variation in these landscapes was strongly predicted by the local density of functional elements and the recombination rate, suggesting that the landscapes have been shaped by widespread natural selection. Using the varying divergence times between pairs of taxa, we show that the correlations between FST and genome features arose almost immediately after a population split and have become stronger over time. Simulations of genomic landscape evolution suggest that background selection (BGS; i.e., selection against deleterious mutations) alone is too subtle to generate the observed patterns, but scenarios that involve positive selection and genetic incompatibilities are plausible alternative explanations. Finally, tests for introgression among these taxa reveal widespread evidence of heterogeneous selection against gene flow during this radiation. Combined with previous evidence for adaptation in this system, we conclude that the correlation in FST among these taxa informs us about the processes contributing to adaptation and speciation during a rapid radiation.
Assuntos
Fluxo Gênico , Variação Genética , Genoma de Planta/genética , Genômica/métodos , Mimulus/genética , Seleção Genética , Adaptação Fisiológica/genética , Especiação Genética , Genética Populacional/métodos , Mimulus/classificação , FilogeniaRESUMO
Inference with population genetic data usually treats the population pedigree as a nuisance parameter, the unobserved product of a past history of random mating. However, the history of genetic relationships in a given population is a fixed, unobserved object, and so an alternative approach is to treat this network of relationships as a complex object we wish to learn about, by observing how genomes have been noisily passed down through it. This paper explores this point of view, showing how to translate questions about population genetic data into calculations with a Poisson process of mutations on all ancestral genomes. This method is applied to give a robust interpretation to the f4 statistic used to identify admixture, and to design a new statistic that measures covariances in mean times to most recent common ancestor between two pairs of sequences. The method more generally interprets population genetic statistics in terms of sums of specific functions over ancestral genomes, thereby providing concrete, broadly interpretable interpretations for these statistics. This provides a method for describing demographic history without simplified demographic models. More generally, it brings into focus the population pedigree, which is averaged over in model-based demographic inference.
Assuntos
Demografia , Genética Populacional , Algoritmos , Variação Genética , Humanos , Modelos Genéticos , Linhagem , Distribuição de Poisson , Densidade DemográficaRESUMO
In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly 'simplify' a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation's entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.
Assuntos
Biologia Computacional/métodos , Variação Genética , Genética Populacional , Software , Algoritmos , Simulação por Computador , Frequência do Gene , Genoma , Genótipo , Humanos , Modelos Genéticos , Linhagem , Polimorfismo GenéticoRESUMO
Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build "geogenetic maps," which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.
Assuntos
Fluxo Gênico/genética , Frequência do Gene , Genética Populacional , Teorema de Bayes , Geografia , HumanosRESUMO
Species often encounter, and adapt to, many patches of similar environmental conditions across their range. Such adaptation can occur through convergent evolution if different alleles arise in different patches, or through the spread of shared alleles by migration acting to synchronize adaptation across the species. The tension between the two reflects the constraint imposed on evolution by the underlying genetic architecture versus how effectively selection and geographic isolation act to inhibit the geographic spread of locally adapted alleles. This paper studies the balance between these two routes to adaptation in a model of continuous environments with patchy selection pressures. We address the following questions: How long does it take for a novel allele to appear in a patch where it is locally adapted through mutation? Or, through migration from another, already adapted patch? Which is more likely to occur, as a function of distance between the patches? What population genetic signal is left by the spread of migrant alleles? To answer these questions we examine the family structure underlying migration-selection equilibrium surrounding an already adapted patch, treating those rare families that reach new patches as spatial branching processes. A main result is that patches further apart than a critical distance will likely evolve independent locally adapted alleles; this distance is proportional to the spatial scale of selection ([Formula: see text], where σ is the dispersal distance and sm is the selective disadvantage of these alleles between patches), and depends linearly on log(sm/µ), where µ is the mutation rate. This provides a way to understand the role of geographic separation between patches in promoting convergent adaptation and the genomic signals it leaves behind. We illustrate these ideas using the convergent evolution of cryptic coloration in the rock pocket mouse, Chaetodipus intermedius, as an empirical example.
Assuntos
Adaptação Fisiológica/genética , Evolução Molecular , Mutação , Seleção GenéticaRESUMO
The extent to which populations experiencing shared selective pressures adapt through a shared genetic response is relevant to many questions in evolutionary biology. In this article, we explore how standing genetic variation contributes to convergent genetic responses in a geographically spread population. Geographically limited dispersal slows the spread of each selected allele, hence allowing other alleles to spread before any one comes to dominate the population. When selectively equivalent alleles meet, their progress is substantially slowed, dividing the species range into a random tessellation, which can be well understood by analogy to a Poisson process model of crystallization. In this framework, we derive the geographic scale over which an allele dominates and the proportion of adaptive alleles that arise from standing variation. Finally, we explore how negative pleiotropic effects of alleles can bias the subset of alleles that contribute to the species' adaptive response. We apply the results to the malaria-resistance glucose-6-phosphate dehydrogenase-deficiency alleles, where the large mutational target size makes it a likely candidate for adaptation from deleterious standing variation. Our results suggest that convergent adaptation may be common. Therefore, caution must be exercised when arguing that strongly geographically restricted alleles are the outcome of local adaptation. We close by discussing the implications of these results for ideas of species coherence and the nature of divergence between species.
Assuntos
Adaptação Fisiológica/genética , Evolução Biológica , Variação Genética , Alelos , Resistência à Doença , Glucosefosfato Desidrogenase/genética , Humanos , Malária/enzimologia , Malária/genética , Modelos Genéticos , Mutação , Filogeografia , Seleção GenéticaRESUMO
Coevolution between two species can lead to exaggerated phenotypes that vary in a correlated manner across space. However, the conditions under which we expect such spatially varying coevolutionary patterns in polygenic traits are not well-understood. We investigate the coevolutionary dynamics between two species undergoing reciprocal adaptation across space and time, using simulations inspired by the Taricha newt - Thamnophis garter snake system. One striking observation from this system is that newts in some areas carry much more tetrodotoxin than in other areas, and garter snakes that live near more toxic newts tend to be more resistant to this toxin, a correlation seen across several broad geographic areas. Furthermore, snakes seem to be "winning" the coevolutionary arms race, i.e., having a high level of resistance compared to local newt toxicity, despite substantial variation in both toxicity and resistance across the range. We explore how possible genetic architectures of the toxin and resistance traits would affect the coevolutionary dynamics by manipulating both mutation rate and effect size of mutations across many simulations. We find that coevolutionary dynamics alone were not sufficient in our simulations to produce the striking mosaic of levels of toxicity and resistance observed in nature, but simulations with ecological heterogeneity (in trait costliness or interaction rate) did produce such patterns. We also find that in simulations, newts tend to "win" across most combinations of genetic architectures, although the species with higher mutational genetic variance tends to have an advantage.
RESUMO
The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone and is thus likely a genetic signal of co-dispersal in a host-parasite system.
Assuntos
Anopheles , Malária Falciparum , Parasitos , Plasmodium , Animais , Parasitos/genética , Anopheles/genética , Anopheles/parasitologia , Interações Hospedeiro-Parasita/genética , Plasmodium/genética , Plasmodium falciparum/genética , GeografiaRESUMO
For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
Assuntos
Variação Genética , Hominidae , Animais , Seleção Genética , Hominidae/genética , Mutação , GenômicaRESUMO
A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and barriers to dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity-by-descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.
Assuntos
Genética Populacional , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Animais , Genética Populacional/métodos , Lobos/genética , Lobos/classificação , Densidade Demográfica , Demografia/métodos , GenótipoRESUMO
A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and barriers to dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN .
RESUMO
Individual-based simulation has become an increasingly crucial tool for many fields of population biology. However, implementing realistic and stable simulations in continuous space presents a variety of difficulties, from modeling choices to computational efficiency. This paper aims to be a practical guide to spatial simulation, helping researchers to implement realistic and efficient spatial, individual-based simulations and avoid common pitfalls. To do this, we delve into mechanisms of mating, reproduction, density-dependent feedback, and dispersal, all of which may vary across the landscape, discuss how these affect population dynamics, and describe how to parameterize simulations in convenient ways (for instance, to achieve a desired population density). We also demonstrate how to implement these models using the current version of the individual-based simulator, SLiM. Since SLiM has the capacity to simulate genomes, we also discuss natural selection - in particular, how genetic variation can affect demographic processes. Finally, we provide four short vignettes: simulations of pikas that shift their range up a mountain as temperatures rise; mosquitoes that live in rivers as juveniles and experience seasonally changing habitat; cane toads that expand across Australia, reaching 120 million individuals; and monarch butterflies whose populations are regulated by an explicitly modeled resource (milkweed).
RESUMO
Classical ecological theory predicts that environmental stochasticity increases extinction risk by reducing the average per-capita growth rate of populations. For sedentary populations in a spatially homogeneous yet temporally variable environment, a simple model of population growth is a stochastic differential equation dZ(t) = µZ(t)dt + σZ(t)dW(t), t ≥ 0, where the conditional law of Z(t+Δt)-Z(t) given Z(t) = z has mean and variance approximately z µΔt and z²σ²Δt when the time increment Δt is small. The long-term stochastic growth rate lim(tâ∞) t⻹ log Z(t) for such a population equals µ − σ²/2 . Most populations, however, experience spatial as well as temporal variability. To understand the interactive effects of environmental stochasticity, spatial heterogeneity, and dispersal on population growth, we study an analogous model X(t) = (X¹(t) , . . . , X(n)(t)), t ≥ 0, for the population abundances in n patches: the conditional law of X(t+Δt) given X(t) = x is such that the conditional mean of X(i)(t+Δt) − X(i)(t) is approximately [x(i)µ(i) + Σ(j) (x(j) D(ji) − x(i) D(i j) )]Δt where µ(i) is the per capita growth rate in the ith patch and D(ij) is the dispersal rate from the ith patch to the jth patch, and the conditional covariance of X(i)(t+Δt)− X(i)(t) and X(j)(t+Δt) − X(j)(t) is approximately x(i)x(j)σ(ij)Δt for some covariance matrix Σ = (σ(ij)). We show for such a spatially extended population that if S(t) = X¹(t)+· · ·+ X(n)(t) denotes the total population abundance, then Y(t) = X(t)/S(t), the vector of patch proportions, converges in law to a random vector Y(∞) as t â ∞, and the stochastic growth rate lim(tâ∞) t⻹ log S(t) equals the space-time average per-capita growth rate Σ(i)µ(i)E[Y(i)(∞)] experienced by the population minus half of the space-time average temporal variation E[Σ(i,j) σ(i j)Y(i)(∞) Y(j)(∞)] experienced by the population. Using this characterization of the stochastic growth rate, we derive an explicit expression for the stochastic growth rate for populations living in two patches, determine which choices of the dispersal matrix D produce the maximal stochastic growth rate for a freely dispersing population, derive an analytic approximation of the stochastic growth rate for dispersal limited populations, and use group theoretic techniques to approximate the stochastic growth rate for populations living in multi-scale landscapes (e.g. insects on plants in meadows on islands). Our results provide fundamental insights into "ideal free" movement in the face of uncertainty, the persistence of coupled sink populations, the evolution of dispersal rates, and the single large or several small (SLOSS) debate in conservation biology. For example, our analysis implies that even in the absence of density-dependent feedbacks, ideal-free dispersers occupy multiple patches in spatially heterogeneous environments provided environmental fluctuations are sufficiently strong and sufficiently weakly correlated across space. In contrast, for diffusively dispersing populations living in similar environments, intermediate dispersal rates maximize their stochastic growth rate.
Assuntos
Ecossistema , Modelos Biológicos , Crescimento Demográfico , Animais , Espécies em Perigo de Extinção , Processos EstocásticosRESUMO
For at least the past five decades population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modelling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modelling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.
RESUMO
The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite, but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone, and is thus likely a genetic signal of co-dispersal in a host-parasite system.
RESUMO
The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Assuntos
Ecossistema , Genética Populacional , Humanos , Densidade Demográfica , Simulação por Computador , Redes Neurais de Computação , Variação GenéticaRESUMO
One of the goals of population genetics is to understand how evolutionary forces shape patterns of genetic variation over time. However, because populations evolve across both time and space, most evolutionary processes also have an important spatial component, acting through phenomena such as isolation by distance, local mate choice, or uneven distribution of resources. This spatial dimension is often neglected, partly due to the lack of tools specifically designed for building and evaluating complex spatio-temporal population genetic models. To address this methodological gap, we present a new framework for simulating spatially-explicit genomic data, implemented in a new R package called slendr (www.slendr.net), which leverages a SLiM simulation back-end script bundled with the package. With this framework, the users can programmatically and visually encode spatial population ranges and their temporal dynamics (i.e., population displacements, expansions, and contractions) either on real Earth landscapes or on abstract custom maps, and schedule splits and gene-flow events between populations using a straightforward declarative language. Additionally, slendr can simulate data from traditional, non-spatial models, either with SLiM or using an alternative built-in coalescent msprime back end. Together with its R-idiomatic interface to the tskit library for tree-sequence processing and analysis, slendr opens up the possibility of performing efficient, reproducible simulations of spatio-temporal genomic data entirely within the R environment, leveraging its wealth of libraries for geospatial data analysis, statistics, and visualization. Here, we present the design of the slendr R package and demonstrate its features on several practical example workflows.
RESUMO
We introduce a broad class of mechanistic spatial models to describe how spatially heterogeneous populations live, die, and reproduce. Individuals are represented by points of a point measure, whose birth and death rates can depend both on spatial position and local population density, defined at a location to be the convolution of the point measure with a suitable non-negative integrable kernel centred on that location. We pass to three different scaling limits: an interacting superprocess, a nonlocal partial differential equation (PDE), and a classical PDE. The classical PDE is obtained both by a two-step convergence argument, in which we first scale time and population size and pass to the nonlocal PDE, and then scale the kernel that determines local population density; and in the important special case in which the limit is a reaction-diffusion equation, directly by simultaneously scaling the kernel width, timescale and population size in our individual based model. A novelty of our model is that we explicitly model a juvenile phase. The number of juveniles produced by an individual depends on local population density at the location of the parent; these juvenile offspring are thrown off in a (possibly heterogeneous, anisotropic) Gaussian distribution around the location of the parent; they then reach (instant) maturity with a probability that can depend on the local population density at the location at which they land. Although we only record mature individuals, a trace of this two-step description remains in our population models, resulting in novel limits in which the spatial dynamics are governed by a nonlinear diffusion. Using a lookdown representation, we are able to retain information about genealogies relating individuals in our population and, in the case of deterministic limiting models, we use this to deduce the backwards in time motion of the ancestral lineage of an individual sampled from the population. We observe that knowing the history of the population density is not enough to determine the motion of ancestral lineages in our model. We also investigate (and contrast) the behaviour of lineages for three different deterministic models of a population expanding its range as a travelling wave: the Fisher-KPP equation, the Allen-Cahn equation, and a porous medium equation with logistic growth.
RESUMO
Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.