Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Theor Popul Biol ; 156: 117-129, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38423480

RESUMEN

The infinitesimal model of quantitative genetics relies on the Central Limit Theorem to stipulate that under additive models of quantitative traits determined by many loci having similar effect size, the difference between an offspring's genetic trait component and the average of their two parents' genetic trait components is Normally distributed and independent of the parents' values. Here, we investigate how the assumption of similar effect sizes affects the model: if, alternatively, the tail of the effect size distribution is polynomial with exponent α<2, then a different Central Limit Theorem implies that sums of effects should be well-approximated by a "stable distribution", for which single large effects are often still important. Empirically, we first find tail exponents between 1 and 2 in effect sizes estimated by genome-wide association studies of many human disease-related traits. We then show that the independence of offspring trait deviations from parental averages in many cases implies the Gaussian aspect of the infinitesimal model, suggesting that non-Gaussian models of trait evolution must explicitly track the underlying genetics, at least for loci of large effect. We also characterize possible limiting trait distributions of the infinitesimal model with infinitely divisible noise distributions, and compare our results to simulations.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Humanos , Distribución Normal , Fenotipo
2.
PLoS Biol ; 17(7): e3000391, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31339877

RESUMEN

Speciation genomic studies aim to interpret patterns of genome-wide variation in light of the processes that give rise to new species. However, interpreting the genomic "landscape" of speciation is difficult, because many evolutionary processes can impact levels of variation. Facilitated by the first chromosome-level assembly for the group, we use whole-genome sequencing and simulations to shed light on the processes that have shaped the genomic landscape during a radiation of monkeyflowers. After inferring the phylogenetic relationships among the 9 taxa in this radiation, we show that highly similar diversity (π) and differentiation (FST) landscapes have emerged across the group. Variation in these landscapes was strongly predicted by the local density of functional elements and the recombination rate, suggesting that the landscapes have been shaped by widespread natural selection. Using the varying divergence times between pairs of taxa, we show that the correlations between FST and genome features arose almost immediately after a population split and have become stronger over time. Simulations of genomic landscape evolution suggest that background selection (BGS; i.e., selection against deleterious mutations) alone is too subtle to generate the observed patterns, but scenarios that involve positive selection and genetic incompatibilities are plausible alternative explanations. Finally, tests for introgression among these taxa reveal widespread evidence of heterogeneous selection against gene flow during this radiation. Combined with previous evidence for adaptation in this system, we conclude that the correlation in FST among these taxa informs us about the processes contributing to adaptation and speciation during a rapid radiation.


Asunto(s)
Flujo Génico , Variación Genética , Genoma de Planta/genética , Genómica/métodos , Mimulus/genética , Selección Genética , Adaptación Fisiológica/genética , Especiación Genética , Genética de Población/métodos , Mimulus/clasificación , Filogenia
3.
Theor Popul Biol ; 127: 91-101, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30978307

RESUMEN

Inference with population genetic data usually treats the population pedigree as a nuisance parameter, the unobserved product of a past history of random mating. However, the history of genetic relationships in a given population is a fixed, unobserved object, and so an alternative approach is to treat this network of relationships as a complex object we wish to learn about, by observing how genomes have been noisily passed down through it. This paper explores this point of view, showing how to translate questions about population genetic data into calculations with a Poisson process of mutations on all ancestral genomes. This method is applied to give a robust interpretation to the f4 statistic used to identify admixture, and to design a new statistic that measures covariances in mean times to most recent common ancestor between two pairs of sequences. The method more generally interprets population genetic statistics in terms of sums of specific functions over ancestral genomes, thereby providing concrete, broadly interpretable interpretations for these statistics. This provides a method for describing demographic history without simplified demographic models. More generally, it brings into focus the population pedigree, which is averaged over in model-based demographic inference.


Asunto(s)
Demografía , Genética de Población , Algoritmos , Variación Genética , Humanos , Modelos Genéticos , Linaje , Distribución de Poisson , Densidad de Población
4.
PLoS Comput Biol ; 14(11): e1006581, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30383757

RESUMEN

In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly 'simplify' a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation's entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.


Asunto(s)
Biología Computacional/métodos , Variación Genética , Genética de Población , Programas Informáticos , Algoritmos , Simulación por Computador , Frecuencia de los Genes , Genoma , Genotipo , Humanos , Modelos Genéticos , Linaje , Polimorfismo Genético
5.
PLoS Genet ; 12(1): e1005703, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26771578

RESUMEN

Geographic patterns of genetic variation within modern populations, produced by complex histories of migration, can be difficult to infer and visually summarize. A general consequence of geographically limited dispersal is that samples from nearby locations tend to be more closely related than samples from distant locations, and so genetic covariance often recapitulates geographic proximity. We use genome-wide polymorphism data to build "geogenetic maps," which, when applied to stationary populations, produces a map of the geographic positions of the populations, but with distances distorted to reflect historical rates of gene flow. In the underlying model, allele frequency covariance is a decreasing function of geogenetic distance, and nonlocal gene flow such as admixture can be identified as anomalously strong covariance over long distances. This admixture is explicitly co-estimated and depicted as arrows, from the source of admixture to the recipient, on the geogenetic map. We demonstrate the utility of this method on a circum-Tibetan sampling of the greenish warbler (Phylloscopus trochiloides), in which we find evidence for gene flow between the adjacent, terminal populations of the ring species. We also analyze a global sampling of human populations, for which we largely recover the geography of the sampling, with support for significant histories of admixture in many samples. This new tool for understanding and visualizing patterns of population structure is implemented in a Bayesian framework in the program SpaceMix.


Asunto(s)
Flujo Génico/genética , Frecuencia de los Genes , Genética de Población , Teorema de Bayes , Geografía , Humanos
6.
PLoS Genet ; 11(11): e1005630, 2015 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-26571125

RESUMEN

Species often encounter, and adapt to, many patches of similar environmental conditions across their range. Such adaptation can occur through convergent evolution if different alleles arise in different patches, or through the spread of shared alleles by migration acting to synchronize adaptation across the species. The tension between the two reflects the constraint imposed on evolution by the underlying genetic architecture versus how effectively selection and geographic isolation act to inhibit the geographic spread of locally adapted alleles. This paper studies the balance between these two routes to adaptation in a model of continuous environments with patchy selection pressures. We address the following questions: How long does it take for a novel allele to appear in a patch where it is locally adapted through mutation? Or, through migration from another, already adapted patch? Which is more likely to occur, as a function of distance between the patches? What population genetic signal is left by the spread of migrant alleles? To answer these questions we examine the family structure underlying migration-selection equilibrium surrounding an already adapted patch, treating those rare families that reach new patches as spatial branching processes. A main result is that patches further apart than a critical distance will likely evolve independent locally adapted alleles; this distance is proportional to the spatial scale of selection ([Formula: see text], where σ is the dispersal distance and sm is the selective disadvantage of these alleles between patches), and depends linearly on log(sm/µ), where µ is the mutation rate. This provides a way to understand the role of geographic separation between patches in promoting convergent adaptation and the genomic signals it leaves behind. We illustrate these ideas using the convergent evolution of cryptic coloration in the rock pocket mouse, Chaetodipus intermedius, as an empirical example.


Asunto(s)
Adaptación Fisiológica/genética , Evolución Molecular , Mutación , Selección Genética
7.
Am Nat ; 186 Suppl 1: S5-23, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26656217

RESUMEN

The extent to which populations experiencing shared selective pressures adapt through a shared genetic response is relevant to many questions in evolutionary biology. In this article, we explore how standing genetic variation contributes to convergent genetic responses in a geographically spread population. Geographically limited dispersal slows the spread of each selected allele, hence allowing other alleles to spread before any one comes to dominate the population. When selectively equivalent alleles meet, their progress is substantially slowed, dividing the species range into a random tessellation, which can be well understood by analogy to a Poisson process model of crystallization. In this framework, we derive the geographic scale over which an allele dominates and the proportion of adaptive alleles that arise from standing variation. Finally, we explore how negative pleiotropic effects of alleles can bias the subset of alleles that contribute to the species' adaptive response. We apply the results to the malaria-resistance glucose-6-phosphate dehydrogenase-deficiency alleles, where the large mutational target size makes it a likely candidate for adaptation from deleterious standing variation. Our results suggest that convergent adaptation may be common. Therefore, caution must be exercised when arguing that strongly geographically restricted alleles are the outcome of local adaptation. We close by discussing the implications of these results for ideas of species coherence and the nature of divergence between species.


Asunto(s)
Adaptación Fisiológica/genética , Evolución Biológica , Variación Genética , Alelos , Resistencia a la Enfermedad , Glucosafosfato Deshidrogenasa/genética , Humanos , Malaria/enzimología , Malaria/genética , Modelos Genéticos , Mutación , Filogeografía , Selección Genética
8.
bioRxiv ; 2024 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-38106105

RESUMEN

Coevolution between two species can lead to exaggerated phenotypes that vary in a correlated manner across space. However, the conditions under which we expect such spatially varying coevolutionary patterns in polygenic traits are not well-understood. We investigate the coevolutionary dynamics between two species undergoing reciprocal adaptation across space and time, using simulations inspired by the Taricha newt - Thamnophis garter snake system. One striking observation from this system is that newts in some areas carry much more tetrodotoxin than in other areas, and garter snakes that live near more toxic newts tend to be more resistant to this toxin, a correlation seen across several broad geographic areas. Furthermore, snakes seem to be "winning" the coevolutionary arms race, i.e., having a high level of resistance compared to local newt toxicity, despite substantial variation in both toxicity and resistance across the range. We explore how possible genetic architectures of the toxin and resistance traits would affect the coevolutionary dynamics by manipulating both mutation rate and effect size of mutations across many simulations. We find that coevolutionary dynamics alone were not sufficient in our simulations to produce the striking mosaic of levels of toxicity and resistance observed in nature, but simulations with ecological heterogeneity (in trait costliness or interaction rate) did produce such patterns. We also find that in simulations, newts tend to "win" across most combinations of genetic architectures, although the species with higher mutational genetic variance tends to have an advantage.

9.
Genetics ; 226(4)2024 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-38242701

RESUMEN

For at least the past 5 decades, population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modeling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well-sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations, we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modeling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.


Asunto(s)
Variación Genética , Hominidae , Animales , Selección Genética , Hominidae/genética , Mutación , Genómica
10.
G3 (Bethesda) ; 14(3)2024 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-38230808

RESUMEN

The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone and is thus likely a genetic signal of co-dispersal in a host-parasite system.


Asunto(s)
Anopheles , Malaria Falciparum , Parásitos , Plasmodium , Animales , Parásitos/genética , Anopheles/genética , Anopheles/parasitología , Interacciones Huésped-Parásitos/genética , Plasmodium/genética , Plasmodium falciparum/genética , Geografía
11.
bioRxiv ; 2024 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-38559192

RESUMEN

A fundamental goal in population genetics is to understand how variation is arrayed over natural landscapes. From first principles we know that common features such as heterogeneous population densities and source sink dynamics of dispersal should shape genetic variation over space, however there are few tools currently available that can deal with these ubiquitous complexities. Geographically referenced single nucleotide polymorphism (SNP) data are increasingly accessible, presenting an opportunity to study genetic variation across geographic space in myriad species. We present a new inference method that uses geo-referenced SNPs and a deep neural network to estimate spatially heterogeneous maps of population density and dispersal rate. Our neural network trains on simulated input and output pairings, where the input consists of genotypes and sampling locations generated from a continuous space population genetic simulator, and the output is a map of the true demographic parameters. We benchmark our tool against existing methods and discuss qualitative differences between the different approaches; in particular, our program is unique because it infers the magnitude of both dispersal and density as well as their variation over the landscape, and it does so using SNP data. Similar methods are constrained to estimating relative migration rates, or require identity by descent blocks as input. We applied our tool to empirical data from North American grey wolves, for which it estimated mostly reasonable demographic parameters, but was affected by incomplete spatial sampling. Genetic based methods like ours complement other, direct methods for estimating past and present demography, and we believe will serve as valuable tools for applications in conservation, ecology, and evolutionary biology. An open source software package implementing our method is available from https://github.com/kr-colab/mapNN.

12.
J Math Biol ; 66(3): 423-76, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-22427143

RESUMEN

Classical ecological theory predicts that environmental stochasticity increases extinction risk by reducing the average per-capita growth rate of populations. For sedentary populations in a spatially homogeneous yet temporally variable environment, a simple model of population growth is a stochastic differential equation dZ(t) = µZ(t)dt + σZ(t)dW(t), t ≥ 0, where the conditional law of Z(t+Δt)-Z(t) given Z(t) = z has mean and variance approximately z µΔt and z²σ²Δt when the time increment Δt is small. The long-term stochastic growth rate lim(t→∞) t⁻¹ log Z(t) for such a population equals µ − σ²/2 . Most populations, however, experience spatial as well as temporal variability. To understand the interactive effects of environmental stochasticity, spatial heterogeneity, and dispersal on population growth, we study an analogous model X(t) = (X¹(t) , . . . , X(n)(t)), t ≥ 0, for the population abundances in n patches: the conditional law of X(t+Δt) given X(t) = x is such that the conditional mean of X(i)(t+Δt) − X(i)(t) is approximately [x(i)µ(i) + Σ(j) (x(j) D(ji) − x(i) D(i j) )]Δt where µ(i) is the per capita growth rate in the ith patch and D(ij) is the dispersal rate from the ith patch to the jth patch, and the conditional covariance of X(i)(t+Δt)− X(i)(t) and X(j)(t+Δt) − X(j)(t) is approximately x(i)x(j)σ(ij)Δt for some covariance matrix Σ = (σ(ij)). We show for such a spatially extended population that if S(t) = X¹(t)+· · ·+ X(n)(t) denotes the total population abundance, then Y(t) = X(t)/S(t), the vector of patch proportions, converges in law to a random vector Y(∞) as t → ∞, and the stochastic growth rate lim(t→∞) t⁻¹ log S(t) equals the space-time average per-capita growth rate Σ(i)µ(i)E[Y(i)(∞)] experienced by the population minus half of the space-time average temporal variation E[Σ(i,j) σ(i j)Y(i)(∞) Y(j)(∞)] experienced by the population. Using this characterization of the stochastic growth rate, we derive an explicit expression for the stochastic growth rate for populations living in two patches, determine which choices of the dispersal matrix D produce the maximal stochastic growth rate for a freely dispersing population, derive an analytic approximation of the stochastic growth rate for dispersal limited populations, and use group theoretic techniques to approximate the stochastic growth rate for populations living in multi-scale landscapes (e.g. insects on plants in meadows on islands). Our results provide fundamental insights into "ideal free" movement in the face of uncertainty, the persistence of coupled sink populations, the evolution of dispersal rates, and the single large or several small (SLOSS) debate in conservation biology. For example, our analysis implies that even in the absence of density-dependent feedbacks, ideal-free dispersers occupy multiple patches in spatially heterogeneous environments provided environmental fluctuations are sufficiently strong and sufficiently weakly correlated across space. In contrast, for diffusively dispersing populations living in similar environments, intermediate dispersal rates maximize their stochastic growth rate.


Asunto(s)
Ecosistema , Modelos Biológicos , Crecimiento Demográfico , Animales , Especies en Peligro de Extinción , Procesos Estocásticos
13.
bioRxiv ; 2023 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-36798346

RESUMEN

For at least the past five decades population genetics, as a field, has worked to describe the precise balance of forces that shape patterns of variation in genomes. The problem is challenging because modelling the interactions between evolutionary processes is difficult, and different processes can impact genetic variation in similar ways. In this paper, we describe how diversity and divergence between closely related species change with time, using correlations between landscapes of genetic variation as a tool to understand the interplay between evolutionary processes. We find strong correlations between landscapes of diversity and divergence in a well sampled set of great ape genomes, and explore how various processes such as incomplete lineage sorting, mutation rate variation, GC-biased gene conversion and selection contribute to these correlations. Through highly realistic, chromosome-scale, forward-in-time simulations we show that the landscapes of diversity and divergence in the great apes are too well correlated to be explained via strictly neutral processes alone. Our best fitting simulation includes both deleterious and beneficial mutations in functional portions of the genome, in which 9% of fixations within those regions is driven by positive selection. This study provides a framework for modelling genetic variation in closely related species, an approach which can shed light on the complex balance of forces that have shaped genetic variation.

14.
bioRxiv ; 2023 Nov 09.
Artículo en Inglés | MEDLINE | ID: mdl-37503196

RESUMEN

The often tight association between parasites and their hosts means that under certain scenarios, the evolutionary histories of the two species can become closely coupled both through time and across space. Using spatial genetic inference, we identify a potential signal of common dispersal patterns in the Anopheles gambiae and Plasmodium falciparum host-parasite system as seen through a between-species correlation of the differences between geographic sampling location and geographic location predicted from the genome. This correlation may be due to coupled dispersal dynamics between host and parasite, but may also reflect statistical artifacts due to uneven spatial distribution of sampling locations. Using continuous-space population genetics simulations, we investigate the degree to which uneven distribution of sampling locations leads to bias in prediction of spatial location from genetic data and implement methods to counter this effect. We demonstrate that while algorithmic bias presents a problem in inference from spatio-genetic data, the correlation structure between A. gambiae and P. falciparum predictions cannot be attributed to spatial bias alone, and is thus likely a genetic signal of co-dispersal in a host-parasite system.

15.
Genetics ; 224(2)2023 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-37052957

RESUMEN

The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.


Asunto(s)
Ecosistema , Genética de Población , Humanos , Densidad de Población , Simulación por Computador , Redes Neurales de la Computación , Variación Genética
16.
ArXiv ; 2023 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-37292478

RESUMEN

We introduce a broad class of mechanistic spatial models to describe how spatially heterogeneous populations live, die, and reproduce. Individuals are represented by points of a point measure, whose birth and death rates can depend both on spatial position and local population density, defined at a location to be the convolution of the point measure with a suitable non-negative integrable kernel centred on that location. We pass to three different scaling limits: an interacting superprocess, a nonlocal partial differential equation (PDE), and a classical PDE. The classical PDE is obtained both by a two-step convergence argument, in which we first scale time and population size and pass to the nonlocal PDE, and then scale the kernel that determines local population density; and in the important special case in which the limit is a reaction-diffusion equation, directly by simultaneously scaling the kernel width, timescale and population size in our individual based model. A novelty of our model is that we explicitly model a juvenile phase. The number of juveniles produced by an individual depends on local population density at the location of the parent; these juvenile offspring are thrown off in a (possibly heterogeneous, anisotropic) Gaussian distribution around the location of the parent; they then reach (instant) maturity with a probability that can depend on the local population density at the location at which they land. Although we only record mature individuals, a trace of this two-step description remains in our population models, resulting in novel limits in which the spatial dynamics are governed by a nonlinear diffusion. Using a lookdown representation, we are able to retain information about genealogies relating individuals in our population and, in the case of deterministic limiting models, we use this to deduce the backwards in time motion of the ancestral lineage of an individual sampled from the population. We observe that knowing the history of the population density is not enough to determine the motion of ancestral lineages in our model. We also investigate (and contrast) the behaviour of lineages for three different deterministic models of a population expanding its range as a travelling wave: the Fisher-KPP equation, the Allen-Cahn equation, and a porous medium equation with logistic growth.

17.
Elife ; 122023 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-37342968

RESUMEN

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.


Asunto(s)
Genoma , Programas Informáticos , Simulación por Computador , Genética de Población , Genómica
18.
PLoS Comput Biol ; 7(5): e1001136, 2011 May.
Artículo en Inglés | MEDLINE | ID: mdl-21589887

RESUMEN

Recent whole genome polymerase binding assays in the Drosophila embryo have shown that a substantial proportion of uninduced genes have pre-assembled RNA polymerase-II transcription initiation complex (PIC) bound to their promoters. These constitute a subset of promoter proximally paused genes for which mRNA elongation instead of promoter access is regulated. This difference can be described as a rearrangement of the regulatory topology to control the downstream transcriptional process of elongation rather than the upstream transcriptional initiation event. It has been shown experimentally that genes with the former mode of regulation tend to induce faster and more synchronously, and that promoter-proximal pausing is observed mainly in metazoans, in accord with a posited impact on synchrony. However, it has not been shown whether or not it is the change in the regulated step per se that is causal. We investigate this question by proposing and analyzing a continuous-time Markov chain model of PIC assembly regulated at one of two steps: initial polymerase association with DNA, or release from a paused, transcribing state. Our analysis demonstrates that, over a wide range of physical parameters, increased speed and synchrony are functional consequences of elongation control. Further, we make new predictions about the effect of elongation regulation on the consistent control of total transcript number between cells. We also identify which elements in the transcription induction pathway are most sensitive to molecular noise and thus possibly the most evolutionarily constrained. Our methods produce symbolic expressions for quantities of interest with reasonable computational effort and they can be used to explore the interplay between interaction topology and molecular noise in a broader class of biochemical networks. We provide general-purpose code implementing these methods.


Asunto(s)
Elementos de Facilitación Genéticos , Modelos Genéticos , Regiones Promotoras Genéticas , Transcripción Genética , Activación Transcripcional , Animales , Drosophila , Embrión no Mamífero , Cadenas de Markov , ARN Polimerasa II/química , ARN Polimerasa II/metabolismo , ARN Mensajero/genética , Sitio de Iniciación de la Transcripción
19.
J Comput Biol ; 29(8): 802-824, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35776513

RESUMEN

Although the rates at which positions in the genome mutate are known to depend not only on the nucleotide to be mutated, but also on neighboring nucleotides, it remains challenging to do phylogenetic inference using models of context-dependent mutation. In these models, the effects of one mutation may in principle propagate to faraway locations, making it difficult to compute exact likelihoods. This article shows how to use bounds on the propagation of dependency to compute likelihoods of mutation of a given segment of genome by marginalizing over sufficiently long flanking sequence. This can be used for maximum likelihood or Bayesian inference. Protocols examining residuals and iterative model refinement are also discussed. Tools for efficiently working with these models are provided in an R package, which could be used in other applications. The method is used to examine context dependence of mutations since the common ancestor of humans and chimpanzee.


Asunto(s)
Genoma , Modelos Genéticos , Teorema de Bayes , Humanos , Mutación , Filogenia , Probabilidad
20.
Evolution ; 76(2): 236-251, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34529267

RESUMEN

Even if a species' phenotype does not change over evolutionary time, the underlying mechanism may change, as distinct molecular pathways can realize identical phenotypes. Here we use linear system theory to explore the consequences of this idea, describing how a gene network underlying a conserved phenotype evolves, as the genetic drift of small changes to these molecular pathways causes a population to explore the set of mechanisms with identical phenotypes. To do this, we model an organism's internal state as a linear system of differential equations for which the environment provides input and the phenotype is the output, in which context there exists an exact characterization of the set of all mechanisms that give the same input-output relationship. This characterization implies that selectively neutral directions in genotype space should be common and that the evolutionary exploration of these distinct but equivalent mechanisms can lead to the reproductive incompatibility of independently evolving populations. This evolutionary exploration, or system drift, is expected to proceed at a rate proportional to the amount of intrapopulation genetic variation divided by the effective population size ( Ne$N_e$ ). At biologically reasonable parameter values this could lead to substantial interpopulation incompatibility, and thus speciation, on a time scale of Ne$N_e$ generations. This model also naturally predicts Haldane's rule, thus providing a concrete explanation of why heterogametic hybrids tend to be disrupted more often than homogametes during the early stages of speciation.


Asunto(s)
Evolución Biológica , Flujo Genético , Especiación Genética , Genotipo , Hibridación Genética , Modelos Genéticos , Densidad de Población , Reproducción
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA