Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 695
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 185(24): 4604-4620.e32, 2022 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-36423582

RESUMO

Natural and induced somatic mutations that accumulate in the genome during development record the phylogenetic relationships of cells; whether these lineage barcodes capture the complex dynamics of progenitor states remains unclear. We introduce quantitative fate mapping, an approach to reconstruct the hierarchy, commitment times, population sizes, and commitment biases of intermediate progenitor states during development based on a time-scaled phylogeny of their descendants. To reconstruct time-scaled phylogenies from lineage barcodes, we introduce Phylotime, a scalable maximum likelihood clustering approach based on a general barcoding mutagenesis model. We validate these approaches using realistic in silico and in vitro barcoding experiments. We further establish criteria for the number of cells that must be analyzed for robust quantitative fate mapping and a progenitor state coverage statistic to assess the robustness. This work demonstrates how lineage barcodes, natural or synthetic, enable analyzing progenitor fate and dynamics long after embryonic development in any organism.


Assuntos
Desenvolvimento Embrionário , Linhagem da Célula/genética , Estudos Retrospectivos , Filogenia , Mutagênese
2.
Annu Rev Genet ; 54: 213-236, 2020 11 23.
Artigo em Inglês | MEDLINE | ID: mdl-32870729

RESUMO

Natural highly fecund populations abound. These range from viruses to gadids. Many highly fecund populations are economically important. Highly fecund populations provide an important contrast to the low-fecundity organisms that have traditionally been applied in evolutionary studies. A key question regarding high fecundity is whether large numbers of offspring are produced on a regular basis, by few individuals each time, in a sweepstakes mode of reproduction. Such reproduction characteristics are not incorporated into the classical Wright-Fisher model, the standard reference model of population genetics, or similar types of models, in which each individual can produce only small numbers of offspring relative to the population size. The expected genomic footprints of population genetic models of sweepstakes reproduction are very different from those of the Wright-Fisher model. A key, immediate issue involves identifying the footprints of sweepstakes reproduction in genomic data. Whole-genome sequencing data can be used to distinguish the patterns made by sweepstakes reproduction from the patterns made by population growth in a population evolving according to the Wright-Fisher model (or similar models). If the hypothesis of sweepstakes reproduction cannot be rejected, then models of sweepstakes reproduction and associated multiple-merger coalescents will become at least as relevant as the Wright-Fisher model (or similar models) and the Kingman coalescent, the cornerstones of mathematical population genetics, in further discussions of evolutionary genomics of highly fecund populations.


Assuntos
Fertilidade/genética , Evolução Biológica , Genética Populacional/métodos , Genômica/métodos , Humanos , Modelos Genéticos , Densidade Demográfica , Crescimento Demográfico , Reprodução/genética
3.
Proc Natl Acad Sci U S A ; 121(40): e2404973121, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39302998

RESUMO

Replica symmetry breaking (RSB) for spin glasses predicts that the equilibrium configuration at two different magnetic fields are maximally decorrelated. We show that this theory presents quantitative predictions for this chaotic behavior under the application of a vanishing external magnetic field, in the crossover region where the field intensity scales proportionally to [Formula: see text], being N the system size. We show that RSB theory provides universal predictions for chaotic behavior: They depend only on the zero-field overlap probability function [Formula: see text] and are independent of other system features. In the infinite volume limit, each spin-glass sample is characterized by an infinite number of states that have a tree-like structure. We generate the corresponding probability distribution through efficient sampling using a representation based on the Bolthausen-Sznitman coalescent. Using solely [Formula: see text] as input we can analytically compute the statistics of the states in the region of vanishing magnetic field. In this way, we can compute the overlap probability distribution in the presence of a small vanishing field and the increase of chaoticity when increasing the field. To test our computations, we have simulated the Bethe lattice spin glass and the 4D Edwards-Anderson model, finding in both cases excellent agreement with the universal predictions.

4.
Am J Hum Genet ; 110(12): 2077-2091, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38065072

RESUMO

Understanding the genetic basis of complex phenotypes is a central pursuit of genetics. Genome-wide association studies (GWASs) are a powerful way to find genetic loci associated with phenotypes. GWASs are widely and successfully used, but they face challenges related to the fact that variants are tested for association with a phenotype independently, whereas in reality variants at different sites are correlated because of their shared evolutionary history. One way to model this shared history is through the ancestral recombination graph (ARG), which encodes a series of local coalescent trees. Recent computational and methodological breakthroughs have made it feasible to estimate approximate ARGs from large-scale samples. Here, we explore the potential of an ARG-based approach to quantitative-trait locus (QTL) mapping, echoing existing variance-components approaches. We propose a framework that relies on the conditional expectation of a local genetic relatedness matrix (local eGRM) given the ARG. Simulations show that our method is especially beneficial for finding QTLs in the presence of allelic heterogeneity. By framing QTL mapping in terms of the estimated ARG, we can also facilitate the detection of QTLs in understudied populations. We use local eGRM to analyze two chromosomes containing known body size loci in a sample of Native Hawaiians. Our investigations can provide intuition about the benefits of using estimated ARGs in population- and statistical-genetic methods in general.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Humanos , Mapeamento Cromossômico/métodos , Modelos Genéticos , Fenótipo , Locos de Características Quantitativas/genética , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética
5.
Proc Natl Acad Sci U S A ; 120(44): e2310708120, 2023 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-37871206

RESUMO

Analyses of genome sequence data have revealed pervasive interspecific gene flow and enriched our understanding of the role of gene flow in speciation and adaptation. Inference of gene flow using genomic data requires powerful statistical methods. Yet current likelihood-based methods involve heavy computation and are feasible for small datasets only. Here, we implement the multispecies-coalescent-with-migration model in the Bayesian program bpp, which can be used to test for gene flow and estimate migration rates, as well as species divergence times and population sizes. We develop Markov chain Monte Carlo algorithms for efficient sampling from the posterior, enabling the analysis of genome-scale datasets with thousands of loci. Implementation of both introgression and migration models in the same program allows us to test whether gene flow occurred continuously over time or in pulses. Analyses of genomic data from Anopheles mosquitoes demonstrate rich information in typical genomic datasets about the mode and rate of gene flow.


Assuntos
Algoritmos , Fluxo Gênico , Animais , Filogenia , Simulação por Computador , Teorema de Bayes , Funções Verossimilhança , Modelos Genéticos
6.
Mol Biol Evol ; 41(5)2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38630635

RESUMO

Bayesian coalescent skyline plot models are widely used to infer demographic histories. The first (non-Bayesian) coalescent skyline plot model assumed a known genealogy as data, while subsequent models and implementations jointly inferred the genealogy and demographic history from sequence data, including heterochronous samples. Overall, there exist multiple different Bayesian coalescent skyline plot models which mainly differ in two key aspects: (i) how changes in population size are modeled through independent or autocorrelated prior distributions, and (ii) how many change-points in the demographic history are used, where they occur and if the number is pre-specified or inferred. The specific impact of each of these choices on the inferred demographic history is not known because of two reasons: first, not all models are implemented in the same software, and second, each model implementation makes specific choices that the biologist cannot influence. To facilitate a detailed evaluation of Bayesian coalescent skyline plot models, we implemented all currently described models in a flexible design into the software RevBayes. Furthermore, we evaluated models and choices on an empirical dataset of horses supplemented by a small simulation study. We find that estimated demographic histories can be grouped broadly into two groups depending on how change-points in the demographic history are specified (either independent of or at coalescent events). Our simulations suggest that models using change-points at coalescent events produce spurious variation near the present, while most models using independent change-points tend to over-smooth the inferred demographic history.


Assuntos
Teorema de Bayes , Genética Populacional , Modelos Genéticos , Animais , Genética Populacional/métodos , Cavalos , Densidade Demográfica , Simulação por Computador , Software , Demografia
7.
Syst Biol ; 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39078610

RESUMO

Ancient DNA (aDNA) is increasingly being used to investigate questions such as the phylogenetic relationships and divergence times of extant and extinct species. If aDNA samples are sufficiently old, expected branch lengths (in units of nucleotide substitutions) are reduced relative to contemporary samples. This can be accounted for by incorporating sample ages into phylogenetic analyses. Existing methods that use tip (sample) dates infer gene trees rather than species trees, which can lead to incorrect or biased inferences of the species tree. Methods using a multispecies coalescent (MSC) model overcome these issues. We developed an MSC model with tip dates and implemented it in the program bpp. The method performed well for a range of biologically realistic scenarios, estimating calibrated divergence times and mutation rates precisely. Simulations suggest that estimation precision can be best improved by prioritizing sampling of many loci and more ancient samples. Incorrectly treating ancient samples as contemporary in analyzing simulated data, mimicking a common practice of empirical analyses, led to large systematic biases in model parameters, including divergence times. Two genomic datasets of mammoths and elephants were analyzed, demonstrating the method's empirical utility.

8.
Syst Biol ; 2024 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-39041315

RESUMO

Recent genomic analyses have highlighted the prevalence of speciation with gene flow in many taxa and have underscored the importance of accounting for these reticulate evolutionary processes when constructing species trees and generating parameter estimates. This is especially important for deepening our understanding of speciation in the sea where fast moving ocean currents, expanses of deep water, and periodic episodes of sea level rise and fall act as soft and temporary allopatric barriers that facilitate both divergence and secondary contact. Under these conditions, gene flow is not expected to cease completely while contemporary distributions are expected to differ from historical ones. Here we conduct range-wide sampling for Pederson's cleaner shrimp (Ancylomenes pedersoni), a species complex from the Greater Caribbean that contains three clearly delimited mitochondrial lineages with both allopatric and sympatric distributions. Using mtDNA barcodes and a genomic ddRADseq approach, we combine classic phylogenetic analyses with extensive topology testing and demographic modeling (10 site frequency replicates x 45 evolutionary models x 50 model simulations/replicate = 22,500 simulations) to test species boundaries and reconstruct the evolutionary history of what was expected to be a simple case study. Instead, our results indicate a history of allopatric divergence, secondary contact, introgression, and endemic hybrid speciation that we hypothesize was driven by the final closure of the Isthmus of Panama and the strengthening of the Gulf Stream Current ~3.5 million years ago. The history of this species complex recovered by model-based methods that allow reticulation differs from that recovered by standard phylogenetic analyses and is unexpected given contemporary distributions. The geologically and biologically meaningful insights gained by our model selection analyses illuminate what is likely a novel pathway of species formation not previously documented that resulted from one of the most biogeographically significant events in Earth's history.

9.
Syst Biol ; 2024 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-39180155

RESUMO

The multispecies coalescent (MSC) model accommodates genealogical fluctuations across the genome and provides a natural framework for comparative analysis of genomic sequence data from closely related species to infer the history of species divergence and gene flow. Given a set of populations, hypotheses of species delimitation (and species phylogeny) may be formulated as instances of MSC models (e.g., MSC for one species versus MSC for two species) and compared using Bayesian model selection. This approach, implemented in the program bpp, has been found to be prone to over-splitting. Alternatively heuristic criteria based on population parameters (such as popula- tion split times, population sizes, and migration rates) estimated from genomic data may be used to delimit species. Here we develop hierarchical merge and split algorithms for heuristic species delimitation based on the genealogical divergence index (𝑔𝑑𝑖) and implement them in a python pipeline called hhsd. We characterize the behavior of the 𝑔𝑑𝑖 under a few simple scenarios of gene flow. We apply the new approaches to a dataset simulated under a model of isolation by distance as well as three empirical datasets. Our tests suggest that the new approaches produced sensible results and were less prone to over-splitting. We discuss possible strategies for accommodating paraphyletic species in the hierarchical algorithm, as well as the challenges of species delimitation based on heuristic criteria.

10.
Syst Biol ; 2024 Oct 10.
Artigo em Inglês | MEDLINE | ID: mdl-39387633

RESUMO

While phylogenies have been essential in understanding how species evolve, they do not adequately describe some evolutionary processes. For instance, hybridization, a common phenomenon where interbreeding between two species leads to formation of a new species, must be depicted by a phylogenetic network, a structure that modifies a phylogenetic tree by allowing two branches to merge into one, resulting in reticulation. However, existing methods for estimating networks become computationally expensive as the dataset size and/or topological complexity increase. The lack of methods for scalable inference hampers phylogenetic networks from being widely used in practice, despite accumulating evidence that hybridization occurs frequently in nature. Here, we propose a novel method, PhyNEST (Phylogenetic Network Estimation using SiTe patterns), that estimates binary, level-1 phylogenetic networks with a fixed, user-specified number of reticulations directly from sequence data. By using the composite likelihood as the basis for inference, PhyNEST is able to use the full genomic data in a computationally tractable manner, eliminating the need to summarize the data as a set of gene trees prior to network estimation. To search network space, PhyNEST implements both hill climbing and simulated annealing algorithms. PhyNEST assumes that the data are composed of coalescent independent sites that evolve according to the Jukes-Cantor substitution model and that the network has a constant effective population size. Simulation studies demonstrate that PhyNEST is often more accurate than two existing composite likelihood summary methods (SNaQ and PhyloNet) and that it is robust to at least one form of model misspecification (assuming a less complex nucleotide substitution model than the true generating model). We applied PhyNEST to reconstruct the evolutionary relationships among Heliconius butterflies and Papionini primates, characterized by hybrid speciation and widespread introgression, respectively. PhyNEST is implemented in an open-source Julia package and is publicly available at https://github.com/sungsik-kong/PhyNEST.jl.

11.
Syst Biol ; 2024 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-38733563

RESUMO

Accurately reconstructing the reticulate histories of polyploids remains a central challenge for understanding plant evolution. Although phylogenetic networks can provide insights into relationships among polyploid lineages, inferring networks may be hindered by the complexities of homology determination in polyploid taxa. We use simulations to show that phasing alleles from allopolyploid individuals can improve phylogenetic network inference under the multispecies coalescent by obtaining the true network with fewer loci compared to haplotype consensus sequences or sequences with heterozygous bases represented as ambiguity codes. Phased allelic data can also improve divergence time estimates for networks, which is helpful for evaluating allopolyploid speciation hypotheses and proposing mechanisms of speciation. To achieve these outcomes in empirical data, we present a novel pipeline that leverages a recently developed phasing algorithm to reliably phase alleles from polyploids. This pipeline is especially appropriate for target enrichment data, where depth of coverage is typically high enough to phase entire loci. We provide an empirical example in the North American Dryopteris fern complex that demonstrates insights from phased data as well as the challenges of network inference. We establish that our pipeline (PATÉ: Phased Alleles from Target Enrichment data) is capable of recovering a high proportion of phased loci from both diploids and polyploids. These data may improve network estimates compared to using haplotype consensus assemblies by accurately inferring the direction of gene flow, but statistical non-identifiability of phylogenetic networks poses a barrier to inferring the evolutionary history of reticulate complexes.

12.
Mol Biol Evol ; 40(7)2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37440530

RESUMO

Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.


Assuntos
Algoritmos , Modelos Genéticos , Filogenia , Funções Verossimilhança
13.
Mol Biol Evol ; 40(8)2023 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-37552932

RESUMO

Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.


Assuntos
Borboletas , Animais , Funções Verossimilhança , Teorema de Bayes , Borboletas/genética , Genoma , Genômica , Fluxo Gênico , Filogenia , Hibridização Genética
14.
New Phytol ; 2024 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-39253771

RESUMO

Early studies of the textbook mixed-ploidy system Biscutella laevigata highlighted diploids restricted to never-glaciated lowlands and tetraploids at high elevations across the European Alps, promoting the hypothesis that whole-genome duplication (WGD) is advantageous under environmental changes. Here we addressed long-held hypotheses on the role of hybridisation at the origin of the tetraploids, their single vs multiple origins, and whether a shift in climatic niche accompanied WGD. Climatic niche modelling together with spatial genetics and coalescent modelling based on ddRAD-seq genotyping of 17 diploid and 19 tetraploid populations was used to revisit the evolution of this species complex in space and time. Diploids differentiated into four genetic lineages corresponding to allopatric glacial refugia at the onset of the last ice age, whereas tetraploids displaying tetrasomic inheritance formed a uniform group that originated from southern diploids before the last glacial maximum. Derived from diploids occurring at high elevation, autotetraploids likely inherited their adaptation to high elevation rather than having evolved it through or after WGD. They further presented considerable postglacial expansion across the Alps and underwent admixture with diploids. Although the underpinnings of the successful expansion of autotetraploids remain elusive, differentiation in B. laevigata was chiefly driven by the glacial history of the Alps.

15.
Mol Ecol ; : e17523, 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39248016

RESUMO

Genetic analyses of host-specific parasites can elucidate the evolutionary histories and biological features of their hosts. Here, we used population-genomic analyses of ectoparasitic seal lice (Echinophthirius horridus) to shed light on the postglacial history of seals in the Arctic Ocean and the Baltic Sea region. One key question was the enigmatic origin of relict landlocked ringed seal populations in lakes Saimaa and Ladoga in northern Europe. We found that that lice of four postglacially diverged subspecies of the ringed seal (Pusa hispida) and Baltic gray seal (Halichoerus grypus), like their hosts, form genetically differentiated entities. Using coalescent-based demographic inference, we show that the sequence of divergences of the louse populations is consistent with the geological history of lake formation. In addition, local effective population sizes of the lice are generally proportional to the census sizes of their respective seal host populations. Genome-based reconstructions of long-term effective population sizes revealed clear differences among louse populations associated with gray versus ringed seals, with apparent links to Pleistocene and Holocene climatic variation as well as to the isolation histories of ringed seal subspecies. Interestingly, our analyses also revealed ancient gene flow between the lice of Baltic gray and ringed seals, suggesting that the distributions of Baltic seals overlapped to a greater extent in the past than is the case today. Taken together, our results demonstrate how genomic information from specialized parasites with higher mutation and substitution rates than their hosts can potentially illuminate finer scale population genetic patterns than similar data from their hosts.

16.
Mol Phylogenet Evol ; 199: 108158, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-39025321

RESUMO

Incomplete Lineage Sorting (ILS) and introgression are among the two main factors causing incongruence between gene and species trees. Advances in phylogenomic studies have allowed us to overcome most of these issues, providing reliable phylogenetic hypotheses while revealing the underlying evolutionary scenario. Across the last century, many incongruent phylogenetic reconstructions were recovered for Drosophilidae, employing a limited sampling of genetic markers or species. In these studies, the monophyly and the phylogenetic positioning of the Zygothrica genus group stood out as one of the most controversial questions. Thus, here, we addressed these issues using a phylogenomic approach, while accessing the influence of ILS and introgressions on the diversification of these species and addressing the spatio-temporal scenario associated with their evolution. For this task, the genomes of nine specimens from six Neotropical species belonging to the Zygothrica genus group were sequenced and evaluated in a phylogenetic framework encompassing other 39 species of Drosophilidae. Nucleotide and amino acid sequences recovered for a set of 2,534 single-copy genes by BUSCO were employed to reconstruct maximum likelihood (ML) concatenated and multi-species coalescent (MSC) trees. Likelihood mapping, quartet sampling, and reticulation tests were employed to infer the level and causes of incongruence. Lastly, a penalized-likelihood molecular clock strategy with fossil calibrations was performed to infer divergence times. Taken together, our results recovered the subdivision of Drosophila into six different lineages, one of which clusters species of the Zygothrica genus group (except for H. duncani). The divergence of this lineage was dated to Oligocene âˆ¼ 31 Mya and seems to have occurred in the same timeframe as other key diversification within Drosophila. According to the concatenated and MSC strategies, this lineage is sister to the clade joining Drosophila (Siphlodora) with the Hawaiian Drosophila and Scaptomyza. Likelihood mapping, quartet sampling, reticulation reconstructions as well as introgression tests revealed that this lineage was the target of several hybridization events involving the ancestors of different Drosophila lineages. Thus, our results generally show introgression as a major source of previous incongruence. Nevertheless, the similar diversification times recovered for several of the Neotropical Drosophila lineages also support the scenario of multiple and simultaneous diversifications taking place at the base of Drosophilidae phylogeny, at least in the Neotropics.


Assuntos
Drosophilidae , Filogenia , Animais , Drosophilidae/genética , Drosophilidae/classificação , Genoma de Inseto/genética , Genômica
17.
Theor Popul Biol ; 155: 67-76, 2024 02.
Artigo em Inglês | MEDLINE | ID: mdl-38092137

RESUMO

Consider the diffusion process defined by the forward equation ut(t,x)=12{xu(t,x)}xx-α{xu(t,x)}x for t,x≥0 and -∞<α<∞, with an initial condition u(0,x)=δ(x-x0). This equation was introduced and solved by Feller to model the growth of a population of independently reproducing individuals. We explore important coalescent processes related to Feller's solution. For any α and x0>0 we calculate the distribution of the random variable An(s;t), defined as the finite number of ancestors at a time s in the past of a sample of size n taken from the infinite population of a Feller diffusion at a time t since its initiation. In a subcritical diffusion we find the distribution of population and sample coalescent trees from time t back, conditional on non-extinction as t→∞. In a supercritical diffusion we construct a coalescent tree which has a single founder and derive the distribution of coalescent times.

18.
Theor Popul Biol ; 158: 150-169, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38880430

RESUMO

The coalescent is a stochastic process representing ancestral lineages in a population undergoing neutral genetic drift. Originally defined for a well-mixed population, the coalescent has been adapted in various ways to accommodate spatial, age, and class structure, along with other features of real-world populations. To further extend the range of population structures to which coalescent theory applies, we formulate a coalescent process for a broad class of neutral drift models with arbitrary - but fixed - spatial, age, sex, and class structure, haploid or diploid genetics, and any fixed mating pattern. Here, the coalescent is represented as a random sequence of mappings [Formula: see text] from a finite set G to itself. The set G represents the "sites" (in individuals, in particular locations and/or classes) at which these alleles can live. The state of the coalescent, Ct:G→G, maps each site g∈G to the site containing g's ancestor, t time-steps into the past. Using this representation, we define and analyze coalescence time, coalescence branch length, mutations prior to coalescence, and stationary probabilities of identity-by-descent and identity-by-state. For low mutation, we provide a recipe for computing identity-by-descent and identity-by-state probabilities via the coalescent. Applying our results to a diploid population with arbitrary sex ratio r, we find that measures of genetic dissimilarity, among any set of sites, are scaled by 4r(1-r) relative to the even sex ratio case.


Assuntos
Deriva Genética , Genética Populacional , Modelos Genéticos , Mutação , Processos Estocásticos , Humanos , Diploide
19.
Theor Popul Biol ; 158: 1-20, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38697365

RESUMO

We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (α→-∞ or α→+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.


Assuntos
Alelos , Frequência do Gene , Modelos Genéticos , Mutação , Seleção Genética , Humanos , Genética Populacional
20.
Theor Popul Biol ; 159: 91-107, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38490495

RESUMO

Motivated by the question of the impact of selective advantage in populations with skewed reproduction mechanisms, we study a Moran model with selection. We assume that there are two types of individuals, where the reproductive success of one type is larger than the other. The higher reproductive success may stem from either more frequent reproduction, or from larger numbers of offspring, and is encoded in a measure Λ for each of the two types. Λ-reproduction here means that a whole fraction of the population is replaced at a reproductive event. Our approach consists of constructing a Λ-asymmetric Moran model in which individuals of the two populations compete, rather than considering a Moran model for each population. Provided the measure are ordered stochastically, we can couple them. This allows us to construct the central object of this paper, the Λ-asymmetric ancestral selection graph, leading to a pathwise duality of the forward in time Λ-asymmetric Moran model with its ancestral process. We apply the ancestral selection graph in order to obtain scaling limits of the forward and backward processes, and note that the frequency process converges to the solution of an SDE with discontinuous paths. Finally, we derive a Griffiths representation for the generator of the SDE and use it to find a semi-explicit formula for the probability of fixation of the less beneficial of the two types.


Assuntos
Seleção Genética , Reprodução , Modelos Teóricos , Humanos , Dinâmica Populacional , Modelos Genéticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA