RESUMO
Genome-wide genealogies of multiple species carry detailed information about demographic and selection processes on individual branches of the phylogeny. Here, we introduce TRAILS, a hidden Markov model that accurately infers time-resolved population genetics parameters, such as ancestral effective population sizes and speciation times, for ancestral branches using a multi-species alignment of three species and an outgroup. TRAILS leverages the information contained in incomplete lineage sorting fragments by modelling genealogies along the genome as rooted three-leaved trees, each with a topology and two coalescent events happening in discretized time intervals within the phylogeny. Posterior decoding of the hidden Markov model can be used to infer the ancestral recombination graph for the alignment and details on demographic changes within a branch. Since TRAILS performs posterior decoding at the base-pair level, genome-wide scans based on the posterior probabilities can be devised to detect deviations from neutrality. Using TRAILS on a human-chimp-gorilla-orangutan alignment, we recover speciation parameters and extract information about the topology and coalescent times at high resolution.
Assuntos
Especiação Genética , Hominidae , Animais , Humanos , Hominidae/genética , Pan troglodytes/genética , Filogenia , Genética Populacional , Modelos GenéticosRESUMO
In order to accommodate the empirical fact that population structures are rarely simple, modern studies of evolutionary dynamics allow for complicated and highly heterogeneous spatial structures. As a result, one of the most difficult obstacles lies in making analytical deductions, either qualitative or quantitative, about the long-term outcomes of evolution. The "structure-coefficient" theorem is a well-known approach to this problem for mutation-selection processes under weak selection, but a general method of evaluating the terms it comprises is lacking. Here, we provide such a method for populations of fixed (but arbitrary) size and structure, using easily interpretable demographic measures. This method encompasses a large family of evolutionary update mechanisms and extends the theorem to allow for asymmetric contests to provide a better understanding of the mutation-selection balance under more realistic circumstances. We apply the method to study social goods produced and distributed among individuals in spatially heterogeneous populations, where asymmetric interactions emerge naturally and the outcome of selection varies dramatically, depending on the nature of the social good, the spatial topology, and the frequency with which mutations arise.
Assuntos
Evolução Biológica , Teoria dos Jogos , Animais , Genética Populacional , MutaçãoRESUMO
We consider a single genetic locus with two alleles A1 and A2 in a large haploid population. The locus is subject to selection and two-way, or recurrent, mutation. Assuming the allele frequencies follow a Wright-Fisher diffusion and have reached stationarity, we describe the asymptotic behaviors of the conditional gene genealogy and the latent mutations of a sample with known allele counts, when the count n1 of allele A1 is fixed, and when either or both the sample size n and the selection strength |α| tend to infinity. Our study extends previous work under neutrality to the case of non-neutral rare alleles, asserting that when selection is not too strong relative to the sample size, even if it is strongly positive or strongly negative in the usual sense (αâ-∞ or αâ+∞), the number of latent mutations of the n1 copies of allele A1 follows the same distribution as the number of alleles in the Ewens sampling formula. On the other hand, very strong positive selection relative to the sample size leads to neutral gene genealogies with a single ancient latent mutation. We also demonstrate robustness of our asymptotic results against changing population sizes, when one of |α| or n is large.
Assuntos
Alelos , Frequência do Gene , Modelos Genéticos , Mutação , Seleção Genética , Humanos , Genética PopulacionalRESUMO
We consider two-player iterated survival games in which players are able to switch from a more cooperative behavior to a less cooperative one at some step of an n-step game. Payoffs are survival probabilities and lone individuals have to finish the game on their own. We explore the potential of these games to support cooperation, focusing on the case in which each single step is a Prisoner's Dilemma. We find that incentives for or against cooperation depend on the number of defections at the end of the game, as opposed to the number of steps in the game. Broadly, cooperation is supported when the survival prospects of lone individuals are relatively bleak. Specifically, we find three critical values or cutoffs for the loner survival probability which, in concert with other survival parameters, determine the incentives for or against cooperation. One cutoff determines the existence of an optimal number of defections against a fully cooperative partner, one determines whether additional defections eventually become disfavored as the number of defections by the partner increases, and one determines whether additional cooperations eventually become favored as the number of defections by the partner increases. We obtain expressions for these switch-points and for optimal numbers of defections against partners with various strategies. These typically involve small numbers of defections even in very long games. We show that potentially long stretches of equilibria may exist, in which there is no incentive to defect more or cooperate more. We describe how individuals find equilibria in best-response walks among n-step strategies.
Assuntos
Comportamento Cooperativo , Teoria dos Jogos , Dilema do Prisioneiro , Humanos , ProbabilidadeRESUMO
We describe an iterated game between two players, in which the payoff is to survive a number of steps. Expected payoffs are probabilities of survival. A key feature of the game is that individuals have to survive on their own if their partner dies. We consider individuals with hardwired, unconditional behaviors or strategies. When both players are present, each step is a symmetric two-player game. The overall survival of the two individuals forms a Markov chain. As the number of iterations tends to infinity, all probabilities of survival decrease to zero. We obtain general, analytical results for n-step payoffs and use these to describe how the game changes as n increases. In order to predict changes in the frequency of a cooperative strategy over time, we embed the survival game in three different models of a large, well-mixed population. Two of these models are deterministic and one is stochastic. Offspring receive their parent's type without modification and fitnesses are determined by the game. Increasing the number of iterations changes the prospects for cooperation. All models become neutral in the limit (nâ∞). Further, if pairs of cooperative individuals survive together with high probability, specifically higher than for any other pair and for either type when it is alone, then cooperation becomes favored if the number of iterations is large enough. This holds regardless of the structure of pairwise interactions in a single step. Even if the single-step interaction is a Prisoner's Dilemma, the cooperative type becomes favored. Enhanced survival is crucial in these iterated evolutionary games: if players in pairs start the game with a fitness deficit relative to lone individuals, the prospects for cooperation can become even worse than in the case of a single-step game.
Assuntos
Teoria dos Jogos , Sobrevida , Cadeias de Markov , Dinâmica Populacional/estatística & dados numéricosRESUMO
This article consists of commentaries on a selected group of papers of Marc Feldman published in Theoretical Population Biology from 1970 to the present. The papers describe a diverse set of population-genetic models, covering topics such as cultural evolution, social evolution, and the evolution of recombination. The commentaries highlight Marc Feldman's role in providing mathematically rigorous formulations to explore qualitative hypotheses, in many cases generating surprising conclusions.
Assuntos
Evolução Cultural , Genética Populacional , Publicações , Humanos , Modelos Estatísticos , Recombinação Genética , Aprendizado SocialRESUMO
Genetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.
Assuntos
Linhagem , Simulação por Computador , Demografia , Variação Genética , Genética Populacional , Humanos , Modelos GenéticosRESUMO
The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene-conversion rates by using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population-size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased sequenced Dutch individuals and inferred a point mutation rate of 1.66 × 10(-8) per base per generation and a rate of 1.26 × 10(-9) for <20 bp indels. By quantifying how estimates varied as a function of allele frequency, we inferred the probability that a site is involved in non-crossover gene conversion as 5.99 × 10(-6). We found that recombination does not have observable mutagenic effects after gene conversion is accounted for and that local gene-conversion rates reflect recombination rates. We detected a strong enrichment of recent deleterious variation among mismatching variants found within IBD regions and observed summary statistics of local sharing of IBD segments to closely match previously proposed metrics of background selection; however, we found no significant effects of selection on our mutation-rate estimates. We detected no evidence of strong variation of mutation rates in a number of genomic annotations obtained from several recent studies. Our analysis suggests that a mutation-rate estimate higher than that reported by recent pedigree-based studies should be adopted in the context of DNA-based demographic reconstruction.
Assuntos
Genoma Humano , Mutação em Linhagem Germinativa , Modelos Genéticos , Taxa de Mutação , Alelos , Frequência do Gene , Haplótipos , Humanos , Mutação INDEL , Modelos Lineares , Recombinação GenéticaRESUMO
The population-scaled mutation rate, θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, the variance of Tajima's estimator (θË), which is the average number of pairwise differences, does not vanish even as nâ∞. The non-zero variance of Î¸Ë results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by gene genealogies at all loci. We derive the correlation coefficient under a diploid, discrete-time, Wright-Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (VarθË∕θ2≈ONe-1), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.
Assuntos
Loci Gênicos , Genética Populacional/métodos , Modelos Genéticos , Linhagem , Simulação por Computador , Demografia , Feminino , Genealogia e Heráldica , Variação Genética , Heterozigoto , Humanos , Masculino , Taxa de Mutação , Densidade Demográfica , Análise de SequênciaRESUMO
Many mathematical frameworks of evolutionary game dynamics assume that the total population size is constant and that selection affects only the relative frequency of strategies. Here, we consider evolutionary game dynamics in an extended Wright-Fisher process with variable population size. In such a scenario, it is possible that the entire population becomes extinct. Survival of the population may depend on which strategy prevails in the game dynamics. Studying cooperative dilemmas, it is a natural feature of such a model that cooperators enable survival, while defectors drive extinction. Although defectors are favored for any mixed population, random drift could lead to their elimination and the resulting pure-cooperator population could survive. On the other hand, if the defectors remain, then the population will quickly go extinct because the frequency of cooperators steadily declines and defectors alone cannot survive. In a mutation-selection model, we find that (i) a steady supply of cooperators can enable long-term population survival, provided selection is sufficiently strong, and (ii) selection can increase the abundance of cooperators but reduce their relative frequency. Thus, evolutionary game dynamics in populations with variable size generate a multifaceted notion of what constitutes a trait's long-term success.
Assuntos
Modelos Biológicos , Densidade Demográfica , Dinâmica Populacional , Evolução Biológica , Extinção Biológica , Teoria dos Jogos , Humanos , Mutação , Pais , Distribuição de PoissonRESUMO
Contrary to what is often assumed in population genetics, independently segregating loci do not have completely independent ancestries, since all loci are inherited through a single, shared population pedigree. Previous work has shown that the non-independence between gene genealogies of independently segregating loci created by the population pedigree is weak in panmictic populations, and predictions made from standard coalescent theory are accurate for populations that are at least moderately sized. Here, we investigate patterns of coalescence in pedigrees of structured populations. We find that the pedigree creates deviations away from the predictions of the structured coalescent that persist on a longer timescale than in the case of panmictic populations. Nevertheless, we find that the structured coalescent provides a reasonable approximation for the coalescent process in structured population pedigrees so long as migration events are moderately frequent and there are no migration events in the recent pedigree of the sample. When there are migration events in the recent sample pedigree, we find that distributions of coalescence in the sample can be modeled as a mixture of distributions from different initial sample configurations. We use this observation to motivate a maximum-likelihood approach for inferring migration rates and mutation rates jointly with features of the pedigree such as recent migrant ancestry and recent relatedness. Using simulation, we show that our inference framework accurately recovers long-term migration rates in the presence of recent migration events in the sample pedigree.
Assuntos
Genética Populacional/métodos , Funções Verossimilhança , Modelos Genéticos , Linhagem , Genealogia e Heráldica , HumanosRESUMO
The evolution of drug resistance in HIV occurs by the fixation of specific, well-known, drug-resistance mutations, but the underlying population genetic processes are not well understood. By analyzing within-patient longitudinal sequence data, we make four observations that shed a light on the underlying processes and allow us to infer the short-term effective population size of the viral population in a patient. Our first observation is that the evolution of drug resistance usually occurs by the fixation of one drug-resistance mutation at a time, as opposed to several changes simultaneously. Second, we find that these fixation events are accompanied by a reduction in genetic diversity in the region surrounding the fixed drug-resistance mutation, due to the hitchhiking effect. Third, we observe that the fixation of drug-resistance mutations involves both hard and soft selective sweeps. In a hard sweep, a resistance mutation arises in a single viral particle and drives all linked mutations with it when it spreads in the viral population, which dramatically reduces genetic diversity. On the other hand, in a soft sweep, a resistance mutation occurs multiple times on different genetic backgrounds, and the reduction of diversity is weak. Using the frequency of occurrence of hard and soft sweeps we estimate the effective population size of HIV to be 1.5 x 10(5) (95% confidence interval [0.8 x 10(5),4.8 x 10(5)]). This number is much lower than the actual number of infected cells, but much larger than previous population size estimates based on synonymous diversity. We propose several explanations for the observed discrepancies. Finally, our fourth observation is that genetic diversity at non-synonymous sites recovers to its pre-fixation value within 18 months, whereas diversity at synonymous sites remains depressed after this time period. These results improve our understanding of HIV evolution and have potential implications for treatment strategies.
Assuntos
Resistência a Medicamentos/genética , Variação Genética , Infecções por HIV/genética , HIV/genética , Adaptação Biológica , Evolução Molecular , Genética Populacional , HIV/patogenicidade , Infecções por HIV/virologia , Humanos , MutaçãoRESUMO
A long genomic segment inherited by a pair of individuals from a single, recent common ancestor is said to be identical-by-descent (IBD). Shared IBD segments have numerous applications in genetics, from demographic inference to phasing, imputation, pedigree reconstruction, and disease mapping. Here, we provide a theoretical analysis of IBD sharing under Markovian approximations of the coalescent with recombination. We describe a general framework for the IBD process along the chromosome under the Markovian models (SMC/SMC'), as well as introduce and justify a new model, which we term the renewal approximation, under which lengths of successive segments are independent. Then, considering the infinite-chromosome limit of the IBD process, we recover previous results (for SMC) and derive new results (for SMC') for the mean number of shared segments longer than a cutoff and the fraction of the chromosome found in such segments. We then use renewal theory to derive an expression (in Laplace space) for the distribution of the number of shared segments and demonstrate implications for demographic inference. We also compute (again, in Laplace space) the distribution of the fraction of the chromosome in shared segments, from which we obtain explicit expressions for the first two moments. Finally, we generalize all results to populations with a variable effective size.
Assuntos
Ligação Genética/genética , Genética Populacional , Modelos Teóricos , Cadeias de Markov , Modelos Genéticos , LinhagemRESUMO
Genetic data from two or more species provide information about the process of speciation. In their analysis of DNA from humans, chimpanzees, gorillas, orangutans and macaques (HCGOM), Patterson et al. suggest that the apparently short divergence time between humans and chimpanzees on the X chromosome is explained by a massive interspecific hybridization event in the ancestry of these two species. However, Patterson et al. do not statistically test their own null model of simple speciation before concluding that speciation was complex, and--even if the null model could be rejected--they do not consider other explanations of a short divergence time on the X chromosome. These include natural selection on the X chromosome in the common ancestor of humans and chimpanzees, changes in the ratio of male-to-female mutation rates over time, and less extreme versions of divergence with gene flow (see ref. 2, for example). I therefore believe that their claim of hybridization is unwarranted.
Assuntos
Especiação Genética , Modelos Genéticos , Pan troglodytes/genética , Animais , Cromossomos de Mamíferos/genética , Feminino , Humanos , Masculino , Mutagênese/genética , Filogenia , Reprodutibilidade dos Testes , Seleção Genética , Caracteres Sexuais , Fatores de Tempo , Cromossomo X/genéticaRESUMO
We consider a simple diploid population-genetic model with potentially high variability of offspring numbers among individuals. Specifically, against a backdrop of Wright-Fisher reproduction and no selection, there is an additional probability that a big family occurs, meaning that a pair of individuals has a number of offspring on the order of the population size. We study how the pedigree of the population generated under this model affects the ancestral genetic process of a sample of size two at a single autosomal locus without recombination. Our population model is of the type for which multiple-merger coalescent processes have been described. We prove that the conditional distribution of the pairwise coalescence time given the random pedigree converges to a limit law as the population size tends to infinity. This limit law may or may not be the usual exponential distribution of the Kingman coalescent, depending on the frequency of big families. But because it includes the number and times of big families, it differs from the usual multiple-merger coalescent models. The usual multiple-merger coalescent models are seen as describing the ancestral process marginal to, or averaging over, the pedigree. In the limiting ancestral process conditional on the pedigree, the intervals between big families can be modeled using the Kingman coalescent but each big family causes a discrete jump in the probability of coalescence. Analogous results should hold for larger samples and other population models. We illustrate these results with simulations and additional analysis, highlighting their implications for inference and understanding of multilocus data.
Assuntos
Genética Populacional , Modelos Genéticos , Linhagem , Humanos , Densidade DemográficaRESUMO
Recurrent mutation produces multiple copies of the same allele which may be co-segregating in a population. Yet, most analyses of allele-frequency or site-frequency spectra assume that all observed copies of an allele trace back to a single mutation. We develop a sampling theory for the number of latent mutations in the ancestry of a rare variant, specifically a variant observed in relatively small count in a large sample. Our results follow from the statistical independence of low-count mutations, which we show to hold for the standard neutral coalescent or diffusion model of population genetics as well as for more general coalescent trees. For populations of constant size, these counts are distributed like the number of alleles in the Ewens sampling formula. We develop a Poisson sampling model for populations of varying size and illustrate it using new results for site-frequency spectra in an exponentially growing population. We apply our model to a large data set of human SNPs and use it to explain dramatic differences in site-frequency spectra across the range of mutation rates in the human genome.
Assuntos
Genética Populacional , Modelos Genéticos , Humanos , Mutação , Frequência do Gene , Taxa de Mutação , AlelosRESUMO
We show that the number of lineages ancestral to a sample, as a function of time back into the past, which we call the number of lineages as a function of time (NLFT), is a nearly deterministic property of large-sample gene genealogies. We obtain analytic expressions for the NLFT for both constant-sized and exponentially growing populations. The low level of stochastic variation associated with the NLFT of a large sample suggests using the NLFT to make estimates of population parameters. Based on this, we develop a new computational method of inferring the size and growth rate of a population from a large sample of DNA sequences at a single locus. We apply our method first to a sample of 1,212 mitochondrial DNA (mtDNA) sequences from China, confirming a pattern of recent population growth previously identified using other techniques, but with much smaller confidence intervals for past population sizes due to the low variation of the NLFT. We further analyze a set of 63 mtDNA sequences from blue whales (BWs), concluding that the population grew in the past. This calls for reevaluation of previous studies that were based on the assumption that the BW population was fixed.
Assuntos
Variação Genética , Dinâmica Populacional , Crescimento Demográfico , Algoritmos , Animais , Povo Asiático , Balaenoptera/genética , Teorema de Bayes , China , Simulação por Computador , Genoma Mitocondrial/genética , Humanos , Funções Verossimilhança , Modelos Genéticos , Filogenia , Alinhamento de Sequência , Processos EstocásticosRESUMO
The emergence of cooperation in populations of selfish individuals is a fascinating topic that has inspired much work in theoretical biology. Here, we study the evolution of cooperation in a model where individuals are characterized by phenotypic properties that are visible to others. The population is well mixed in the sense that everyone is equally likely to interact with everyone else, but the behavioral strategies can depend on distance in phenotype space. We study the interaction of cooperators and defectors. In our model, cooperators cooperate with those who are similar and defect otherwise. Defectors always defect. Individuals mutate to nearby phenotypes, which generates a random walk of the population in phenotype space. Our analysis brings together ideas from coalescence theory and evolutionary game dynamics. We obtain a precise condition for natural selection to favor cooperators over defectors. Cooperation is favored when the phenotypic mutation rate is large and the strategy mutation rate is small. In the optimal case for cooperators, in a one-dimensional phenotype space and for large population size, the critical benefit-to-cost ratio is given by b/c = 1 + 2/square root(3). We also derive the fundamental condition for any two-strategy symmetric game and consider high-dimensional phenotype spaces.
Assuntos
Evolução Biológica , Biologia Computacional , Simulação por Computador , Teoria dos Jogos , Modelos Genéticos , FenótipoRESUMO
A large number of statistical tests have been proposed to detect natural selection based on a sample of variation at a single genetic locus. These tests measure the deviation of the allelic frequency distribution observed within populations from the distribution expected under a set of assumptions that includes both neutral evolution and equilibrium population demography. The present study considers a new way to assess the statistical properties of these tests of selection, by their behavior in response to direct perturbations of the steady-state allelic frequency distribution, unconstrained by any particular nonequilibrium demographic scenario. Results from Monte Carlo computer simulations indicate that most tests of selection are more sensitive to perturbations of the allele frequency distribution that increase the variance in allele frequencies than to perturbations that decrease the variance. Simulations also demonstrate that it requires, on average, 4N generations (N is the diploid effective population size) for tests of selection to relax to their theoretical, steady-state distributions following different perturbations of the allele frequency distribution to its extremes. This relatively long relaxation time highlights the fact that these tests are not robust to violations of the other assumptions of the null model besides neutrality. Lastly, genetic variation arising under an example of a regularly cycling demographic scenario is simulated. Tests of selection performed on this last set of simulated data confirm the confounding nature of these tests for the inference of natural selection, under a demographic scenario that likely holds for many species. The utility of using empirical, genomic distributions of test statistics, instead of the theoretical steady-state distribution, is discussed as an alternative for improving the statistical inference of natural selection.