Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
Add more filters










Publication year range
1.
bioRxiv ; 2024 Jun 14.
Article in English | MEDLINE | ID: mdl-38915613

ABSTRACT

Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a roll in a given allele frequency change. Predicting how much allele frequencies change under drift and selection had remained an open problem well into the 21st century, even those contributing to simple, monogenic traits. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. In particular, we derive analytic expressions for the transition probability (i.e., the probability that an allele will change in frequency from x , to y in time t ) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of allele frequency change to test for selection, as well as explore optimal design choices for evolve-and-resequence experiments to uncover the genetic architecture of polygenic traits under selection.

2.
Curr Biol ; 33(22): R1197-R1200, 2023 11 20.
Article in English | MEDLINE | ID: mdl-37989099

ABSTRACT

Human and Neanderthal populations met and mixed on multiple occasions over evolutionary time, resulting in the exchange of genetic material. New genomic analyses of diverse African populations reveal a history of bidirectional gene flow and selection acting on introgressed alleles.


Subject(s)
Evolution, Molecular , Genome, Human , Neanderthals , Animals , Humans , Alleles , Gene Flow , Genomics , Neanderthals/genetics , Selection, Genetic , African People
3.
Nat Commun ; 14(1): 5465, 2023 09 12.
Article in English | MEDLINE | ID: mdl-37699896

ABSTRACT

Twentieth century industrial whaling pushed several species to the brink of extinction, with fin whales being the most impacted. However, a small, resident population in the Gulf of California was not targeted by whaling. Here, we analyzed 50 whole-genomes from the Eastern North Pacific (ENP) and Gulf of California (GOC) fin whale populations to investigate their demographic history and the genomic effects of natural and human-induced bottlenecks. We show that the two populations diverged ~16,000 years ago, after which the ENP population expanded and then suffered a 99% reduction in effective size during the whaling period. In contrast, the GOC population remained small and isolated, receiving less than one migrant per generation. However, this low level of migration has been crucial for maintaining its viability. Our study exposes the severity of whaling, emphasizes the importance of migration, and demonstrates the use of genome-based analyses and simulations to inform conservation strategies.


Subject(s)
Fin Whale , Humans , Animals , Genomics , Industry
4.
Am J Hum Genet ; 110(10): 1804-1816, 2023 10 05.
Article in English | MEDLINE | ID: mdl-37725976

ABSTRACT

Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged ∼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.


Subject(s)
Genetics, Population , Genome, Human , Humans , Latin America , Genome, Human/genetics , Demography , White
6.
Mol Biol Evol ; 40(8)2023 08 03.
Article in English | MEDLINE | ID: mdl-37450583

ABSTRACT

Wang et al. (2023) recently proposed an approach to infer the history of human generation intervals from changes in mutation profiles over time. As the relative proportions of different mutation types depend on the ages of parents, binning variants by the time they arose allows for the inference of changes in average paternal and maternal generation intervals. Applying this approach to published allele age estimates, Wang et al. (2023) inferred long-lasting sex differences in average generation times and surprisingly found that ancestral generation times of West African populations remained substantially higher than those of Eurasian populations extending tens of thousands of generations into the past. Here, we argue that the results and interpretations in Wang et al. (2023) are primarily driven by noise and biases in input data and a lack of validation using independent approaches for estimating allele ages. With the recent development of methods to reconstruct genome-wide gene genealogies, coalescence times, and allele ages, we caution that downstream analyses may be strongly influenced by uncharacterized biases in their output.


Subject(s)
Uncertainty , Humans , Female , Male , Mutation , Alleles
7.
Elife ; 122023 06 21.
Article in English | MEDLINE | ID: mdl-37342968

ABSTRACT

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone.


Subject(s)
Genome , Software , Computer Simulation , Genetics, Population , Genomics
8.
Nature ; 617(7962): 755-763, 2023 05.
Article in English | MEDLINE | ID: mdl-37198480

ABSTRACT

Despite broad agreement that Homo sapiens originated in Africa, considerable uncertainty surrounds specific models of divergence and migration across the continent1. Progress is hampered by a shortage of fossil and genomic data, as well as variability in previous estimates of divergence times1. Here we seek to discriminate among such models by considering linkage disequilibrium and diversity-based statistics, optimized for rapid, complex demographic inference2. We infer detailed demographic models for populations across Africa, including eastern and western representatives, and newly sequenced whole genomes from 44 Nama (Khoe-San) individuals from southern Africa. We infer a reticulated African population history in which present-day population structure dates back to Marine Isotope Stage 5. The earliest population divergence among contemporary populations occurred 120,000 to 135,000 years ago and was preceded by links between two or more weakly differentiated ancestral Homo populations connected by gene flow over hundreds of thousands of years. Such weakly structured stem models explain patterns of polymorphism that had previously been attributed to contributions from archaic hominins in Africa2-7. In contrast to models with archaic introgression, we predict that fossil remains from coexisting ancestral populations should be genetically and morphologically similar, and that only an inferred 1-4% of genetic differentiation among contemporary human populations can be attributed to genetic drift between stem populations. We show that model misspecification explains the variation in previous estimates of divergence times, and argue that studying a range of models is key to making robust inferences about deep history.


Subject(s)
Genetics, Population , Human Migration , Phylogeny , Humans , Africa/ethnology , Fossils , Gene Flow , Genetic Drift , Genetic Introgression , Genome, Human , History, Ancient , Human Migration/history , Linkage Disequilibrium/genetics , Polymorphism, Genetic , Time Factors
9.
Genetics ; 222(3)2022 11 01.
Article in English | MEDLINE | ID: mdl-36173327

ABSTRACT

Understanding the demographic history of populations is a key goal in population genetics, and with improving methods and data, ever more complex models are being proposed and tested. Demographic models of current interest typically consist of a set of discrete populations, their sizes and growth rates, and continuous and pulse migrations between those populations over a number of epochs, which can require dozens of parameters to fully describe. There is currently no standard format to define such models, significantly hampering progress in the field. In particular, the important task of translating the model descriptions in published work into input suitable for population genetic simulators is labor intensive and error prone. We propose the Demes data model and file format, built on widely used technologies, to alleviate these issues. Demes provide a well-defined and unambiguous model of populations and their properties that is straightforward to implement in software, and a text file format that is designed for simplicity and clarity. We provide thoroughly tested implementations of Demes parsers in multiple languages including Python and C, and showcase initial support in several simulators and inference methods. An introduction to the file format and a detailed specification are available at https://popsim-consortium.github.io/demes-spec-docs/.


Subject(s)
Genetics, Population , Software , Demography
10.
Genetics ; 221(4)2022 07 30.
Article in English | MEDLINE | ID: mdl-35736370

ABSTRACT

Selected mutations interfere and interact with evolutionary processes at nearby loci, distorting allele frequency trajectories and creating correlations between pairs of mutations. Recent studies have used patterns of linkage disequilibrium between selected variants to test for selective interference and epistatic interactions, with some disagreement over interpreting observations from data. Interpretation is hindered by a lack of analytic or even numerical expectations for patterns of variation between pairs of loci under the combined effects of selection, dominance, epistasis, and demography. Here, I develop a numerical approach to compute the expected two-locus sampling distribution under diploid selection with arbitrary epistasis and dominance, recombination, and variable population size. I use this to explore how epistasis and dominance affect expected signed linkage disequilibrium, including for nonsteady-state demography relevant to human populations. Using whole-genome sequencing data from humans, I explore genome-wide patterns of linkage disequilibrium within protein-coding genes. I show that positive linkage disequilibrium between missense mutations within genes is driven by strong positive allele-frequency correlations between mutations that fall within the same annotated conserved domain, pointing to compensatory mutations or antagonistic epistasis as the prevailing mode of interaction within conserved genic elements. Linkage disequilibrium between missense mutations is reduced outside of conserved domains, as expected under Hill-Robertson interference. This variation in both mutational fitness effects and selective interactions within protein-coding genes calls for more refined inferences of the joint distribution of fitness and interactive effects, and the methods presented here should prove useful in that pursuit.


Subject(s)
Epistasis, Genetic , Models, Genetic , Biological Evolution , Gene Frequency , Humans , Linkage Disequilibrium , Selection, Genetic
11.
Genetics ; 220(3)2022 03 03.
Article in English | MEDLINE | ID: mdl-34897427

ABSTRACT

Stochastic simulation is a key tool in population genetics, since the models involved are often analytically intractable and simulation is usually the only way of obtaining ground-truth data to evaluate inferences. Because of this, a large number of specialized simulation programs have been developed, each filling a particular niche, but with largely overlapping functionality and a substantial duplication of effort. Here, we introduce msprime version 1.0, which efficiently implements ancestry and mutation simulations based on the succinct tree sequence data structure and the tskit library. We summarize msprime's many features, and show that its performance is excellent, often many times faster and more memory efficient than specialized alternatives. These high-performance features have been thoroughly tested and validated, and built using a collaborative, open source development model, which reduces duplication of effort and promotes software quality via community engagement.


Subject(s)
Algorithms , Models, Genetic , Computer Simulation , Genetics, Population , Mutation , Software
12.
Am J Bot ; 108(11): 2269-2281, 2021 11.
Article in English | MEDLINE | ID: mdl-34636416

ABSTRACT

PREMISE: Polyploid species often have complex evolutionary histories that have, until recently, been intractable due to limitations of genomic resources. While recent work has further uncovered the evolutionary history of the octoploid strawberry (Fragaria L.), there are still open questions. Much is unknown about the evolutionary relationship of the wild octoploid species, Fragaria virginiana and Fragaria chiloensis, and gene flow within and among species after the formation of the octoploid genome. METHODS: We leveraged a collection of wild octoploid ecotypes of strawberry representing the recognized subspecies and ranging from Alaska to southern Chile, and a high-density SNP array to investigate wild octoploid strawberry evolution. Evolutionary relationships were interrogated with phylogenetic analysis and genetic clustering algorithms. Additionally, admixture among and within species is assessed with model-based and tree-based approaches. RESULTS: Phylogenetic analysis revealed that the two octoploid strawberry species are monophyletic sister lineages. The genetic clustering results show substructure between North and South American F. chiloensis populations. Additionally, model-based and tree-based methods support gene flow within and among the two octoploid species, including newly identified admixture in the Hawaiian F. chiloensis subsp. sandwicensis population. CONCLUSIONS: F. virginiana and F. chiloensis are supported as monophyletic and sister lineages. All but one of the subspecies show extensive paraphyly. Furthermore, phylogenetic relationships among F. chiloensis populations supports a single population range expansion southward from North America. The inter- and intraspecific relationships of octoploid strawberry are complex and suggest substantial gene flow between sympatric populations among and within species.


Subject(s)
Fragaria , Americas , Fragaria/genetics , Genome, Plant , Phylogeny , Polyploidy
13.
Genet Epidemiol ; 45(6): 621-632, 2021 09.
Article in English | MEDLINE | ID: mdl-34157784

ABSTRACT

Linkage-Disequilibrium Score Regression (LDSC) is a popular framework for analyzing Genome-wide Association Studies (GWAS) summary statistics that allows for estimating single nucleotide polymorphism heritability, confounding, and functional enrichment of genetic variants with different annotations. Recent work has highlighted the influence of implicit and explicit assumptions of the model on the biological interpretation of the results. In this study, we explored a formulation of LDSC that replaces the r2 measure of LD with a recently proposed unbiased estimator of the D2 statistic. In addition to modest statistical difference across estimators, this derivation highlighted implicit and unrealistic assumptions about the relationship between allele frequency, effect size, and annotation status. We carry out a systematic comparison of alternative LDSC formulations by applying them to summary statistics from 47 GWAS traits. Our results show that commonly used models likely underestimate functional enrichment. These results highlight the importance of calibrating the LDSC model to achieve a more robust understanding of polygenic traits.


Subject(s)
Genome-Wide Association Study , Multifactorial Inheritance , Humans , Linkage Disequilibrium , Models, Genetic , Polymorphism, Single Nucleotide
14.
Mol Biol Evol ; 38(10): 4588-4602, 2021 09 27.
Article in English | MEDLINE | ID: mdl-34043790

ABSTRACT

The effect of a mutation on fitness may differ between populations depending on environmental and genetic context, but little is known about the factors that underlie such differences. To quantify genome-wide correlations in mutation fitness effects, we developed a novel concept called a joint distribution of fitness effects (DFE) between populations. We then proposed a new statistic w to measure the DFE correlation between populations. Using simulation, we showed that inferring the DFE correlation from the joint allele frequency spectrum is statistically precise and robust. Using population genomic data, we inferred DFE correlations of populations in humans, Drosophila melanogaster, and wild tomatoes. In these species, we found that the overall correlation of the joint DFE was inversely related to genetic differentiation. In humans and D. melanogaster, deleterious mutations had a lower DFE correlation than tolerated mutations, indicating a complex joint DFE. Altogether, the DFE correlation can be reliably inferred, and it offers extensive insight into the genetics of population divergence.


Subject(s)
Drosophila melanogaster , Genetic Fitness , Animals , Drosophila melanogaster/genetics , Gene Frequency , Genome , Models, Genetic , Mutation
15.
Proc Natl Acad Sci U S A ; 118(21)2021 05 25.
Article in English | MEDLINE | ID: mdl-34016747

ABSTRACT

As populations boom and bust, the accumulation of genetic diversity is modulated, encoding histories of living populations in present-day variation. Many methods exist to decode these histories, and all must make strong model assumptions. It is typical to assume that mutations accumulate uniformly across the genome at a constant rate that does not vary between closely related populations. However, recent work shows that mutational processes in human and great ape populations vary across genomic regions and evolve over time. This perturbs the mutation spectrum (relative mutation rates in different local nucleotide contexts). Here, we develop theoretical tools in the framework of Kingman's coalescent to accommodate mutation spectrum dynamics. We present mutation spectrum history inference (mushi), a method to perform nonparametric inference of demographic and mutation spectrum histories from allele frequency data. We use mushi to reconstruct trajectories of effective population size and mutation spectrum divergence between human populations, identify mutation signatures and their dynamics in different human populations, and calibrate the timing of a previously reported mutational pulse in the ancestors of Europeans. We show that mutation spectrum histories can be placed in a well-studied theoretical setting and rigorously inferred from genomic variation data, like other features of evolutionary history.


Subject(s)
Gene Frequency/genetics , Genetics, Population/statistics & numerical data , Models, Genetic , Mutation/genetics , Animals , Genetic Variation/genetics , Genomics , Hominidae/genetics , Humans , Mutation Rate , Population Density
16.
Mol Biol Evol ; 38(8): 3358-3372, 2021 07 29.
Article in English | MEDLINE | ID: mdl-33930151

ABSTRACT

The study of domestication contributes to our knowledge of evolution and crop genetic resources. Human selection has shaped wild Brassica rapa into diverse turnip, leafy, and oilseed crops. Despite its worldwide economic importance and potential as a model for understanding diversification under domestication, insights into the number of domestication events and initial crop(s) domesticated in B. rapa have been limited due to a lack of clarity about the wild or feral status of conspecific noncrop relatives. To address this gap and reconstruct the domestication history of B. rapa, we analyzed 68,468 genotyping-by-sequencing-derived single nucleotide polymorphisms for 416 samples in the largest diversity panel of domesticated and weedy B. rapa to date. To further understand the center of origin, we modeled the potential range of wild B. rapa during the mid-Holocene. Our analyses of genetic diversity across B. rapa morphotypes suggest that noncrop samples from the Caucasus, Siberia, and Italy may be truly wild, whereas those occurring in the Americas and much of Europe are feral. Clustering, tree-based analyses, and parameterized demographic inference further indicate that turnips were likely the first crop type domesticated, from which leafy types in East Asia and Europe were selected from distinct lineages. These findings clarify the domestication history and nature of wild crop genetic resources for B. rapa, which provides the first step toward investigating cases of possible parallel selection, the domestication and feralization syndrome, and novel germplasm for Brassica crop improvement.


Subject(s)
Brassica rapa/genetics , Crops, Agricultural/genetics , Domestication , Models, Genetic , Plant Weeds/genetics , Genetic Introgression , Genetic Variation , Genotyping Techniques , Phylogeography , Selection, Genetic
17.
Am J Hum Genet ; 107(4): 583-588, 2020 10 01.
Article in English | MEDLINE | ID: mdl-33007197

ABSTRACT

Simulation plays a central role in population genomics studies. Recent years have seen rapid improvements in software efficiency that make it possible to simulate large genomic regions for many individuals sampled from large numbers of populations. As the complexity of the demographic models we study grows, however, there is an ever-increasing opportunity to introduce bugs in their implementation. Here, we describe two errors made in defining population genetic models using the msprime coalescent simulator that have found their way into the published record. We discuss how these errors have affected downstream analyses and give recommendations for software developers and users to reduce the risk of such errors.


Subject(s)
Genetics, Population/trends , Genome, Human , Models, Genetic , Software , Algorithms , Computer Simulation , Demography , Genetic Variation , Genetics, Population/history , History, Ancient , Human Migration/history , Human Migration/statistics & numerical data , Humans
18.
Elife ; 92020 06 23.
Article in English | MEDLINE | ID: mdl-32573438

ABSTRACT

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.


Subject(s)
Genetics, Population , Genomic Library , Models, Genetic , Animals , Arabidopsis/genetics , Dogs/genetics , Drosophila melanogaster/genetics , Escherichia coli/genetics , Genetics, Population/methods , Genetics, Population/organization & administration , Genome/genetics , Genome, Human/genetics , Humans , Pongo abelii/genetics
19.
PLoS Genet ; 16(5): e1008619, 2020 05.
Article in English | MEDLINE | ID: mdl-32369493

ABSTRACT

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.


Subject(s)
Algorithms , Base Sequence/physiology , Genetics, Population/methods , Genome-Wide Association Study/methods , Models, Genetic , Cohort Studies , Computer Simulation , Evolution, Molecular , Genome/genetics , Genome-Wide Association Study/statistics & numerical data , Humans , Linkage Disequilibrium , Recombination, Genetic/physiology , Sample Size
20.
Mol Biol Evol ; 37(3): 923-932, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31697386

ABSTRACT

Linkage disequilibrium (LD) is used to infer evolutionary history, to identify genomic regions under selection, and to dissect the relationship between genotype and phenotype. In each case, we require accurate estimates of LD statistics from sequencing data. Unphased data present a challenge because multilocus haplotypes cannot be inferred exactly. Widely used estimators for the common statistics r2 and D2 exhibit large and variable upward biases that complicate interpretation and comparison across cohorts. Here, we show how to find unbiased estimators for a wide range of two-locus statistics, including D2, for both single and multiple randomly mating populations. These unbiased statistics are particularly well suited to estimate effective population sizes from unlinked loci in small populations. We develop a simple inference pipeline and use it to refine estimates of recent effective population sizes of the threatened Channel Island Fox populations.


Subject(s)
Computational Biology/methods , Foxes/genetics , Animals , Gene Frequency , Genetics, Population , Genotype , Haplotypes , Linkage Disequilibrium , Models, Genetic , Phenotype , Polymorphism, Single Nucleotide , Population Density , Selection, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL
...