Pesquisa | Biblioteca Virtual em Saúde

1.

Multiple Sources of Uncertainty Confound Inference of Historical Human Generation Times.

Ragsdale, Aaron P; Thornton, Kevin R.

Mol Biol Evol ; 40(8)2023 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-37450583

RESUMO

Wang et al. (2023) recently proposed an approach to infer the history of human generation intervals from changes in mutation profiles over time. As the relative proportions of different mutation types depend on the ages of parents, binning variants by the time they arose allows for the inference of changes in average paternal and maternal generation intervals. Applying this approach to published allele age estimates, Wang et al. (2023) inferred long-lasting sex differences in average generation times and surprisingly found that ancestral generation times of West African populations remained substantially higher than those of Eurasian populations extending tens of thousands of generations into the past. Here, we argue that the results and interpretations in Wang et al. (2023) are primarily driven by noise and biases in input data and a lack of validation using independent approaches for estimating allele ages. With the recent development of methods to reconstruct genome-wide gene genealogies, coalescence times, and allele ages, we caution that downstream analyses may be strongly influenced by uncharacterized biases in their output.

Assuntos

Incerteza , Humanos , Feminino , Masculino , Mutação , Alelos

2.

Evidence of directional and stabilizing selection in contemporary humans.

Sanjak, Jaleal S; Sidorenko, Julia; Robinson, Matthew R; Thornton, Kevin R; Visscher, Peter M.

Proc Natl Acad Sci U S A ; 115(1): 151-156, 2018 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-29255044

RESUMO

Modern molecular genetic datasets, primarily collected to study the biology of human health and disease, can be used to directly measure the action of natural selection and reveal important features of contemporary human evolution. Here we leverage the UK Biobank data to test for the presence of linear and nonlinear natural selection in a contemporary population of the United Kingdom. We obtain phenotypic and genetic evidence consistent with the action of linear/directional selection. Phenotypic evidence suggests that stabilizing selection, which acts to reduce variance in the population without necessarily modifying the population mean, is widespread and relatively weak in comparison with estimates from other species.

Assuntos

Evolução Biológica , Modelos Genéticos , Fenótipo , Seleção Genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Reino Unido

3.

Tandem duplications lead to novel expression patterns through exon shuffling in Drosophila yakuba.

Rogers, Rebekah L; Shao, Ling; Thornton, Kevin R.

PLoS Genet ; 13(5): e1006795, 2017 May.

Artigo em Inglês | MEDLINE | ID: mdl-28531189

RESUMO

One common hypothesis to explain the impacts of tandem duplications is that whole gene duplications commonly produce additive changes in gene expression due to copy number changes. Here, we use genome wide RNA-seq data from a population sample of Drosophila yakuba to test this 'gene dosage' hypothesis. We observe little evidence of expression changes in response to whole transcript duplication capturing 5' and 3' UTRs. Among whole gene duplications, we observe evidence that dosage sharing across copies is likely to be common. The lack of expression changes after whole gene duplication suggests that the majority of genes are subject to tight regulatory control and therefore not sensitive to changes in gene copy number. Rather, we observe changes in expression level due to both shuffling of regulatory elements and the creation of chimeric structures via tandem duplication. Additionally, we observe 30 de novo gene structures arising from tandem duplications, 23 of which form with expression in the testes. Thus, the value of tandem duplications is likely to be more intricate than simple changes in gene dosage. The common regulatory effects from chimeric gene formation after tandem duplication may explain their contribution to genome evolution.

Assuntos

Drosophila/genética , Éxons , Dosagem de Genes , Duplicação Gênica , Sequências de Repetição em Tandem , Regiões 3' não Traduzidas , Regiões 5' não Traduzidas , Animais , Evolução Molecular , Recombinação Genética

4.

A Model of Compound Heterozygous, Loss-of-Function Alleles Is Broadly Consistent with Observations from Complex-Disease GWAS Datasets.

Sanjak, Jaleal S; Long, Anthony D; Thornton, Kevin R.

PLoS Genet ; 13(1): e1006573, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-28103232

RESUMO

The genetic component of complex disease risk in humans remains largely unexplained. A corollary is that the allelic spectrum of genetic variants contributing to complex disease risk is unknown. Theoretical models that relate population genetic processes to the maintenance of genetic variation for quantitative traits may suggest profitable avenues for future experimental design. Here we use forward simulation to model a genomic region evolving under a balance between recurrent deleterious mutation and Gaussian stabilizing selection. We consider multiple genetic and demographic models, and several different methods for identifying genomic regions harboring variants associated with complex disease risk. We demonstrate that the model of gene action, relating genotype to phenotype, has a qualitative effect on several relevant aspects of the population genetic architecture of a complex trait. In particular, the genetic model impacts genetic variance component partitioning across the allele frequency spectrum and the power of statistical tests. Models with partial recessivity closely match the minor allele frequency distribution of significant hits from empirical genome-wide association studies without requiring homozygous effect sizes to be small. We highlight a particular gene-based model of incomplete recessivity that is appealing from first principles. Under that model, deleterious mutations in a genomic region partially fail to complement one another. This model of gene-based recessivity predicts the empirically observed inconsistency between twin and SNP based estimated of dominance heritability. Furthermore, this model predicts considerable levels of unexplained variance associated with intralocus epistasis. Our results suggest a need for improved statistical tools for region based genetic association and heritability estimation.

Assuntos

Frequência do Gene , Predisposição Genética para Doença , Genoma Humano , Heterozigoto , Modelos Genéticos , Epistasia Genética , Estudo de Associação Genômica Ampla/normas , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único

5.

Efficient pedigree recording for fast population genetics simulation.

Kelleher, Jerome; Thornton, Kevin R; Ashander, Jaime; Ralph, Peter L.

PLoS Comput Biol ; 14(11): e1006581, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30383757

RESUMO

In this paper we describe how to efficiently record the entire genetic history of a population in forwards-time, individual-based population genetics simulations with arbitrary breeding models, population structure and demography. This approach dramatically reduces the computational burden of tracking individual genomes by allowing us to simulate only those loci that may affect reproduction (those having non-neutral variants). The genetic history of the population is recorded as a succinct tree sequence as introduced in the software package msprime, on which neutral mutations can be quickly placed afterwards. Recording the results of each breeding event requires storage that grows linearly with time, but there is a great deal of redundancy in this information. We solve this storage problem by providing an algorithm to quickly 'simplify' a tree sequence by removing this irrelevant history for a given set of genomes. By periodically simplifying the history with respect to the extant population, we show that the total storage space required is modest and overall large efficiency gains can be made over classical forward-time simulations. We implement a general-purpose framework for recording and simplifying genealogical data, which can be used to make simulations of any population model more efficient. We modify two popular forwards-time simulation frameworks to use this new approach and observe efficiency gains in large, whole-genome simulations of one to two orders of magnitude. In addition to speed, our method for recording pedigrees has several advantages: (1) All marginal genealogies of the simulated individuals are recorded, rather than just genotypes. (2) A population of N individuals with M polymorphic sites can be stored in O(N log N + M) space, making it feasible to store a simulation's entire final generation as well as its history. (3) A simulation can easily be initialized with a more efficient coalescent simulation of deep history. The software for recording and processing tree sequences is named tskit.

Assuntos

Biologia Computacional/métodos , Variação Genética , Genética Populacional , Software , Algoritmos , Simulação por Computador , Frequência do Gene , Genoma , Genótipo , Humanos , Modelos Genéticos , Linhagem , Polimorfismo Genético

6.

The Drosophila melanogaster Genetic Reference Panel.

Mackay, Trudy F C; Richards, Stephen; Stone, Eric A; Barbadilla, Antonio; Ayroles, Julien F; Zhu, Dianhui; Casillas, Sònia; Han, Yi; Magwire, Michael M; Cridland, Julie M; Richardson, Mark F; Anholt, Robert R H; Barrón, Maite; Bess, Crystal; Blankenburg, Kerstin Petra; Carbone, Mary Anna; Castellano, David; Chaboub, Lesley; Duncan, Laura; Harris, Zeke; Javaid, Mehwish; Jayaseelan, Joy Christina; Jhangiani, Shalini N; Jordan, Katherine W; Lara, Fremiet; Lawrence, Faye; Lee, Sandra L; Librado, Pablo; Linheiro, Raquel S; Lyman, Richard F; Mackey, Aaron J; Munidasa, Mala; Muzny, Donna Marie; Nazareth, Lynne; Newsham, Irene; Perales, Lora; Pu, Ling-Ling; Qu, Carson; Ràmia, Miquel; Reid, Jeffrey G; Rollmann, Stephanie M; Rozas, Julio; Saada, Nehad; Turlapati, Lavanya; Worley, Kim C; Wu, Yuan-Qing; Yamamoto, Akihiko; Zhu, Yiming; Bergman, Casey M; Thornton, Kevin R.

Nature ; 482(7384): 173-8, 2012 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-22318601

RESUMO

A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype-phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype-phenotype mapping using the power of Drosophila genetics.

Assuntos

Drosophila melanogaster/genética , Estudo de Associação Genômica Ampla , Genômica , Locos de Características Quantitativas/genética , Alelos , Animais , Centrômero/genética , Cromossomos de Insetos/genética , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética/genética , Inanição/genética , Telômero/genética , Cromossomo X/genética

7.

A second-generation assembly of the Drosophila simulans genome provides new insights into patterns of lineage-specific divergence.

Hu, Tina T; Eisen, Michael B; Thornton, Kevin R; Andolfatto, Peter.

Genome Res ; 23(1): 89-98, 2013 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-22936249

RESUMO

We create a new assembly of the Drosophila simulans genome using 142 million paired short-read sequences and previously published data for strain w(501). Our assembly represents a higher-quality genomic sequence with greater coverage, fewer misassemblies, and, by several indexes, fewer sequence errors. Evolutionary analysis of this genome reference sequence reveals interesting patterns of lineage-specific divergence that are different from those previously reported. Specifically, we find that Drosophila melanogaster evolves faster than D. simulans at all annotated classes of sites, including putatively neutrally evolving sites found in minimal introns. While this may be partly explained by a higher mutation rate in D. melanogaster, we also find significant heterogeneity in rates of evolution across classes of sites, consistent with historical differences in the effective population size for the two species. Also contrary to previous findings, we find that the X chromosome is evolving significantly faster than autosomes for nonsynonymous and most noncoding DNA sites and significantly slower for synonymous sites. The absence of a X/A difference for putatively neutral sites and the robustness of the pattern to Gene Ontology and sex-biased expression suggest that partly recessive beneficial mutations may comprise a substantial fraction of noncoding DNA divergence observed between species. Our results have more general implications for the interpretation of evolutionary analyses of genomes of different quality.

Assuntos

Drosophila/genética , Evolução Molecular , Genoma de Inseto , Animais , Cromossomos de Insetos/genética , Mapeamento de Sequências Contíguas , Íntrons , Taxa de Mutação , Filogenia , População/genética , Cromossomo X/genética

8.

Genome-wide analysis of a long-term evolution experiment with Drosophila.

Burke, Molly K; Dunham, Joseph P; Shahrestani, Parvin; Thornton, Kevin R; Rose, Michael R; Long, Anthony D.

Nature ; 467(7315): 587-90, 2010 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-20844486

RESUMO

Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast. Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult â¼20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species; in our sexual populations, adaptation is not associated with 'classic' sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include 'incomplete' sweep models, in which mutations have not had enough time to fix, and 'soft' sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time.

Assuntos

Evolução Biológica , Drosophila melanogaster/genética , Drosophila melanogaster/fisiologia , Genoma de Inseto/genética , Seleção Genética/genética , Alelos , Animais , Drosophila melanogaster/embriologia , Drosophila melanogaster/crescimento & desenvolvimento , Feminino , Frequência do Gene/genética , Aptidão Genética/genética , Heterozigoto , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Sexo

9.

Properties and modeling of GWAS when complex disease risk is due to non-complementing, deleterious mutations in genes of large effect.

Thornton, Kevin R; Foran, Andrew J; Long, Anthony D.

PLoS Genet ; 9(2): e1003258, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23437004

RESUMO

Current genome-wide association studies (GWAS) have high power to detect intermediate frequency SNPs making modest contributions to complex disease, but they are underpowered to detect rare alleles of large effect (RALE). This has led to speculation that the bulk of variation for most complex diseases is due to RALE. One concern with existing models of RALE is that they do not make explicit assumptions about the evolution of a phenotype and its molecular basis. Rather, much of the existing literature relies on arbitrary mapping of phenotypes onto genotypes obtained either from standard population-genetic simulation tools or from non-genetic models. We introduce a novel simulation of a 100-kilobase gene region, based on the standard definition of a gene, in which mutations are unconditionally deleterious, are continuously arising, have partially recessive and non-complementing effects on phenotype (analogous to what is widely observed for most Mendelian disorders), and are interspersed with neutral markers that can be genotyped. Genes evolving according to this model exhibit a characteristic GWAS signature consisting of an excess of marginally significant markers. Existing tests for an excess burden of rare alleles in cases have low power while a simple new statistic has high power to identify disease genes evolving under our model. The structure of linkage disequilibrium between causative mutations and significantly associated markers under our model differs fundamentally from that seen when rare causative markers are assumed to be neutral. Rather than tagging single haplotypes bearing a large number of rare causative alleles, we find that significant SNPs in a GWAS tend to tag single causative mutations of small effect relative to other mutations in the same gene. Our results emphasize the importance of evaluating the power to detect associations under models that are genetically and evolutionarily motivated.

Assuntos

Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Modelos Genéticos , Alelos , Bases de Dados Genéticas , Ligação Genética , Haplótipos , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética

10.

The power to detect quantitative trait loci using resequenced, experimentally evolved populations of diploid, sexual organisms.

Baldwin-Brown, James G; Long, Anthony D; Thornton, Kevin R.

Mol Biol Evol ; 31(4): 1040-55, 2014 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-24441104

RESUMO

A novel approach for dissecting complex traits is to experimentally evolve laboratory populations under a controlled environment shift, resequence the resulting populations, and identify single nucleotide polymorphisms (SNPs) and/or genomic regions highly diverged in allele frequency. To better understand the power and localization ability of such an evolve and resequence (E&R) approach, we carried out forward-in-time population genetics simulations of 1 Mb genomic regions under a large combination of experimental conditions, then attempted to detect significantly diverged SNPs. Our analysis indicates that the ability to detect differentiation between populations is primarily affected by selection coefficient, population size, number of replicate populations, and number of founding haplotypes. We estimate that E&R studies can detect and localize causative sites with 80% success or greater when the number of founder haplotypes is over 500, experimental populations are replicated at least 25-fold, population size is at least 1,000 diploid individuals, and the selection coefficient on the locus of interest is at least 0.1. More achievable experimental designs (less replicated, fewer founder haplotypes, smaller effective population size, and smaller selection coefficients) can have power of greater than 50% to identify a handful of SNPs of which one is likely causative. Similarly, in cases where s ≥ 0.2, less demanding experimental designs can yield high power.

Assuntos

Modelos Genéticos , Locos de Características Quantitativas , Animais , Simulação por Computador , Diploide , Drosophila melanogaster/genética , Evolução Molecular , Frequência do Gene , Genes de Insetos , Deriva Genética , Marcadores Genéticos , Escore Lod , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA

11.

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans.

Rogers, Rebekah L; Cridland, Julie M; Shao, Ling; Hu, Tina T; Andolfatto, Peter; Thornton, Kevin R.

Mol Biol Evol ; 31(7): 1750-66, 2014 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-24710518

RESUMO

We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of Drosophila yakuba and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting that deleterious impacts are common. Drosophila simulans shows larger numbers of whole gene duplications in comparison to larger proportions of gene fragments in D. yakuba. Drosophila simulans displays an excess of high-frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X or demographic forces driving duplicates to high frequency. We identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited noncoding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.

Assuntos

Drosophila/classificação , Drosophila/genética , Duplicação Gênica , Sequências de Repetição em Tandem , Animais , Evolução Molecular , Feminino , Variação Genética , Genoma , Genótipo , Taxa de Mutação , Deleção de Sequência

12.

Genome sequencing reveals complex speciation in the Drosophila simulans clade.

Garrigan, Daniel; Kingan, Sarah B; Geneva, Anthony J; Andolfatto, Peter; Clark, Andrew G; Thornton, Kevin R; Presgraves, Daven C.

Genome Res ; 22(8): 1499-511, 2012 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-22534282

RESUMO

The three species of the Drosophila simulans clade--the cosmopolitan species, D. simulans, and the two island endemic species, D. mauritiana and D. sechellia--are important models in speciation genetics, but some details of their phylogenetic and speciation history remain unresolved. The order and timing of speciation are disputed, and the existence, magnitude, and timing of gene flow among the three species remain unclear. Here we report on the analysis of a whole-genome four-species sequence alignment that includes all three D. simulans clade species as well as the D. melanogaster reference sequence. The alignment comprises novel, paired short-read sequence data from a single highly inbred line each from D. simulans, D. mauritiana, and D. sechellia. We are unable to reject a species phylogeny with a basal polytomy; the estimated age of the polytomy is 242,000 yr before the present. However, we also find that up to 4.6% of autosomal and 2.2% of X-linked regions have evolutionary histories consistent with recent gene flow between the mainland species (D. simulans) and the two island endemic species (D. mauritiana and D. sechellia). Our findings thus show that gene flow has occurred throughout the genomes of the D. simulans clade species despite considerable geographic, ecological, and intrinsic reproductive isolation. Last, our analysis of lineage-specific changes confirms that the D. sechellia genome has experienced a significant excess of slightly deleterious changes and a dearth of presumed favorable changes. The relatively reduced efficacy of natural selection in D. sechellia is consistent with its derived, persistently reduced historical effective population size.

Assuntos

Drosophila/classificação , Especiação Genética , Genoma de Inseto , Animais , Sequência de Bases , Cromossomos de Insetos/genética , Drosophila/genética , Evolução Molecular , Fluxo Gênico , Haplótipos , Filogenia , Densidade Demográfica , Isolamento Reprodutivo , Seleção Genética , Alinhamento de Sequência , Análise de Sequência de DNA

13.

Abundance and distribution of transposable elements in two Drosophila QTL mapping resources.

Cridland, Julie M; Macdonald, Stuart J; Long, Anthony D; Thornton, Kevin R.

Mol Biol Evol ; 30(10): 2311-27, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23883524

RESUMO

Here we present computational machinery to efficiently and accurately identify transposable element (TE) insertions in 146 next-generation sequenced inbred strains of Drosophila melanogaster. The panel of lines we use in our study is composed of strains from a pair of genetic mapping resources: the Drosophila Genetic Reference Panel (DGRP) and the Drosophila Synthetic Population Resource (DSPR). We identified 23,087 TE insertions in these lines, of which 83.3% are found in only one line. There are marked differences in the distribution of elements over the genome, with TEs found at higher densities on the X chromosome, and in regions of low recombination. We also identified many more TEs per base pair of intronic sequence and fewer TEs per base pair of exonic sequence than expected if TEs are located at random locations in the euchromatic genome. There was substantial variation in TE load across genes. For example, the paralogs derailed and derailed-2 show a significant difference in the number of TE insertions, potentially reflecting differences in the selection acting on these loci. When considering TE families, we find a very weak effect of gene family size on TE insertions per gene, indicating that as gene family size increases the number of TE insertions in a given gene within that family also increases. TEs are known to be associated with certain phenotypes, and our data will allow investigators using the DGRP and DSPR to assess the functional role of TE insertions in complex trait variation more generally. Notably, because most TEs are very rare and often private to a single line, causative TEs resulting in phenotypic differences among individuals may typically fail to replicate across mapping panels since individual elements are unlikely to segregate in both panels. Our data suggest that "burden tests" that test for the effect of TEs as a class may be more fruitful.

Assuntos

Elementos de DNA Transponíveis , Drosophila melanogaster/genética , Locos de Características Quantitativas , Animais , Biologia Computacional , Evolução Molecular , Feminino , Aptidão Genética , Genoma , Masculino , Modelos Genéticos , Família Multigênica , Polimorfismo de Nucleotídeo Único , Seleção Genética , Cromossomo X/genética

14.

Demes: a standard format for demographic models.

Gower, Graham; Ragsdale, Aaron P; Bisschop, Gertjan; Gutenkunst, Ryan N; Hartfield, Matthew; Noskova, Ekaterina; Schiffels, Stephan; Struck, Travis J; Kelleher, Jerome; Thornton, Kevin R.

Genetics ; 222(3)2022 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-36173327

RESUMO

Understanding the demographic history of populations is a key goal in population genetics, and with improving methods and data, ever more complex models are being proposed and tested. Demographic models of current interest typically consist of a set of discrete populations, their sizes and growth rates, and continuous and pulse migrations between those populations over a number of epochs, which can require dozens of parameters to fully describe. There is currently no standard format to define such models, significantly hampering progress in the field. In particular, the important task of translating the model descriptions in published work into input suitable for population genetic simulators is labor intensive and error prone. We propose the Demes data model and file format, built on widely used technologies, to alleviate these issues. Demes provide a well-defined and unambiguous model of populations and their properties that is straightforward to implement in software, and a text file format that is designed for simplicity and clarity. We provide thoroughly tested implementations of Demes parsers in multiple languages including Python and C, and showcase initial support in several simulators and inference methods. An introduction to the file format and a detailed specification are available at https://popsim-consortium.github.io/demes-spec-docs/.

Assuntos

Genética Populacional , Software , Demografia

15.

An approximate bayesian estimator suggests strong, recurrent selective sweeps in Drosophila.

Jensen, Jeffrey D; Thornton, Kevin R; Andolfatto, Peter.

PLoS Genet ; 4(9): e1000198, 2008 Sep 19.

Artigo em Inglês | MEDLINE | ID: mdl-18802463

RESUMO

The recurrent fixation of newly arising, beneficial mutations in a species reduces levels of linked neutral variability. Models positing frequent weakly beneficial substitutions or, alternatively, rare, strongly selected substitutions predict similar average effects on linked neutral variability, if the product of the rate and strength of selection is held constant. We propose an approximate Bayesian (ABC) polymorphism-based estimator that can be used to distinguish between these models, and apply it to multi-locus data from Drosophila melanogaster. We investigate the extent to which inference about the strength of selection is sensitive to assumptions about the underlying distributions of the rates of substitution and recombination, the strength of selection, heterogeneity in mutation rate, as well as the population's demographic history. We show that assuming fixed values of selection parameters in estimation leads to overestimates of the strength of selection and underestimates of the rate. We estimate parameters for an African population of D. melanogaster (s approximately 2E-03, ) and compare these to previous estimates. Finally, we show that surveying larger genomic regions is expected to lend much more discriminatory power to the approach. It will thus be of great interest to apply this method to emerging whole-genome polymorphism data sets in many taxa.

Assuntos

Drosophila melanogaster/genética , Genética Populacional/métodos , Seleção Genética , África , Animais , Teorema de Bayes , Simulação por Computador , Demografia , Evolução Molecular , Variação Genética , Genoma de Inseto , Modelos Genéticos , Modelos Estatísticos , Polimorfismo Genético

16.

A new approach for using genome scans to detect recent positive selection in the human genome.

Tang, Kun; Thornton, Kevin R; Stoneking, Mark.

PLoS Biol ; 5(7): e171, 2007 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-17579516

RESUMO

Genome-wide scanning for signals of recent positive selection is essential for a comprehensive and systematic understanding of human adaptation. Here, we present a genomic survey of recent local selective sweeps, especially aimed at those nearly or recently completed. A novel approach was developed for such signals, based on contrasting the extended haplotype homozygosity (EHH) profiles between populations. We applied this method to the genome single nucleotide polymorphism (SNP) data of both the International HapMap Project and Perlegen Sciences, and detected widespread signals of recent local selection across the genome, consisting of both complete and partial sweeps. A challenging problem of genomic scans of recent positive selection is to clearly distinguish selection from neutral effects, given the high sensitivity of the test statistics to departures from neutral demographic assumptions and the lack of a single, accurate neutral model of human history. We therefore developed a new procedure that is robust across a wide range of demographic and ascertainment models, one that indicates that certain portions of the genome clearly depart from neutrality. Simulations of positive selection showed that our tests have high power towards strong selection sweeps that have undergone fixation. Gene ontology analysis of the candidate regions revealed several new functional groups that might help explain some important interpopulation differences in phenotypic traits.

Assuntos

Genoma Humano , Seleção Genética , Alelos , Biometria , Bases de Dados Genéticas , Técnicas Genéticas , Genética Populacional , Haplótipos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Fatores de Tempo

17.

Inferring selection in partially sequenced regions.

Jensen, Jeffrey D; Thornton, Kevin R; Aquadro, Charles F.

Mol Biol Evol ; 25(2): 438-46, 2008 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-18165259

RESUMO

A common approach for identifying loci influenced by positive selection involves scanning large portions of the genome for regions that are inconsistent with the neutral equilibrium model or represent outliers relative to the empirical distribution of some aspect of the data. Once identified, partial sequence is generated spanning this more localized region in order to quantify the site-frequency spectrum and evaluate the data with tests of neutrality and selection. This method is widely used as partial sequencing is less expensive with regard to both time and money. Here, we demonstrate that this approach can lead to biased maximum likelihood estimates of selection parameters and reduced rejection rates, with some parameter combinations resulting in clearly misleading results. Most significantly, for a commonly used sample size in Drosophila population genetics (i.e., n = 12), the estimate of the target of selection has a large mean square error and the strength of selection is severely under estimated when the true selected site has not been sampled. We propose sequencing approaches that are much more likely to accurately localize the target and estimate the strength of selection. Additionally, we examine the performance of a commonly used test of selection under a variety of recurrent and single sweep models.

Assuntos

Algoritmos , Genoma , Modelos Genéticos , Funções Verossimilhança

18.

Automating approximate Bayesian computation by local linear regression.

Thornton, Kevin R.

BMC Genet ; 10: 35, 2009 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-19583871

RESUMO

BACKGROUND: In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. RESULTS: The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. CONCLUSION: In practice, the ABCreg simplifies implementing ABC based on local-linear regression.

Assuntos

Teorema de Bayes , Biologia Computacional/métodos , Modelos Lineares , Software , Algoritmos , Animais , Simulação por Computador , Drosophila melanogaster/genética , Genética Populacional/métodos

19.

Polygenic Adaptation to an Environmental Shift: Temporal Dynamics of Variation Under Gaussian Stabilizing Selection and Additive Effects on a Single Trait.

Thornton, Kevin R.

Genetics ; 213(4): 1513-1530, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31653678

RESUMO

Predictions about the effect of natural selection on patterns of linked neutral variation are largely based on models involving the rapid fixation of unconditionally beneficial mutations. However, when phenotypes adapt to a new optimum trait value, the strength of selection on individual mutations decreases as the population adapts. Here, I use explicit forward simulations of a single trait with additive-effect mutations adapting to an "optimum shift." Detectable "hitchhiking" patterns are only apparent if (i) the optimum shifts are large with respect to equilibrium variation for the trait, (ii) mutation rates to large-effect mutations are low, and (iii) large-effect mutations rapidly increase in frequency and eventually reach fixation, which typically occurs after the population reaches the new optimum. For the parameters simulated here, partial sweeps do not appreciably affect patterns of linked variation, even when the mutations are strongly selected. The contribution of new mutations vs. standing variation to fixation depends on the mutation rate affecting trait values. Given the fixation of a strongly selected variant, patterns of hitchhiking are similar on average for the two classes of sweeps because sweeps from standing variation involving large-effect mutations are rare when the optimum shifts. The distribution of effect sizes of new mutations has little effect on the time to reach the new optimum, but reducing the mutational variance increases the magnitude of hitchhiking patterns. In general, populations reach the new optimum prior to the completion of any sweeps, and the times to fixation are longer for this model than for standard models of directional selection. The long fixation times are due to a combination of declining selection pressures during adaptation and the possibility of interference among weakly selected sites for traits with high mutation rates.

Assuntos

Adaptação Fisiológica/genética , Meio Ambiente , Herança Multifatorial/genética , Característica Quantitativa Herdável , Seleção Genética , Simulação por Computador , Loci Gênicos , Variação Genética , Haplótipos/genética , Modelos Genéticos , Mutação/genética , Distribuição Normal , Fenótipo , Recombinação Genética/genética , Fatores de Tempo

20.

The neutral coalescent process for recent gene duplications and copy-number variants.

Thornton, Kevin R.

Genetics ; 177(2): 987-1000, 2007 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-17720930

RESUMO

I describe a method for simulating samples from gene families of size two under a neutral coalescent process, for the case where the duplicate gene either has fixed recently in the population or is still segregating. When a duplicate locus has recently fixed by genetic drift, diversity in the new gene is expected to be reduced, and an excess of rare alleles is expected, relative to the predictions of the standard coalescent model. The expected patterns of polymorphism in segregating duplicates ("copy-number variants") depend both on the frequency of the duplicate in the sample and on the rate of crossing over between the two loci. When the crossover rate between the ancestral gene and the copy-number variant is low, the expected pattern of variability in the ancestral gene will be similar to the predictions of models of either balancing or positive selection, if the frequency of the duplicate in the sample is intermediate or high, respectively. Simulations are used to investigate the effect of crossing over between loci, and gene conversion between the duplicate loci, on levels of variability and the site-frequency spectrum.

Assuntos

Dosagem de Genes/genética , Duplicação Gênica , Variação Genética/genética , Troca Genética , Deriva Genética , Modelos Genéticos , Polimorfismo Genético

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA