RESUMO
Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of "linked selection" on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of other modes of linked selection and of adaptation in particular. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.
Assuntos
Drosophila melanogaster/genética , Evolução Molecular , Variação Genética , Seleção Genética/genética , Adaptação Biológica/genética , Substituição de Aminoácidos/genética , Animais , Mapeamento Cromossômico , Genoma de Inseto , Modelos Genéticos , Regiões não Traduzidas/genéticaRESUMO
We create a new assembly of the Drosophila simulans genome using 142 million paired short-read sequences and previously published data for strain w(501). Our assembly represents a higher-quality genomic sequence with greater coverage, fewer misassemblies, and, by several indexes, fewer sequence errors. Evolutionary analysis of this genome reference sequence reveals interesting patterns of lineage-specific divergence that are different from those previously reported. Specifically, we find that Drosophila melanogaster evolves faster than D. simulans at all annotated classes of sites, including putatively neutrally evolving sites found in minimal introns. While this may be partly explained by a higher mutation rate in D. melanogaster, we also find significant heterogeneity in rates of evolution across classes of sites, consistent with historical differences in the effective population size for the two species. Also contrary to previous findings, we find that the X chromosome is evolving significantly faster than autosomes for nonsynonymous and most noncoding DNA sites and significantly slower for synonymous sites. The absence of a X/A difference for putatively neutral sites and the robustness of the pattern to Gene Ontology and sex-biased expression suggest that partly recessive beneficial mutations may comprise a substantial fraction of noncoding DNA divergence observed between species. Our results have more general implications for the interpretation of evolutionary analyses of genomes of different quality.
Assuntos
Drosophila/genética , Evolução Molecular , Genoma de Inseto , Animais , Cromossomos de Insetos/genética , Mapeamento de Sequências Contíguas , Íntrons , Taxa de Mutação , Filogenia , População/genética , Cromossomo X/genéticaRESUMO
Plants can defend themselves against a wide array of enemies, from microbes to large animals, yet there is great variability in the effectiveness of such defences, both within and between species. Some of this variation can be explained by conflicting pressures from pathogens with different modes of attack. A second explanation comes from an evolutionary 'tug of war', in which pathogens adapt to evade detection, until the plant has evolved new recognition capabilities for pathogen invasion. If selection is, however, sufficiently strong, susceptible hosts should remain rare. That this is not the case is best explained by costs incurred from constitutive defences in a pest-free environment. Using a combination of forward genetics and genome-wide association analyses, we demonstrate that allelic diversity at a single locus, ACCELERATED CELL DEATH 6 (ACD6), underpins marked pleiotropic differences in both vegetative growth and resistance to microbial infection and herbivory among natural Arabidopsis thaliana strains. A hyperactive ACD6 allele, compared to the reference allele, strongly enhances resistance to a broad range of pathogens from different phyla, but at the same time slows the production of new leaves and greatly reduces the biomass of mature leaves. This allele segregates at intermediate frequency both throughout the worldwide range of A. thaliana and within local populations, consistent with this allele providing substantial fitness benefits despite its marked impact on growth.
Assuntos
Alelos , Arabidopsis/genética , Aptidão Genética/genética , Variação Genética/genética , Anquirinas/genética , Anquirinas/metabolismo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Arabidopsis/microbiologia , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Biomassa , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Estudo de Associação Genômica Ampla , Dados de Sequência Molecular , Fenótipo , Doenças das Plantas/genética , Doenças das Plantas/microbiologia , Folhas de Planta/anatomia & histologia , Folhas de Planta/genética , Folhas de Planta/crescimento & desenvolvimento , Folhas de Planta/parasitologia , Locos de Características QuantitativasRESUMO
Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases, genome-wide association (GWA) studies have, owing to advances in genotyping and sequencing technology, become an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available, because once these lines have been genotyped they can be phenotyped multiple times, making it possible (as well as extremely cost effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly self-fertilizing model plant known to harbour considerable genetic variation for many adaptively important traits. Our results are dramatically different from those of human GWA studies, in that we identify many common alleles of major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true associations from false. However, a-priori candidates are significantly over-represented among these associations as well, making many of them excellent candidates for follow-up experiments. Our study demonstrates the feasibility of GWA studies in A. thaliana and suggests that the approach will be appropriate for many other organisms.
Assuntos
Arabidopsis/classificação , Arabidopsis/genética , Genoma de Planta/genética , Estudo de Associação Genômica Ampla , Fenótipo , Alelos , Proteínas de Arabidopsis/genética , Flores/genética , Genes de Plantas/genética , Loci Gênicos/genética , Genótipo , Imunidade Inata/genética , Endogamia , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Linkage disequilibrium (LD) is a major aspect of the organization of genetic variation in natural populations. Here we describe the genome-wide pattern of LD in a sample of 19 Arabidopsis thaliana accessions using 341,602 non-singleton SNPs. LD decays within 10 kb on average, considerably faster than previously estimated. Tag SNP selection algorithms and 'hide-the-SNP' simulations suggest that genome-wide association mapping will require only 40%-50% of the observed SNPs, a reduction similar to estimates in a sample of African Americans. An Affymetrix genotyping array containing 250,000 SNPs has been designed based on these results; we demonstrate that it should have more than adequate coverage for genome-wide association mapping. The extent of LD is highly variable, and we find clear evidence of recombination hotspots, which seem to occur preferentially in intergenic regions. LD also reflects the action of selection, and it is more extensive between nonsynonymous polymorphisms than between synonymous polymorphisms.
Assuntos
Arabidopsis/genética , Desequilíbrio de Ligação , Recombinação Genética , Mapeamento Cromossômico/métodos , Cromossomos de Plantas/genética , Genoma de Planta , Genótipo , Haplótipos , Modelos Genéticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of Drosophila yakuba and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting that deleterious impacts are common. Drosophila simulans shows larger numbers of whole gene duplications in comparison to larger proportions of gene fragments in D. yakuba. Drosophila simulans displays an excess of high-frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X or demographic forces driving duplicates to high frequency. We identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited noncoding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.
Assuntos
Drosophila/classificação , Drosophila/genética , Duplicação Gênica , Sequências de Repetição em Tandem , Animais , Evolução Molecular , Feminino , Variação Genética , Genoma , Genótipo , Taxa de Mutação , Deleção de SequênciaRESUMO
We present a new approach to genotyping based on multiplexed shotgun sequencing that can identify recombination breakpoints in a large number of individuals simultaneously at a resolution sufficient for most mapping purposes, such as quantitative trait locus (QTL) mapping and mapping of induced mutations. We first describe a simple library construction protocol that uses just 10 ng of genomic DNA per individual and makes the approach accessible to any laboratory with standard molecular biology equipment. Sequencing this library results in a large number of sequence reads widely distributed across the genomes of multiplexed bar-coded individuals. We develop a Hidden Markov Model to estimate ancestry at all genomic locations in all individuals using these data. We demonstrate the utility of the approach by mapping a dominant marker allele in D. simulans to within 105 kb of its true position using 96 F1-backcross individuals genotyped in a single lane on an Illumina Genome Analyzer. We further demonstrate the utility of our method by genetically mapping more than 400 previously unassembled D. simulans contigs to linkage groups and by evaluating the quality of targeted introgression lines. At this level of multiplexing and divergence between strains, our method allows estimation of recombination breakpoints to a median of 38-kb intervals. Our analysis suggests that higher levels of multiplexing and/or use of strains with lower levels of divergence are practicable.
Assuntos
Mapeamento Cromossômico/métodos , Tipagem Molecular/métodos , Análise de Sequência de DNA/métodos , Animais , Pontos de Quebra do Cromossomo , Biologia Computacional , Drosophila/genética , Feminino , Genes Dominantes/genética , Marcadores Genéticos , Genótipo , Masculino , Fenótipo , Locos de Características Quantitativas/genética , Recombinação Genética/genética , Projetos de PesquisaRESUMO
The detection of footprints of natural selection in genetic polymorphism data is fundamental to understanding the genetic basis of adaptation, and has important implications for human health. The standard approach has been to reject neutrality in favor of selection if the pattern of variation at a candidate locus was significantly different from the predictions of the standard neutral model. The problem is that the standard neutral model assumes more than just neutrality, and it is almost always possible to explain the data using an alternative neutral model with more complex demography. Today's wealth of genomic polymorphism data, however, makes it possible to dispense with models altogether by simply comparing the pattern observed at a candidate locus to the genomic pattern, and rejecting neutrality if the pattern is extreme. Here, we utilize this approach on a truly genomic scale, comparing a candidate locus to thousands of alleles throughout the Arabidopsis thaliana genome. We demonstrate that selection has acted to increase the frequency of early-flowering alleles at the vernalization requirement locus FRIGIDA. Selection seems to have occurred during the last several thousand years, possibly in response to the spread of agriculture. We introduce a novel test statistic based on haplotype sharing that embraces the problem of population structure, and so should be widely applicable.
Assuntos
Arabidopsis/genética , Arabidopsis/fisiologia , Flores/fisiologia , Alelos , Proteínas de Arabidopsis/genética , Evolução Biológica , Flores/genética , Genoma de Planta , Polimorfismo Genético , Dinâmica Populacional , Seleção Genética , Estatísticas não Paramétricas , Fatores de TempoRESUMO
We resequenced 876 short fragments in a sample of 96 individuals of Arabidopsis thaliana that included stock center accessions as well as a hierarchical sample from natural populations. Although A. thaliana is a selfing weed, the pattern of polymorphism in general agrees with what is expected for a widely distributed, sexually reproducing species. Linkage disequilibrium decays rapidly, within 50 kb. Variation is shared worldwide, although population structure and isolation by distance are evident. The data fail to fit standard neutral models in several ways. There is a genome-wide excess of rare alleles, at least partially due to selection. There is too much variation between genomic regions in the level of polymorphism. The local level of polymorphism is negatively correlated with gene density and positively correlated with segmental duplications. Because the data do not fit theoretical null distributions, attempts to infer natural selection from polymorphism data will require genome-wide surveys of polymorphism in order to identify anomalous regions. Despite this, our data support the utility of A. thaliana as a model for evolutionary functional genomics.
Assuntos
Arabidopsis/genética , Polimorfismo Genético , Frequência do Gene , Genética Populacional , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Tandem duplications are an essential source of genetic novelty, and their variation in natural populations is expected to influence adaptive walks. Here, we describe evolutionary impacts of recently-derived, segregating tandem duplications in Drosophila yakuba and Drosophila simulans. We observe an excess of duplicated genes involved in defense against pathogens, insecticide resistance, chorion development, cuticular peptides, and lipases or endopeptidases associated with the accessory glands across both species. The observed agreement is greater than expectations on chance alone, suggesting large amounts of convergence across functional categories. We document evidence of widespread selection on the D. simulans X, suggesting adaptation through duplication is common on the X. Despite the evidence for positive selection, duplicates display an excess of low frequency variants consistent with largely detrimental impacts, limiting the variation that can effectively facilitate adaptation. Standing variation for tandem duplications spans less than 25% of the genome in D. yakuba and D. simulans, indicating that evolution will be strictly limited by mutation, even in organisms with large population sizes. Effective whole gene duplication rates are low at 1.17 × 10-9 per gene per generation in D. yakuba and 6.03 × 10-10 per gene per generation in D. simulans, suggesting long wait times for new mutations on the order of thousands of years for the establishment of sweeps. Hence, in cases where adaptation depends on individual tandem duplications, evolution will be severely limited by mutation. We observe low levels of parallel recruitment of the same duplicated gene in different species, suggesting that the span of standing variation will define evolutionary outcomes in spite of convergence across gene ontologies consistent with rapidly evolving phenotypes.
Assuntos
Drosophila simulans/genética , Drosophila/genética , Genoma de Inseto , Seleção Genética , Animais , Evolução Biológica , Duplicação Gênica , Variação Genética , Funções Verossimilhança , Polimorfismo de Nucleotídeo Único , Cromossomo XRESUMO
The shift from outcrossing to selfing is common in flowering plants, but the genomic consequences and the speed at which they emerge remain poorly understood. An excellent model for understanding the evolution of self fertilization is provided by Capsella rubella, which became self compatible <200,000 years ago. We report a C. rubella reference genome sequence and compare RNA expression and polymorphism patterns between C. rubella and its outcrossing progenitor Capsella grandiflora. We found a clear shift in the expression of genes associated with flowering phenotypes, similar to that seen in Arabidopsis, in which self fertilization evolved about 1 million years ago. Comparisons of the two Capsella species showed evidence of rapid genome-wide relaxation of purifying selection in C. rubella without a concomitant change in transposable element abundance. Overall we document that the transition to selfing may be typified by parallel shifts in gene expression, along with a measurable reduction of purifying selection.
Assuntos
Capsella/genética , Evolução Molecular , Fertilização/genética , Genoma de Planta , Polinização/genética , Arabidopsis/genética , Fertilização/fisiologia , Genes de Plantas , Genoma de Planta/fisiologia , Dados de Sequência Molecular , Polinização/fisiologia , Autofertilização/genética , Análise de Sequência de DNA , Fatores de TempoRESUMO
We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.
Assuntos
Arabidopsis/genética , Genoma de Planta , Arabidopsis/classificação , Sequência de Bases , Centrômero/genética , Cromossomos de Plantas/genética , DNA de Plantas/genética , Evolução Molecular , Modelos Genéticos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico , Especificidade da EspécieRESUMO
A powerful way to map functional genomic variation and reveal the genetic basis of local adaptation is to associate allele frequency across the genome with environmental conditions. Serpentine soils, characterized by high heavy-metal content and low calcium-to-magnesium ratios, are a classic context for studying adaptation of plants to local soil conditions. To investigate whether Arabidopsis lyrata is locally adapted to serpentine soil, and to map the polymorphisms responsible for such adaptation, we pooled DNA from individuals from serpentine and nonserpentine soils and sequenced each 'gene pool' with the Illumina Genome Analyzer. The polymorphisms that are most strongly associated with soil type are enriched at heavy-metal detoxification and calcium and magnesium transport loci, providing numerous candidate mutations for serpentine adaptation. Sequencing of three candidate loci in the European subspecies of A. lyrata indicates parallel differentiation of the same polymorphism at one locus, confirming ecological adaptation, and different polymorphisms at two other loci, which may indicate convergent evolution.
Assuntos
Adaptação Biológica/genética , Arabidopsis/genética , Asbestos Serpentinas/farmacologia , Análise de Sequência de DNA , Solo , Adaptação Biológica/efeitos dos fármacos , Arabidopsis/fisiologia , DNA de Plantas/análise , DNA de Plantas/efeitos dos fármacos , Evolução Molecular , Genes de Plantas/efeitos dos fármacos , Genética Populacional , Genoma de Planta/efeitos dos fármacos , Geografia , Metais Pesados/farmacologia , Repetições de Microssatélites/genética , Polimorfismo Genético , Análise de Sequência de DNA/métodosRESUMO
Unlike most of its close relatives, Arabidopsis thaliana is capable of self-pollination. In other members of the mustard family, outcrossing is ensured by the complex self-incompatibility (S) locus,which harbors multiple diverged specificity haplotypes that effectively prevent selfing. We investigated the role of the S locus in the evolution of and transition to selfing in A. thaliana. We found that the S locus of A. thaliana harbored considerable diversity, which is an apparent remnant of polymorphism in the outcrossing ancestor. Thus, the fixation of a single inactivated S-locus allele cannot have been a key step in the transition to selfing. An analysis of the genome-wide pattern of linkage disequilibrium suggests that selfing most likely evolved roughly a million years ago or more.
Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Arabidopsis/fisiologia , Evolução Biológica , Genes de Plantas , Proteínas Nucleares/genética , Proteínas de Plantas/genética , Proteínas Quinases/genética , Pseudogenes , Alelos , Sequência de Aminoácidos , Cromossomos Artificiais Bacterianos , Deriva Genética , Haplótipos , Desequilíbrio de Ligação , Dados de Sequência Molecular , Reação em Cadeia da Polimerase , Polimorfismo Genético , Reprodução/fisiologiaRESUMO
We used hybridization to the ATH1 gene expression array to interrogate genomic DNA diversity in 23 wild strains (accessions) of Arabidopsis thaliana (arabidopsis), in comparison with the reference strain Columbia (Col). At <1% false discovery rate, we detected 77,420 single-feature polymorphisms (SFPs) with distinct patterns of variation across the genome. Total and pair-wise diversity was higher near the centromeres and the heterochromatic knob region, but overall diversity was positively correlated with recombination rate (R(2) = 3.1%). The difference between total and pair-wise SFP diversity is a relative measure contrasting diversifying or frequency-dependent selection, similar to Tajima's D, and can be calibrated by the empirical genome-wide distribution. Each unique locus, centered on a gene, has a diversity and selection score that suggest a relative role in past evolutionary processes. Homologs of disease resistance (R) genes include members with especially high levels of diversity often showing frequency-dependent selection and occasionally evidence of a past selective sweep. Receptor-like and S-locus proteins also contained members with elevated levels of diversity and signatures of selection, whereas other gene families, bHLH, F-box, and RING finger proteins, showed more typical levels of diversity. SFPs identified with the gene expression array also provide an empirical hybridization polymorphism background for studies of gene expression polymorphism and are available through the genome browser http://signal.salk.edu/cgi-bin/AtSFP.
Assuntos
Arabidopsis/genética , Genoma de Planta/genética , Polimorfismo Genético , Cromossomos de Plantas/genética , Genes de Plantas , Haplótipos , Seleção GenéticaRESUMO
The genomes of individuals from the same species vary in sequence as a result of different evolutionary processes. To examine the patterns of, and the forces shaping, sequence variation in Arabidopsis thaliana, we performed high-density array resequencing of 20 diverse strains (accessions). More than 1 million nonredundant single-nucleotide polymorphisms (SNPs) were identified at moderate false discovery rates (FDRs), and approximately 4% of the genome was identified as being highly dissimilar or deleted relative to the reference genome sequence. Patterns of polymorphism are highly nonrandom among gene families, with genes mediating interaction with the biotic environment having exceptional polymorphism levels. At the chromosomal scale, regional variation in polymorphism was readily apparent. A scan for recent selective sweeps revealed several candidate regions, including a notable example in which almost all variation was removed in a 500-kilobase window. Analyzing the polymorphisms we describe in larger sets of accessions will enable a detailed understanding of forces shaping population-wide sequence variation in A. thaliana.