RESUMO
Chromosomal fusions represent one of the most common types of chromosomal rearrangements found in nature. Yet, their role in shaping the genomic landscape of recombination and hence genome evolution remains largely unexplored. Here, we take advantage of wild mice populations with chromosomal fusions to evaluate the effect of this type of structural variant on genomic landscapes of recombination and divergence. To this aim, we combined cytological analysis of meiotic crossovers in primary spermatocytes with inferred analysis of recombination rates based on linkage disequilibrium using single nucleotide polymorphisms. Our results suggest the presence of a combined effect of Robertsonian fusions and Prdm9 allelic background, a gene involved in the formation of meiotic double strand breaks and postzygotic reproductive isolation, in reshaping genomic landscapes of recombination. We detected a chromosomal redistribution of meiotic recombination toward telomeric regions in metacentric chromosomes in mice with Robertsonian fusions when compared to nonfused mice. This repatterning was accompanied by increased levels of crossover interference and reduced levels of estimated recombination rates between populations, together with high levels of genomic divergence. Interestingly, we detected that Prdm9 allelic background was a major determinant of recombination rates at the population level, whereas Robertsonian fusions showed limited effects, restricted to centromeric regions of fused chromosomes. Altogether, our results provide new insights into the effect of Robertsonian fusions and Prdm9 background on meiotic recombination.
Assuntos
Cromossomos , Genômica , Masculino , Animais , Camundongos , AlelosRESUMO
Adaptive challenges that humans faced as they expanded across the globe left specific molecular footprints that can be decoded in our today's genomes. Different sets of metrics are used to identify genomic regions that have undergone selection. However, there are fewer methods capable of pinpointing the allele ultimately responsible for this selection. Here, we present PopHumanVar, an interactive online application that is designed to facilitate the exploration and thorough analysis of candidate genomic regions by integrating both functional and population genomics data currently available. PopHumanVar generates useful summary reports of prioritized variants that are putatively causal of recent selective sweeps. It compiles data and graphically represents different layers of information, including natural selection statistics, as well as functional annotations and genealogical estimations of variant age, for biallelic single nucleotide variants (SNVs) of the 1000 Genomes Project phase 3. Specifically, PopHumanVar amasses SNV-based information from GEVA, SnpEFF, GWAS Catalog, ClinVar, RegulomeDB and DisGeNET databases, as well as accurate estimations of iHS, nSL and iSAFE statistics. Notably, PopHumanVar can successfully identify known causal variants of frequently reported candidate selection regions, including EDAR in East-Asians, ACKR1 (DARC) in Africans and LCT/MCM6 in Europeans. PopHumanVar is open and freely available at https://pophumanvar.uab.cat.
Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Seleção Genética/genética , Software , Adaptação Fisiológica/genética , Biologia Computacional , Genômica , Humanos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Drosophila melanogaster is a leading model in population genetics and genomics, and a growing number of whole-genome data sets from natural populations of this species have been published over the last years. A major challenge is the integration of disparate data sets, often generated using different sequencing technologies and bioinformatic pipelines, which hampers our ability to address questions about the evolution of this species. Here we address these issues by developing a bioinformatics pipeline that maps pooled sequencing (Pool-Seq) reads from D. melanogaster to a hologenome consisting of fly and symbiont genomes and estimates allele frequencies using either a heuristic (PoolSNP) or a probabilistic variant caller (SNAPE-pooled). We use this pipeline to generate the largest data repository of genomic data available for D. melanogaster to date, encompassing 271 previously published and unpublished population samples from over 100 locations in >20 countries on four continents. Several of these locations have been sampled at different seasons across multiple years. This data set, which we call Drosophila Evolution over Space and Time (DEST), is coupled with sampling and environmental metadata. A web-based genome browser and web portal provide easy access to the SNP data set. We further provide guidelines on how to use Pool-Seq data for model-based demographic inference. Our aim is to provide this scalable platform as a community resource which can be easily extended via future efforts for an even more extensive cosmopolitan data set. Our resource will enable population geneticists to analyze spatiotemporal genetic patterns and evolutionary dynamics of D. melanogaster populations in unprecedented detail.
Assuntos
Drosophila melanogaster , Metagenômica , Animais , Drosophila melanogaster/genética , Frequência do Gene , Genética Populacional , GenômicaRESUMO
Since the migrations that led humans to colonize Earth, our species has faced frequent adaptive challenges that have left signatures in the landscape of genetic variation and that we can identify in our today's genomes. Here, we (i) perform an outlier approach on eight different population genetic statistics for 22 non-admixed human populations of the Phase III of the 1000 Genomes Project to detect selective sweeps at different historical ages, as well as events of recurrent positive selection in the human lineage; and (ii) create PopHumanScan, an online catalog that compiles and annotates all candidate regions under selection to facilitate their validation and thoroughly analysis. Well-known examples of human genetic adaptation published elsewhere are included in the catalog, as well as hundreds of other attractive candidates that will require further investigation. Designed as a collaborative database, PopHumanScan aims to become a central repository to share information, guide future studies and help advance our understanding of how selection has modeled our genomes as a response to changes in the environment or lifestyle of human populations. PopHumanScan is open and freely available at https://pophumanscan.uab.cat.
Assuntos
Adaptação Fisiológica/genética , Biologia Computacional/métodos , Genética Populacional/métodos , Genoma Humano/genética , Seleção Genética , Bases de Dados Genéticas , Evolução Molecular , Genômica/métodos , Humanos , Internet , Desequilíbrio de Ligação , Modelos GenéticosRESUMO
The McDonald and Kreitman test (MKT) is one of the most powerful and widely used methods to detect and quantify recurrent natural selection using DNA sequence data. Here we present iMKT (acronym for integrative McDonald and Kreitman test), a novel web-based service performing four distinct MKT types. It allows the detection and estimation of four different selection regimes -adaptive, neutral, strongly deleterious and weakly deleterious- acting on any genomic sequence. iMKT can analyze both user's own population genomic data and pre-loaded Drosophila melanogaster and human sequences of protein-coding genes obtained from the largest population genomic datasets to date. Advanced options in the website allow testing complex hypotheses such as the application example showed here: do genes located in high recombination regions undergo higher rates of adaptation? We aim that iMKT will become a reference site tool for the study of evolutionary adaptation in massive population genomics datasets, especially in Drosophila and humans. iMKT is a free resource online at https://imkt.uab.cat.
Assuntos
Adaptação Fisiológica/genética , Drosophila melanogaster/genética , Genoma , Recombinação Genética , Seleção Genética , Análise de Sequência de DNA/estatística & dados numéricos , Alelos , Animais , Evolução Biológica , Conjuntos de Dados como Assunto , Frequência do Gene , Humanos , Metagenômica , Polimorfismo GenéticoRESUMO
The 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting >84 million variants. The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human populations. Here we present PopHuman, a new population genomics-oriented genome browser based on JBrowse that allows the interactive visualization and retrieval of an extensive inventory of population genetics metrics. Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. PopHuman is open and freely available at http://pophuman.uab.cat.
Assuntos
Bases de Dados Genéticas , Variação Genética , Genética Populacional , Genoma Humano , Cromossomos Humanos , Genes , Genômica , HumanosRESUMO
Evolutionary rates for protein-coding genes are determined not only by natural selection but also by multiple genomic factors including mutation rates, recombination, gene expression levels, and chromosomal location. To investigate the joint effects of different genomic determinants on protein evolution, we compared the coding sequences of 9017 single-copy orthologs between 2 cactophilic species from the Drosophila subgenus, Drosophila mojavensis and D. buzzatii, whose genomes have been previously sequenced. We assessed the impact of 7 genomic determinants, that is, chromosome type, recombination, chromosomal inversions, expression breadth, expression level, gene length, and the number of exons, on divergence rates of protein-coding genes to understand patterns of evolutionary variation. Integrative analysis of these factors revealed that 1) X-linked and autosomal genes evolve at significantly different rates in agreement with the faster-X hypothesis, 2) genes located on the dot chromosome and pericentromeric regions have higher divergence rates, 3) genes located at chromosomes with more fixed inversions have higher pairwise divergence than those located at nearly collinear chromosomes, and 4) gene expression patterns can be considered the strongest determinant of protein evolution. In addition, the number of exons and protein length had a significant effect on pairwise divergence at synonymous sites. All in all, our results show the relative importance of each genomic factor on the rates of protein evolution and functional constraint in these 2 cactophilic Drosophila species.
Assuntos
Proteínas de Drosophila/genética , Drosophila/genética , Evolução Molecular , Genoma de Inseto , Animais , Recombinação Genética , Especificidade da EspécieRESUMO
SUMMARY: The recent compilation of over 1100 worldwide wild-derived Drosophila melanogaster genome sequences reassembled using a standardized pipeline provides a unique resource for population genomic studies (Drosophila Genome Nexus, DGN). A visual display of the estimated metrics describing genome-wide variation and selection patterns would allow gaining a global view and understanding of the evolutionary forces shaping genome variation. AVAILABILITY AND IMPLEMENTATION: Here, we present PopFly, a population genomics-oriented genome browser, based on JBrowse software, that contains a complete inventory of population genomic parameters estimated from DGN data. This browser is designed for the automatic analysis and display of genetic variation data within and between populations along the D. melanogaster genome. PopFly allows the visualization and retrieval of functional annotations, estimates of nucleotide diversity metrics, linkage disequilibrium statistics, recombination rates, a battery of neutrality tests, and population differentiation parameters at different window sizes through the euchromatic chromosomes. PopFly is open and freely available at site http://popfly.uab.cat . CONTACT: sergi.hervas@uab.cat or antonio.barbadilla@uab.cat.
Assuntos
Drosophila melanogaster/genética , Variação Genética , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Animais , Evolução Biológica , Genoma de Inseto , Desequilíbrio de LigaçãoRESUMO
A major challenge of biology is understanding the relationship between molecular genetic variation and variation in quantitative traits, including fitness. This relationship determines our ability to predict phenotypes from genotypes and to understand how evolutionary forces shape variation within and between species. Previous efforts to dissect the genotype-phenotype map were based on incomplete genotypic information. Here, we describe the Drosophila melanogaster Genetic Reference Panel (DGRP), a community resource for analysis of population genomics and quantitative traits. The DGRP consists of fully sequenced inbred lines derived from a natural population. Population genomic analyses reveal reduced polymorphism in centromeric autosomal regions and the X chromosome, evidence for positive and negative selection, and rapid evolution of the X chromosome. Many variants in novel genes, most at low frequency, are associated with quantitative traits and explain a large fraction of the phenotypic variance. The DGRP facilitates genotype-phenotype mapping using the power of Drosophila genetics.
Assuntos
Drosophila melanogaster/genética , Estudo de Associação Genômica Ampla , Genômica , Locos de Características Quantitativas/genética , Alelos , Animais , Centrômero/genética , Cromossomos de Insetos/genética , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Seleção Genética/genética , Inanição/genética , Telômero/genética , Cromossomo X/genéticaRESUMO
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Inversão de Sequência , Pontos de Quebra do Cromossomo , Inversão Cromossômica , Humanos , Internet , Polimorfismo Genético , Duplicações Segmentares Genômicas , Integração de SistemasRESUMO
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in nonmodel species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to nonmodel genomes. We apply ABC-MK to the human proteome and a set of known virus interacting proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
Assuntos
Evolução Biológica , Seleção Genética , Humanos , Genoma , MutaçãoRESUMO
BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.
Assuntos
Genômica , Gorilla gorilla/genética , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Sequência de Aminoácidos , Animais , Feminino , Heterozigoto , Masculino , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/genética , Repetições de Microssatélites/genética , Dados de Sequência Molecular , Mutação , Análise de Sequência de DNARESUMO
MOTIVATION: The completion of 168 genome sequences from a single population of Drosophila melanogaster provides a global view of genomic variation and an understanding of the evolutionary forces shaping the patterns of DNA polymorphism and divergence along the genome. RESULTS: We present the 'Population Drosophila Browser' (PopDrowser), a new genome browser specially designed for the automatic analysis and representation of genetic variation across the D. melanogaster genome sequence. PopDrowser allows estimating and visualizing the values of a number of DNA polymorphism and divergence summary statistics, linkage disequilibrium parameters and several neutrality tests. PopDrowser also allows performing custom analyses on-the-fly using user-selected parameters. AVAILABILITY: PopDrowser is freely available from http://PopDrowser.uab.cat.
Assuntos
Drosophila melanogaster/genética , Drosophila/genética , Animais , Evolução Biológica , Drosophila/classificação , Proteínas de Drosophila/genética , Variação Genética , Genética Populacional , Internet , Desequilíbrio de Ligação , Polimorfismo GenéticoRESUMO
Inferring the effects of positive selection on genomes remains a critical step in characterizing the ultimate and proximate causes of adaptation across species, and quantifying positive selection remains a challenge due to the confounding effects of many other evolutionary processes. Robust and efficient approaches for adaptation inference could help characterize the rate and strength of adaptation in non-model species for which demographic history, mutational processes, and recombination patterns are not currently well-described. Here, we introduce an efficient and user-friendly extension of the McDonald-Kreitman test (ABC-MK) for quantifying long-term protein adaptation in specific lineages of interest. We characterize the performance of our approach with forward simulations and find that it is robust to many demographic perturbations and positive selection configurations, demonstrating its suitability for applications to non-model genomes. We apply ABC-MK to the human proteome and a set of known Virus Interacting Proteins (VIPs) to test the long-term adaptation in genes interacting with viruses. We find substantially stronger signatures of positive selection on RNA-VIPs than DNA-VIPs, suggesting that RNA viruses may be an important driver of human adaptation over deep evolutionary time scales.
RESUMO
The McDonald and Kreitman test is one of the most powerful and widely used methods to detect and quantify recurrent natural selection in DNA sequence data. One of its main limitations is the underestimation of positive selection due to the presence of slightly deleterious variants segregating at low frequencies. Although several approaches have been developed to overcome this limitation, most of them work on gene pooled analyses. Here, we present the imputed McDonald and Kreitman test (impMKT), a new straightforward approach for the detection of positive selection and other selection components of the distribution of fitness effects at the gene level. We compare imputed McDonald and Kreitman test with other widely used McDonald and Kreitman test approaches considering both simulated and empirical data. By applying imputed McDonald and Kreitman test to humans and Drosophila data at the gene level, we substantially increase the statistical evidence of positive selection with respect to previous approaches (e.g. by 50% and 157% compared with the McDonald and Kreitman test in Drosophila and humans, respectively). Finally, we review the minimum number of genes required to obtain a reliable estimation of the proportion of adaptive substitution (α) in gene pooled analyses by using the imputed McDonald and Kreitman test compared with other McDonald and Kreitman test implementations. Because of its simplicity and increased power to detect recurrent positive selection on genes, we propose the imputed McDonald and Kreitman test as the first straightforward approach for testing specific evolutionary hypotheses at the gene level. The software implementation and population genomics data are available at the web-server imkt.uab.cat.
Assuntos
Evolução Biológica , Seleção Genética , Animais , Drosophila/genética , Evolução Molecular , Humanos , Metagenômica , SoftwareRESUMO
The McDonald and Kreitman test (MKT) is one of the most powerful and extensively used tests to detect the signature of natural selection at the molecular level. Here, we present the standard and generalized MKT website, a novel website that allows performing MKTs not only for synonymous and nonsynonymous changes, as the test was initially described, but also for other classes of regions and/or several loci. The website has three different interfaces: (i) the standard MKT, where users can analyze several types of sites in a coding region, (ii) the advanced MKT, where users can compare two closely linked regions in the genome that can be either coding or noncoding, and (iii) the multi-locus MKT, where users can analyze many separate loci in a single multi-locus test. The website has already been used to show that selection efficiency is positively correlated with effective population size in the Drosophila genus and it has been applied to include estimates of selection in DPDB. This website is a timely resource, which will presumably be widely used by researchers in the field and will contribute to enlarge the catalogue of cases of adaptive evolution. It is available at http://mkt.uab.es.
Assuntos
Variação Genética , Seleção Genética , Análise de Sequência de DNA , Software , Animais , Códon/química , Drosophila/genética , InternetRESUMO
A main assumption of molecular population genetics is that genomic mutation rate does not depend on sequence function. Challenging this assumption, a recent study has found a reduction in the mutation rate in exons compared to introns in somatic cells, ascribed to an enhanced exonic mismatch repair system activity. If this reduction happens also in the germline, it can compromise studies of population genomics, including the detection of selection when using introns as proxies for neutrality. Here we compile and analyze published germline de novo mutation data to test if the exonic mutation rate is also reduced in germ cells. After controlling for sampling bias in datasets with diseased probands and extended nucleotide context dependency, we find no reduction in the mutation rate in exons compared to introns in the germline. Therefore, there is no evidence that enhanced exonic mismatch repair activity determines the mutation rate in germline cells.
Assuntos
Éxons/genética , Mutação em Linhagem Germinativa , Íntrons/genética , Taxa de Mutação , Algoritmos , Reparo de Erro de Pareamento de DNA/genética , Evolução Molecular , Células Germinativas/metabolismo , Humanos , Modelos Genéticos , Mutação , Sequenciamento do Exoma/métodosRESUMO
Multi-locus and multi-species nucleotide diversity studies would benefit enormously from a public database encompassing high-quality haplotypic sequences with their associated genetic diversity measures. MamPol, 'Mammalia Polymorphism Database', is a website containing all the well-annotated polymorphic sequences available in GenBank for the Mammalia class grouped by name of organism and gene. Diversity measures of single nucleotide polymorphisms are provided for each set of haplotypic homologous sequences, including polymorphism at synonymous and non-synonymous sites, linkage disequilibrium and codon bias. Data gathering, calculation of diversity measures and daily updates are automatically performed using PDA software. The MamPol website includes several interfaces for browsing the contents of the database and making customizable comparative searches of different species or taxonomic groups. It also contains a set of tools for simple re-analysis of the available data and a statistics section that is updated daily and summarizes the contents of the database. MamPol is available at http://mampol.uab.es/ and can be downloaded via FTP.
Assuntos
Bases de Dados de Ácidos Nucleicos , Mamíferos/genética , Polimorfismo Genético , Animais , Internet , Polimorfismo de Nucleotídeo Único , Interface Usuário-ComputadorRESUMO
Pipeline Diversity Analysis (PDA) is an open-source, web-based tool that allows the exploration of polymorphism in large datasets of heterogeneous DNA sequences, and can be used to create secondary polymorphism databases for different taxonomic groups, such as the Drosophila Polymorphism Database (DPDB). A new version of the pipeline presented here, PDA v.2, incorporates substantial improvements, including new methods for data mining and grouping sequences, new criteria for data quality assessment and a better user interface. PDA is a powerful tool to obtain and synthesize existing empirical evidence on genetic diversity in any species or species group. PDA v.2 is available on the web at http://pda.uab.es/.
Assuntos
Bases de Dados de Ácidos Nucleicos , Polimorfismo Genético , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Drosophila/genética , Internet , Controle de Qualidade , Alinhamento de Sequência/normas , Software/normas , Interface Usuário-ComputadorRESUMO
Molecular population genetics aims to explain genetic variation and molecular evolution from population genetics principles. The field was born 50 years ago with the first measures of genetic variation in allozyme loci, continued with the nucleotide sequencing era, and is currently in the era of population genomics. During this period, molecular population genetics has been revolutionized by progress in data acquisition and theoretical developments. The conceptual elegance of the neutral theory of molecular evolution or the footprint carved by natural selection on the patterns of genetic variation are two examples of the vast number of inspiring findings of population genetics research. Since the inception of the field, Drosophila has been the prominent model species: molecular variation in populations was first described in Drosophila and most of the population genetics hypotheses were tested in Drosophila species. In this review, we describe the main concepts, methods, and landmarks of molecular population genetics, using the Drosophila model as a reference. We describe the different genetic data sets made available by advances in molecular technologies, and the theoretical developments fostered by these data. Finally, we review the results and new insights provided by the population genomics approach, and conclude by enumerating challenges and new lines of inquiry posed by increasingly large population scale sequence data.