Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 46
Filter
1.
PLoS Biol ; 20(9): e3001775, 2022 09.
Article in English | MEDLINE | ID: mdl-36099311

ABSTRACT

Understanding the dynamics of species adaptation to their environments has long been a central focus of the study of evolution. Theories of adaptation propose that populations evolve by "walking" in a fitness landscape. This "adaptive walk" is characterised by a pattern of diminishing returns, where populations further away from their fitness optimum take larger steps than those closer to their optimal conditions. Hence, we expect young genes to evolve faster and experience mutations with stronger fitness effects than older genes because they are further away from their fitness optimum. Testing this hypothesis, however, constitutes an arduous task. Young genes are small, encode proteins with a higher degree of intrinsic disorder, are expressed at lower levels, and are involved in species-specific adaptations. Since all these factors lead to increased protein evolutionary rates, they could be masking the effect of gene age. While controlling for these factors, we used population genomic data sets of Arabidopsis and Drosophila and estimated the rate of adaptive substitutions across genes from different phylostrata. We found that a gene's evolutionary age significantly impacts the molecular rate of adaptation. Moreover, we observed that substitutions in young genes tend to have larger physicochemical effects. Our study, therefore, provides strong evidence that molecular evolution follows an adaptive walk model across a large evolutionary timescale.


Subject(s)
Arabidopsis , Drosophila , Adaptation, Physiological/genetics , Animals , Arabidopsis/genetics , Drosophila/genetics , Evolution, Molecular , Models, Genetic
2.
Proc Biol Sci ; 291(2016): 20232308, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38320616

ABSTRACT

Migratory birds possess remarkable accuracy in orientation and navigation, which involves various compass systems including the magnetic compass. Identifying the primary magnetosensor remains a fundamental open question. Cryptochromes (Cry) have been shown to be magnetically sensitive, and Cry4a from a migratory songbird seems to show enhanced magnetic sensitivity in vitro compared to Cry4a from resident species. We investigate Cry and their potential involvement in magnetoreception in a phylogenetic framework, integrating molecular evolutionary analyses with protein dynamics modelling. Our analysis is based on 363 bird genomes and identifies different selection regimes in passerines. We show that Cry4a is characterized by strong positive selection and high variability, typical characteristics of sensor proteins. We identify key sites that are likely to have facilitated the evolution of an optimized sensory protein for night-time orientation in songbirds. Additionally, we show that Cry4 was lost in hummingbirds, parrots and Tyranni (Suboscines), and thus identified a gene deletion, which might facilitate testing the function of Cry4a in birds. In contrast, the other avian Cry (Cry1 and Cry2) were highly conserved across all species, indicating basal, non-sensory functions. Our results support a specialization or functional differentiation of Cry4 in songbirds which could be magnetosensation.


Subject(s)
Songbirds , Animals , Phylogeny , Songbirds/physiology , Cryptochromes/metabolism , Magnetic Fields , Animal Migration/physiology
3.
PLoS Comput Biol ; 19(4): e1010982, 2023 04.
Article in English | MEDLINE | ID: mdl-37079488

ABSTRACT

Expression noise, the variability of the amount of gene product among isogenic cells grown in identical conditions, originates from the inherent stochasticity of diffusion and binding of the molecular players involved in transcription and translation. It has been shown that expression noise is an evolvable trait and that central genes exhibit less noise than peripheral genes in gene networks. A possible explanation for this pattern is increased selective pressure on central genes since they propagate their noise to downstream targets, leading to noise amplification. To test this hypothesis, we developed a new gene regulatory network model with inheritable stochastic gene expression and simulated the evolution of gene-specific expression noise under constraint at the network level. Stabilizing selection was imposed on the expression level of all genes in the network and rounds of mutation, selection, replication and recombination were performed. We observed that local network features affect both the probability to respond to selection, and the strength of the selective pressure acting on individual genes. In particular, the reduction of gene-specific expression noise as a response to stabilizing selection on the gene expression level is higher in genes with higher centrality metrics. Furthermore, global topological structures such as network diameter, centralization and average degree affect the average expression variance and average selective pressure acting on constituent genes. Our results demonstrate that selection at the network level leads to differential selective pressure at the gene level, and local and global network characteristics are an essential component of gene-specific expression noise evolution.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Gene Regulatory Networks/genetics , Phenotype , Gene Expression
4.
Mol Biol Evol ; 39(4)2022 04 11.
Article in English | MEDLINE | ID: mdl-35349721

ABSTRACT

Compensatory substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme is residues coevolving for charge compensation, which are evolving faster than noncoevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.


Subject(s)
Evolution, Molecular , Proteins , Amino Acid Motifs , Mutation , Phylogeny , Proteins/chemistry , Proteins/genetics , Sequence Alignment
5.
Mol Ecol ; 2023 May 08.
Article in English | MEDLINE | ID: mdl-37157166

ABSTRACT

Through its fermentative capacities, Saccharomyces cerevisiae was central in the development of civilisation during the Neolithic period, and the yeast remains of importance in industry and biotechnology, giving rise to bona fide domesticated populations. Here, we conduct a population genomic study of domesticated and wild populations of S. cerevisiae. Using coalescent analyses, we report that the effective population size of yeast populations decreased since the divergence with S. paradoxus. We fitted models of distributions of fitness effects to infer the rate of adaptive ( ω a $$ {\omega}_a $$ ) and non-adaptive ( ω na $$ {\omega}_{na} $$ ) non-synonymous substitutions in protein-coding genes. We report an overall limited contribution of positive selection to S. cerevisiae protein evolution, albeit with higher rates of adaptive evolution in wild compared to domesticated populations. Our analyses revealed the signature of background selection and possibly Hill-Robertson interference, as recombination was found to be negatively correlated with ω na $$ {\omega}_{na} $$ and positively correlated with ω a $$ {\omega}_a $$ . However, the effect of recombination on ω a $$ {\omega}_a $$ was found to be labile, as it is only apparent after removing the impact of codon usage bias on the synonymous site frequency spectrum and disappears if we control for the correlation with ω na $$ {\omega}_{na} $$ , suggesting that it could be an artefact of the decreasing population size. Furthermore, the rate of adaptive non-synonymous substitutions is significantly correlated with the residue solvent exposure, a relation that cannot be explained by the population's demography. Together, our results provide a detailed characterisation of adaptive mutations in protein-coding genes across S. cerevisiae populations.

6.
PLoS Genet ; 15(11): e1008449, 2019 11.
Article in English | MEDLINE | ID: mdl-31725722

ABSTRACT

Understanding the causes and consequences of recombination landscape evolution is a fundamental goal in genetics that requires recombination maps from across the tree of life. Such maps can be obtained from population genomic datasets, but require large sample sizes. Alternative methods are therefore necessary to research organisms where such datasets cannot be generated easily, such as non-model or ancient species. Here we extend the sequentially Markovian coalescent model to jointly infer demography and the spatial variation in recombination rate. Using extensive simulations and sequence data from humans, fruit-flies and a fungal pathogen, we demonstrate that iSMC accurately infers recombination maps under a wide range of scenarios-remarkably, even from a single pair of unphased genomes. We exploit this possibility and reconstruct the recombination maps of ancient hominins. We report that the ancient and modern maps are correlated in a manner that reflects the established phylogeny of Neanderthals, Denisovans, and modern human populations.


Subject(s)
Genome, Human/genetics , Hominidae/genetics , Metagenomics , Recombination, Genetic/genetics , Animals , Chromosome Mapping , Genetic Variation/genetics , Humans , Markov Chains , Neanderthals/genetics , Paleontology/trends , Phylogeny
7.
PLoS Pathog ; 12(6): e1005697, 2016 06.
Article in English | MEDLINE | ID: mdl-27332891

ABSTRACT

The biotrophic basidiomycete fungus Ustilago maydis causes smut disease in maize. Hallmarks of the disease are large tumors that develop on all aerial parts of the host in which dark pigmented teliospores are formed. We have identified a member of the WOPR family of transcription factors, Ros1, as major regulator of spore formation in U. maydis. ros1 expression is induced only late during infection and hence Ros1 is neither involved in plant colonization of dikaryotic fungal hyphae nor in plant tumor formation. However, during late stages of infection Ros1 is essential for fungal karyogamy, massive proliferation of diploid fungal cells and spore formation. Premature expression of ros1 revealed that Ros1 counteracts the b-dependent filamentation program and induces morphological alterations resembling the early steps of sporogenesis. Transcriptional profiling and ChIP-seq analyses uncovered that Ros1 remodels expression of about 30% of all U. maydis genes with 40% of these being direct targets. In total the expression of 80 transcription factor genes is controlled by Ros1. Four of the upregulated transcription factor genes were deleted and two of the mutants were affected in spore development. A large number of b-dependent genes were differentially regulated by Ros1, suggesting substantial changes in this regulatory cascade that controls filamentation and pathogenic development. Interestingly, 128 genes encoding secreted effectors involved in the establishment of biotrophic development were downregulated by Ros1 while a set of 70 "late effectors" was upregulated. These results indicate that Ros1 is a master regulator of late development in U. maydis and show that the biotrophic interaction during sporogenesis involves a drastic shift in expression of the fungal effectome including the downregulation of effectors that are essential during early stages of infection.


Subject(s)
Fungal Proteins/metabolism , Gene Expression Regulation, Fungal/physiology , Ustilago/pathogenicity , Zea mays/microbiology , Chromatin Immunoprecipitation , Electrophoretic Mobility Shift Assay , Microscopy, Confocal , Mycoses/metabolism , Plant Tumors/microbiology , Polymerase Chain Reaction , Spores, Fungal , Transcription Factors , Ustilago/metabolism , Virulence/physiology , Virulence Factors/metabolism
8.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Article in English | MEDLINE | ID: mdl-22398555

ABSTRACT

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Subject(s)
Evolution, Molecular , Genetic Speciation , Genome/genetics , Gorilla gorilla/genetics , Animals , Female , Gene Expression Regulation , Genetic Variation/genetics , Genomics , Humans , Macaca mulatta/genetics , Molecular Sequence Data , Pan troglodytes/genetics , Phylogeny , Pongo/genetics , Proteins/genetics , Sequence Alignment , Species Specificity , Transcription, Genetic
9.
PLoS Genet ; 11(8): e1005451, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26274919

ABSTRACT

The human and chimpanzee X chromosomes are less divergent than expected based on autosomal divergence. We study incomplete lineage sorting patterns between humans, chimpanzees and gorillas to show that this low divergence can be entirely explained by megabase-sized regions comprising one-third of the X chromosome, where polymorphism in the human-chimpanzee ancestral species was severely reduced. We show that background selection can explain at most 10% of this reduction of diversity in the ancestor. Instead, we show that several strong selective sweeps in the ancestral species can explain it. We also report evidence of population specific sweeps in extant humans that overlap the regions of low diversity in the ancestral species. These regions further correspond to chromosomal sections shown to be devoid of Neanderthal introgression into modern humans. This suggests that the same X-linked regions that undergo selective sweeps are among the first to form reproductive barriers between diverging species. We hypothesize that meiotic drive is the underlying mechanism causing these two observations.


Subject(s)
Chromosomes, Human, X/genetics , Animals , Female , Genetic Drift , Genetic Speciation , Genetic Variation , Humans , Male , Neanderthals , Recombination, Genetic , Selection, Genetic , Species Specificity
10.
Genome Res ; 24(3): 467-74, 2014 Mar.
Article in English | MEDLINE | ID: mdl-24190946

ABSTRACT

Recombination is a major determinant of adaptive and nonadaptive evolution. Understanding how the recombination landscape has evolved in humans is thus key to the interpretation of human genomic evolution. Comparison of fine-scale recombination maps of human and chimpanzee has revealed large changes at fine genomic scales and conservation over large scales. Here we demonstrate how a fine-scale recombination map can be derived for the ancestor of human and chimpanzee, allowing us to study the changes that have occurred in human and chimpanzee since these species diverged. The map is produced from more than one million accurately determined recombination events. We find that this new recombination map is intermediate to the maps of human and chimpanzee but that the recombination landscape has evolved more rapidly in the human lineage than in the chimpanzee lineage. We use the map to show that recombination rate, through the effect of GC-biased gene conversion, is an even stronger determinant of base composition evolution than previously reported.


Subject(s)
Base Composition , Chromosomes, Mammalian , Gene Conversion , Pan troglodytes/genetics , Animals , Chromosome Mapping , Evolution, Molecular , Genetic Speciation , Genetic Variation , Genome , Humans , Phylogeny , Recombination, Genetic , Selection, Genetic
11.
Bioinformatics ; 32(16): 2554-5, 2016 08 15.
Article in English | MEDLINE | ID: mdl-27153632

ABSTRACT

MOTIVATION: In many organisms, including humans, recombination clusters within recombination hotspots. The standard method for de novo detection of recombinants at hotspots is sperm typing. This relies on allele-specific PCR at single nucleotide polymorphisms. Designing allele-specific primers by hand is time-consuming. We have therefore written a package to support hotspot detection and analysis. RESULTS: hotspot consists of four programs: asp looks up SNPs and designs allele-specific primers; aso constructs allele-specific oligos for mapping recombinants; xov implements a maximum-likelihood method for estimating the crossover rate; six, finally, simulates typing data. AVAILABILITY AND IMPLEMENTATION: hotspot is written in C. Sources are freely available under the GNU General Public License from http://github.com/evolbioinf/hotspot/ CONTACT: haubold@evolbio.mpg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Recombination, Genetic , Software , Spermatozoa , Alleles , Humans , Likelihood Functions , Male
12.
BMC Bioinformatics ; 16: 190, 2015 Jun 09.
Article in English | MEDLINE | ID: mdl-26055961

ABSTRACT

BACKGROUND: Comparative analysis of homologous sequences enables the understanding of evolutionary patterns at the molecular level, unraveling the functional constraints that shaped the underlying genes. Bioinformatic pipelines for comparative sequence analysis typically include procedures for (i) alignment quality assessment and (ii) control of sequence redundancy. An additional, underassessed step is the control of the amount and distribution of missing data in sequence alignments. While the number of sequences available for a given gene typically increases with time, the site-specific coverage of each alignment position remains highly variable because of differences in sequencing and annotation quality, or simply because of biological variation. For any given alignment-based analysis, the selection of sequences thus defines a trade-off between the species representation and the quantity of sites with sufficient coverage to be included in the subsequent analyses. RESULTS: We introduce an algorithm for the optimization of sequence alignments according to the number of sequences vs. number of sites trade-off. The algorithm uses a guide tree to compute scores for each bipartition of the alignment, allowing the recursive selection of sequence subsets with optimal combinations of sequence and site numbers. By applying our methods to two large data sets of several thousands of gene families, we show that significant site-specific coverage increases can be achieved while controlling for the species representation. CONCLUSIONS: The algorithm introduced in this work allows the control of the distribution of missing data in any sequence alignment by removing sequences to increase the number of sites with a defined minimum coverage. We advocate that our missing data optimization procedure in an important step which should be considered in comparative analysis pipelines, together with alignment quality assessment and control of sampled diversity. An open source C++ implementation is available at http://bioweb.me/physamp.


Subject(s)
Algorithms , Computational Biology/methods , Databases, Factual , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Humans
13.
PLoS Genet ; 8(12): e1003125, 2012.
Article in English | MEDLINE | ID: mdl-23284294

ABSTRACT

We present a hidden Markov model (HMM) for inferring gradual isolation between two populations during speciation, modelled as a time interval with restricted gene flow. The HMM describes the history of adjacent nucleotides in two genomic sequences, such that the nucleotides can be separated by recombination, can migrate between populations, or can coalesce at variable time points, all dependent on the parameters of the model, which are the effective population sizes, splitting times, recombination rate, and migration rate. We show by extensive simulations that the HMM can accurately infer all parameters except the recombination rate, which is biased downwards. Inference is robust to variation in the mutation rate and the recombination rate over the sequence and also robust to unknown phase of genomes unless they are very closely related. We provide a test for whether divergence is gradual or instantaneous, and we apply the model to three key divergence processes in great apes: (a) the bonobo and common chimpanzee, (b) the eastern and western gorilla, and (c) the Sumatran and Bornean orang-utan. We find that the bonobo and chimpanzee appear to have undergone a clear split, whereas the divergence processes of the gorilla and orang-utan species occurred over several hundred thousands years with gene flow stopping quite recently. We also apply the model to the Homo/Pan speciation event and find that the most likely scenario involves an extended period of gene flow during speciation.


Subject(s)
Evolution, Molecular , Genetic Speciation , Genetic Variation , Genome , Animals , Gene Flow , Genetics, Population , Gorilla gorilla/genetics , Humans , Markov Chains , Models, Theoretical , Pan paniscus/genetics , Pan troglodytes/genetics , Phylogeny , Pongo/genetics , Population Density
14.
BMC Genomics ; 15: 53, 2014 Jan 22.
Article in English | MEDLINE | ID: mdl-24447531

ABSTRACT

BACKGROUND: Sequence alignments are the starting point for most evolutionary and comparative analyses. Full genome sequences can be compared to study patterns of within and between species variation. Genome sequence alignments are complex structures containing information such as coordinates, quality scores and synteny structure, which are stored in Multiple Alignment Format (MAF) files. Processing these alignments therefore involves parsing and manipulating typically large MAF files in an efficient way. RESULTS: MafFilter is a command-line driven program written in C++ that enables the processing of genome alignments stored in the Multiple Alignment Format in an efficient and extensible manner. It provides an extensive set of tools which can be parametrized and combined by the user via option files. We demonstrate the software's functionality and performance on several biological examples covering Primate genomics and fungal population genomics. Example analyses involve window-based alignment filtering, feature extractions and various statistics, phylogenetics and population genomics calculations. CONCLUSIONS: MafFilter is a highly efficient and flexible tool to analyse multiple genome alignments. By allowing the user to combine a large set of available methods, as well as designing his/her own, it enables the design of custom data filtering and analysis pipelines for genomic studies. MafFilter is an open source software available at http://bioweb.me/maffilter.


Subject(s)
Genomics/methods , Software , Chromosomes/genetics , Exons , Fungi/genetics , Genome , Genome, Fungal , Internet , User-Computer Interface
15.
Mol Biol Evol ; 30(8): 1745-50, 2013 Aug.
Article in English | MEDLINE | ID: mdl-23699471

ABSTRACT

Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized-yet extensible-code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics, and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included.


Subject(s)
Computational Biology , Evolution, Molecular , Software , Algorithms , Computational Biology/methods , Genomics/methods , Humans , Internet
16.
Genome Res ; 21(3): 349-56, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21270173

ABSTRACT

We search the complete orangutan genome for regions where humans are more closely related to orangutans than to chimpanzees due to incomplete lineage sorting (ILS) in the ancestor of human and chimpanzees. The search uses our recently developed coalescent hidden Markov model (HMM) framework. We find ILS present in ∼1% of the genome, and that the ancestral species of human and chimpanzees never experienced a severe population bottleneck. The existence of ILS is validated with simulations, site pattern analysis, and analysis of rare genomic events. The existence of ILS allows us to disentangle the time of isolation of humans and orangutans (the speciation time) from the genetic divergence time, and we find speciation to be as recent as 9-13 million years ago (Mya; contingent on the calibration point). The analyses provide further support for a recent speciation of human and chimpanzee at ∼4 Mya and a diverse ancestor of human and chimpanzee with an effective population size of about 50,000 individuals. Posterior decoding infers ILS for each nucleotide in the genome, and we use this to deduce patterns of selection in the ancestral species. We demonstrate the effect of background selection in the common ancestor of humans and chimpanzees. In agreement with predictions from population genetics, ILS was found to be reduced in exons and gene-dense regions when we control for confounding factors such as GC content and recombination rate. Finally, we find the broad-scale recombination rate to be conserved through the complete ape phylogeny.


Subject(s)
Genetic Speciation , Nucleotides/analysis , Pan troglodytes/genetics , Phylogeny , Pongo/genetics , Animals , Base Composition , Base Sequence , Conserved Sequence/genetics , Genetic Drift , Genetic Variation , Genome , Humans , Models, Statistical , Molecular Sequence Data , Population Density , Recombination, Genetic , Selection, Genetic
17.
Genome Res ; 21(12): 2157-66, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21994252

ABSTRACT

The fungus Mycosphaerella graminicola emerged as a new pathogen of cultivated wheat during its domestication ~11,000 yr ago. We assembled 12 high-quality full genome sequences to investigate the genetic footprints of selection in this wheat pathogen and closely related sister species that infect wild grasses. We demonstrate a strong effect of natural selection in shaping the pathogen genomes with only ~3% of nonsynonymous mutations being effectively neutral. Forty percent of all fixed nonsynonymous substitutions, on the other hand, are driven by positive selection. Adaptive evolution has affected M. graminicola to the highest extent, consistent with recent host specialization. Positive selection has prominently altered genes encoding secreted proteins and putative pathogen effectors supporting the premise that molecular host-pathogen interaction is a strong driver of pathogen evolution. Recent divergence between pathogen sister species is attested by the high degree of incomplete lineage sorting (ILS) in their genomes. We exploit ILS to generate a genetic map of the species without any crossing data, document recent times of species divergence relative to genome divergence, and show that gene-rich regions or regions with low recombination experience stronger effects of natural selection on neutral diversity. Emergence of a new agricultural host selected a highly specialized and fast-evolving pathogen with unique evolutionary patterns compared with its wild relatives. The strong impact of natural selection, we document, is at odds with the small effective population sizes estimated and suggest that population sizes were historically large but likely unstable.


Subject(s)
Ascomycota/genetics , Evolution, Molecular , Genome, Fungal , Plant Diseases/microbiology , Selection, Genetic , Triticum/microbiology
18.
Brief Bioinform ; 13(2): 228-43, 2012 Mar.
Article in English | MEDLINE | ID: mdl-21949241

ABSTRACT

Positions in a molecule that share a common constraint do not evolve independently, and therefore leave a signature in the patterns of homologous sequences. Exhibiting such positions with a coevolution pattern from a sequence alignment has great potential for predicting functional and structural properties of molecules through comparative analysis. This task is complicated by the existence of additional correlation sources, leading to false predictions. The nature of the data is a major source of noise correlation: sequences are taken from individuals with different degrees of relatedness, and who therefore are intrinsically correlated. This has led to several method developments in different fields that are potentially confusing for non-expert users interested in these methodologies. It also explains why coevolution detection methods are largely unemployed despite the importance of the biological questions they address. In this article, I focus on the role of shared ancestry for understanding molecular coevolution patterns. I review and classify existing coevolution detection methods according to their ability to handle shared ancestry. Using a ribosomal RNA benchmark data set, for which detailed knowledge of the structure and coevolution patterns is available, I demonstrate and explain why taking the underlying evolutionary history of sequences into account is the only way to extract the full coevolution signal in the data. I also evaluate, using rigorous statistical procedures, the best approaches to do so, and discuss several important biological aspects to consider when performing coevolution analyses.


Subject(s)
Computer Simulation , Evolution, Molecular , Phylogeny , RNA, Ribosomal/genetics , Sequence Alignment
19.
PLoS Genet ; 7(3): e1001319, 2011 Mar.
Article in English | MEDLINE | ID: mdl-21408205

ABSTRACT

Due to genetic variation in the ancestor of two populations or two species, the divergence time for DNA sequences from two populations is variable along the genome. Within genomic segments all bases will share the same divergence-because they share a most recent common ancestor-when no recombination event has occurred to split them apart. The size of these segments of constant divergence depends on the recombination rate, but also on the speciation time, the effective population size of the ancestral population, as well as demographic effects and selection. Thus, inference of these parameters may be possible if we can decode the divergence times along a genomic alignment. Here, we present a new hidden Markov model that infers the changing divergence (coalescence) times along the genome alignment using a coalescent framework, in order to estimate the speciation time, the recombination rate, and the ancestral effective population size. The model is efficient enough to allow inference on whole-genome data sets. We first investigate the power and consistency of the model with coalescent simulations and then apply it to the whole-genome sequences of the two orangutan sub-species, Bornean (P. p. pygmaeus) and Sumatran (P. p. abelii) orangutans from the Orangutan Genome Project. We estimate the speciation time between the two sub-species to be thousand years ago and the effective population size of the ancestral orangutan species to be , consistent with recent results based on smaller data sets. We also report a negative correlation between chromosome size and ancestral effective population size, which we interpret as a signature of recombination increasing the efficacy of selection.


Subject(s)
Evolution, Molecular , Genetic Speciation , Genome , Pongo abelii/genetics , Pongo pygmaeus/genetics , Algorithms , Animals , Chromosomes/metabolism , Genetic Variation , Genetics, Population , Markov Chains , Models, Genetic , Models, Statistical , Population Density , Recombination, Genetic , Sequence Alignment , Sequence Homology, Nucleic Acid , Time Factors
20.
Genetics ; 227(2)2024 06 05.
Article in English | MEDLINE | ID: mdl-38565705

ABSTRACT

The rate at which recombination events occur in a population is an indicator of its effective population size and the organism's reproduction mode. It determines the extent of linkage disequilibrium along the genome and, thereby, the efficacy of both purifying and positive selection. The population recombination rate can be inferred using models of genome evolution in populations. Classic methods based on the patterns of linkage disequilibrium provide the most accurate estimates, providing large sample sizes are used and the demography of the population is properly accounted for. Here, the capacity of approaches based on the sequentially Markov coalescent (SMC) to infer the genome-average recombination rate from as little as a single diploid genome is examined. SMC approaches provide highly accurate estimates even in the presence of changing population sizes, providing that (1) within genome heterogeneity is accounted for and (2) classic maximum-likelihood optimization algorithms are employed to fit the model. SMC-based estimates proved sensitive to gene conversion, leading to an overestimation of the recombination rate if conversion events are frequent. Conversely, methods based on the correlation of heterozygosity succeed in disentangling the rate of crossing over from that of gene conversion events, but only when the population size is constant and the recombination landscape homogeneous. These results call for a convergence of these two methods to obtain accurate and comparable estimates of recombination rates between populations.


Subject(s)
Linkage Disequilibrium , Markov Chains , Models, Genetic , Recombination, Genetic , Genome , Algorithms , Genetics, Population/methods , Gene Conversion , Animals , Humans , Population Density
SELECTION OF CITATIONS
SEARCH DETAIL