Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 2.617
Filter
1.
Nat Commun ; 15(1): 5644, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38969648

ABSTRACT

Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.


Subject(s)
DNA Transposable Elements , Drosophila melanogaster , Genome, Insect , Mutation , Wolbachia , Animals , Drosophila melanogaster/genetics , DNA Transposable Elements/genetics , Wolbachia/genetics , Genome, Insect/genetics , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Genomics/methods , Gene Conversion
2.
Mol Biol Evol ; 41(6)2024 Jun 01.
Article in English | MEDLINE | ID: mdl-38829800

ABSTRACT

It is commonly thought that the long-term advantage of meiotic recombination is to dissipate genetic linkage, allowing natural selection to act independently on different loci. It is thus theoretically expected that genes with higher recombination rates evolve under more effective selection. On the other hand, recombination is often associated with GC-biased gene conversion (gBGC), which theoretically interferes with selection by promoting the fixation of deleterious GC alleles. To test these predictions, several studies assessed whether selection was more effective in highly recombining genes (due to dissipation of genetic linkage) or less effective (due to gBGC), assuming a fixed distribution of fitness effects (DFE) for all genes. In this study, I directly derive the DFE from a gene's evolutionary history (shaped by mutation, selection, drift, and gBGC) under empirical fitness landscapes. I show that genes that have experienced high levels of gBGC are less fit and thus have more opportunities for beneficial mutations. Only a small decrease in the genome-wide intensity of gBGC leads to the fixation of these beneficial mutations, particularly in highly recombining genes. This results in increased positive selection in highly recombining genes that is not caused by more effective selection. Additionally, I show that the death of a recombination hotspot can lead to a higher dN/dS than its birth, but with substitution patterns biased towards AT, and only at selected positions. This shows that controlling for a substitution bias towards GC is therefore not sufficient to rule out the contribution of gBGC to signatures of accelerated evolution. Finally, although gBGC does not affect the fixation probability of GC-conservative mutations, I show that by altering the DFE, gBGC can also significantly affect nonsynonymous GC-conservative substitution patterns.


Subject(s)
Evolution, Molecular , Gene Conversion , Models, Genetic , Recombination, Genetic , Selection, Genetic , Genetic Fitness , Mutation , Base Composition , Genetic Linkage
3.
Genetics ; 227(3)2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38691577

ABSTRACT

Although gene conversion (GC) in Saccharomyces cerevisiae is the most error-free way to repair double-strand breaks (DSBs), the mutation rate during homologous recombination is 1,000 times greater than during replication. Many mutations involve dissociating a partially copied strand from its repair template and re-aligning with the same or another template, leading to -1 frameshifts in homonucleotide runs, quasipalindrome (QP)-associated mutations and microhomology-mediated interchromosomal template switches. We studied GC induced by HO endonuclease cleavage at MATα, repaired by an HMR::KI-URA3 donor. We inserted into HMR::KI-URA3 an 18-bp inverted repeat where one arm had a 4-bp insertion. Most GCs yield MAT::KI-ura3::QP + 4 (Ura-) outcomes, but template-switching produces Ura+ colonies, losing the 4-bp insertion. If the QP arm without the insertion is first encountered by repair DNA polymerase and is then (mis)used as a template, the palindrome is perfected. When the QP + 4 arm is encountered first, Ura+ derivatives only occur after second-end capture and second-strand synthesis. QP + 4 mutations are suppressed by mismatch repair (MMR) proteins Msh2, Msh3, and Mlh1, but not Msh6. Deleting Rdh54 significantly reduces QP mutations only when events creating Ura+ occur in the context of a D-loop but not during second-strand synthesis. A similar bias is found with a proofreading-defective DNA polymerase mutation (poI3-01). DSB-induced mutations differed in several genetic requirements from spontaneous events. We also created a + 1 frameshift in the donor, expanding a run of 4 Cs to 5 Cs. Again, Ura+ recombinants markedly increased by disabling MMR, suggesting that MMR acts during GC but favors the unbroken, template strand.


Subject(s)
DNA Breaks, Double-Stranded , DNA Mismatch Repair , Frameshift Mutation , Mutagenesis , Saccharomyces cerevisiae Proteins , Saccharomyces cerevisiae , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Gene Conversion , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , MutS Homolog 2 Protein/genetics , MutS Homolog 2 Protein/metabolism , MutS Homolog 3 Protein/genetics , MutS Homolog 3 Protein/metabolism , MutL Protein Homolog 1
4.
Proc Natl Acad Sci U S A ; 121(23): e2401973121, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38809707

ABSTRACT

In many mammals, recombination events are concentrated in hotspots directed by a sequence-specific DNA-binding protein named PRDM9. Intriguingly, PRDM9 has been lost several times in vertebrates, and notably among mammals, it has been pseudogenized in the ancestor of canids. In the absence of PRDM9, recombination hotspots tend to occur in promoter-like features such as CpG islands. It has thus been proposed that one role of PRDM9 could be to direct recombination away from PRDM9-independent hotspots. However, the ability of PRDM9 to direct recombination hotspots has been assessed in only a handful of species, and a clear picture of how much recombination occurs outside of PRDM9-directed hotspots in mammals is still lacking. In this study, we derived an estimator of past recombination activity based on signatures of GC-biased gene conversion in substitution patterns. We quantified recombination activity in PRDM9-independent hotspots in 52 species of boreoeutherian mammals. We observe a wide range of recombination rates at these loci: several species (such as mice, humans, some felids, or cetaceans) show a deficit of recombination, while a majority of mammals display a clear peak of recombination. Our results demonstrate that PRDM9-directed and PRDM9-independent hotspots can coexist in mammals and that their coexistence appears to be the rule rather than the exception. Additionally, we show that the location of PRDM9-independent hotspots is relatively more stable than that of PRDM9-directed hotspots, but that PRDM9-independent hotspots nevertheless evolve slowly in concert with DNA hypomethylation.


Subject(s)
Histone-Lysine N-Methyltransferase , Recombination, Genetic , Animals , Histone-Lysine N-Methyltransferase/genetics , Histone-Lysine N-Methyltransferase/metabolism , Humans , Mammals/genetics , CpG Islands/genetics , Eutheria/genetics , Mice , Female , Gene Conversion , Evolution, Molecular
5.
PLoS Genet ; 20(5): e1011274, 2024 May.
Article in English | MEDLINE | ID: mdl-38768268

ABSTRACT

Molecular dissection of meiotic recombination in mammals, combined with population-genetic and comparative studies, have revealed a complex evolutionary dynamic characterized by short-lived recombination hotspots. Hotspots are chromosome positions containing DNA sequences where the protein PRDM9 can bind and cause crossing-over. To explain these fast evolutionary dynamic, a so-called intra-genomic Red Queen model has been proposed, based on the interplay between two antagonistic forces: biased gene conversion, mediated by double-strand breaks, resulting in hotspot extinction (the hotspot conversion paradox), followed by positive selection favoring mutant PRDM9 alleles recognizing new sequence motifs. Although this model predicts many empirical observations, the exact causes of the positive selection acting on new PRDM9 alleles is still not well understood. In this direction, experiment on mouse hybrids have suggested that, in addition to targeting double strand breaks, PRDM9 has another role during meiosis. Specifically, PRDM9 symmetric binding (simultaneous binding at the same site on both homologues) would facilitate homology search and, as a result, the pairing of the homologues. Although discovered in hybrids, this second function of PRDM9 could also be involved in the evolutionary dynamic observed within populations. To address this point, here, we present a theoretical model of the evolutionary dynamic of meiotic recombination integrating current knowledge about the molecular function of PRDM9. Our modeling work gives important insights into the selective forces driving the turnover of recombination hotspots. Specifically, the reduced symmetrical binding of PRDM9 caused by the loss of high affinity binding sites induces a net positive selection eliciting new PRDM9 alleles recognizing new targets. The model also offers new insights about the influence of the gene dosage of PRDM9, which can paradoxically result in negative selection on new PRDM9 alleles entering the population, driving their eviction and thus reducing standing variation at this locus.


Subject(s)
Evolution, Molecular , Histone-Lysine N-Methyltransferase , Meiosis , Histone-Lysine N-Methyltransferase/genetics , Histone-Lysine N-Methyltransferase/metabolism , Meiosis/genetics , Animals , Mice , Gene Conversion , DNA Breaks, Double-Stranded , Alleles , Models, Genetic , Humans , Recombination, Genetic
6.
Genetics ; 227(2)2024 06 05.
Article in English | MEDLINE | ID: mdl-38565705

ABSTRACT

The rate at which recombination events occur in a population is an indicator of its effective population size and the organism's reproduction mode. It determines the extent of linkage disequilibrium along the genome and, thereby, the efficacy of both purifying and positive selection. The population recombination rate can be inferred using models of genome evolution in populations. Classic methods based on the patterns of linkage disequilibrium provide the most accurate estimates, providing large sample sizes are used and the demography of the population is properly accounted for. Here, the capacity of approaches based on the sequentially Markov coalescent (SMC) to infer the genome-average recombination rate from as little as a single diploid genome is examined. SMC approaches provide highly accurate estimates even in the presence of changing population sizes, providing that (1) within genome heterogeneity is accounted for and (2) classic maximum-likelihood optimization algorithms are employed to fit the model. SMC-based estimates proved sensitive to gene conversion, leading to an overestimation of the recombination rate if conversion events are frequent. Conversely, methods based on the correlation of heterozygosity succeed in disentangling the rate of crossing over from that of gene conversion events, but only when the population size is constant and the recombination landscape homogeneous. These results call for a convergence of these two methods to obtain accurate and comparable estimates of recombination rates between populations.


Subject(s)
Linkage Disequilibrium , Markov Chains , Models, Genetic , Recombination, Genetic , Genome , Algorithms , Genetics, Population/methods , Gene Conversion , Animals , Humans , Population Density
7.
Mol Biol Evol ; 41(5)2024 May 03.
Article in English | MEDLINE | ID: mdl-38667829

ABSTRACT

Different frequencies amongst codons that encode the same amino acid (i.e. synonymous codons) have been observed in multiple species. Studies focused on uncovering the forces that drive such codon usage showed that a combined effect of mutational biases and translational selection works to produce different frequencies of synonymous codons. However, only few have been able to measure and distinguish between these forces that may leave similar traces on the coding regions. Here, we have developed a codon model that allows the disentangling of mutation, selection on amino acids and synonymous codons, and GC-biased gene conversion (gBGC) which we employed on an extensive dataset of 415 chordates and 191 arthropods. We found that chordates need 15 more synonymous codon categories than arthropods to explain the empirical codon frequencies, which suggests that the extent of codon usage can vary greatly between animal phyla. Moreover, methylation at CpG sites seems to partially explain these patterns of codon usage in chordates but not in arthropods. Despite the differences between the two phyla, our findings demonstrate that in both, GC-rich codons are disfavored when mutations are GC-biased, and the opposite is true when mutations are AT-biased. This indicates that selection on the genomic coding regions might act primarily to stabilize its GC/AT content on a genome-wide level. Our study shows that the degree of synonymous codon usage varies considerably among animals, but is likely governed by a common underlying dynamic.


Subject(s)
Arthropods , Codon Usage , Selection, Genetic , Animals , Arthropods/genetics , Chordata/genetics , Mutation , Evolution, Molecular , Codon , Models, Genetic , Base Composition , Gene Conversion
8.
Nat Commun ; 15(1): 1915, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38429336

ABSTRACT

Artificial biomolecular condensates are emerging as a versatile approach to organize molecular targets and reactions without the need for lipid membranes. Here we ask whether the temporal response of artificial condensates can be controlled via designed chemical reactions. We address this general question by considering a model problem in which a phase separating component participates in reactions that dynamically activate or deactivate its ability to self-attract. Through a theoretical model we illustrate the transient and equilibrium effects of reactions, linking condensate response and reaction parameters. We experimentally realize our model problem using star-shaped DNA motifs known as nanostars to generate condensates, and we take advantage of strand invasion and displacement reactions to kinetically control the capacity of nanostars to interact. We demonstrate reversible dissolution and growth of DNA condensates in the presence of specific DNA inputs, and we characterize the role of toehold domains, nanostar size, and nanostar valency. Our results will support the development of artificial biomolecular condensates that can adapt to environmental changes with prescribed temporal dynamics.


Subject(s)
Biomolecular Condensates , DNA Packaging , DNA Replication , Gene Conversion , Nucleotide Motifs
9.
Am J Hum Genet ; 111(4): 691-700, 2024 Apr 04.
Article in English | MEDLINE | ID: mdl-38513668

ABSTRACT

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.


Subject(s)
Biological Specimen Banks , Gene Conversion , Humans , Software , Haplotypes/genetics , Chromosomes , Polymorphism, Single Nucleotide
10.
PLoS Biol ; 22(3): e3002507, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38451924

ABSTRACT

While the malaria parasite Plasmodium falciparum has low average genome-wide diversity levels, likely due to its recent introduction from a gorilla-infecting ancestor (approximately 10,000 to 50,000 years ago), some genes display extremely high diversity levels. In particular, certain proteins expressed on the surface of human red blood cell-infecting merozoites (merozoite surface proteins (MSPs)) possess exactly 2 deeply diverged lineages that have seemingly not recombined. While of considerable interest, the evolutionary origin of this phenomenon remains unknown. In this study, we analysed the genetic diversity of 2 of the most variable MSPs, DBLMSP and DBLMSP2, which are paralogs (descended from an ancestral duplication). Despite thousands of available Illumina WGS datasets from malaria-endemic countries, diversity in these genes has been hard to characterise as reads containing highly diverged alleles completely fail to align to the reference genome. To solve this, we developed a pipeline leveraging genome graphs, enabling us to genotype them at high accuracy and completeness. Using our newly- resolved sequences, we found that both genes exhibit 2 deeply diverged lineages in a specific protein domain (DBL) and that one of the 2 lineages is shared across the genes. We identified clear evidence of nonallelic gene conversion between the 2 genes as the likely mechanism behind sharing, leading us to propose that gene conversion between diverged paralogs, and not recombination suppression, can generate this surprising genealogy; a model that is furthermore consistent with high diversity levels in these 2 genes despite the strong historical P. falciparum transmission bottleneck.


Subject(s)
Hominidae , Malaria, Falciparum , Malaria , Parasites , Animals , Humans , Plasmodium falciparum/metabolism , Parasites/metabolism , Gene Conversion , Antigens, Surface , Malaria/parasitology , Protozoan Proteins/genetics , Protozoan Proteins/metabolism , Genetic Variation
11.
HLA ; 103(2): e15386, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38342852

ABSTRACT

Identification of novel HLA-A*23:128 allele generated by interlocus gene conversion in Brazilian bone marrow donor.


Subject(s)
Bone Marrow , Gene Conversion , Humans , Brazil , Alleles , Tissue Donors , HLA-A Antigens/genetics
12.
J Evol Biol ; 37(4): 383-400, 2024 Apr 14.
Article in English | MEDLINE | ID: mdl-38367009

ABSTRACT

Population genetic inference of selection on the nucleotide sequence level often proceeds by comparison to a reference sequence evolving only under mutation and population demography. Among the few candidates for such a reference sequence is the 5' part of short introns (5SI) in Drosophila. In addition to mutation and population demography, however, there is evidence for a weak force favouring GC bases, likely due to GC-biased gene conversion (gBGC), and for the effect of linked selection. Here, we use polymorphism and divergence data of Drosophila melanogaster to detect and describe the forces affecting the evolution of the 5SI. We separately analyse mutation classes, compare them between chromosomes, and relate them to recombination rate frequencies. GC-conservative mutations seem to be mainly influenced by mutation and drift, with linked selection mostly causing differences between the central and the peripheral (i.e., telomeric and centromeric) regions of the chromosome arms. Comparing GC-conservative mutation patterns between autosomes and the X chromosome showed differences in mutation rates, rather than linked selection, in the central chromosomal regions after accounting for differences in effective population sizes. On the other hand, GC-changing mutations show asymmetric site frequency spectra, indicating the presence of gBGC, varying among mutation classes and in intensity along chromosomes, but approximately equal in strength in autosomes and the X chromosome.


Subject(s)
Drosophila melanogaster , Gene Conversion , Animals , Drosophila melanogaster/genetics , Introns , Evolution, Molecular , Mutation , Drosophila/genetics , X Chromosome/genetics , Selection, Genetic
13.
BMC Plant Biol ; 23(1): 608, 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38036992

ABSTRACT

BACKGROUND: Despite GC variation constitutes a fundamental element of genome and species diversity, the precise mechanisms driving it remain unclear. The abundant sequence data available for the ITS2, a commonly employed phylogenetic marker in plants, offers an exceptional resource for exploring the GC variation across angiosperms. RESULTS: A comprehensive selection of 8666 species, comprising 165 genera, 63 families, and 30 orders were used for the analyses. The alignment of ITS2 sequence-structures and partitioning of secondary structures into paired and unpaired regions were performed using 4SALE. Substitution rates and frequencies among GC base-pairs in the paired regions of ITS2 were calculated using RNA-specific models in the PHASE package. The results showed that the distribution of ITS2 GC contents on the angiosperm phylogeny was heterogeneous, but their increase was generally associated with ITS2 sequence homogenization, thereby supporting the occurrence of GC-biased gene conversion (gBGC) during the concerted evolution of ITS2. Additionally, the GC content in the paired regions of the ITS2 secondary structure was significantly higher than that of the unpaired regions, indicating the selection of GC for thermodynamic stability. Furthermore, the RNA substitution models demonstrated that base-pair transformations favored both the elevation and fixation of GC in the paired regions, providing further support for gBGC. CONCLUSIONS: Our findings highlight the significance of secondary structure in GC investigation, which demonstrate that both gBGC and structure-based selection are influential factors driving angiosperm ITS2 GC content.


Subject(s)
Magnoliopsida , Humans , Magnoliopsida/genetics , Phylogeny , Gene Conversion , Base Composition , RNA , Evolution, Molecular
14.
Genome Res ; 33(10): 1673-1689, 2023 10.
Article in English | MEDLINE | ID: mdl-37884342

ABSTRACT

Ultraconserved elements (UCEs) are the most conserved regions among the genomes of evolutionarily distant species and are thought to play critical biological functions. However, some UCEs rapidly evolved in specific lineages, and whether they contributed to adaptive evolution is still controversial. Here, using an increased number of sequenced genomes with high taxonomic coverage, we identified 2191 mammalian UCEs and 5938 avian UCEs from 95 mammal and 94 bird genomes, respectively. Our results show that these UCEs are functionally constrained and that their adjacent genes are prone to widespread expression with low expression diversity across tissues. Functional enrichment of mammalian and avian UCEs shows different trends indicating that UCEs may contribute to adaptive evolution of taxa. Focusing on lineage-specific accelerated evolution, we discover that the proportion of fast-evolving UCEs in nine mammalian and 10 avian test lineages range from 0.19% to 13.2%. Notably, up to 62.1% of fast-evolving UCEs in test lineages are much more likely to result from GC-biased gene conversion (gBGC). A single cervid-specific gBGC region embracing the uc.359 allele significantly alters the expression of Nova1 and other neural-related genes in the rat brain. Combined with the altered regulatory activity of ancient gBGC-induced fast-evolving UCEs in eutherians, our results provide evidence that synergy between gBGC and selection shaped lineage-specific substitution patterns, even in the most constrained regulatory elements. In summary, our results show that gBGC played an important role in facilitating lineage-specific accelerated evolution of UCEs, and further support the idea that a combination of multiple evolutionary forces shapes adaptive evolution.


Subject(s)
Gene Conversion , Mammals , Animals , Rats , Mammals/genetics , Alleles , Birds/genetics , Evolution, Molecular , Neuro-Oncological Ventral Antigen
15.
Mol Biol Evol ; 40(9)2023 09 01.
Article in English | MEDLINE | ID: mdl-37675606

ABSTRACT

Following a duplication, the resulting paralogs tend to diverge. While mutation and natural selection can accelerate this process, they can also slow it. Here, we quantify the paralog homogenization that is caused by point mutations and interlocus gene conversion (IGC). Among 164 duplicated teleost genes, the median percentage of postduplication codon substitutions that arise from IGC rather than point mutation is estimated to be between 7% and 8%. By differentiating between the nonsynonymous codon substitutions that homogenize the protein sequences of paralogs and the nonhomogenizing nonsynonymous substitutions, we estimate the homogenizing nonsynonymous rates to be higher for 163 of the 164 teleost data sets as well as for all 14 data sets of duplicated yeast ribosomal protein-coding genes that we consider. For all 14 yeast data sets, the estimated homogenizing nonsynonymous rates exceed the synonymous rates.


Subject(s)
Gene Conversion , Magnoliopsida , Saccharomyces cerevisiae , Amino Acid Sequence , Genes, Duplicate , Selection, Genetic
16.
Nat Commun ; 14(1): 5692, 2023 09 14.
Article in English | MEDLINE | ID: mdl-37709766

ABSTRACT

In the absence of recombination, the number of transposable elements (TEs) increases due to less efficient selection, but the dynamics of such TE accumulations are not well characterized. Leveraging a dataset of 21 independent events of recombination cessation of different ages in mating-type chromosomes of Microbotryum fungi, we show that TEs rapidly accumulated in regions lacking recombination, but that TE content reached a plateau at ca. 50% of occupied base pairs by 1.5 million years following recombination suppression. The same TE superfamilies have expanded in independently evolved non-recombining regions, in particular rolling-circle replication elements (Helitrons). Long-terminal repeat (LTR) retrotransposons of the Copia and Ty3 superfamilies also expanded, through transposition bursts (distinguished from gene conversion based on LTR divergence), with both non-recombining regions and autosomes affected, suggesting that non-recombining regions constitute TE reservoirs. This study improves our knowledge of genome evolution by showing that TEs can accumulate through bursts, following non-linear decelerating dynamics.


Subject(s)
DNA Transposable Elements , Reproduction , DNA Transposable Elements/genetics , Cell Communication , DNA Replication , Gene Conversion
17.
Genome Biol Evol ; 15(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37565492

ABSTRACT

Coding sequence evolution is influenced by both natural selection and neutral evolutionary forces. In many species, the effects of mutation bias, codon usage, and GC-biased gene conversion (gBGC) on gene sequence evolution have not been detailed. Quantification of how these forces shape substitution patterns is therefore necessary to understand the strength and direction of natural selection. Here, we used comparative genomics to investigate the association between base composition and codon usage bias on gene sequence evolution in butterflies and moths (Lepidoptera), including an in-depth analysis of underlying patterns and processes in one species, Leptidea sinapis. The data revealed significant G/C to A/T substitution bias at third codon position with some variation in the strength among different butterfly lineages. However, the substitution bias was lower than expected from previously estimated mutation rate ratios, partly due to the influence of gBGC. We found that A/T-ending codons were overrepresented in most species, but there was a positive association between the magnitude of codon usage bias and GC-content in third codon positions. In addition, the tRNA-gene population in L. sinapis showed higher GC-content at third codon positions compared to coding sequences in general and less overrepresentation of A/T-ending codons. There was an inverse relationship between synonymous substitutions and codon usage bias indicating selection on synonymous sites. We conclude that the evolutionary rate in Lepidoptera is affected by a complex interaction between underlying G/C -> A/T mutation bias and partly counteracting fixation biases, predominantly conferred by overall purifying selection, gBGC, and selection on codon usage.


Subject(s)
Butterflies , Animals , Butterflies/genetics , Codon Usage , Base Composition , Codon , Gene Conversion , Selection, Genetic , Evolution, Molecular
18.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37535674

ABSTRACT

MOTIVATION: Meiotic recombination is the main driving force of human genetic diversity, along with mutations. Recombinations split into crossovers, separating large chromosomal regions originating from different homologous chromosomes, and non-crossovers (NCOs), where a small segment from one chromosome is embedded in a region originating from the homologous chromosome. NCOs are much less studied than mutations and crossovers as NCOs are short and can only be detected at markers heterozygous in the transmitting parent, leaving most of them undetectable. RESULTS: The detectable NCOs, known as gene conversions, hide information about NCOs, including their number and length, waiting to be unveiled. We introduce NCOurd, software, and algorithm, based on an expectation-maximization algorithm, to estimate the number of NCOs and their length distribution from gene conversion data. AVAILABILITY AND IMPLEMENTATION: https://github.com/DecodeGenetics/NCOurd.


Subject(s)
Crossing Over, Genetic , Gene Conversion , Humans , Heterozygote , Meiosis
19.
Theor Popul Biol ; 153: 69-90, 2023 10.
Article in English | MEDLINE | ID: mdl-37451508

ABSTRACT

Recombination often concentrates in small regions called recombination hotspots where recombination is much higher than the genome's average. In many vertebrates, including humans, gene PRDM9 specifies which DNA motifs will be the target for breaks that initiate recombination, ultimately determining the location of recombination hotspots. Because the sequence that breaks (allowing recombination) is converted into the sequence that does not break (preventing recombination), the latter sequence is over-transmitted to future generations and recombination hotspots are self-destructive. Given their self-destructive nature, recombination hotspots should eventually become extinct in genomes where they are found. While empirical evidence shows that individual hotspots do become inactive over time (die), hotspots are abundant in many vertebrates: a contradiction called the Recombination Hotspot Paradox. What saves recombination hotspots from their foretold extinction? Here we formulate a co-evolutionary model of the interaction among sequence-specific gene conversion, fertility selection, and recurrent mutation. We find that allelic frequencies oscillate leading to stable limit cycles. From a biological perspective this means that when fertility selection is weaker than gene conversion, it cannot stop individual hotspots from dying but can save them from extinction by driving their re-activation (resuscitation). In our model, mutation balances death and resuscitation of hotspots, thus maintaining their number over evolutionary time. Interestingly, we find that multiple alleles result in oscillations that are chaotic and multiple targets in oscillations that are asynchronous between targets thus helping to maintain the average genomic recombination probability constant. Furthermore, we find that the level of expression of PRDM9 should control for the fraction of targets that are hotspots and the overall temperature of the genome. Therefore, our co-evolutionary model improves our understanding of how hotspots may be replaced, thus contributing to solve the Recombination Hotspot Paradox. From a more applied perspective our work provides testable predictions regarding the relation between mutation probability and fertility selection with life expectancy of hotspots.


Subject(s)
Gene Conversion , Recombination, Genetic , Humans , Animals , Mutation , Gene Frequency , Models, Genetic , Histone-Lysine N-Methyltransferase/genetics
20.
Nature ; 617(7960): 325-334, 2023 05.
Article in English | MEDLINE | ID: mdl-37165237

ABSTRACT

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.


Subject(s)
Gene Conversion , Mutation , Segmental Duplications, Genomic , Humans , Gene Conversion/genetics , Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Haplotypes/genetics , Exons/genetics , Cytosine/chemistry , Guanine/chemistry , CpG Islands/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...