RESUMEN
Experimental studies on DNA transposable elements (TEs) have been limited in scale, leading to a lack of understanding of the factors influencing transposition activity, evolutionary dynamics, and application potential as genome engineering tools. We predicted 130 active DNA TEs from 102 metazoan genomes and evaluated their activity in human cells. We identified 40 active (integration-competent) TEs, surpassing the cumulative number (20) of TEs found previously. With this unified comparative data, we found that the Tc1/mariner superfamily exhibits elevated activity, potentially explaining their pervasive horizontal transfers. Further functional characterization of TEs revealed additional divergence in features such as insertion bias. Remarkably, in CAR-T therapy for hematological and solid tumors, Mariner2_AG (MAG), the most active DNA TE identified, largely outperformed two widely used vectors, the lentiviral vector and the TE-based vector SB100X. Overall, this study highlights the varied transposition features and evolutionary dynamics of DNA TEs and increases the TE toolbox diversity.
Asunto(s)
Elementos Transponibles de ADN , Humanos , Elementos Transponibles de ADN/genética , Ingeniería Genética/métodos , Genoma Humano , Animales , Evolución MolecularRESUMEN
Whether and how certain transposable elements with viral origins, such as endogenous retroviruses (ERVs) dormant in our genomes, can become awakened and contribute to the aging process is largely unknown. In human senescent cells, we found that HERVK (HML-2), the most recently integrated human ERVs, are unlocked to transcribe viral genes and produce retrovirus-like particles (RVLPs). These HERVK RVLPs constitute a transmissible message to elicit senescence phenotypes in young cells, which can be blocked by neutralizing antibodies. The activation of ERVs was also observed in organs of aged primates and mice as well as in human tissues and serum from the elderly. Their repression alleviates cellular senescence and tissue degeneration and, to some extent, organismal aging. These findings indicate that the resurrection of ERVs is a hallmark and driving force of cellular senescence and tissue aging.
Asunto(s)
Envejecimiento , Retrovirus Endógenos , Anciano , Animales , Humanos , Ratones , Envejecimiento/genética , Envejecimiento/patología , Senescencia Celular , Retrovirus Endógenos/genética , PrimatesRESUMEN
In a broad range of taxa, genes can duplicate through an RNA intermediate in a process mediated by retrotransposons (retroposition). In mammals, L1 retrotransposons drive retroposition, but the elements responsible for retroposition in other animals have yet to be identified. Here, we examined young retrocopies from various animals that still retain the sequence features indicative of the underlying retroposition mechanism. In Drosophila melanogaster, we identified and de novo assembled 15 polymorphic retrocopies and found that all retroposed loci are chimeras of internal retrocopies flanked by discontinuous LTR retrotransposons. At the fusion points between the mRNAs and the LTR retrotransposons, we identified shared short similar sequences that suggest the involvement of microsimilarity-dependent template switches. By expanding our approach to mosquito, zebrafish, chicken, and mammals, we identified in all these species recently originated retrocopies with a similar chimeric structure and shared microsimilarities at the fusion points. We also identified several retrocopies that combine the sequences of two or more parental genes, demonstrating LTR-retroposition as a novel mechanism of exon shuffling. Finally, we found that LTR-mediated retrocopies are immediately cotranscribed with their flanking LTR retrotransposons. Transcriptional profiling coupled with sequence analyses revealed that the sense-strand transcription of the retrocopies often lead to the origination of in-frame proteins relative to the parental genes. Overall, our data show that LTR-mediated retroposition is highly conserved across a wide range of animal taxa; combined with previous work from plants and yeast, it represents an ancient and ongoing mechanism continuously shaping gene content evolution in eukaryotes.
Asunto(s)
Duplicación de Gen , Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Secuencias Repetidas Terminales , Animales , Pollos/genética , Culicidae/genética , Drosophila melanogaster/genética , Evolución Molecular , Humanos , Mamíferos/genética , Ratones , Retroelementos , Duplicaciones Segmentarias en el Genoma , Pez Cebra/genéticaRESUMEN
Plant resistance genes (R genes) harbor tremendous allelic diversity, constituting a robust immune system effective against microbial pathogens. Nevertheless, few functional R genes have been identified for even the best-studied pathosystems. Does this limited repertoire reflect specificity, with most R genes having been defeated by former pests, or do plants harbor a rich diversity of functional R genes, the composite behavior of which is yet to be characterized? Here, we survey 332 NBS-LRR genes cloned from five resistant Oryza sativa (rice) cultivars for their ability to confer recognition of 12 rice blast isolates when transformed into susceptible cultivars. Our survey reveals that 48.5% of the 132 NBS-LRR loci tested contain functional rice blast R genes, with most R genes deriving from multi-copy clades containing especially diversified loci. Each R gene recognized, on average, 2.42 of the 12 isolates screened. The abundant R genes identified in resistant genomes provide extraordinary redundancy in the ability of host genotypes to recognize particular isolates. If the same is true for other pathogens, many extant NBS-LRR genes retain functionality. Our success at identifying rice blast R genes also validates a highly efficient cloning and screening strategy.
Asunto(s)
Resistencia a la Enfermedad/genética , Oryza/genética , Proteínas de Plantas/genética , Estudio de Asociación del Genoma Completo , Magnaporthe/fisiología , Oryza/microbiología , Análisis de Secuencia de ADNRESUMEN
Next-generation sequencing (NGS), represented by Illumina platforms, has been an essential cornerstone of basic and applied research. However, the sequencing error rate of 1 per 1000 bp (10-3) represents a serious hurdle for research areas focusing on rare mutations, such as somatic mosaicism or microbe heterogeneity. By examining the high-fidelity sequencing methods developed in the past decade, we summarized three major factors underlying errors and the corresponding 12 strategies mitigating these errors. We then proposed a novel framework to classify 11 preexisting representative methods according to the corresponding combinatory strategies and identified three trends that emerged during methodological developments. We further extended this analysis to eight long-read sequencing methods, emphasizing error reduction strategies. Finally, we suggest two promising future directions that could achieve comparable or even higher accuracy with lower costs in both NGS and long-read sequencing.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/economía , MutaciónRESUMEN
Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.
Asunto(s)
Elementos Transponibles de ADN , Drosophila melanogaster , Genoma de los Insectos , Mutación , Wolbachia , Animales , Drosophila melanogaster/genética , Elementos Transponibles de ADN/genética , Wolbachia/genética , Genoma de los Insectos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Conversión GénicaRESUMEN
Since the discovery of the first transposon by Dr. Barbara McClintock, the prevalence and diversity of transposable elements (TEs) have been gradually recognized. As fundamental genetic components, TEs drive organismal evolution not only by contributing functional sequences (e.g., regulatory elements or "controllers" as phrased by Dr. McClintock) but also by shuffling genomic sequences. In the latter respect, TE-mediated gene duplications have contributed to the origination of new genes and attracted extensive interest. In response to the development of this field, we herein attempt to provide an overview of TE-mediated duplication by focusing on common rules emerging across duplications generated by different TE types. Specifically, despite the huge divergence of transposition machinery across TEs, we identify three common features of various TE-mediated duplication mechanisms, including end bypass, template switching, and recurrent transposition. These three features lead to one common functional outcome, namely, TE-mediated duplicates tend to be subjected to exon shuffling and neofunctionalization. Therefore, the intrinsic properties of the mutational mechanism constrain the evolutionary trajectories of these duplicates. We finally discuss the future of this field including an in-depth characterization of both the duplication mechanisms and functions of TE-mediated duplicates.
Asunto(s)
Elementos Transponibles de ADN , Genómica , Elementos Transponibles de ADN/genética , Mutación , Secuencias Reguladoras de Ácidos Nucleicos , Evolución MolecularRESUMEN
Background: Acute ischemic stroke (AIS) and acute myocardial infarction (AMI) share several features on multiple levels. These two events may occur in conjunction or in rapid succession, and the occurrence of one event may increase the risk of the other. Owing to their similar pathophysiologies, we aimed to identify immune-related biomarkers common to AIS and AMI as potential therapeutic targets. Methods: We identified differentially expressed genes (DEGs) between the AIS and control groups, as well as AMI and control groups using microarray data (GSE16561 and GSE123342). A weighted gene co-expression network analysis (WGCNA) approach was used to identify hub genes associated with AIS and/or AMI progression. The intersection of the four gene sets identified key genes, which were subjected to functional enrichment and protein-protein interaction (PPI) network analyses. We confirmed the expression levels of hub genes using two sets of gene expression profiles (GSE58294 and GSE66360), and the ability of the genes to distinguish patients with AIS and/or AMI from control patients was assessed by calculating the receiver operating characteristic values. Finally, the investigation of transcription factor (TF)-, miRNA-, and drug-gene interactions led to the discovery of therapeutic candidates. Results: We identified 477 and 440 DEGs between the AIS and control groups and between the AMI and control groups, respectively. Using WGCNA, 2,776 and 2,811 genes in the key modules were identified for AIS and AMI, respectively. Sixty key genes were obtained from the intersection of the four gene sets, which were used to identify the 10 hub genes with the highest connection scores through PPI network analysis. Functional enrichment analysis revealed that the key genes were primarily involved in immunity-related processes. Finally, the upregulation of five hub genes was confirmed using two other datasets, and immune infiltration analysis revealed their correlation with certain immune cells. Regulatory network analyses indicated that GATA2 and hsa-mir-27a-3p might be important regulators of these genes. Conclusion: Using comprehensive bioinformatics analyses, we identified five immune-related biomarkers that significantly contributed to the pathophysiological mechanisms of both AIS and AMI. These biomarkers can be used to monitor and prevent AIS after AMI, or vice versa.
RESUMEN
BACKGROUND: Gene presence/absence (P/A) polymorphisms are commonly observed in plants and are important in individual adaptation and species differentiation. Detecting their abundance, distribution and variation among individuals would help to understand the role played by these polymorphisms in a given species. The recently sequenced 80 Arabidopsis genomes provide an opportunity to address these questions. RESULTS: By systematically investigating these accessions, we identified 2,407 P/A genes (or 8.9%) absent in one or more genomes, averaging 444 absent genes per accession. 50.6% of P/A genes belonged to multi-copy gene families, or 31.0% to clustered genes. However, the highest proportion of P/A genes, outnumbered in singleton genes, was observed in the regions near centromeres. In addition, a significant correlation was observed between the P/A gene frequency among the 80 accessions and the diversity level at P/A loci. Furthermore, the proportion of P/A genes was different among functional gene categories. Finally, a P/A gene tree showed a diversified population structure in the worldwide Arabidopsis accessions. CONCLUSIONS: An estimate of P/A genes and their frequency distribution in the worldwide Arabidopsis accessions was obtained. Our results suggest that there are diverse mechanisms to generate or maintain P/A genes, by which individuals and functionally different genes can selectively maintain P/A polymorphisms for a specific adaptation.
Asunto(s)
Arabidopsis/genética , Frecuencia de los Genes , Genes de Plantas , Polimorfismo Genético , Genética de Población , Técnicas de Genotipaje , Familia de MultigenesRESUMEN
Trypsin participates in many fundamental biological processes, the most notably in digesting food. The 12 species of Drosophila provide a great opportunity to analyze the duplication pattern of trypsins and their association with dietary changes. Here, we find that the trypsin family expands dramatically after speciation. The duplication events are strongly related to the host preferences, with significantly more copy numbers in species breeding on rotting fruits. Temporal analysis of the duplication events indicates that the occurrences of these events are not simultaneous, but rather correlate to the ecological change or host shift. Furthermore, we find that the specialists and generalists have different adaptive selections, which is revealed by dynamic duplication and/or deletion and relatively high Ka/Ks values on the duplicated events in specialists. Our findings suggest that the duplication of trypsin genes has played an important role in the adaption of Drosophila to the diverse ecosystems.
Asunto(s)
Drosophila/genética , Duplicación de Gen , Variación Genética , Tripsina/genética , Animales , Drosophila/clasificación , Evolución MolecularRESUMEN
NBS-LRR (nucleotide-binding site-leucine-rich repeat), LRR-RLK (LRR-receptor-like kinase), and LRR-only are the three major LRR-encoding genes. Owing to the crucial role played by them in plant resistance, development, and growth, extensive studies have been performed on the NBS-LRR and LRR-RLK genes. However, few studies have focused on these genes collectively; they may co-vary as all of them contain LRR motifs. To investigate their common evolutionary patterns, all major classes of LRR-encoding genes were identified in 12 plant species, and particularly compared in two pairs of close relatives, Arabidopsis thaliana-A. lyrata (At-Al) and Zea mays-Sorghum bicolor. Our results showed that these genes co-vary significantly in terms of their numbers between species and that the genes with certain evolutionary parameters are most likely to have similar functions. The development-related genes have clear orthologous relationships between closely related species, as well as lower nucleotide divergence, and Ka/Ks ratio. In contrast, resistance-related genes have exactly opposite characteristics and favor 11-15 LRRs per gene. This association could be very useful in predicting the function of LRR-encoding genes. The presence of co-variation suggests that LRRs, combined with other domains, can work better in some common functions. In order to cooperate efficiently, there should be balanced gene numbers among the different gene classes.
Asunto(s)
Arabidopsis/genética , Genes de Plantas/genética , Variación Genética/genética , Proteínas/genética , Zea mays/genética , Arabidopsis/clasificación , Evolución Molecular , Proteínas Repetidas Ricas en Leucina , Sistemas de Lectura Abierta/genética , Filogenia , Zea mays/clasificaciónRESUMEN
The XA21 protein has broad spectrum resistance against Xanthomonas oryzae pv. oryzae. Although Xa21-mediated immunity is well characterized, little is known about the origin and evolutionary history of this gene in grasses. Therefore, we analyzed all Xa21 gene homologs in eight whole-genome sequenced rice lines, as well as in four gramineous genomes, rice, Brachypodium, sorghum and maize; using Arabidopsis Xa21 homologs as outgroups, 17, 7, 7 and 3 Xa21 homologs were detected in these four grasses, respectively. Synteny and phylogenetic analysis showed that frequent gene translocation, duplication and/or loss, have occurred at Xa21 homologous loci, suggesting that they have undergone or are undergoing rapid generation of copy number variations. Within the rice species, the high level of nucleotide diversity between Xa21-like orthologs showed a strong association with the presence/absence haplotypes, suggesting that the genetic structure of rice lines plays an important role in the variations between these Xa21-like orthologs. Strongly positive selection was detected in the core region of the leucine-rich repeat domains of the Xa21 subclade among the rice lines, indicating that the rapid gene diversification of Xa21 homologs may be a strategy for a given species to adapt to the changing spectrum of species-specific pathogens.
Asunto(s)
Adaptación Biológica/genética , Evolución Molecular , Proteínas de Plantas/genética , Poaceae/genética , Arabidopsis/genética , Genoma de Planta , Oryza/genética , Filogenia , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/inmunología , Proteínas de Plantas/inmunología , Poaceae/clasificación , Polimorfismo Genético , Selección GenéticaRESUMEN
Despite long being considered as "junk", transposable elements (TEs) are now accepted as catalysts of evolution. One example is Mutator-like elements (MULEs, one type of terminal inverted repeat DNA TEs, or TIR TEs) capturing sequences as Pack-MULEs in plants. However, their origination mechanism remains perplexing, and whether TIR TEs mediate duplication in animals is almost unexplored. Here we identify 370 Pack-TIRs in 100 animal reference genomes and one Pack-TIR (Ssk-FB4) family in fly populations. We find that single-copy Pack-TIRs are mostly generated via transposition-independent gap filling, and multicopy Pack-TIRs are likely generated by transposition after replication fork switching. We show that a proportion of Pack-TIRs are transcribed and often form chimeras with hosts. We also find that Ssk-FB4s represent a young protein family, as supported by proteomics and signatures of positive selection. Thus, TIR TEs catalyze new gene structures and new genes in animals via both transposition-independent and -dependent mechanisms.
Asunto(s)
Elementos Transponibles de ADN/genética , Genoma de Planta/genética , Secuencias Repetidas Terminales/genética , Animales , Oryza/genéticaRESUMEN
RNA-based duplicated genes or functional retrocopies (retrogenes) are known to drive phenotypic evolution. Retrogenes emerge via retroposition, which is mainly mediated by long interspersed nuclear element 1 (LINE-1 or L1) retrotransposons in mammals. By contrast, long terminal repeat (LTR) retrotransposons appear to be the major player in plants, although an L1-like mechanism has also been hypothesized to be involved in retroposition. We tested this hypothesis by searching for young retrocopies, as these still retain the sequence features associated with the underlying retroposition mechanism. Specifically, we identified polymorphic retrocopies (retroCNVs) by analyzing public Arabidopsis (Arabidopsis thaliana) resequencing data. Furthermore, we searched for recently originated retrocopies encoded by the reference genome of Arabidopsis and Manihot esculenta. Across these two datasets, we found cases with L1-like hallmarks, namely, the expected target site sequence, a polyA tail and target site duplications. Such data suggest that an L1-like mechanism could operate in plants, especially dicots.
Asunto(s)
Duplicación de Gen , Elementos de Nucleótido Esparcido Largo , ARN/genética , Retroelementos , Tracheophyta/genética , Arabidopsis/genética , Evolución Molecular , Orden Génico , Genoma de PlantaRESUMEN
Amino acid usage varies from species to species. A previous study has found a universal trend in amino acid gain and loss in many taxa and a one-way model of amino acid evolution in which the number of new amino acids increases as the number of old amino acids decreases was proposed. Later studies showed that this pattern of amino acid gain and loss is likely to be compatible with the neutral theory. The present work aimed to further study this problem by investigating the evolutionary patterns of amino acids in 8 primates (the nucleotide and protein alignments are available online http://gattaca.nju.edu.cn/pub_data.html). First, the number of amino acids gained and lost was calculated and the evolution trend of each amino acid was inferred. These values were found to be closely related to the usage of each amino acid. Then we analyzed the mutational trend of amino acid substitution in human using SNPs, this trend is highly correlated with fixation trend only with greater variance. Finally, the trends in the evolution of 20 amino acids were evaluated in human on different time scales, and the increasing rate of 5 significantly increasing amino acids was found to decrease as a function of time elapsed since divergence, and the dS/dN ratio also found to increase as a function of time elapsed since divergence. These results suggested that the observed amino acid substitution pattern is influenced by mutation and purifying selection. In conclusion, the present study shows that usage of amino acids is an important factor capable of influencing the observed pattern of amino acid evolution, and also presented evidences suggesting that the observed universal trend of amino acid gain and loss is compatible with neutral evolution.
Asunto(s)
Aminoácidos/genética , Evolución Molecular , Animales , Callithrix/genética , Codón , Gorilla gorilla/genética , Humanos , Hylobates/genética , Macaca mulatta/genética , Pan troglodytes/genética , Pongo abelii/genéticaRESUMEN
How the structure and base composition of genes changed with the evolution of vertebrates remains a puzzling question. Here we analyzed 895 orthologous protein-coding genes in six multicellular animals: human, chicken, zebrafish, sea squirt, fruit fly, and worm. Our analyses reveal that many gene regions, particularly intron and 3' UTR, gradually expanded throughout the evolution of vertebrates from their invertebrate ancestors, and that the number of exons per gene increased. Studies based on all protein-coding genes in each genome provide consistent results. We also find that GC-content increased in many gene regions (especially 5' UTR) in the evolution of endotherms, except in coding-exons. Analysis of individual genomes shows that 3' UTR demonstrated stronger length and GC-content correlation with intron than 5' UTR, and gene with large intron in all six species demonstrated relatively similar GC-content. Our data indicates a great increase in complexity in vertebrate genes and we propose that the requirement for morphological and functional changes is probably the driving force behind the evolution of structure and base composition complexity in multicellular animal genes.
Asunto(s)
Composición de Base/genética , Evolución Molecular , Homología de Secuencia de Aminoácido , Vertebrados/genética , Regiones no Traducidas 3'/genética , Regiones no Traducidas 5'/genética , Animales , Biología Computacional , Exones/genética , Secuencia Rica en GC/genética , Genoma , Humanos , Intrones/genéticaRESUMEN
Protein is an essential component for life, and its synthesis is mediated by codons in any organisms on earth. While some codons encode the same amino acid, their usage is often highly biased. There are many factors that can cause the bias, but a potential effect of mononucleotide repeats, which are known to be highly mutable, on codon usage and codon pair preference is largely unknown. In this study we performed a genomic survey on the relationship between mononucleotide repeats and codon pair bias in 53 bacteria, 68 archaea, and 13 eukaryotes. By distinguishing the codon pair bias from the codon usage bias, four general patterns were revealed: strong avoidance of five or six mononucleotide repeats in codon pairs; lower observed/expected (o/e) ratio for codon pairs with C or G repeats (C/G pairs) than that with A or T repeats (A/T pairs); a negative correlation between genomic GC contents and the o/e ratios, particularly for C/G pairs; and avoidance of C/G pairs in highly conserved genes. These results support natural selection against long mononucleotide repeats, which could induce frameshift mutations in coding sequences. The fact that these patterns are found in all kingdoms of life suggests that this is a general phenomenon in living organisms. Thus, long mononucleotide repeats may play an important role in base composition and genetic stability of a gene and gene functions.