RESUMEN
Nucleic acids are not only static carriers of genetic information but also play vital roles in controlling cellular lifecycles through their fascinating structural diversity [...].
Asunto(s)
Biología Computacional , ADN , Conformación de Ácido Nucleico , ARN , ARN/química , ARN/metabolismo , ADN/química , ADN/metabolismo , Biología Computacional/métodos , HumanosRESUMEN
The dynamic processes operating on genomic DNA, such as gene expression and cellular division, lead inexorably to topological challenges in the form of entanglements, catenanes, knots, "bubbles", R-loops, and other outcomes of supercoiling and helical disruption. The resolution of toxic topological stress is the function attributed to DNA topoisomerases. A prominent example is the negative supercoiling (nsc) trailing processive enzymes such as DNA and RNA polymerases. The multiple equilibrium states that nscDNA can adopt by redistribution of helical twist and writhe include the left-handed double-helical conformation known as Z-DNA. Thirty years ago, one of our labs isolated a protein from Drosophila cells and embryos with a 100-fold greater affinity for Z-DNA than for B-DNA, and identified it as topoisomerase II (gene Top2, orthologous to the human UniProt proteins TOP2A and TOP2B). GTP increased the affinity and selectivity for Z-DNA even further and also led to inhibition of the isomerase enzymatic activity. An allosteric mechanism was proposed, in which topoII acts as a Z-DNA-binding protein (ZBP) to stabilize given states of topological (sub)domains and associated multiprotein complexes. We have now explored this possibility by comprehensive bioinformatic analyses of the available protein sequences of topoII representing organisms covering the whole tree of life. Multiple alignment of these sequences revealed an extremely high level of evolutionary conservation, including a winged-helix protein segment, here denoted as Zτ, constituting the putative structural homolog of Zα, the canonical Z-DNA/Z-RNA binding domain previously identified in the interferon-inducible RNA Adenosine-to-Inosine-editing deaminase, ADAR1p150. In contrast to Zα, which is separate from the protein segment responsible for catalysis, Zτ encompasses the active site tyrosine of topoII; a GTP-binding site and a GxxG sequence motif are in close proximity. Quantitative Zτ-Zα similarity comparisons and molecular docking with interaction scoring further supported the "B-Z-topoII hypothesis" and has led to an expanded mechanism for topoII function incorporating the recognition of Z-DNA segments ("Z-flipons") as an inherent and essential element. We further propose that the two Zτ domains of the topoII homodimer exhibit a single-turnover "conformase" activity on given G(ate) B-DNA segments ("Z-flipins"), inducing their transition to the left-handed Z-conformation. Inasmuch as the topoII-Z-DNA complexes are isomerase inactive, we infer that they fulfill important structural roles in key processes such as mitosis. Topoisomerases are preeminent targets of anti-cancer drug discovery, and we anticipate that detailed elucidation of their structural-functional interactions with Z-DNA and GTP will facilitate the design of novel, more potent and selective anti-cancer chemotherapeutic agents.
Asunto(s)
ADN Forma B , ADN de Forma Z , Humanos , Simulación del Acoplamiento Molecular , ADN/química , ADN-Topoisomerasas de Tipo II/genética , ADN-Topoisomerasas de Tipo II/metabolismo , Guanosina Trifosfato , Adenosina Desaminasa/metabolismoRESUMEN
The opium poppy's ability to produce various alkaloids is both useful and problematic. Breeding of new varieties with varying alkaloid content is therefore an important task. In this paper, the breeding technology of new low morphine poppy genotypes, based on a combination of a TILLING approach and single-molecule real-time NGS sequencing, is presented. Verification of the mutants in the TILLING population was obtained using RT-PCR and HPLC methods. Only three of the single-copy genes of the morphine pathway among the eleven genes were used for the identification of mutant genotypes. Point mutations were obtained only in one gene (CNMT) while an insertion was obtained in the other (SalAT). Only a few expected transition SNPs from G:C to A:T were obtained. In the low morphine mutant genotype, the production of morphine was decreased to 0.1% from 1.4% in the original variety. A comprehensive description of the breeding process, a basic characterization of the main alkaloid content, and a gene expression profile for the main alkaloid-producing genes is provided. Difficulties with the TILLING approach are also described and discussed.
RESUMEN
Epigenetics deals with changes in gene expression that are not caused by modifications in the primary sequence of nucleic acids. These changes beyond primary structures of nucleic acids not only include DNA/RNA methylation, but also other reversible conversions, together with histone modifications or RNA interference. In addition, under particular conditions (such as specific ion concentrations or protein-induced stabilization), the right-handed double-stranded DNA helix (B-DNA) can form noncanonical structures commonly described as "non-B DNA" structures. These structures comprise, for example, cruciforms, i-motifs, triplexes, and G-quadruplexes. Their formation often leads to significant differences in replication and transcription rates. Noncanonical RNA structures have also been documented to play important roles in translation regulation and the biology of noncoding RNAs. In human and animal studies, the frequency and dynamics of noncanonical DNA and RNA structures are intensively investigated, especially in the field of cancer research and neurodegenerative diseases. In contrast, noncanonical DNA and RNA structures in plants have been on the fringes of interest for a long time and only a few studies deal with their formation, regulation, and physiological importance for plant stress responses. Herein, we present a review focused on the main fields of epigenetics in plants and their possible roles in stress responses and signaling, with special attention dedicated to noncanonical DNA and RNA structures.
Asunto(s)
G-Cuádruplex , Ácidos Nucleicos , Animales , Humanos , ADN/genética , ADN/química , Epigénesis Genética , ARN/genética , ARN/química , Plantas/genéticaRESUMEN
Sequences of nucleic acids with the potential to form four-stranded G-quadruplex structures are intensively studied mainly in the context of human diseases, pathogens, or extremophile organisms; nonetheless, the knowledge about their occurrence and putative role in plants is still limited. This work is focused on G-quadruplex-forming sites in two gene sets of interest: drought stress-responsive genes, and genes related to the production/biosynthesis of phenolic compounds in the model plant organism Arabidopsis thaliana. In addition, 20 housekeeping genes were analyzed as well, where the constitutive gene expression was expected (with no need for precise regulation depending on internal or external factors). The results have shown that none of the tested gene sets differed significantly in the content of G-quadruplex-forming sites, however, the highest frequency of G-quadruplex-forming sites was found in the 5'-UTR regions of phenolic compounds' biosynthesis genes, which indicates the possibility of their regulation at the mRNA level. In addition, mainly within the introns and 1000 bp flanks downstream gene regions, G-quadruplex-forming sites were highly underrepresented. Finally, cluster analysis allowed us to observe similarities between particular genes in terms of their PQS characteristics. We believe that the original approach used in this study may become useful for further and more comprehensive bioinformatic studies in the field of G-quadruplex genomics.
RESUMEN
Plant miRNAs are powerful regulators of gene expression at the post-transcriptional level, which was repeatedly proved in several model plant species. miRNAs are considered to be key regulators of many developmental, homeostatic, and immune processes in plants. However, our understanding of plant miRNAs is still limited, despite the fact that an increasing number of studies have appeared. This systematic review aims to summarize our current knowledge about miRNAs in spring barley (Hordeum vulgare), which is an important agronomical crop worldwide and serves as a common monocot model for studying abiotic stress responses as well. This can help us to understand the connection between plant miRNAs and (not only) abiotic stresses in general. In the end, some future perspectives and open questions are summarized.
Asunto(s)
Hordeum , MicroARNs , Hordeum/genética , Hordeum/metabolismo , MicroARNs/genética , MicroARNs/metabolismo , Estrés Fisiológico/genética , Plantas/metabolismo , Regulación de la Expresión Génica de las PlantasRESUMEN
Photosynthetically active radiation (PAR) is an important environmental cue inducing the production of many secondary metabolites involved in plant oxidative stress avoidance and tolerance. To examine the complex role of PAR irradiance and specific spectral components on the accumulation of phenolic compounds (PheCs), we acclimated spring barley (Hordeum vulgare) to different spectral qualities (white, blue, green, red) at three irradiances (100, 200, 400 µmol m-2 s-1). We confirmed that blue light irradiance is essential for the accumulation of PheCs in secondary barley leaves (in UV-lacking conditions), which underpins the importance of photoreceptor signals (especially cryptochrome). Increasing blue light irradiance most effectively induced the accumulation of B-dihydroxylated flavonoids, probably due to the significantly enhanced expression of the F3'H gene. These changes in PheC metabolism led to a steeper increase in antioxidant activity than epidermal UV-A shielding in leaf extracts containing PheCs. In addition, we examined the possible role of miRNAs in the complex regulation of gene expression related to PheC biosynthesis.
Asunto(s)
Hordeum , Rayos Ultravioleta , Flavonoides/metabolismo , Hordeum/genética , Hordeum/metabolismo , Luz , Fenoles/metabolismo , Hojas de la Planta/genética , Hojas de la Planta/metabolismoRESUMEN
SARS-CoV-2 is a novel positive-sense single-stranded RNA virus from the Coronaviridae family (genus Betacoronavirus), which has been established as causing the COVID-19 pandemic. The genome of SARS-CoV-2 is one of the largest among known RNA viruses, comprising of at least 26 known protein-coding loci. Studies thus far have outlined the coding capacity of the positive-sense strand of the SARS-CoV-2 genome, which can be used directly for protein translation. However, it has been recently shown that transcribed negative-sense viral RNA intermediates that arise during viral genome replication from positive-sense viruses can also code for proteins. No studies have yet explored the potential for negative-sense SARS-CoV-2 RNA intermediates to contain protein-coding loci. Thus, using sequence and structure-based bioinformatics methodologies, we have investigated the presence and validity of putative negative-sense ORFs (nsORFs) in the SARS-CoV-2 genome. Nine nsORFs were discovered to contain strong eukaryotic translation initiation signals and high codon adaptability scores, and several of the nsORFs were predicted to interact with RNA-binding proteins. Evolutionary conservation analyses indicated that some of the nsORFs are deeply conserved among related coronaviruses. Three-dimensional protein modeling revealed the presence of higher order folding among all putative SARS-CoV-2 nsORFs, and subsequent structural mimicry analyses suggest similarity of the nsORFs to DNA/RNA-binding proteins and proteins involved in immune signaling pathways. Altogether, these results suggest the potential existence of still undescribed SARS-CoV-2 proteins, which may play an important role in the viral lifecycle and COVID-19 pathogenesis.
Asunto(s)
COVID-19 , SARS-CoV-2 , COVID-19/genética , Genoma Viral , Humanos , Pandemias , ARN Viral/química , ARN Viral/genética , Proteínas de Unión al ARN/genética , SARS-CoV-2/genéticaRESUMEN
Due to the fast global spreading of the Severe Acute Respiratory Syndrome Coronavirus - 2 (SARS-CoV-2), prevention and treatment options are direly needed in order to control infection-related morbidity, mortality, and economic losses. Although drug and inactivated and attenuated virus vaccine development can require significant amounts of time and resources, DNA and RNA vaccines offer a quick, simple, and cheap treatment alternative, even when produced on a large scale. The spike protein, which has been shown as the most antigenic SARS-CoV-2 protein, has been widely selected as the target of choice for DNA/RNA vaccines. Vaccination campaigns have reported high vaccination rates and protection, but numerous unintended effects, ranging from muscle pain to death, have led to concerns about the safety of RNA/DNA vaccines. In parallel to these studies, several open reading frames (ORFs) have been found to be overlapping SARS-CoV-2 accessory genes, two of which, ORF2b and ORF-Sh, overlap the spike protein sequence. Thus, the presence of these, and potentially other ORFs on SARS-CoV-2 DNA/RNA vaccines, could lead to the translation of undesired proteins during vaccination. Herein, we discuss the translation of overlapping genes in connection with DNA/RNA vaccines. Two mRNA vaccine spike protein sequences, which have been made publicly-available, were compared to the wild-type sequence in order to uncover possible differences in putative overlapping ORFs. Notably, the Moderna mRNA-1273 vaccine sequence is predicted to contain no frameshifted ORFs on the positive sense strand, which highlights the utility of codon optimization in DNA/RNA vaccine design to remove undesired overlapping ORFs. Since little information is available on ORF2b or ORF-Sh, we use structural bioinformatics techniques to investigate the structure-function relationship of these proteins. The presence of putative ORFs on DNA/RNA vaccine candidates implies that overlapping genes may contribute to the translation of smaller peptides, potentially leading to unintended clinical outcomes, and that the protein-coding potential of DNA/RNA vaccines should be rigorously examined prior to administration.
Asunto(s)
Genes Sobrepuestos , Genes Virales , Vacunas de ADN/genética , Vacunas de ARNm/genética , Vacunas contra la COVID-19/efectos adversos , Vacunas contra la COVID-19/genética , Codón , Humanos , Conformación de Ácido Nucleico , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Dominios Proteicos , ARN Mensajero , Glicoproteína de la Espiga del Coronavirus/genética , Vacunas de ADN/efectos adversos , Vacunas de ARNm/efectos adversosRESUMEN
Z-DNA and Z-RNA are functionally important left-handed structures of nucleic acids, which play a significant role in several molecular and biological processes including DNA replication, gene expression regulation and viral nucleic acid sensing. Most proteins that have been proven to interact with Z-DNA/Z-RNA contain the so-called Zα domain, which is structurally well conserved. To date, only eight proteins with Zα domain have been described within a few organisms (including human, mouse, Danio rerio, Trypanosoma brucei and some viruses). Therefore, this paper aimed to search for new Z-DNA/Z-RNA binding proteins in the complete PDB structures database and from the AlphaFold2 protein models. A structure-based similarity search found 14 proteins with highly similar Zα domain structure in experimentally-defined proteins and 185 proteins with a putative Zα domain using the AlphaFold2 models. Structure-based alignment and molecular docking confirmed high functional conservation of amino acids involved in Z-DNA/Z-RNA, suggesting that Z-DNA/Z-RNA recognition may play an important role in a variety of cellular processes.
Asunto(s)
ADN de Forma Z/química , Proteínas de Unión al ADN/química , Modelos Moleculares , Dominios y Motivos de Interacción de Proteínas , Proteínas de Unión al ARN/química , ARN/química , Secuencia de Aminoácidos , Sitios de Unión , ADN de Forma Z/metabolismo , Proteínas de Unión al ADN/metabolismo , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Conformación de Ácido Nucleico , Unión Proteica , Conformación Proteica , ARN/metabolismo , Proteínas de Unión al ARN/metabolismo , Relación Estructura-ActividadRESUMEN
Water deficiency is one of the most significant abiotic stresses that negatively affects growth and reduces crop yields worldwide. Most research is focused on model plants and/or crops which are most agriculturally important. In this research, drought stress was applied to two drought stress contrasting varieties of Papaver somniferum (the opium poppy), a non-model plant species, during the first week of its germination, which differ in responses to drought stress. After sowing, the poppy seedlings were immediately subjected to drought stress for 7 days. We conducted a large-scale transcriptomic and proteomic analysis for drought stress response. At first, we found that the transcriptomic and proteomic profiles significantly differ. However, the most significant findings are the identification of key genes and proteins with significantly different expressions relating to drought stress, e.g., the heat-shock protein family, dehydration responsive element-binding transcription factors, ubiquitin E3 ligase, and others. In addition, metabolic pathway analysis showed that these genes and proteins were part of several biosynthetic pathways most significantly related to photosynthetic processes, and oxidative stress responses. A future study will focus on a detailed analysis of key genes and the development of selection markers for the determination of drought-resistant varieties and the breeding of new resistant lineages.
RESUMEN
Recently, the quest for the mythical fountain of youth has produced extensive research programs that aim to extend the healthy lifespan of humans. Despite advances in our understanding of the aging process, the surprisingly extended lifespan and cancer resistance of some animal species remain unexplained. The p53 protein plays a crucial role in tumor suppression, tissue homeostasis, and aging. Long-lived, cancer-free African elephants have 20 copies of the TP53 gene, including 19 retrogenes (38 alleles), which are partially active, whereas humans possess only one copy of TP53 and have an estimated cancer mortality rate of 11-25%. The mechanism through which p53 contributes to the resolution of the Peto's paradox in Animalia remains vague. Thus, in this work, we took advantage of the available datasets and inspected the p53 amino acid sequence of phylogenetically related organisms that show variations in their lifespans. We discovered new correlations between specific amino acid deviations in p53 and the lifespans across different animal species. We found that species with extended lifespans have certain characteristic amino acid substitutions in the p53 DNA-binding domain that alter its function, as depicted from the Phenotypic Annotation of p53 Mutations, using the PROVEAN tool or SWISS-MODEL workflow. In addition, the loop 2 region of the human p53 DNA-binding domain was identified as the longest region that was associated with longevity. The 3D model revealed variations in the loop 2 structure in long-lived species when compared with human p53. Our findings show a direct association between specific amino acid residues in p53 protein, changes in p53 functionality, and the extended animal lifespan, and further highlight the importance of p53 protein in aging.
Asunto(s)
Bases de Datos Genéticas , Dosificación de Gen , Longevidad , Modelos Moleculares , Animales , Dominios Proteicos , Estructura Secundaria de Proteína , Especificidad de la Especie , Proteína p53 Supresora de Tumor/química , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismoRESUMEN
G-quadruplexes have long been perceived as rare and physiologically unimportant nucleic acid structures. However, several studies have revealed their importance in molecular processes, suggesting their possible role in replication and gene expression regulation. Pathways involving G-quadruplexes are intensively studied, especially in the context of human diseases, while their involvement in gene expression regulation in plants remains largely unexplored. Here, we conducted a bioinformatic study and performed a complex circular dichroism measurement to identify a stable G-quadruplex in the gene RPB1, coding for the RNA polymerase II large subunit. We found that this G-quadruplex-forming locus is highly evolutionarily conserved amongst plants sensu lato (Archaeplastida) that share a common ancestor more than one billion years old. Finally, we discussed a new hypothesis regarding G-quadruplexes interacting with UV light in plants to potentially form an additional layer of the regulatory network.
Asunto(s)
G-Cuádruplex , Proteínas de Plantas/química , Plantas/química , ARN Polimerasa II/química , Secuencia de Aminoácidos , Arabidopsis/química , Arabidopsis/genética , Arabidopsis/efectos de la radiación , Dicroismo Circular , Biología Computacional , Evolución Molecular , G-Cuádruplex/efectos de la radiación , Regulación de la Expresión Génica de las Plantas/genética , Glaucophyta/química , Glaucophyta/genética , Glaucophyta/efectos de la radiación , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/efectos de la radiación , Plantas/genética , Plantas/efectos de la radiación , ARN Polimerasa II/genética , Rhodophyta/química , Rhodophyta/genética , Rhodophyta/efectos de la radiación , Alineación de Secuencia , Rayos UltravioletaRESUMEN
In a recently published paper, we have found that SARS-CoV-2 hot-spot mutations are significantly associated with inverted repeat loci and CG dinucleotides. However, fast-spreading strains with new mutations (so-called mink farm mutations, England mutations and Japan mutations) have been recently described. We used the new datasets to check the positioning of mutation sites in genomes of the new SARS-CoV-2 strains. Using an open-access Palindrome analyzer tool, we found mutations in these new strains to be significantly enriched in inverted repeat loci.
Asunto(s)
Mutación , SARS-CoV-2/genética , COVID-19/virología , Genoma Viral , HumanosRESUMEN
G-quadruplexes contribute to the regulation of key molecular processes. Their utilization for antiviral therapy is an emerging field of contemporary research. Here we present comprehensive analyses of the presence and localization of putative G-quadruplex forming sequences (PQS) in all viral genomes currently available in the NCBI database (including subviral agents). The G4Hunter algorithm was applied to a pool of 11,000 accessible viral genomes representing 350 Mbp in total. PQS frequencies differ across evolutionary groups of viruses, and are enriched in repeats, replication origins, 5'UTRs and 3'UTRs. Importantly, PQS presence and localization is connected to viral lifecycles and corresponds to the type of viral infection rather than to nucleic acid type; while viruses routinely causing persistent infections in Metazoa hosts are enriched for PQS, viruses causing acute infections are significantly depleted for PQS. The unique localization of PQS identifies the importance of G-quadruplex-based regulation of viral replication and life cycle, providing a tool for potential therapeutic targeting.
Asunto(s)
Bases de Datos de Ácidos Nucleicos , G-Cuádruplex , Genoma Viral , Virosis , Virus , ADN Viral/genética , ADN Viral/metabolismo , Humanos , Virosis/genética , Virosis/metabolismo , Virus/genética , Virus/metabolismoRESUMEN
The importance of gene expression regulation in viruses based upon G-quadruplex may point to its potential utilization in therapeutic targeting. Here, we present analyses as to the occurrence of putative G-quadruplex-forming sequences (PQS) in all reference viral dsDNA genomes and evaluate their dependence on PQS occurrence in host organisms using the G4Hunter tool. PQS frequencies differ across host taxa without regard to GC content. The overlay of PQS with annotated regions reveals the localization of PQS in specific regions. While abundance in some, such as repeat regions, is shared by all groups, others are unique. There is abundance within introns of Eukaryota-infecting viruses, but depletion of PQS in introns of bacteria-infecting viruses. We reveal a significant positive correlation between PQS frequencies in dsDNA viruses and corresponding hosts from archaea, bacteria, and eukaryotes. A strong relationship between PQS in a virus and its host indicates their close coevolution and evolutionarily reciprocal mimicking of genome organization.
Asunto(s)
Biología Computacional/métodos , ADN/genética , G-Cuádruplex , Genoma Viral , Proteínas Virales/genética , Archaea/virología , Bacterias/virología , Regulación de la Expresión Génica , Genoma , Humanos , Virus/genéticaRESUMEN
Nucleic acid-binding proteins are traditionally divided into two categories: With the ability to bind DNA or RNA. In the light of new knowledge, such categorizing should be overcome because a large proportion of proteins can bind both DNA and RNA. Another even more important features of nucleic acid-binding proteins are so-called sequence or structure specificities. Proteins able to bind nucleic acids in a sequence-specific manner usually contain one or more of the well-defined structural motifs (zinc-fingers, leucine zipper, helix-turn-helix, or helix-loop-helix). In contrast, many proteins do not recognize nucleic acid sequence but rather local DNA or RNA structures (G-quadruplexes, i-motifs, triplexes, cruciforms, left-handed DNA/RNA form, and others). Finally, there are also proteins recognizing both sequence and local structural properties of nucleic acids (e.g., famous tumor suppressor p53). In this mini-review, we aim to summarize current knowledge about the amino acid composition of various types of nucleic acid-binding proteins with a special focus on significant enrichment and/or depletion in each category.
Asunto(s)
Proteínas de Unión al ADN/genética , ADN/ultraestructura , Conformación de Ácido Nucleico , ARN/ultraestructura , Secuencia de Aminoácidos/genética , Proteínas Portadoras/genética , Proteínas Portadoras/ultraestructura , ADN/genética , ADN de Forma Z , G-Cuádruplex , Humanos , Leucina Zippers/genética , Nucleoproteínas/genética , Nucleoproteínas/ultraestructura , ARN/química , Dedos de Zinc/genéticaRESUMEN
G-quadruplexes are four-stranded nucleic acid structures occurring in the genomes of all living organisms and viruses. It is increasingly evident that these structures play important molecular roles; generally, by modulating gene expression and overall genome integrity. For a long period, G-quadruplexes have been studied specifically in the context of human promoters, telomeres, and associated diseases (cancers, neurological disorders). Several of the proteins for binding G-quadruplexes are known, providing promising targets for influencing G-quadruplex-related processes in organisms. Nonetheless, in plants, only a small number of G-quadruplex binding proteins have been described to date. Thus, we aimed to bioinformatically inspect the available protein sequences to find the best protein candidates with the potential to bind G-quadruplexes. Two similar glycine and arginine-rich G-quadruplex-binding motifs were described in humans. The first is the so-called "RGG motif"-RRGDGRRRGGGGRGQGGRGRGGGFKG, and the second (which has been recently described) is known as the "NIQI motif"-RGRGRGRGGGSGGSGGRGRG. Using this general knowledge, we searched for plant proteins containing the above mentioned motifs, using two independent approaches (BLASTp and FIMO scanning), and revealed many proteins containing the G4-binding motif(s). Our research also revealed the core proteins involved in G4 folding and resolving in green plants, algae, and the key plant model organism, Arabidopsis thaliana. The discovered protein candidates were annotated using STRINGdb and sorted by their molecular and physiological roles in simple schemes. Our results point to the significant role of G4-binding proteins in the regulation of gene expression in plants.
RESUMEN
SARS-CoV-2 is an intensively investigated virus from the order Nidovirales (Coronaviridae family) that causes COVID-19 disease in humans. Through enormous scientific effort, thousands of viral strains have been sequenced to date, thereby creating a strong background for deep bioinformatics studies of the SARS-CoV-2 genome. In this study, we inspected high-frequency mutations of SARS-CoV-2 and carried out systematic analyses of their overlay with inverted repeat (IR) loci and CpG islands. The main conclusion of our study is that SARS-CoV-2 hot-spot mutations are significantly enriched within both IRs and CpG island loci. This points to their role in genomic instability and may predict further mutational drive of the SARS-CoV-2 genome. Moreover, CpG islands are strongly enriched upstream from viral ORFs and thus could play important roles in transcription and the viral life cycle. We hypothesize that hypermethylation of these loci will decrease the transcription of viral ORFs and could therefore limit the progression of the disease.
Asunto(s)
COVID-19/virología , Islas de CpG , Mutación , SARS-CoV-2/genética , Metilación de ADN , Genoma Viral , Humanos , Unión ProteicaRESUMEN
The importance of unusual DNA structures in the regulation of basic cellular processes is an emerging field of research. Amongst local non-B DNA structures, G-quadruplexes (G4s) have gained in popularity during the last decade, and their presence and functional relevance at the DNA and RNA level has been demonstrated in a number of viral, bacterial, and eukaryotic genomes, including humans. Here, we performed the first systematic search of G4-forming sequences in all archaeal genomes available in the NCBI database. In this article, we investigate the presence and locations of G-quadruplex forming sequences using the G4Hunter algorithm. G-quadruplex-prone sequences were identified in all archaeal species, with highly significant differences in frequency, from 0.037 to 15.31 potential quadruplex sequences per kb. While G4 forming sequences were extremely abundant in Hadesarchaea archeon (strikingly, more than 50% of the Hadesarchaea archaeon isolate WYZ-LMO6 genome is a potential part of a G4-motif), they were very rare in the Parvarchaeota phylum. The presence of G-quadruplex forming sequences does not follow a random distribution with an over-representation in non-coding RNA, suggesting possible roles for ncRNA regulation. These data illustrate the unique and non-random localization of G-quadruplexes in Archaea.