RESUMEN
Asgard archaea are of great interest as the progenitors of Eukaryotes, but little is known about the mobile genetic elements (MGEs) that may shape their ongoing evolution. Here, we describe MGEs that replicate in Atabeyarchaeia, a wetland Asgard archaea lineage represented by two complete genomes. We used soil depth-resolved population metagenomic data sets to track 18 MGEs for which genome structures were defined and precise chromosome integration sites could be identified for confident host linkage. Additionally, we identified a complete 20.67 kbp circular plasmid and two family-level groups of viruses linked to Atabeyarchaeia, via CRISPR spacer targeting. Closely related 40 kbp viruses possess a hypervariable genomic region encoding combinations of specific genes for small cysteine-rich proteins structurally similar to restriction-homing endonucleases. One 10.9 kbp integrative conjugative element (ICE) integrates genomically into the Atabeyarchaeum deiterrae-1 chromosome and has a 2.5 kbp circularizable element integrated within it. The 10.9 kbp ICE encodes an expressed Type IIG restriction-modification system with a sequence specificity matching an active methylation motif identified by Pacific Biosciences (PacBio) high-accuracy long-read (HiFi) metagenomic sequencing. Restriction-modification of Atabeyarchaeia differs from that of another coexisting Asgard archaea, Freyarchaeia, which has few identified MGEs but possesses diverse defense mechanisms, including DISARM and Hachiman, not found in Atabeyarchaeia. Overall, defense systems and methylation mechanisms of Asgard archaea likely modulate their interactions with MGEs, and integration/excision and copy number variation of MGEs in turn enable host genetic versatility.
RESUMEN
The amount of bacterial and archaeal genome sequence and methylome data has greatly increased over the last decade, enabling new insights into the functional roles of DNA methylation in these organisms. Methyltransferases (MTases), the enzymes responsible for DNA methylation, are exchanged between prokaryotes through horizontal gene transfer and can function either as part of restriction-modification systems or in apparent isolation as single (orphan) genes. The patterns of DNA methylation they confer on the host chromosome can have significant effects on gene expression, DNA replication, and other cellular processes. Some processes require very stable patterns of methylation, resulting in conservation of persistent MTases in a particular lineage. Other processes require patterns that are more dynamic yet more predictable than what is afforded by horizontal gene transfer and gene loss, resulting in phase-variable or recombination-driven MTase alleles. In this review, we discuss what is currently known about the functions of DNA methylation in prokaryotes in light of these evolutionary patterns.
Asunto(s)
Metilación de ADN , Epigenómica , Enzimas de Restricción-Modificación del ADN/genética , Enzimas de Restricción-Modificación del ADN/metabolismo , Metiltransferasas/genética , Metiltransferasas/metabolismo , Células Procariotas/metabolismoRESUMEN
REBASE is a comprehensive and extensively curated database of information about the components of restriction-modification (RM) systems. It is fully referenced and provides information about the recognition and cleavage sites for both restriction enzymes and DNA methyltransferases together with their commercial availability, methylation sensitivity, crystal and sequence data. All completely sequenced genomes and select shotgun sequences are analyzed for RM system components. When PacBio sequence data is available, the recognition sequences of many DNA methyltransferases (MTases) can be determined. This has led to an explosive growth in the number of well-characterized MTases in REBASE. The contents of REBASE may be browsed from the web rebase.neb.com and selected compilations can be downloaded by FTP (ftp.neb.com). Monthly updates are also available via email.
Asunto(s)
Metilación de ADN , Metilasas de Modificación del ADN , Bases de Datos Factuales , Enzimas de Restricción del ADN/metabolismo , Metilasas de Modificación del ADN/metabolismo , ADN/genética , Enzimas de Restricción-Modificación del ADN/genéticaRESUMEN
How do we scale biological science to the demand of next generation biology and medicine to keep track of the facts, predictions, and hypotheses? These days, enormous amounts of DNA sequence and other omics data are generated. Since these data contain the blueprint for life, it is imperative that we interpret it accurately. The abundance of DNA is only one part of the challenge. Artificial Intelligence (AI) and network methods routinely build on large screens, single cell technologies, proteomics, and other modalities to infer or predict biological functions and phenotypes associated with proteins, pathways, and organisms. As a first step, how do we systematically trace the provenance of knowledge from experimental ground truth to gene function predictions and annotations? Here, we review the main challenges in tracking the evolution of biological knowledge and propose several specific solutions to provenance and computational tracing of evidence in functional linkage networks.
Asunto(s)
Macrodatos , Redes Reguladoras de Genes , Genómica/estadística & datos numéricos , Algoritmos , Inteligencia Artificial , Biología Computacional , Ligamiento Genético , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Proteómica/estadística & datos numéricos , Biología Sintética , Biología de SistemasRESUMEN
DNA methylation is widespread amongst eukaryotes and prokaryotes to modulate gene expression and confer viral resistance. 5-Methylcytosine (m5C) methylation has been described in genomes of a large fraction of bacterial species as part of restriction-modification systems, each composed of a methyltransferase and cognate restriction enzyme. Methylases are site-specific and target sequences vary across organisms. High-throughput methods, such as bisulfite-sequencing can identify m5C at base resolution but require specialized library preparations and single molecule, real-time (SMRT) sequencing usually misses m5C. Here, we present a new method called RIMS-seq (rapid identification of methylase specificity) to simultaneously sequence bacterial genomes and determine m5C methylase specificities using a simple experimental protocol that closely resembles the DNA-seq protocol for Illumina. Importantly, the resulting sequencing quality is identical to DNA-seq, enabling RIMS-seq to substitute standard sequencing of bacterial genomes. Applied to bacteria and synthetic mixed communities, RIMS-seq reveals new methylase specificities, supporting routine study of m5C methylation while sequencing new genomes.
Asunto(s)
5-Metilcitosina/metabolismo , Metilasas de Modificación del ADN/metabolismo , Enzimas de Restricción del ADN/metabolismo , Escherichia coli K12/genética , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Acinetobacter calcoaceticus/enzimología , Acinetobacter calcoaceticus/genética , Aeromonas hydrophila/enzimología , Aeromonas hydrophila/genética , Bacillus amyloliquefaciens/enzimología , Bacillus amyloliquefaciens/genética , Secuencia de Bases , Clostridium acetobutylicum/enzimología , Clostridium acetobutylicum/genética , Metilación de ADN , Metilasas de Modificación del ADN/genética , Enzimas de Restricción del ADN/genética , Escherichia coli K12/enzimología , Regulación Bacteriana de la Expresión Génica , Haemophilus/enzimología , Haemophilus/genética , Haemophilus influenzae/enzimología , Haemophilus influenzae/genética , Humanos , Microbiota/genética , Análisis de Secuencia de ADN , Piel/microbiologíaRESUMEN
Analysis of genomic DNA from pathogenic strains of Burkholderia cenocepacia J2315 and Escherichia coli O104:H4 revealed the presence of two unusual MTase genes. Both are plasmid-borne ORFs, carried by pBCA072 for B. cenocepacia J2315 and pESBL for E. coli O104:H4. Pacific Biosciences SMRT sequencing was used to investigate DNA methyltransferases M.BceJIII and M.EcoGIX, using artificial constructs. Mating properties of engineered pESBL derivatives were also investigated. Both MTases yield promiscuous m6A modification of single strands, in the context SAY (where S = C or G and Y = C or T). Strikingly, this methylation is asymmetric in vivo, detected almost exclusively on one DNA strand, and is incomplete: typically, around 40% of susceptible motifs are modified. Genetic and biochemical studies suggest that enzyme action depends on replication mode: DNA Polymerase I (PolI)-dependent ColE1 and p15A origins support asymmetric modification, while the PolI-independent pSC101 origin does not. An MTase-PolI complex may enable discrimination of PolI-dependent and independent plasmid origins. M.EcoGIX helps to establish pESBL in new hosts by blocking the action of restriction enzymes, in an orientation-dependent fashion. Expression and action appear to occur on the entering single strand in the recipient, early in conjugal transfer, until lagging-strand replication creates the double-stranded form.
Asunto(s)
Metilación de ADN/genética , ADN Polimerasa I/genética , ADN de Cadena Simple/genética , Metiltransferasas/genética , Proteínas Bacterianas/genética , Burkholderia cenocepacia/genética , Replicación del ADN/genética , Escherichia coli O104/genética , Proteínas de Escherichia coli/genética , Genoma Bacteriano/genética , Plásmidos/genética , Proteínas Ribosómicas/genéticaRESUMEN
HhaI, a Type II restriction endonuclease, recognizes the symmetric sequence 5'-GCG↓C-3' in duplex DNA and cleaves ('↓') to produce fragments with 2-base, 3'-overhangs. We determined the structure of HhaI in complex with cognate DNA at an ultra-high atomic resolution of 1.0 Å. Most restriction enzymes act as dimers with two catalytic sites, and cleave the two strands of duplex DNA simultaneously, in a single binding event. HhaI, in contrast, acts as a monomer with only one catalytic site, and cleaves the DNA strands sequentially, one after the other. HhaI comprises three domains, each consisting of a mixed five-stranded ß sheet with a defined function. The first domain contains the catalytic-site; the second contains residues for sequence recognition; and the third contributes to non-specific DNA binding. The active-site belongs to the 'PD-D/EXK' superfamily of nucleases and contains the motif SD-X11-EAK. The first two domains are similar in structure to two other monomeric restriction enzymes, HinP1I (G↓CGC) and MspI (C↓CGG), which produce fragments with 5'-overhangs. The third domain, present only in HhaI, shifts the positions of the recognition residues relative to the catalytic site enabling this enzyme to cleave the recognition sequence at a different position. The structure of M.HhaI, the biological methyltransferase partner of HhaI, was determined earlier. Together, these two structures represent the first natural pair of restriction-modification enzymes to be characterized in atomic detail.
Asunto(s)
ADN/ultraestructura , Desoxirribonucleasas de Localización Especificada Tipo II/ultraestructura , Conformación de Ácido Nucleico , Conformación Proteica , Dominio Catalítico , Cristalografía por Rayos X , ADN/química , ADN/genética , Enzimas de Restricción del ADN/química , Enzimas de Restricción del ADN/genética , Enzimas de Restricción del ADN/ultraestructura , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/ultraestructura , Desoxirribonucleasas de Localización Especificada Tipo II/química , Desoxirribonucleasas de Localización Especificada Tipo II/genética , Haemophilus/química , Haemophilus/enzimología , Unión Proteica/genéticaRESUMEN
The genomes of gut Bacteroidales contain numerous invertible regions, many of which contain promoters that dictate phase-variable synthesis of surface molecules such as polysaccharides, fimbriae, and outer surface proteins. Here, we characterize a different type of phase-variable system of Bacteroides fragilis, a Type I restriction modification system (R-M). We show that reversible DNA inversions within this R-M locus leads to the generation of eight specificity proteins with distinct recognition sites. In vitro grown bacteria have a different proportion of specificity gene combinations at the expression locus than bacteria isolated from the mammalian gut. By creating mutants, each able to produce only one specificity protein from this region, we identified the R-M recognition sites of four of these S-proteins using SMRT sequencing. Transcriptome analysis revealed that the locked specificity mutants, whether grown in vitro or isolated from the mammalian gut, have distinct transcriptional profiles, likely creating different phenotypes, one of which was confirmed. Genomic analyses of diverse strains of Bacteroidetes from both host-associated and environmental sources reveal the ubiquity of phase-variable R-M systems in this phylum.
Asunto(s)
Proteínas Bacterianas/metabolismo , Bacteroides fragilis/enzimología , Enzimas de Restricción-Modificación del ADN/metabolismo , Microbioma Gastrointestinal , Animales , Proteínas Bacterianas/genética , Enzimas de Restricción-Modificación del ADN/genética , Humanos , Ratones , Mutación , TranscriptomaRESUMEN
Type I restriction-modification (R-M) systems consist of a DNA endonuclease (HsdR, HsdM and HsdS subunits) and methyltransferase (HsdM and HsdS subunits). The hsdS sequences flanked by inverted repeats (referred to as epigenetic invertons) in certain Type I R-M systems undergo invertase-catalyzed inversions. Previous studies in Streptococcus pneumoniae have shown that hsdS inversions within clonal populations produce subpopulations with profound differences in the methylome, cellular physiology and virulence. In this study, we bioinformatically identified six major clades of the tyrosine and serine family invertases homologs from 16 bacterial phyla, which potentially catalyze hsdS inversions in the epigenetic invertons. In particular, the epigenetic invertons are highly enriched in host-associated bacteria. We further verified hsdS inversions in the Type I R-M systems of four representative host-associated bacteria and found that each of the resultant hsdS allelic variants specifies methylation of a unique DNA sequence. In addition, transcriptome analysis revealed that hsdS allelic variations in Enterococcus faecalis exert significant impact on gene expression. These findings indicate that epigenetic switches driven by invertases in the epigenetic invertons broadly operate in the host-associated bacteria, which may broadly contribute to bacterial host adaptation and virulence beyond the role of the Type I R-M systems against phage infection.
Asunto(s)
Proteínas Bacterianas/genética , Enzimas de Restricción-Modificación del ADN/genética , Epigénesis Genética , Regulación Bacteriana de la Expresión Génica , Bacteroides fragilis/genética , Metilación de ADN , ADN Bacteriano/química , Enterococcus faecalis/genética , Secuencias Invertidas Repetidas , Streptococcus agalactiae/genética , Treponema denticola/genéticaRESUMEN
We describe the cloning, expression and characterization of the first truly non-specific adenine DNA methyltransferase, M.EcoGII. It is encoded in the genome of the pathogenic strain Escherichia coli O104:H4 C227-11, where it appears to reside on a cryptic prophage, but is not expressed. However, when the gene encoding M.EcoGII is expressed in vivo - using a high copy pRRS plasmid vector and a methylation-deficient E. coli host-extensive in vivo adenine methylation activity is revealed. M.EcoGII methylates adenine residues in any DNA sequence context and this activity extends to dA and rA bases in either strand of a DNA:RNA-hybrid oligonucleotide duplex and to rA bases in RNAs prepared by in vitro transcription. Using oligonucleotide and bacteriophage M13mp18 virion DNA substrates, we find that M.EcoGII also methylates single-stranded DNA in vitro and that this activity is only slightly less robust than that observed using equivalent double-stranded DNAs. In vitro assays, using purified recombinant M.EcoGII enzyme, demonstrate that up to 99% of dA bases in duplex DNA substrates can be methylated thereby rendering them insensitive to cleavage by multiple restriction endonucleases. These properties suggest that the enzyme could also be used for high resolution mapping of protein binding sites in DNA and RNA substrates.
Asunto(s)
Enzimas de Restricción del ADN/metabolismo , Escherichia coli/genética , Profagos/enzimología , Metiltransferasa de ADN de Sitio Específico (Adenina Especifica)/metabolismo , Adenina/metabolismo , Secuencia de Bases , Metilación de ADN , Enzimas de Restricción del ADN/genética , ADN de Cadena Simple/genética , ADN de Cadena Simple/metabolismo , Electroforesis en Gel de Poliacrilamida , Escherichia coli/virología , Profagos/genética , Unión Proteica , ARN Bicatenario/genética , ARN Bicatenario/metabolismo , Metiltransferasa de ADN de Sitio Específico (Adenina Especifica)/genética , Especificidad por SustratoRESUMEN
The creation of restriction enzymes with programmable DNA-binding and -cleavage specificities has long been a goal of modern biology. The recently discovered Type IIL MmeI family of restriction-and-modification (RM) enzymes that possess a shared target recognition domain provides a framework for engineering such new specificities. However, a lack of structural information on Type IIL enzymes has limited the repertoire that can be rationally engineered. We report here a crystal structure of MmeI in complex with its DNA substrate and an S-adenosylmethionine analog (Sinefungin). The structure uncovers for the first time the interactions that underlie MmeI-DNA recognition and methylation (5'-TCCRAC-3'; R = purine) and provides a molecular basis for changing specificity at four of the six base pairs of the recognition sequence (5'-TCCRAC-3'). Surprisingly, the enzyme is resilient to specificity changes at the first position of the recognition sequence (5'-TCCRAC-3'). Collectively, the structure provides a basis for engineering further derivatives of MmeI and delineates which base pairs of the recognition sequence are more amenable to alterations than others.
Asunto(s)
ADN/química , Desoxirribonucleasas de Localización Especificada Tipo II/química , Secuencia de Bases , Metilación de ADN , Hidrólisis , Datos de Secuencia MolecularRESUMEN
Staphylococcus aureus displays a clonal population structure in which horizontal gene transfer between different lineages is extremely rare. This is due, in part, to the presence of a Type I DNA restriction-modification (RM) system given the generic name of Sau1, which maintains different patterns of methylation on specific target sequences on the genomes of different lineages. We have determined the target sequences recognized by the Sau1 Type I RM systems present in a wide range of the most prevalent S. aureus lineages and assigned the sequences recognized to particular target recognition domains within the RM enzymes. We used a range of biochemical assays on purified enzymes and single molecule real-time sequencing on genomic DNA to determine these target sequences and their patterns of methylation. Knowledge of the main target sequences for Sau1 will facilitate the synthesis of new vectors for transformation of the most prevalent lineages of this 'untransformable' bacterium.
Asunto(s)
Metilasas de Modificación del ADN/química , Metilasas de Modificación del ADN/metabolismo , Desoxirribonucleasas de Localización Especificada Tipo I/química , Desoxirribonucleasas de Localización Especificada Tipo I/metabolismo , Staphylococcus aureus/enzimología , Secuencia de Aminoácidos , ADN/química , ADN/metabolismo , Dominios Proteicos , Análisis de Secuencia de ADN , Staphylococcus aureus/genética , Transformación BacterianaRESUMEN
Two restriction-modification systems have been previously discovered in Thermus aquaticus YT-1. TaqI is a 263-amino acid (aa) Type IIP restriction enzyme that recognizes and cleaves within the symmetric sequence 5'-TCGA-3'. TaqII, in contrast, is a 1105-aa Type IIC restriction-and-modification enzyme, one of a family of Thermus homologs. TaqII was originally reported to recognize two different asymmetric sequences: 5'-GACCGA-3' and 5'-CACCCA-3'. We previously cloned the taqIIRM gene, purified the recombinant protein from Escherichia coli, and showed that TaqII recognizes the 5'-GACCGA-3' sequence only. Here, we report the discovery, isolation, and characterization of TaqIII, the third R-M system from T. aquaticus YT-1. TaqIII is a 1101-aa Type IIC/IIL enzyme and recognizes the 5'-CACCCA-3' sequence previously attributed to TaqII. The cleavage site is 11/9 nucleotides downstream of the A residue. The enzyme exhibits striking biochemical similarity to TaqII. The 93% identity between their aa sequences suggests that they have a common evolutionary origin. The genes are located on two separate plasmids, and are probably paralogs or pseudoparalogs. Putative positions and aa that specify DNA recognition were identified and recognition motifs for 6 uncharacterized Thermus-family enzymes were predicted.
Asunto(s)
Proteínas Bacterianas/genética , Desoxirribonucleasas de Localización Especificada Tipo II/genética , Motivos de Nucleótidos , Plásmidos/metabolismo , Thermus/enzimología , Secuencia de Aminoácidos , Proteínas Bacterianas/metabolismo , Clonación Molecular , División del ADN , Desoxirribonucleasas de Localización Especificada Tipo II/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Expresión Génica , Isoenzimas/genética , Isoenzimas/metabolismo , Peso Molecular , Plásmidos/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/metabolismo , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Especificidad por Sustrato , Thermus/genéticaRESUMEN
DNA methylation acts in concert with restriction enzymes to protect the integrity of prokaryotic genomes. Studies in a limited number of organisms suggest that methylation also contributes to prokaryotic genome regulation, but the prevalence and properties of such non-restriction-associated methylation systems remain poorly understood. Here, we used single molecule, real-time sequencing to map DNA modifications including m6A, m4C, and m5C across the genomes of 230 diverse bacterial and archaeal species. We observed DNA methylation in nearly all (93%) organisms examined, and identified a total of 834 distinct reproducibly methylated motifs. This data enabled annotation of the DNA binding specificities of 620 DNA Methyltransferases (MTases), doubling known specificities for previously hard to study Type I, IIG and III MTases, and revealing their extraordinary diversity. Strikingly, 48% of organisms harbor active Type II MTases with no apparent cognate restriction enzyme. These active 'orphan' MTases are present in diverse bacterial and archaeal phyla and show motif specificities and methylation patterns consistent with functions in gene regulation and DNA replication. Our results reveal the pervasive presence of DNA methylation throughout the prokaryotic kingdoms, as well as the diversity of sequence specificities and potential functions of DNA methylation systems.
Asunto(s)
Epigenómica , Células Procariotas/metabolismo , Secuencia Conservada , Metilación de ADN/genética , Replicación del ADN/genética , Enzimas de Restricción-Modificación del ADN/clasificación , Enzimas de Restricción-Modificación del ADN/metabolismo , Evolución Molecular , Regulación de la Expresión Génica , Genoma , Metiltransferasas/metabolismo , Anotación de Secuencia Molecular , Familia de Multigenes , Motivos de Nucleótidos/genética , Filogenia , Especificidad por SustratoRESUMEN
A Gram-stain-positive, catalase-positive and pleomorphic rod organism was isolated from malted barley in Finland, classified initially by partial 16S rRNA gene sequencing and originally deposited in the VTT Culture Collection as a strain of Propionibacterium acidipropionici (currently Acidipropionibacterium acidipropionici). The subsequent comparison of the whole 16S rRNA gene with other representatives of the genus Acidipropionibacterium revealed that the strain belongs to a novel species, most closely related to Acidipropionibacterium microaerophilum and Acidipropionibacterium acidipropionici, with similarity values of 98.46 and 98.31â%, respectively. The whole genome sequencing using PacBio RS II platform allowed further comparison of the genome with all of the other DNA sequences available for the type strains of the Acidipropionibacterium species. Those comparisons revealed the highest similarity of strain JS278T to A. acidipropionici, which was confirmed by the average nucleotide identity analysis. The genome of strain JS278T is intermediate in size compared to the A. acidipropionici and Acidipropionibacterium jensenii at 3â432â872 bp, the G+C content is 68.4 mol%. The strain fermented a wide range of carbon sources, and produced propionic acid as the major fermentation product. Besides its poor ability to grow at 37 °C and positive catalase reaction, the observed phenotype was almost indistinguishable from those of A. acidipropionici and A. jensenii. Based on our findings, we conclude that the organism represents a novel member of the genus Acidipropionibacterium, for which we propose the name Acidipropionibacteriumvirtanenii sp. nov. The type strain is JS278T (=VTT E-113202T=DSM 106790T).
Asunto(s)
Hordeum/microbiología , Filogenia , Propionibacterium/clasificación , Técnicas de Tipificación Bacteriana , Composición de Base , ADN Bacteriano/genética , Fermentación , Finlandia , Propionibacterium/genética , Propionibacterium/aislamiento & purificación , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADNRESUMEN
We identify a new subgroup of Type I Restriction-Modification enzymes that modify cytosine in one DNA strand and adenine in the opposite strand for host protection. Recognition specificity has been determined for ten systems using SMRT sequencing and each recognizes a novel DNA sequence motif. Previously characterized Type I systems use two identical copies of a single methyltransferase (MTase) subunit, with one bound at each half site of the specificity (S) subunit to form the MTase. The new m4C-producing Type I systems we describe have two separate yet highly similar MTase subunits that form a heterodimeric M1M2S MTase. The MTase subunits from these systems group into two families, one of which has NPPF in the highly conserved catalytic motif IV and modifies adenine to m6A, and one having an NPPY catalytic motif IV and modifying cytosine to m4C. The high degree of similarity among their cytosine-recognizing components (MTase and S) suggest they have recently evolved, most likely from the far more common m6A Type I systems. Type I enzymes that modify cytosine exclusively were formed by replacing the adenine target recognition domain (TRD) with a cytosine-recognizing TRD. These are the first examples of m4C modification in Type I RM systems.
Asunto(s)
Citosina/metabolismo , Enzimas de Restricción-Modificación del ADN/metabolismo , ADN/metabolismo , Adenina/metabolismo , Secuencia de Aminoácidos , Catálisis , Biología Computacional/métodos , ADN/química , Enzimas de Restricción-Modificación del ADN/química , Enzimas de Restricción-Modificación del ADN/genética , Metilación , Metiltransferasas/química , Metiltransferasas/metabolismo , Mutación , Motivos de Nucleótidos , Subunidades de Proteína/química , Subunidades de Proteína/metabolismo , Especificidad por SustratoRESUMEN
The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on â¼ 3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.
Asunto(s)
Proteínas Arqueales/fisiología , Proteínas Bacterianas/fisiología , Bases de Datos de Proteínas , Proteínas Arqueales/química , Proteínas Arqueales/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Anotación de Secuencia MolecularRESUMEN
Modified DNA bases in mammalian genomes, such as 5-methylcytosine ((5m)C) and its oxidized forms, are implicated in important epigenetic regulation processes. In human or mouse, successive enzymatic conversion of (5m)C to its oxidized forms is carried out by the ten-eleven translocation (TET) proteins. Previously we reported the structure of a TET-like (5m)C oxygenase (NgTET1) from Naegleria gruberi, a single-celled protist evolutionarily distant from vertebrates. Here we show that NgTET1 is a 5-methylpyrimidine oxygenase, with activity on both (5m)C (major activity) and thymidine (T) (minor activity) in all DNA forms tested, and provide unprecedented evidence for the formation of 5-formyluridine ((5f)U) and 5-carboxyuridine ((5ca)U) in vitro. Mutagenesis studies reveal a delicate balance between choice of (5m)C or T as the preferred substrate. Furthermore, our results suggest substrate preference by NgTET1 to (5m)CpG and TpG dinucleotide sites in DNA. Intriguingly, NgTET1 displays higher T-oxidation activity in vitro than mammalian TET1, supporting a closer evolutionary relationship between NgTET1 and the base J-binding proteins from trypanosomes. Finally, we demonstrate that NgTET1 can be readily used as a tool in (5m)C sequencing technologies such as single molecule, real-time sequencing to map (5m)C in bacterial genomes at base resolution.