ABSTRACT
Determination of eukaryotic transcription start sites (TSSs) has been based on methods that require the cap structure at the 5' end of transcripts derived from Pol II RNA polymerase. Consequently, these methods do not reveal TSSs derived from the other RNA polymerases that also play critical roles in various cell functions. To address this limitation, we developed ReCappable-seq, which comprehensively identifies TSS for both Pol II and non-Pol II transcripts at single-nucleotide resolution. The method relies on specific enzymatic exchange of 5' m7G caps and 5' triphosphates with a selectable tag. When applied to human transcriptomes, ReCappable-seq identifies Pol II TSSs that are in agreement with orthogonal methods such as CAGE. Additionally, ReCappable-seq reveals a rich landscape of TSSs associated with Pol III transcripts that have not previously been amenable to study at genome-wide scale. Novel TSS from non-Pol II transcription can be located in the nuclear and mitochondrial genomes. ReCappable-seq interrogates the regulatory landscape of coding and noncoding RNA concurrently and enables the classification of epigenetic profiles associated with Pol II and non-Pol II TSS.
Subject(s)
DNA-Directed RNA Polymerases , RNA Polymerase II , RNA Polymerase II/genetics , RNA Polymerase II/metabolism , RNA, Untranslated , Transcription Initiation Site , TranscriptomeABSTRACT
Nanopore sequencing devices read individual RNA strands directly. This facilitates identification of exon linkages and nucleotide modifications; however, using conventional direct RNA nanopore sequencing, the 5' and 3' ends of poly(A) RNA cannot be identified unambiguously. This is due in part to RNA degradation in vivo and in vitro that can obscure transcription start and end sites. In this study, we aimed to identify individual full-length human RNA isoforms among Ć¢ĀĀ¼4 million nanopore poly(A)-selected RNA reads. First, to identify RNA strands bearing 5' m7G caps, we exchanged the biological cap for a modified cap attached to a 45-nt oligomer. This oligomer adaptation method improved 5' end sequencing and ensured correct identification of the 5' m7G capped ends. Second, among these 5'-capped nanopore reads, we screened for features consistent with a 3' polyadenylation site. Combining these two steps, we identified 294,107 individual high-confidence full-length RNA scaffolds from human GM12878 cells, most of which (257,721) aligned to protein-coding genes. Of these, 4876 scaffolds indicated unannotated isoforms that were often internal to longer, previously identified RNA isoforms. Orthogonal data for m7G caps and open chromatin, such as CAGE and DNase-HS seq, confirmed the validity of these high-confidence RNA scaffolds.
Subject(s)
RNA Isoforms/chemistry , RNA, Messenger/chemistry , Cell Line, Tumor , Humans , Nanopore Sequencing/methods , RNA 3' Polyadenylation Signals , RNA Isoforms/genetics , RNA, Messenger/genetics , TranscriptomeABSTRACT
The SARS-CoV-2 virus has a complex transcriptome characterised by multiple, nested subgenomic RNAsused to express structural and accessory proteins. Long-read sequencing technologies such as nanopore direct RNA sequencing can recover full-length transcripts, greatly simplifying the assembly of structurally complex RNAs. However, these techniques do not detect the 5' cap, thus preventing reliable identification and quantification of full-length, coding transcript models. Here we used Nanopore ReCappable Sequencing (NRCeq), a new technique that can identify capped full-length RNAs, to assemble a complete annotation of SARS-CoV-2 sgRNAs and annotate the location of capping sites across the viral genome. We obtained robust estimates of sgRNA expression across cell lines and viral isolates and identified novel canonical and non-canonical sgRNAs, including one that uses a previously un-annotated leader-to-body junction site. The data generated in this work constitute a useful resource for the scientific community and provide important insights into the mechanisms that regulate the transcription of SARS-CoV-2 sgRNAs.
Subject(s)
COVID-19 , Nanopores , RNA, Guide, Kinetoplastida/chemistry , COVID-19/genetics , Genome, Viral/genetics , Humans , RNA Caps , RNA, Viral/genetics , RNA, Viral/metabolism , SARS-CoV-2/geneticsABSTRACT
BACKGROUND: The initiating nucleotide found at the 5' end of primary transcripts has a distinctive triphosphorylated end that distinguishes these transcripts from all other RNA species. Recognizing this distinction is key to deconvoluting the primary transcriptome from the plethora of processed transcripts that confound analysis of the transcriptome. The currently available methods do not use targeted enrichment for the 5'end of primary transcripts, but rather attempt to deplete non-targeted RNA. RESULTS: We developed a method, Cappable-seq, for directly enriching for the 5' end of primary transcripts and enabling determination of transcription start sites at single base resolution. This is achieved by enzymatically modifying the 5' triphosphorylated end of RNA with a selectable tag. We first applied Cappable-seq to E. coli, achieving up to 50 fold enrichment of primary transcripts and identifying an unprecedented 16539 transcription start sites (TSS) genome-wide at single base resolution. We also applied Cappable-seq to a mouse cecum sample and identified TSS in a microbiome. CONCLUSIONS: Cappable-seq allows for the first time the capture of the 5' end of primary transcripts. This enables a unique robust TSS determination in bacteria and microbiomes. Ā In addition to and beyond TSS determination, Cappable-seq depletes ribosomal RNA and reduces the complexity of the transcriptome to a single quantifiable tag per transcript enabling digital profiling of gene expression in any microbiome.
Subject(s)
Escherichia coli/genetics , Gastrointestinal Microbiome/genetics , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Transcription Initiation Site , Animals , Female , Mice , Mice, Inbred C57BL , Promoter Regions, Genetic , RNA, Bacterial/genetics , TranscriptomeABSTRACT
We report here the first occurrence of an adenosine deaminase-related growth factor (ADGF) that deaminates adenosine 5' monophosphate (AMP) in preference to adenosine. The ADGFs are a group of secreted deaminases found throughout the animal kingdom that affect the extracellular concentration of adenosine by converting it to inosine. The AMP deaminase studied here was first isolated and biochemically characterized from the roman snail Helix pomatia in 1983. Determination of the amino acid sequence of the AMP deaminase enabled sequence comparisons to protein databases and revealed it as a member of the ADGF family. Cloning and expression of its cDNA in Pichia pastoris allowed the comparison of the biochemical characteristics of the native and recombinant forms of the enzyme and confirmed they correspond to the previously reported activity. Uncharacteristically, the H. pomatia AMP deaminase was determined to be dissimilar to the AMP deaminase family by sequence comparison while demonstrating similarity to the ADGFs despite having AMP as its preferred substrate rather than adenosine.
Subject(s)
AMP Deaminase , Animals , Adenosine Deaminase/metabolism , Adenosine/metabolism , Mollusca/metabolism , Intercellular Signaling Peptides and Proteins , Adenosine MonophosphateABSTRACT
The luciferin sulfokinase (coelenterazine sulfotransferase) of Renilla was previously reported to activate the storage form, luciferyl sulfate (coelenterazine sulfate) to luciferin (coelenterazine), the substrate for the luciferase bioluminescence reaction. The gene coding for the coelenterazine sulfotransferase has not been identified. Here we used a combined proteomic/transcriptomic approach to identify and clone the sulfotransferase cDNA. Multiple isoforms of coelenterazine sulfotransferase were identified from the anthozoan Renilla muelleri by intersecting its transcriptome with the LC-MS/MS derived peptide sequences of coelenterazine sulfotransferase purified from Renilla. Two of the isoforms were expressed in E. coli, purified, and partially characterized. The encoded enzymes display sulfotransferase activity that is comparable to that of the native sulfotransferase isolated from Renilla reniformis that was reported in 1970. The bioluminescent assay for sensitive detection of 3'-phosphoadenosine 5'-phosphate (PAP) using the recombinant sulfotransferase is demonstrated.
Subject(s)
Escherichia coli , Proteomics , Animals , Arylsulfotransferase , Chromatography, Liquid , DNA, Complementary , Escherichia coli/genetics , Imidazoles , Luciferases/genetics , Luminescent Measurements , Pyrazines , Renilla/genetics , Sulfates , Sulfotransferases/genetics , Tandem Mass SpectrometryABSTRACT
Nucleic acids in living organisms are more complex than the simple combinations of the four canonical nucleotides. Recent advances in biomedical research have led to the discovery of numerous naturally occurring nucleotide modifications and enzymes responsible for the synthesis of such modifications. In turn, these enzymes can be leveraged towards toolkits for DNA and RNA manipulation for epigenetic sequencing or other biotechnological applications. Here, we present the protocol to obtain purified 5-hydroxymethylcytosine carbamoyltransferase enzymes and the associated assays to convert 5-hydroxymethylcytosine to 5-carbamoyloxymethylcytosine in vitro . We include detailed assays using DNA, RNA, and single nucleotide/deoxynucleotide as substrates. These assays can be combined with downstream applications for genetic/epigenetic regulatory mechanism studies and next-generation sequencing purposes.
ABSTRACT
Shotgun metagenomic sequencing is a powerful approach to study microbiomes in an unbiased manner and of increasing relevance for identifying novel enzymatic functions. However, the potential of metagenomics to relate from microbiome composition to function has thus far been underutilized. Here, we introduce the Metagenomics Genome-Phenome Association (MetaGPA) study framework, which allows linking genetic information in metagenomes with a dedicated functional phenotype. We applied MetaGPA to identify enzymes associated with cytosine modifications in environmental samples. From the 2365 genes that met our significance criteria, we confirm known pathways for cytosine modifications and proposed novel cytosine-modifying mechanisms. Specifically, we characterized and identified a novel nucleic acid-modifying enzyme, 5-hydroxymethylcytosine carbamoyltransferase, that catalyzes the formation of a previously unknown cytosine modification, 5-carbamoyloxymethylcytosine, in DNA and RNA. Our work introduces MetaGPA as a novel and versatile tool for advancing functional metagenomics.
Many industrial processes, such as starch processing and oil refinement, use chemicals that cause harm to the environment. These can often be switched to more sustainable biological processes that are powered by proteins called enzymes. Enzymes are micro-factories that speed up biochemical reactions in most living things. Communities of microorganisms (also known as microbiomes) are an amazing but often untapped resource for discovering enzymes that can be harnessed for industrial purposes. To gain a better picture of the microbes present within a population, researchers often extract and sequence the genetic material of all microorganisms in an environmental sample, also known as the metagenome. While current methods for analyzing the metagenome are good at identifying new species, they often provide limited information about the microorganism's functional role within the community. This makes it difficult to find new enzymes that may be useful for industry. Here, Yang, Lin et al. have developed a new technique called Metagenomics Genome-Phenome Association, or MetaGPA for short. The method works in a similar way to genome-wide association studies (GWAS) which are used to identify genes involved in human disease. However, instead of disease associated genes in humans, MetaGPA finds microbial genes that are associated with a biological process useful for biotechnology. Like GWAS, the new approach created by Yang, Lin et al. compares two groups: the first contains microorganisms that carry out a specific process, and the second contains all organisms in the microbiome. The metagenome of each group is extracted and a computational pipeline is then applied to identify genes, including those coding for enzymes, that are found more often in the group performing the desired task. To test the technique, Yang, Lin et al. used MetGPA to find new enzymes involved in DNA modification. Microbiome samples were collected from coastal water and sewage, and the computational pipeline was applied to discover genes that are associated with this process. Further analysis revealed that one of the identified genes codes for an enzyme that introduces a previously unknown change to DNA. MetaGPA could be applied to other processes and microbiomes, and, if successful, may help researchers to identify more diverse enzymes than is currently available. This could scale up the discovery of new enzymes that can be used to power industrial reactions.
Subject(s)
Cytosine/metabolism , DNA, Bacterial/metabolism , Escherichia coli K12/genetics , Genome, Bacterial , Microbiota/genetics , RNA, Bacterial/metabolismABSTRACT
Here we report a PCR-based DNA engineering technique for seamless assembly of recombinant molecules from multiple components. We create cloning vector and target molecules flanked with compatible single-stranded (ss) extensions. The vector contains a cassette with two inversely oriented nicking endonuclease sites separated by restriction endonuclease site(s). The spacer sequences between the nicking and restriction sites are tailored to create ss extensions of custom sequence. The vector is then linearized by digestion with nicking and restriction endonucleases. To generate target molecules, a single deoxyuridine (dU) residue is placed 6-10 nt away from the 5'-end of each PCR primer. 5' of dU the primer sequence is compatible either with an ss extension on the vector or with the ss extension of the next-in-line PCR product. After amplification, the dU is excised from the PCR products with the USER enzyme leaving PCR products flanked by 3' ss extensions. When mixed together, the linearized vector and PCR products directionally assemble into a recombinant molecule through complementary ss extensions. By varying the design of the PCR primers, the protocol is easily adapted to perform one or more simultaneous DNA manipulations such as directional cloning, site-specific mutagenesis, sequence insertion or deletion and sequence assembly.
Subject(s)
Cloning, Molecular/methods , DNA, Recombinant/chemistry , Deoxyuridine/metabolism , Genetic Engineering/methods , DNA, Recombinant/metabolism , DNA, Single-Stranded/chemistry , Endonucleases/metabolism , Genetic Vectors , Plasmids/genetics , Polymerase Chain Reaction , Uracil-DNA Glycosidase/metabolismABSTRACT
Eukaryotic mRNAs are modified at their 5' end early during transcription by the addition of N7-methylguanosine (m7G), which forms the "cap" on the first 5' nucleotide. Identification of the 5' nucleotide on mRNA is necessary for determination of the Transcription Start Site (TSS). We explored the effect of various reaction conditions on the activity of the yeast scavenger mRNA decapping enzyme DcpS and examined decapping of 30 chemically distinct cap structures varying the state of methylation, sugar, phosphate linkage, and base composition on 25mer RNA oligonucleotides. Contrary to the generally accepted belief that DcpS enzymes only decap short oligonucleotides, we found that the yeast scavenger decapping enzyme decaps RNA transcripts as long as 1400 nucleotides. Further, we validated the application of yDcpS for enriching capped RNA using a strategy of specifically tagging the 5' end of capped RNA by first decapping and then recapping it with an affinity-tagged guanosine nucleotide.
Subject(s)
Endoribonucleases/metabolism , RNA Caps/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/enzymology , Diphosphates/metabolism , Humans , Hydrogen-Ion Concentration , Hydrolysis , Nucleic Acid Conformation , Osmolar Concentration , RNA Cap Analogs/metabolism , RNA Caps/chemistry , RNA, Messenger/chemistry , RNA, Messenger/metabolismABSTRACT
Bacteria respond to their environment by regulating mRNA synthesis, often by altering the genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly 10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of synthetically capped 5' mRNA fragments with template-switching reverse transcription. We find the locations and expression levels of TSS for hundreds of genes change during metabolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA and novel transcription units. We identify sequence motifs associated with carbon source-specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch enables study of condition-specific changes in transcription initiation in bacteria.
Subject(s)
Bacteria/genetics , Fermentation , Plants/microbiology , Transcription Initiation Site , Bacteria/metabolism , Clostridium/genetics , Clostridium/metabolism , Gene Expression Profiling , Genes, Bacterial/genetics , Pectins/metabolism , RNA, Messenger/genetics , Regulon/genetics , Sequence Analysis, DNA/methods , Transcription, GeneticABSTRACT
The majority of inteins are comprised of a protein splicing domain and a homing endonuclease domain. Experimental evidence has demonstrated that the splicing domain and the endonuclease domain in a bifunctional intein are largely independent of each other with respect to both structure and activity. Here, an artificial bifunctional intein has been created through the insertion of an existing homing endonuclease into a mini-intein that is naturally lacking this functionality. The gene for I-CreI, an intron-encoded homing endonuclease, was grafted into the monofunctional Mycobacterium xenopi GyrA intein at the putative site of the missing endonuclease. The resulting fusion protein was found to be capable of protein splicing similar to that of the parent intein. In addition, the protein demonstrated site-specific endonuclease activity that is characteristic of the I-CreI homing endonuclease. The function of each domain therefore remained unaffected by the presence of the other domain. This artificial fusion of the two domains is a potential novel mobile genetic element.
Subject(s)
DNA Gyrase/genetics , DNA Restriction Enzymes/genetics , Protein Splicing , Algal Proteins/genetics , Algal Proteins/metabolism , Animals , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Chlamydomonas reinhardtii/genetics , DNA Gyrase/metabolism , DNA Restriction Enzymes/metabolism , Mycobacterium xenopi/genetics , Protein Precursors/genetics , Protein Precursors/metabolism , Protein Structure, Tertiary , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolismABSTRACT
Marine luciferases are increasingly used as reporters to study gene regulation. These luciferases have utility in bioluminescent assay development, although little has been reported on their catalytic properties in response to substrate concentration. Here, we report that the two marine luciferases from the copepods, Gaussia princeps (GLuc) and Metridia longa (MLuc) were found, surprisingly, to produce light in a cooperative manner with respect to their luciferin substrate concentration; as the substrate concentration was decreased 10 fold the rate of light production decreased 1000 fold. This positive cooperative effect is likely a result of allostery between the two proposed catalytic domains found in Gaussia and Metridia. In contrast, the marine luciferases from Renilla reniformis (RLuc) and Cypridina noctiluca (CLuc) demonstrate a linear relationship between the concentration of their respective luciferin and the rate of light produced. The consequences of these enzyme responses are discussed.
Subject(s)
Aquatic Organisms/enzymology , Luciferases/metabolism , Amino Acid Sequence , Animals , Aquatic Organisms/drug effects , Benzothiazoles/chemistry , Benzothiazoles/pharmacology , Copepoda/enzymology , Imidazoles/chemistry , Luciferases/chemistry , Molecular Sequence Data , Pyrazines/chemistry , Sequence Alignment , Substrate Specificity/drug effectsABSTRACT
Many reactions in cells proceed via the sequestration of two DNA molecules in a synaptic complex. SfiI is a member of a growing family of restriction enzymes that can bind and cleave two DNA sites simultaneously. We present here the structures of tetrameric SfiI in complex with cognate DNA. The structures reveal two different binding states of SfiI: one with both DNA-binding sites fully occupied and the other with fully and partially occupied sites. These two states provide details on how SfiI recognizes and cleaves its target DNA sites, and gives insight into sequential binding events. The SfiI recognition sequence (GGCCNNNN[downward arrow]NGGCC) is a subset of the recognition sequence of BglI (GCCNNNN[downward arrow]NGGC), and both enzymes cleave their target DNAs to leave 3-base 3' overhangs. We show that even though SfiI is a tetramer and BglI is a dimer, and there is little sequence similarity between the two enzymes, their modes of DNA recognition are unusually similar.
Subject(s)
DNA/metabolism , Deoxyribonucleases, Type II Site-Specific/metabolism , Streptomyces/enzymology , Catalytic Domain , Crystallography, X-Ray , DNA/chemistry , Deoxyribonucleases, Type II Site-Specific/chemistry , Dimerization , Nucleic Acid Conformation , Protein Binding , Protein Structure, Quaternary , Protein Structure, Secondary , Protein Structure, TertiaryABSTRACT
The primary target of SgrAI restriction endonuclease is a multiple sequence of the form 5'-CPu/CCGGPyG. Previous work had indicated that SgrAI must bind two recognition sites simultaneously for catalysis [Bilcock, D. T., Daniels, L. E., Bath, A. J. & Halford, S. E. (1999) J. Biol. Chem. 274, 36379-36386]. In the present study, SgrAI is shown to cleave not only its canonical sequences, but also the sequences 5'-CPuCCGGPy(A,T,C) and 5'-CPuCCGGGG, both referred to as secondary sequences. On plasmid pSK7, SgrAI cleaves secondary sites 26-fold slower than the canonical site. However, the same plasmid, but without the canonical site, is cleaved 200-fold slower. We show that DNA termini generated by cleaving the canonical site for SgrAI assist in the cleavage of secondary sites. The SgrAI-termini in cis with respect to secondary site are markedly preferred over those in trans. The SgrAI-termini provided in a form of oligonucleotide duplex are also shown to stimulate canonical site cleavage. At a 40-fold molar excess of the SgrAI-termini over substrate, the SgrAI specificity is shown to improve by two orders of magnitude, because of concurrent 10-fold increase in the cleavage of canonical site and 50-fold decrease in the cleavage of secondary sites. The unconventional reaction pathway by which SgrAI utilizes the self-generated DNA termini to cleave its DNA targets has not been observed hitherto among type II restriction endonucleases. Based on our work and previous reports, a pathway of DNA binding and cleavage by the SgrAI restriction endonuclease is proposed.
Subject(s)
DNA/biosynthesis , Deoxyribonucleases, Type II Site-Specific/metabolism , Base Sequence , DNA/chemistry , Kinetics , Nucleic Acid Conformation , Oligodeoxyribonucleotides/biosynthesis , Substrate SpecificityABSTRACT
The SfiI endonuclease from Streptomyces fimbriatus (EC 3.1.21.4) is a tetrameric enzyme that binds simultaneously to two recognition sites and cleaves both sites concertedly. It serves as a good model system for studying both specificity and cooperative DNA binding. Crystals of the enzyme were obtained by the hanging-drop vapor-diffusion method in complex with a 21-mer oligonucleotide. The crystals are trigonal, with unit-cell parameters a = b = 85.7, c = 202.6 A, and diffract to 2.6 A resolution on a rotating-anode X-ray generator. Preliminary X-ray analysis reveals the space group to be either P3(1)21 or P3(2)21. Interestingly, the crystals change to space group P6(1)22, with unit-cell parameters a = b = 85.5, c = 419.6 A, when the selenomethionyl (SeMet) derivative of the enzyme is co-crystallized with the same DNA. Phase information is currently being derived from this SeMet SfiI-DNA complex.