RESUMO
The application of ribosome profiling has revealed an unexpected abundance of translation in addition to that responsible for the synthesis of previously annotated protein-coding regions. Multiple short sequences have been found to be translated within single RNA molecules, within both annotated protein-coding and noncoding regions. The biological significance of this translation is a matter of intensive investigation. However, current schematic or annotation-based representations of mRNA translation generally do not account for the apparent multitude of translated regions within the same molecules. They also do not take into account the stochasticity of the process that allows alternative translations of the same RNA molecules by different ribosomes. There is a need for formal representations of mRNA complexity that would enable the analysis of quantitative information on translation and more accurate models for predicting the phenotypic effects of genetic variants affecting translation. To address this, we developed a conceptually novel abstraction that we term ribosome decision graphs (RDGs). RDGs represent translation as multiple ribosome paths through untranslated and translated mRNA segments. We termed the latter "translons." Nondeterministic events, such as initiation, reinitiation, selenocysteine insertion, or ribosomal frameshifting, are then represented as branching points. This representation allows for an adequate representation of eukaryotic translation complexity and focuses on locations critical for translation regulation. We show how RDGs can be used for depicting translated regions and for analyzing genetic variation and quantitative genome-wide data on translation for characterization of regulatory modulators of translation.
Assuntos
Biossíntese de Proteínas , RNA Mensageiro , Ribossomos , Ribossomos/metabolismo , Ribossomos/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Humanos , Fases de Leitura Aberta , Eucariotos/genéticaRESUMO
RNA polyadenylation plays a central role in RNA maturation, fate, and stability. In response to developmental cues, polyA tail lengths can vary, affecting the translation efficiency and stability of mRNAs. Here we develop Nanopore 3' end-capture sequencing (Nano3P-seq), a method that relies on nanopore cDNA sequencing to simultaneously quantify RNA abundance, tail composition, and tail length dynamics at per-read resolution. By employing a template-switching-based sequencing protocol, Nano3P-seq can sequence RNA molecule from its 3' end, regardless of its polyadenylation status, without the need for PCR amplification or ligation of RNA adapters. We demonstrate that Nano3P-seq provides quantitative estimates of RNA abundance and tail lengths, and captures a wide diversity of RNA biotypes. We find that, in addition to mRNA and long non-coding RNA, polyA tails can be identified in 16S mitochondrial ribosomal RNA in both mouse and zebrafish models. Moreover, we show that mRNA tail lengths are dynamically regulated during vertebrate embryogenesis at an isoform-specific level, correlating with mRNA decay. Finally, we demonstrate the ability of Nano3P-seq in capturing non-A bases within polyA tails of various lengths, and reveal their distribution during vertebrate embryogenesis. Overall, Nano3P-seq is a simple and robust method for accurately estimating transcript levels, tail lengths, and tail composition heterogeneity in individual reads, with minimal library preparation biases, both in the coding and non-coding transcriptome.
Assuntos
Nanoporos , Transcriptoma , Animais , Camundongos , DNA Complementar/genética , Peixe-Zebra/genética , Peixe-Zebra/metabolismo , Poli A/genética , Poli A/metabolismo , Perfilação da Expressão Gênica , RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Análise de Sequência de RNA/métodosRESUMO
RNA molecules can form secondary and tertiary structures that can regulate their localization and function. Using enzymatic or chemical probing together with high-throughput sequencing, secondary structure can be mapped across the entire transcriptome. However, a limiting factor is that only population averages can be obtained since each read is an independent measurement. Although long-read sequencing has recently been used to determine RNA structure, these methods still used aggregate signals across the strands to detect structure. Averaging across the population also means that only limited information about structural heterogeneity across molecules or dependencies within each molecule can be obtained. Here, we present Single-Molecule Structure sequencing (SMS-seq) that combines structural probing with native RNA sequencing to provide non-amplified, structural profiles of individual molecules with novel analysis methods. Our new approach using mutual information enabled single molecule structural interrogation. Each RNA is probed at numerous bases enabling the discovery of dependencies and heterogeneity of structural features. We also show that SMS-seq can capture tertiary interactions, dynamics of riboswitch ligand binding, and mRNA structural features.
Assuntos
Nanoporos , Conformação de Ácido Nucleico , RNA , Análise de Sequência de RNA , Riboswitch , RNA/genética , RNA/química , Análise de Sequência de RNA/métodos , TranscriptomaRESUMO
AIMS/HYPOTHESIS: Correctly diagnosing MODY is important, as individuals with this diagnosis can discontinue insulin injections; however, many people are misdiagnosed. We aimed to develop a robust approach for determining the pathogenicity of variants of uncertain significance in hepatocyte nuclear factor-1 alpha (HNF1A)-MODY and to obtain an accurate estimate of the prevalence of HNF1A-MODY in paediatric cases of diabetes. METHODS: We extended our previous screening of the Norwegian Childhood Diabetes Registry by 830 additional samples and comprehensively genotyped HNF1A variants in autoantibody-negative participants using next-generation sequencing. Carriers of pathogenic variants were treated by local healthcare providers, and participants with novel likely pathogenic variants and variants of uncertain significance were enrolled in an investigator-initiated, non-randomised, open-label pilot study (ClinicalTrials.gov registration no. NCT04239586). To identify variants associated with HNF1A-MODY, we functionally characterised their pathogenicity and assessed the carriers' phenotype and treatment response to sulfonylurea. RESULTS: In total, 615 autoantibody-negative participants among 4712 cases of paediatric diabetes underwent genetic sequencing, revealing 19 with HNF1A variants. We identified nine carriers with novel variants classified as variants of uncertain significance or likely to be pathogenic, while the remaining ten participants carried five pathogenic variants previously reported. Of the nine carriers with novel variants, six responded favourably to sulfonylurea. Functional investigations revealed their variants to be dysfunctional and demonstrated a correlation with the resulting phenotype, providing evidence for reclassifying these variants as pathogenic. CONCLUSIONS/INTERPRETATION: Based on this robust classification, we estimate that the prevalence of HNF1A-MODY is 0.3% in paediatric diabetes. Clinical phenotyping is challenging and functional investigations provide a strong complementary line of evidence. We demonstrate here that combining clinical phenotyping with functional protein studies provides a powerful tool to obtain a precise diagnosis of HNF1A-MODY.
Assuntos
Diabetes Mellitus Tipo 2 , Humanos , Criança , Projetos Piloto , Diabetes Mellitus Tipo 2/metabolismo , Fenótipo , Autoanticorpos/genética , Fator 1-alfa Nuclear de Hepatócito/genética , Fator 1-alfa Nuclear de Hepatócito/metabolismo , Noruega/epidemiologia , Compostos de Sulfonilureia , MutaçãoRESUMO
In many organisms, early embryonic development is driven by maternally provided factors until the controlled onset of transcription during zygotic genome activation. The regulation of chromatin accessibility and its relationship to gene activity during this transition remain poorly understood. Here, we generated chromatin accessibility maps with ATAC-seq from genome activation until the onset of lineage specification. During this period, chromatin accessibility increases at regulatory elements. This increase is independent of RNA polymerase II-mediated transcription, with the exception of the hypertranscribed miR-430 locus. Instead, accessibility often precedes the transcription of associated genes. Loss of the maternal transcription factors Pou5f3, Sox19b, and Nanog, which are known to be required for zebrafish genome activation, results in decreased accessibility at regulatory elements. Importantly, the accessibility of regulatory regions, especially when established by Pou5f3, Sox19b and Nanog, is predictive for future transcription. Our results show that the maternally provided transcription factors Pou5f3, Sox19b, and Nanog open up chromatin and prime genes for activity during zygotic genome activation in zebrafish.
Assuntos
Montagem e Desmontagem da Cromatina , Regulação da Expressão Gênica no Desenvolvimento , Proteína Homeobox Nanog/metabolismo , Fator 3 de Transcrição de Octâmero/metabolismo , Fatores de Transcrição SOX/metabolismo , Proteínas de Peixe-Zebra/metabolismo , Animais , Cromatina/genética , Embrião não Mamífero/metabolismo , Impressão Genômica , Peixe-ZebraRESUMO
We present ampliCan, an analysis tool for genome editing that unites highly precise quantification and visualization of genuine genome editing events. ampliCan features nuclease-optimized alignments, filtering of experimental artifacts, event-specific normalization, and off-target read detection and quantifies insertions, deletions, HDR repair, as well as targeted base editing. It is scalable to thousands of amplicon sequencing-based experiments from any genome editing experiment, including CRISPR. It enables automated integration of controls and accounts for biases at every step of the analysis. We benchmarked ampliCan on both real and simulated data sets against other leading tools, demonstrating that it outperformed all in the face of common confounding factors.
Assuntos
Sistemas CRISPR-Cas/genética , Edição de Genes/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Taxa de Mutação , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Reparo do DNA por Junção de Extremidades/genética , Reparo de DNA por Recombinação/genética , Alinhamento de Sequência/métodos , SoftwareRESUMO
BACKGROUND: With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. RESULTS: Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5'UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5' UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. CONCLUSION: In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. AVAILABILITY: http://bioconductor.org/packages/ORFik .
Assuntos
Biossíntese de Proteínas , Ribossomos , Regiões 5' não Traduzidas , Sequenciamento de Nucleotídeos em Larga Escala , Fases de Leitura Aberta/genética , Ribossomos/genética , Ribossomos/metabolismoRESUMO
Polyadenylation at the 3'-end is a major regulator of messenger RNA and its length is known to affect nuclear export, stability, and translation, among others. Only recently have strategies emerged that allow for genome-wide poly(A) length assessment. These methods identify genes connected to poly(A) tail measurements indirectly by short-read alignment to genetic 3'-ends. Concurrently, Oxford Nanopore Technologies (ONT) established full-length isoform-specific RNA sequencing containing the entire poly(A) tail. However, assessing poly(A) length through base-calling has so far not been possible due to the inability to resolve long homopolymeric stretches in ONT sequencing. Here we present tailfindr, an R package to estimate poly(A) tail length on ONT long-read sequencing data. tailfindr operates on unaligned, base-called data. It measures poly(A) tail length from both native RNA and DNA sequencing, which makes poly(A) tail studies by full-length cDNA approaches possible for the first time. We assess tailfindr's performance across different poly(A) lengths, demonstrating that tailfindr is a versatile tool providing poly(A) tail estimates across a wide range of sequencing conditions.
Assuntos
Nanoporos , Poli A/metabolismo , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Poli T/metabolismo , PoliadenilaçãoRESUMO
The CRISPR-Cas system is a powerful genome editing tool that functions in a diverse array of organisms and cell types. The technology was initially developed to induce targeted mutations in DNA, but CRISPR-Cas has now been adapted to target nucleic acids for a range of purposes. CHOPCHOP is a web tool for identifying CRISPR-Cas single guide RNA (sgRNA) targets. In this major update of CHOPCHOP, we expand our toolbox beyond knockouts. We introduce functionality for targeting RNA with Cas13, which includes support for alternative transcript isoforms and RNA accessibility predictions. We incorporate new DNA targeting modes, including CRISPR activation/repression, targeted enrichment of loci for long-read sequencing, and prediction of Cas9 repair outcomes. Finally, we expand our results page visualization to reveal alternative isoforms and downstream ATG sites, which will aid users in avoiding the expression of truncated proteins. The CHOPCHOP web tool now supports over 200 genomes and we have released a command-line script for running larger jobs and handling unsupported genomes. CHOPCHOP v3 can be found at https://chopchop.cbu.uib.no.
Assuntos
Sistemas CRISPR-Cas/genética , Bases de Dados Genéticas , Marcação de Genes , Genoma/genética , RNA Guia de Cinetoplastídeos/genética , Software , Animais , Edição de Genes/métodos , HumanosRESUMO
Enhancers control the correct temporal and cell-type-specific activation of gene expression in multicellular eukaryotes. Knowing their properties, regulatory activity and targets is crucial to understand the regulation of differentiation and homeostasis. Here we use the FANTOM5 panel of samples, covering the majority of human tissues and cell types, to produce an atlas of active, in vivo-transcribed enhancers. We show that enhancers share properties with CpG-poor messenger RNA promoters but produce bidirectional, exosome-sensitive, relatively short unspliced RNAs, the generation of which is strongly related to enhancer activity. The atlas is used to compare regulatory programs between different cells at unprecedented depth, to identify disease-associated regulatory single nucleotide polymorphisms, and to classify cell-type-specific and ubiquitous enhancers. We further explore the utility of enhancer redundancy, which explains gene expression strength rather than expression patterns. The online FANTOM5 enhancer atlas represents a unique resource for studies on cell-type-specific enhancers and gene regulation.
Assuntos
Atlas como Assunto , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica/genética , Anotação de Sequência Molecular , Especificidade de Órgãos , Linhagem Celular , Células Cultivadas , Análise por Conglomerados , Predisposição Genética para Doença/genética , Células HeLa , Humanos , Polimorfismo de Nucleotídeo Único/genética , Regiões Promotoras Genéticas/genética , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Sítio de Iniciação de Transcrição , Iniciação da Transcrição GenéticaRESUMO
BACKGROUND: In phylogenetically diverse organisms, the 5' ends of a subset of mRNAs are trans-spliced with a spliced leader (SL) RNA. The functions of SL trans-splicing, however, remain largely enigmatic. RESULTS: We quantified translation genome-wide in the marine chordate, Oikopleura dioica, under inhibition of mTOR, a central growth regulator. Translation of trans-spliced TOP mRNAs was suppressed, consistent with a role of the SL sequence in nutrient-dependent translational control of growth-related mRNAs. Under crowded, nutrient-limiting conditions, O. dioica continued to filter-feed, but arrested growth until favorable conditions returned. Upon release from unfavorable conditions, initial recovery was independent of nutrient-responsive, trans-spliced genes, suggesting animal density sensing as a first trigger for resumption of development. CONCLUSION: Our results are consistent with a proposed role of trans-splicing in the coordinated translational down-regulation of nutrient-responsive genes under growth-limiting conditions.
Assuntos
Regulação da Expressão Gênica , Biossíntese de Proteínas , RNA Mensageiro/metabolismo , Serina-Treonina Quinases TOR/metabolismo , Trans-Splicing , Transcrição Gênica , Animais , Caenorhabditis elegans/genética , Caenorhabditis elegans/crescimento & desenvolvimento , Feminino , Mamíferos/genética , Motivos de Nucleotídeos , Oócitos/metabolismo , RNA Mensageiro/química , Serina-Treonina Quinases TOR/antagonistas & inibidores , Urocordados/genéticaRESUMO
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
Assuntos
Bacillus subtilis/genética , Biologia Computacional/métodos , Escherichia coli K12/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Salmonella typhimurium/genética , Algoritmos , Mapeamento Cromossômico , Aprendizado de Máquina , Fases de Leitura Aberta/genética , Ribossomos/genéticaRESUMO
BACKGROUND: The emergence of ribosome profiling to map actively translating ribosomes has laid the foundation for a diverse range of studies on translational regulation. The data obtained with different variations of this assay is typically manually processed, which has created a need for tools that would streamline and standardize processing steps. RESULTS: We present Shoelaces, a toolkit for ribosome profiling experiments automating read selection and filtering to obtain genuine translating footprints. Based on periodicity, favoring enrichment over the coding regions, it determines the read lengths corresponding to bona fide ribosome protected fragments. The specific codon under translation (P-site) is determined by automatic offset calculations resulting in sub-codon resolution. Shoelaces provides both a user-friendly graphical interface for interactive visualisation in a genome browser-like fashion and a command line interface for integration into automated pipelines. We process 79 libraries and show that studies typically discard excessive amounts of quality data in their manual analysis pipelines. CONCLUSIONS: Shoelaces streamlines ribosome profiling analysis offering automation of the processing, a range of interactive visualization features and export of the data into standard formats. Shoelaces stores all processing steps performed in an XML file that can be used by other groups to exactly reproduce the processing of a given study. We therefore anticipate that Shoelaces can aid researchers by automating what is typically performed manually and contribute to the overall reproducibility of studies. The tool is freely distributed as a Python package, with additional instructions, tutorial and demo datasets available at https://bitbucket.org/valenlab/shoelaces .
Assuntos
Biossíntese de Proteínas , Ribossomos/metabolismo , Software , Gráficos por Computador , Genômica/métodos , Humanos , Ribossomos/química , Fluxo de TrabalhoRESUMO
In just 3 years CRISPR genome editing has transformed biology, and its popularity and potency continue to grow. New CRISPR effectors and rules for locating optimum targets continue to be reported, highlighting the need for computational CRISPR targeting tools to compile these rules and facilitate target selection and design. CHOPCHOP is one of the most widely used web tools for CRISPR- and TALEN-based genome editing. Its overarching principle is to provide an intuitive and powerful tool that can serve both novice and experienced users. In this major update we introduce tools for the next generation of CRISPR advances, including Cpf1 and Cas9 nickases. We support a number of new features that improve the targeting power, usability and efficiency of CHOPCHOP. To increase targeting range and specificity we provide support for custom length sgRNAs, and we evaluate the sequence composition of the whole sgRNA and its surrounding region using models compiled from multiple large-scale studies. These and other new features, coupled with an updated interface for increased usability and support for a continually growing list of organisms, maintain CHOPCHOP as one of the leading tools for CRISPR genome editing. CHOPCHOP v2 can be found at http://chopchop.cbu.uib.no.
Assuntos
Proteínas de Bactérias/genética , Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Endonucleases/genética , Genoma , RNA Guia de Cinetoplastídeos/síntese química , Software , Animais , Proteínas de Bactérias/metabolismo , Proteína 9 Associada à CRISPR , Desoxirribonuclease I/genética , Desoxirribonuclease I/metabolismo , Endonucleases/metabolismo , Edição de Genes , Humanos , Armazenamento e Recuperação da Informação , Internet , Motivos de Nucleotídeos , RNA Guia de Cinetoplastídeos/genética , Nucleases dos Efetores Semelhantes a Ativadores de Transcrição/genética , Nucleases dos Efetores Semelhantes a Ativadores de Transcrição/metabolismoRESUMO
BACKGROUND: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. RESULTS: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. CONCLUSIONS: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites.
Assuntos
Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Processamento de Proteína Pós-Traducional , Ribossomos/metabolismoRESUMO
Epigenetic information is available from contemporary organisms, but is difficult to track back in evolutionary time. Here, we show that genome-wide epigenetic information can be gathered directly from next-generation sequence reads of DNA isolated from ancient remains. Using the genome sequence data generated from hair shafts of a 4000-yr-old Paleo-Eskimo belonging to the Saqqaq culture, we generate the first ancient nucleosome map coupled with a genome-wide survey of cytosine methylation levels. The validity of both nucleosome map and methylation levels were confirmed by the recovery of the expected signals at promoter regions, exon/intron boundaries, and CTCF sites. The top-scoring nucleosome calls revealed distinct DNA positioning biases, attesting to nucleotide-level accuracy. The ancient methylation levels exhibited high conservation over time, clustering closely with modern hair tissues. Using ancient methylation information, we estimated the age at death of the Saqqaq individual and illustrate how epigenetic information can be used to infer ancient gene expression. Similar epigenetic signatures were found in other fossil material, such as 110,000- to 130,000-yr-old bones, supporting the contention that ancient epigenomic information can be reconstructed from a deep past. Our findings lay the foundation for extracting epigenomic information from ancient samples, allowing shifts in epialleles to be tracked through evolutionary time, as well as providing an original window into modern epigenomics.
Assuntos
Citosina/metabolismo , Metilação de DNA , Genoma Humano , Inuíte/genética , Nucleossomos/genética , Animais , Mapeamento Cromossômico , Epigênese Genética , Epigenômica , Evolução Molecular , Expressão Gênica , Regulação da Expressão Gênica , Humanos , Filogenia , Regiões Promotoras Genéticas , Análise de Sequência de DNARESUMO
Over the past decade, high-throughput studies have identified many novel transcripts. While their existence is undisputed, their coding potential and functionality have remained controversial. Recent computational approaches guided by ribosome profiling have indicated that translation is far more pervasive than anticipated and takes place on many transcripts previously assumed to be non-coding. Some of these newly discovered translated transcripts encode short, functional proteins that had been missed in prior screens. Other transcripts are translated, but it might be the process of translation rather than the resulting peptides that serves a function. Here, we review annotation studies in zebrafish to discuss the challenges of placing RNAs onto the continuum that ranges from functional protein-encoding mRNAs to potentially non-functional peptide-producing RNAs to non-coding RNAs. As highlighted by the discovery of the novel signaling peptide Apela/ELABELA/Toddler, accurate annotations can give rise to exciting opportunities to identify the functions of previously uncharacterized transcripts.
Assuntos
Peptídeos/metabolismo , RNA não Traduzido/genética , Animais , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Peixe-Zebra/genéticaRESUMO
Large-scale genomics and computational approaches have identified thousands of putative long non-coding RNAs (lncRNAs). It has been controversial, however, as to what fraction of these RNAs is truly non-coding. Here, we combine ribosome profiling with a machine-learning approach to validate lncRNAs during zebrafish development in a high throughput manner. We find that dozens of proposed lncRNAs are protein-coding contaminants and that many lncRNAs have ribosome profiles that resemble the 5' leaders of coding RNAs. Analysis of ribosome profiling data from embryonic stem cells reveals similar properties for mammalian lncRNAs. These results clarify the annotation of developmental lncRNAs and suggest a potential role for translation in lncRNA regulation. In addition, our computational pipeline and ribosome profiling data provide a powerful resource for the identification of translated open reading frames during zebrafish development.
Assuntos
RNA Longo não Codificante/genética , RNA/genética , Ribossomos/genética , Animais , Desenvolvimento Embrionário/genética , Desenvolvimento Embrionário/fisiologia , Peixe-Zebra/genética , Peixe-Zebra/crescimento & desenvolvimentoRESUMO
Major advances in genome editing have recently been made possible with the development of the TALEN and CRISPR/Cas9 methods. The speed and ease of implementing these technologies has led to an explosion of mutant and transgenic organisms. A rate-limiting step in efficiently applying TALEN and CRISPR/Cas9 methods is the selection and design of targeting constructs. We have developed an online tool, CHOPCHOP (https://chopchop.rc.fas.harvard.edu), to expedite the design process. CHOPCHOP accepts a wide range of inputs (gene identifiers, genomic regions or pasted sequences) and provides an array of advanced options for target selection. It uses efficient sequence alignment algorithms to minimize search times, and rigorously predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs. Each query produces an interactive visualization of the gene with candidate target sites displayed at their genomic positions and color-coded according to quality scores. In addition, for each possible target site, restriction sites and primer candidates are visualized, facilitating a streamlined pipeline of mutant generation and validation. The ease-of-use and speed of CHOPCHOP make it a valuable tool for genome engineering.