RESUMO
BACKGROUND: With the rapid growth in the use of high-throughput methods for characterizing translation and the continued expansion of multi-omics, there is a need for back-end functions and streamlined tools for processing, analyzing, and characterizing data produced by these assays. RESULTS: Here, we introduce ORFik, a user-friendly R/Bioconductor API and toolbox for studying translation and its regulation. It extends GenomicRanges from the genome to the transcriptome and implements a framework that integrates data from several sources. ORFik streamlines the steps to process, analyze, and visualize the different steps of translation with a particular focus on initiation and elongation. It accepts high-throughput sequencing data from ribosome profiling to quantify ribosome elongation or RCP-seq/TCP-seq to also quantify ribosome scanning. In addition, ORFik can use CAGE data to accurately determine 5'UTRs and RNA-seq for determining translation relative to RNA abundance. ORFik supports and calculates over 30 different translation-related features and metrics from the literature and can annotate translated regions such as proteins or upstream open reading frames (uORFs). As a use-case, we demonstrate using ORFik to rapidly annotate the dynamics of 5' UTRs across different tissues, detect their uORFs, and characterize their scanning and translation in the downstream protein-coding regions. CONCLUSION: In summary, ORFik introduces hundreds of tested, documented and optimized methods. ORFik is designed to be easily customizable, enabling users to create complete workflows from raw data to publication-ready figures for several types of sequencing data. Finally, by improving speed and scope of many core Bioconductor functions, ORFik offers enhancement benefiting the entire Bioconductor environment. AVAILABILITY: http://bioconductor.org/packages/ORFik .
Assuntos
Biossíntese de Proteínas , Ribossomos , Regiões 5' não Traduzidas , Sequenciamento de Nucleotídeos em Larga Escala , Fases de Leitura Aberta/genética , Ribossomos/genética , Ribossomos/metabolismoRESUMO
BACKGROUND: The emergence of ribosome profiling to map actively translating ribosomes has laid the foundation for a diverse range of studies on translational regulation. The data obtained with different variations of this assay is typically manually processed, which has created a need for tools that would streamline and standardize processing steps. RESULTS: We present Shoelaces, a toolkit for ribosome profiling experiments automating read selection and filtering to obtain genuine translating footprints. Based on periodicity, favoring enrichment over the coding regions, it determines the read lengths corresponding to bona fide ribosome protected fragments. The specific codon under translation (P-site) is determined by automatic offset calculations resulting in sub-codon resolution. Shoelaces provides both a user-friendly graphical interface for interactive visualisation in a genome browser-like fashion and a command line interface for integration into automated pipelines. We process 79 libraries and show that studies typically discard excessive amounts of quality data in their manual analysis pipelines. CONCLUSIONS: Shoelaces streamlines ribosome profiling analysis offering automation of the processing, a range of interactive visualization features and export of the data into standard formats. Shoelaces stores all processing steps performed in an XML file that can be used by other groups to exactly reproduce the processing of a given study. We therefore anticipate that Shoelaces can aid researchers by automating what is typically performed manually and contribute to the overall reproducibility of studies. The tool is freely distributed as a Python package, with additional instructions, tutorial and demo datasets available at https://bitbucket.org/valenlab/shoelaces .
Assuntos
Biossíntese de Proteínas , Ribossomos/metabolismo , Software , Gráficos por Computador , Genômica/métodos , Humanos , Ribossomos/química , Fluxo de TrabalhoRESUMO
BACKGROUND: While methods for annotation of genes are increasingly reliable, the exact identification of translation initiation sites remains a challenging problem. Since the N-termini of proteins often contain regulatory and targeting information, developing a robust method for start site identification is crucial. Ribosome profiling reads show distinct patterns of read length distributions around translation initiation sites. These patterns are typically lost in standard ribosome profiling analysis pipelines, when reads from footprints are adjusted to determine the specific codon being translated. RESULTS: Utilising these signatures in combination with nucleotide sequence information, we build a model capable of predicting translation initiation sites and demonstrate its high accuracy using N-terminal proteomics. Applying this to prokaryotic translatomes, we re-annotate translation initiation sites and provide evidence of N-terminal truncations and extensions of previously annotated coding sequences. These re-annotations are supported by the presence of structural and sequence-based features next to N-terminal peptide evidence. Finally, our model identifies 61 novel genes previously undiscovered in the Salmonella enterica genome. CONCLUSIONS: Signatures within ribosome profiling read length distributions can be used in combination with nucleotide sequence information to provide accurate genome-wide identification of translation initiation sites.
Assuntos
Bactérias/metabolismo , Proteínas de Bactérias/metabolismo , Processamento de Proteína Pós-Traducional , Ribossomos/metabolismoRESUMO
The rate of translation can vary depending on the mRNA template. During the elongation phase the ribosome can transiently pause or permanently stall. A pause can provide the nascent protein with the time to fold or be transported, while stalling can serve as quality control and trigger degradation of aberrant mRNA and peptide. Ribosome profiling has allowed for the genome-wide detection of such pauses and stalls, but due to library-specific biases, these predictions are often unreliable. Here, we take advantage of the deep conservation of protein synthesis machinery, hypothesizing that similar conservation could exist for functionally important locations of ribosome slowdown, here collectively called stall sites. We analyze multiple ribosome profiling datasets from phylogenetically diverse eukaryotes: yeast, fruit fly, zebrafish, mouse and human to identify conserved stall sites. We find thousands of stall sites across multiple species, with the enrichment of proline, glycine and negatively charged amino acids around conserved stalling. Many of the sites are found in RNA processing genes, suggesting that stalling might have a conserved role in RNA metabolism. In summary, our results provide a rich resource for the study of conserved stalling and indicate possible roles of stalling in gene regulation.