RESUMEN
A large body of evidence indicates that genome annotation pipelines have biased our view of coding sequences because they generally undersample small proteins and peptides. The recent development of genome-wide translation profiling reveals the prevalence of small/short open reading frames (smORFs or sORFs), which are scattered over all classes of transcripts, including both mRNAs and presumptive long noncoding RNAs. Proteomic approaches further confirm an unexpected variety of smORF-encoded peptides (SEPs), representing an overlooked reservoir of bioactive molecules. Indeed, functional studies in a broad range of species from yeast to humans demonstrate that SEPs can harbor key activities for the control of development, differentiation, and physiology. Here we summarize recent advances in the discovery and functional characterization of smORF/SEPs and discuss why these small players can no longer be ignored with regard to genome function.
Asunto(s)
Péptidos/metabolismo , Animales , Genoma , Humanos , Sistemas de Lectura Abierta/genética , Biosíntesis de Proteínas , ARN no Traducido/genéticaRESUMEN
The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.
Asunto(s)
Aprendizaje Profundo , Escherichia coli/genética , Genoma Bacteriano , Genómica/métodos , Sitio de Iniciación de la Transcripción , Secuencia de Bases , Sitios de Unión , ADN Bacteriano/genética , ADN Bacteriano/metabolismo , Escherichia coli/metabolismo , Regiones Promotoras Genéticas/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
MOTIVATION: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Metilación de ADN , Epigenoma , Secuencia de Bases , Análisis de Secuencia de ADN/métodos , Redes Neurales de la ComputaciónRESUMEN
Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.
Asunto(s)
Proteogenómica/métodos , Bases de Datos de Proteínas , Células HCT116 , Humanos , Aprendizaje Automático , RNA-Seq , RibosomasRESUMEN
Translation initiation generally occurs at AUG codons in eukaryotes, although it has been shown that non-AUG or noncanonical translation initiation can also occur. However, the evidence for noncanonical translation initiation sites (TISs) is largely indirect and based on ribosome profiling (Ribo-seq) studies. Here, using a strategy specifically designed to enrich N termini of proteins, we demonstrate that many human proteins are translated at noncanonical TISs. The large majority of TISs that mapped to 5' untranslated regions were noncanonical and led to N-terminal extension of annotated proteins or translation of upstream small open reading frames (uORF). It has been controversial whether the amino acid corresponding to the start codon is incorporated at the TIS or methionine is still incorporated. We found that methionine was incorporated at almost all noncanonical TISs identified in this study. Comparison of the TISs determined through mass spectrometry with ribosome profiling data revealed that about two-thirds of the novel annotations were indeed supported by the available ribosome profiling data. Sequence conservation across species and a higher abundance of noncanonical TISs than canonical ones in some cases suggests that the noncanonical TISs can have biological functions. Overall, this study provides evidence of protein translation initiation at noncanonical TISs and argues that further studies are required for elucidation of functional implications of such noncanonical translation initiation.
Asunto(s)
Regiones no Traducidas 5' , Espectrometría de Masas , Sistemas de Lectura Abierta , Iniciación de la Cadena Peptídica Traduccional , Ribosomas/metabolismo , Células HEK293 , Células Endoteliales de la Vena Umbilical Humana/metabolismo , Humanos , Dominios Proteicos , Ribosomas/genéticaRESUMEN
Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.
Asunto(s)
Biología Computacional/estadística & datos numéricos , Redes Neurales de la Computación , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Proteoma/genética , Ribosomas/genética , Animales , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Plantas/genética , Señales de Clasificación de Proteína/genética , Proteoma/clasificación , Proteoma/metabolismo , Ribosomas/clasificación , Ribosomas/metabolismo , Programas InformáticosRESUMEN
PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.
Asunto(s)
Proteogenómica/métodos , Ribosomas/metabolismo , Cromatografía Liquida , Células HCT116 , Humanos , Células Jurkat , Espectrometría de Masas en TándemRESUMEN
Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.
Asunto(s)
Algoritmos , Anotación de Secuencia Molecular/métodos , Células Procariotas/metabolismo , Biosíntesis de Proteínas/fisiología , Ribosomas/metabolismo , Sitios de Unión , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Ensayos Analíticos de Alto Rendimiento/métodos , Redes Neurales de la Computación , Sistemas de Lectura Abierta , Células Procariotas/química , Procesamiento Proteico-Postraduccional , Alineación de Secuencia/métodos , Transducción de SeñalRESUMEN
sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.
Asunto(s)
Algoritmos , Bases de Datos Genéticas , Sistemas de Lectura Abierta , Proteómica/métodos , Ribosomas/genética , Animales , Secuencia de Bases , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Secuencia Conservada , Conjuntos de Datos como Asunto , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Humanos , Internet , Ratones , Biosíntesis de Proteínas , Ratas , Ribosomas/metabolismo , Alineación de Secuencia , Relación Señal-Ruido , Programas Informáticos , Espectrometría de Masas en Tándem/estadística & datos numéricos , Pez Cebra/genética , Pez Cebra/metabolismoRESUMEN
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Asunto(s)
Proteómica/normas , Humanos , Almacenamiento y Recuperación de la Información , Espectrometría de Masas , Programas InformáticosRESUMEN
Neuropeptides constitute a vast and functionally diverse family of neurochemical signaling molecules and are widely involved in the regulation of various physiological processes. The nematode Caenorhabditis elegans is well-suited for the study of neuropeptide biochemistry and function, as neuropeptide biosynthesis enzymes are not essential for C. elegans viability. This permits the study of neuropeptide biosynthesis in mutants lacking certain neuropeptide-processing enzymes. Mass spectrometry has been used to study the effects of proprotein convertase and carboxypeptidase mutations on proteolytic processing of neuropeptide precursors and on the peptidome in C. elegans However, the enzymes required for the last step in the production of many bioactive peptides, the carboxyl-terminal amidation reaction, have not been characterized in this manner. Here, we describe three genes that encode homologs of neuropeptide amidation enzymes in C. elegans and used tandem LC-MS to compare neuropeptides in WT animals with those in newly generated mutants for these putative amidation enzymes. We report that mutants lacking both a functional peptidylglycine α-hydroxylating monooxygenase and a peptidylglycine α-amidating monooxygenase had a severely altered neuropeptide profile and also a decreased number of offspring. Interestingly, single mutants of the amidation enzymes still expressed some fully processed amidated neuropeptides, indicating the existence of a redundant amidation mechanism in C. elegans All MS data are available via ProteomeXchange with the identifier PXD008942. In summary, the key steps in neuropeptide processing in C. elegans seem to be executed by redundant enzymes, and loss of these enzymes severely affects brood size, supporting the need of amidated peptides for C. elegans reproduction.
Asunto(s)
Amidina-Liasas/metabolismo , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Oxigenasas de Función Mixta/metabolismo , Complejos Multienzimáticos/metabolismo , Neuropéptidos/metabolismo , Amidina-Liasas/química , Amidina-Liasas/genética , Secuencia de Aminoácidos , Animales , Vías Biosintéticas , Caenorhabditis elegans/química , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Cobre/metabolismo , Eliminación de Gen , Humanos , Oxigenasas de Función Mixta/química , Oxigenasas de Función Mixta/genética , Complejos Multienzimáticos/química , Complejos Multienzimáticos/genética , Mutación , Neuropéptidos/genética , Alineación de Secuencia , Espectrometría de Masas en TándemRESUMEN
Alternative translation initiation mechanisms such as leaky scanning and reinitiation potentiate the polycistronic nature of human transcripts. By allowing for reprogrammed translation, these mechanisms can mediate biological responses to stimuli. We combined proteomics with ribosome profiling and mRNA sequencing to identify the biological targets of translation control triggered by the eukaryotic translation initiation factor 1 (eIF1), a protein implicated in the stringency of start codon selection. We quantified expression changes of over 4000 proteins and 10 000 actively translated transcripts, leading to the identification of 245 transcripts undergoing translational control mediated by upstream open reading frames (uORFs) upon eIF1 deprivation. Here, the stringency of start codon selection and preference for an optimal nucleotide context were largely diminished leading to translational upregulation of uORFs with suboptimal start. Interestingly, genes affected by eIF1 deprivation were implicated in energy production and sensing of metabolic stress.
Asunto(s)
Factores Eucarióticos de Iniciación/metabolismo , Proteínas de Neoplasias/metabolismo , Proteínas del Tejido Nervioso/metabolismo , Iniciación de la Cadena Peptídica Traduccional , Línea Celular , Codón Iniciador , Metabolismo Energético/genética , Factores Eucarióticos de Iniciación/antagonistas & inhibidores , Factores Eucarióticos de Iniciación/genética , Expresión Génica , Técnicas de Silenciamiento del Gen , Células HCT116 , Humanos , Proteínas de Neoplasias/antagonistas & inhibidores , Proteínas de Neoplasias/genética , Proteínas del Tejido Nervioso/antagonistas & inhibidores , Proteínas del Tejido Nervioso/genética , Conformación de Ácido Nucleico , Sistemas de Lectura Abierta , ARN Mensajero/química , ARN Mensajero/genética , ARN Mensajero/metabolismo , Ribosomas/genética , Ribosomas/metabolismo , Estrés Fisiológico/genéticaRESUMEN
Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.
Asunto(s)
Bacillus subtilis/genética , Biología Computacional/métodos , Escherichia coli K12/genética , Genoma Bacteriano/genética , Anotación de Secuencia Molecular/métodos , Salmonella typhimurium/genética , Algoritmos , Mapeo Cromosómico , Aprendizaje Automático , Sistemas de Lectura Abierta/genética , Ribosomas/genéticaRESUMEN
Bio-active peptides are involved in the regulation of most physiological processes in the body. Classical bio-active peptides (CBAPs) are cleaved from a larger precursor protein and stored in secretion vesicles from which they are released in the extracellular space. Recently, another non-classical type of bio-active peptides (NCBAPs) has gained interest. These typically are not secreted but instead appear to be translated from short open reading frames (sORF) and released directly into the cytoplasm. In contrast to CBAPs, these peptides are involved in the regulation of intra-cellular processes such as transcriptional control, calcium handling and DNA repair. However, bio-chemical evidence for the translation of sORFs remains elusive. Comprehensive analysis of sORF-encoded polypeptides (SEPs) is hampered by a number of methodological and biological challenges: the low molecular mass (many 4-10 kDa), the low abundance, transient expression and complications in data analysis. We developed a strategy to address a number of these issues. Our strategy is to exclude false positive identifications. In total sample, we identified 926 peptides originated from 37 known (neuro)peptide precursors in mouse striatum. In addition, four SEPs were identified including NoBody, a SEP that was previously discovered in humans and three novel SEPS from 5' untranslated transcript regions (UTRs).
RESUMEN
The salivary protein repertoire released by the herbivorous pest Tetranychus urticae is assumed to hold keys to its success on diverse crops. We report on a spider mite-specific protein family that is expanded in T. urticae. The encoding genes have an expression pattern restricted to the anterior podocephalic glands, while peptide fragments were found in the T. urticae secretome, supporting the salivary nature of these proteins. As peptide fragments were identified in a host-dependent manner, we designated this family as the SHOT (secreted host-responsive protein of Tetranychidae) family. The proteins were divided in three groups based on sequence similarity. Unlike TuSHOT3 genes, TuSHOT1 and TuSHOT2 genes were highly expressed when feeding on a subset of family Fabaceae, while expression was depleted on other hosts. TuSHOT1 and TuSHOT2 expression was induced within 24 h after certain host transfers, pointing toward transcriptional plasticity rather than selection as the cause. Transfer from an 'inducer' to a 'noninducer' plant was associated with slow yet strong downregulation of TuSHOT1 and TuSHOT2, occurring over generations rather than hours. This asymmetric on and off regulation points toward host-specific effects of SHOT proteins, which is further supported by the diversity of SHOT genes identified in Tetranychidae with a distinct host repertoire.
Asunto(s)
Interacciones Huésped-Parásitos/genética , Familia de Multigenes , Proteínas y Péptidos Salivales/genética , Tetranychidae/genética , Transcripción Genética , Secuencia de Aminoácidos , Animales , Regulación de la Expresión Génica de las Plantas , Péptidos/química , Péptidos/metabolismo , Filogenia , Plantas/genética , Plantas/parasitología , Proteómica , ARN Mensajero/genética , ARN Mensajero/metabolismo , Saliva/metabolismo , Factores de TiempoRESUMEN
Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.
Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Genómica/métodos , Espectrometría de Masas/métodos , Proteómica/métodos , Animales , Anticuerpos/genética , Mapeo Peptídico/métodos , Péptidos/análisis , Péptidos/genética , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Ponzoñas/análisisRESUMEN
N-terminal acetylation (Nt-acetylation) by N-terminal acetyltransferases (NATs) is one of the most common protein modifications in eukaryotes. The NatC complex represents one of three major NATs of which the substrate profile remains largely unexplored. Here, we defined the in vivo human NatC Nt-acetylome on a proteome-wide scale by combining knockdown of its catalytic subunit Naa30 with positional proteomics. We identified 46 human NatC substrates, expanding our current knowledge on the substrate repertoire of NatC which now includes proteins harboring Met-Leu, Met-Ile, Met-Phe, Met-Trp, Met-Val, Met-Met, Met-His and Met-Lys N termini. Upon Naa30 depletion the expression levels of several organellar proteins were found reduced, in particular mitochondrial proteins, some of which were found to be NatC substrates. Interestingly, knockdown of Naa30 induced the loss of mitochondrial membrane potential and fragmentation of mitochondria. In conclusion, NatC Nt-acetylates a large variety of proteins and is essential for mitochondrial integrity and function.
Asunto(s)
Proteínas Mitocondriales/metabolismo , Acetiltransferasa C N-Terminal/genética , Acetiltransferasa C N-Terminal/metabolismo , Proteómica/métodos , Acetilación , Línea Celular Tumoral , Técnicas de Silenciamiento del Gen , Células HeLa , Humanos , Unión Proteica , Mapas de Interacción de Proteínas , Especificidad por SustratoRESUMEN
The two-spotted spider mite Tetranychus urticae is an extremely polyphagous crop pest. Alongside an unparalleled detoxification potential for plant secondary metabolites, it has recently been shown that spider mites can attenuate or even suppress plant defenses. Salivary constituents, notably effectors, have been proposed to play an important role in manipulating plant defenses and might determine the outcome of plant-mite interactions. Here, the proteomic composition of saliva from T. urticae lines adapted to various host plants-bean, maize, soy, and tomato-was analyzed using a custom-developed feeding assay coupled with nano-LC tandem mass spectrometry. About 90 putative T. urticae salivary proteins were identified. Many are of unknown function, and in numerous cases belonging to multimembered gene families. RNAseq expression analysis revealed that many genes coding for these salivary proteins were highly expressed in the proterosoma, the mite body region that includes the salivary glands. A subset of genes encoding putative salivary proteins was selected for whole-mount in situ hybridization, and were found to be expressed in the anterior and dorsal podocephalic glands. Strikingly, host plant dependent expression was evident for putative salivary proteins, and was further studied in detail by micro-array based genome-wide expression profiling. This meta-analysis revealed for the first time the salivary protein repertoire of a phytophagous chelicerate. The availability of this salivary proteome will assist in unraveling the molecular interface between phytophagous mites and their host plants, and may ultimately facilitate the development of mite-resistant crops. Furthermore, the technique used in this study is a time- and resource-efficient method to examine the salivary protein composition of other small arthropods for which saliva or salivary glands cannot be isolated easily.
Asunto(s)
Productos Agrícolas/parasitología , Proteómica/métodos , Proteínas y Péptidos Salivales/metabolismo , Tetranychidae/fisiología , Animales , Proteínas de Artrópodos/metabolismo , Cromatografía Liquida , Productos Agrícolas/genética , Regulación de la Expresión Génica , Especificidad del Huésped , Interacciones Huésped-Parásitos , Proteínas y Péptidos Salivales/genética , Análisis de Secuencia de ARN/métodos , Espectrometría de Masas en Tándem , Tetranychidae/metabolismo , Distribución TisularRESUMEN
With the advent of ribosome profiling, a next generation sequencing technique providing a "snap-shot'' of translated mRNA in a cell, many short open reading frames (sORFs) with ribosomal activity were identified. Follow-up studies revealed the existence of functional peptides, so-called micropeptides, translated from these 'sORFs', indicating a new class of bio-active peptides. Over the last few years, several micropeptides exhibiting important cellular functions were discovered. However, ribosome occupancy does not necessarily imply an actual function of the translated peptide, leading to the development of various tools assessing the coding potential of sORFs. Here, we introduce sORFs.org (http://www.sorfs.org), a novel database for sORFs identified using ribosome profiling. Starting from ribosome profiling, sORFs.org identifies sORFs, incorporates state-of-the-art tools and metrics and stores results in a public database. Two query interfaces are provided, a default one enabling quick lookup of sORFs and a BioMart interface providing advanced query and export possibilities. At present, sORFs.org harbors 263 354 sORFs that demonstrate ribosome occupancy, originating from three different cell lines: HCT116 (human), E14_mESC (mouse) and S2 (fruit fly). sORFs.org aims to provide an extensive sORFs database accessible to researchers with limited bioinformatics knowledge, thus enabling easy integration into personal projects.
Asunto(s)
Bases de Datos Genéticas , Sistemas de Lectura Abierta , Animales , Secuencia de Bases , Línea Celular , Secuencia Conservada , Drosophila melanogaster/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Internet , Espectrometría de Masas , Ratones , Péptidos/química , ARN Mensajero/química , Ribosomas/metabolismo , Análisis de Secuencia de ARNRESUMEN
The introduction of new standard formats, proBAM and proBed, improves the integration of genomics and proteomics information, thus aiding proteogenomics applications. These novel formats enable peptide spectrum matches (PSM) to be stored, inspected, and analyzed within the context of the genome. However, an easy-to-use and transparent tool to convert mass spectrometry identification files to these new formats is indispensable. proBAMconvert enables the conversion of common identification file formats (mzIdentML, mzTab, and pepXML) to proBAM/proBed using an intuitive interface. Furthermore, ProBAMconvert enables information to be output both at the PSM and peptide levels and has a command line interface next to the graphical user interface. Detailed documentation and a completely worked-out tutorial is available at http://probam.biobix.be .