Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
1.
Annu Rev Cell Dev Biol ; 33: 391-416, 2017 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-28759257

RESUMO

A large body of evidence indicates that genome annotation pipelines have biased our view of coding sequences because they generally undersample small proteins and peptides. The recent development of genome-wide translation profiling reveals the prevalence of small/short open reading frames (smORFs or sORFs), which are scattered over all classes of transcripts, including both mRNAs and presumptive long noncoding RNAs. Proteomic approaches further confirm an unexpected variety of smORF-encoded peptides (SEPs), representing an overlooked reservoir of bioactive molecules. Indeed, functional studies in a broad range of species from yeast to humans demonstrate that SEPs can harbor key activities for the control of development, differentiation, and physiology. Here we summarize recent advances in the discovery and functional characterization of smORF/SEPs and discuss why these small players can no longer be ignored with regard to genome function.


Assuntos
Peptídeos/metabolismo , Animais , Genoma , Humanos , Fases de Leitura Aberta/genética , Biossíntese de Proteínas , RNA não Traduzido/genética
2.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33834200

RESUMO

The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field.


Assuntos
Aprendizado Profundo , Escherichia coli/genética , Genoma Bacteriano , Genômica/métodos , Sítio de Iniciação de Transcrição , Sequência de Bases , Sítios de Ligação , DNA Bacteriano/genética , DNA Bacteriano/metabolismo , Escherichia coli/metabolismo , Regiões Promotoras Genéticas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
3.
Bioinformatics ; 38(3): 597-603, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34718418

RESUMO

MOTIVATION: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. RESULTS: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget. AVAILABILITY AND IMPLEMENTATION: CpG Transformer is freely available at https://github.com/gdewael/cpg-transformer. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metilação de DNA , Epigenoma , Sequência de Bases , Análise de Sequência de DNA/métodos , Redes Neurais de Computação
4.
Mol Cell Proteomics ; 20: 100076, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33823297

RESUMO

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Assuntos
Proteogenômica/métodos , Bases de Dados de Proteínas , Células HCT116 , Humanos , Aprendizado de Máquina , RNA-Seq , Ribossomos
5.
Genome Res ; 28(1): 25-36, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29162641

RESUMO

Translation initiation generally occurs at AUG codons in eukaryotes, although it has been shown that non-AUG or noncanonical translation initiation can also occur. However, the evidence for noncanonical translation initiation sites (TISs) is largely indirect and based on ribosome profiling (Ribo-seq) studies. Here, using a strategy specifically designed to enrich N termini of proteins, we demonstrate that many human proteins are translated at noncanonical TISs. The large majority of TISs that mapped to 5' untranslated regions were noncanonical and led to N-terminal extension of annotated proteins or translation of upstream small open reading frames (uORF). It has been controversial whether the amino acid corresponding to the start codon is incorporated at the TIS or methionine is still incorporated. We found that methionine was incorporated at almost all noncanonical TISs identified in this study. Comparison of the TISs determined through mass spectrometry with ribosome profiling data revealed that about two-thirds of the novel annotations were indeed supported by the available ribosome profiling data. Sequence conservation across species and a higher abundance of noncanonical TISs than canonical ones in some cases suggests that the noncanonical TISs can have biological functions. Overall, this study provides evidence of protein translation initiation at noncanonical TISs and argues that further studies are required for elucidation of functional implications of such noncanonical translation initiation.


Assuntos
Regiões 5' não Traduzidas , Espectrometria de Massas , Fases de Leitura Aberta , Iniciação Traducional da Cadeia Peptídica , Ribossomos/metabolismo , Células HEK293 , Células Endoteliais da Veia Umbilical Humana/metabolismo , Humanos , Domínios Proteicos , Ribossomos/genética
6.
Exp Cell Res ; 391(1): 111923, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32135166

RESUMO

Growing evidence illustrates the shortcomings on the current understanding of the full complexity of the proteome. Previously overlooked small open reading frames (sORFs) and their encoded microproteins have filled important gaps, exerting their function as biologically relevant regulators. The characterization of the full small proteome has potential applications in many fields. Continuous development of techniques and tools led to an improved sORF discovery, where these can originate from bioinformatics analyses, from sequencing routines or proteomics approaches. In this mini review, we discuss the ongoing trends in the three fields and suggest some strategies for further characterization of high potential candidates.


Assuntos
Biologia Computacional/estatística & dados numéricos , Redes Neurais de Computação , Fases de Leitura Aberta , Biossíntese de Proteínas , Proteoma/genética , Ribossomos/genética , Animais , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Plantas/genética , Sinais Direcionadores de Proteínas/genética , Proteoma/classificação , Proteoma/metabolismo , Ribossomos/classificação , Ribossomos/metabolismo , Software
7.
Mol Cell Proteomics ; 18(8 suppl 1): S126-S140, 2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31040227

RESUMO

PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.


Assuntos
Proteogenômica/métodos , Ribossomos/metabolismo , Cromatografia Líquida , Células HCT116 , Humanos , Células Jurkat , Espectrometria de Massas em Tandem
8.
Nucleic Acids Res ; 47(6): e36, 2019 04 08.
Artigo em Inglês | MEDLINE | ID: mdl-30753697

RESUMO

Annotation of gene expression in prokaryotes often finds itself corrected due to small variations of the annotated gene regions observed between different (sub)-species. It has become apparent that traditional sequence alignment algorithms, used for the curation of genomes, are not able to map the full complexity of the genomic landscape. We present DeepRibo, a novel neural network utilizing features extracted from ribosome profiling information and binding site sequence patterns that shows to be a precise tool for the delineation and annotation of expressed genes in prokaryotes. The neural network combines recurrent memory cells and convolutional layers, adapting the information gained from both the high-throughput ribosome profiling data and ribosome binding translation initiation sequence region into one model. DeepRibo is designed as a single model trained on a variety of ribosome profiling experiments, used for the identification of open reading frames in prokaryotes without a priori knowledge of the translational landscape. Through extensive validation of the model trained on various sets of data, multiple species sequence similarity, mass spectrometry and Edman degradation verified proteins, the effectiveness of DeepRibo is highlighted.


Assuntos
Algoritmos , Anotação de Sequência Molecular/métodos , Células Procarióticas/metabolismo , Biossíntese de Proteínas/fisiologia , Ribossomos/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Ensaios de Triagem em Larga Escala/métodos , Redes Neurais de Computação , Fases de Leitura Aberta , Células Procarióticas/química , Processamento de Proteína Pós-Traducional , Alinhamento de Sequência/métodos , Transdução de Sinais
9.
Nucleic Acids Res ; 46(D1): D497-D502, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29140531

RESUMO

sORFs.org (http://www.sorfs.org) is a public repository of small open reading frames (sORFs) identified by ribosome profiling (RIBO-seq). This update elaborates on the major improvements implemented since its initial release. sORFs.org now additionally supports three more species (zebrafish, rat and Caenorhabditis elegans) and currently includes 78 RIBO-seq datasets, a vast increase compared to the three that were processed in the initial release. Therefore, a novel pipeline was constructed that also enables sORF detection in RIBO-seq datasets comprising solely elongating RIBO-seq data while previously, matching initiating RIBO-seq data was necessary to delineate the sORFs. Furthermore, a novel noise filtering algorithm was designed, able to distinguish sORFs with true ribosomal activity from simulated noise, consequently reducing the false positive identification rate. The inclusion of other species also led to the development of an inner BLAST pipeline, assessing sequence similarity between sORFs in the repository. Building on the proof of concept model in the initial release of sORFs.org, a full PRIDE-ReSpin pipeline was now released, reprocessing publicly available MS-based proteomics PRIDE datasets, reporting on true translation events. Next to reporting those identified peptides, sORFs.org allows visual inspection of the annotated spectra within the Lorikeet MS/MS viewer, thus enabling detailed manual inspection and interpretation.


Assuntos
Algoritmos , Bases de Dados Genéticas , Fases de Leitura Aberta , Proteômica/métodos , Ribossomos/genética , Animais , Sequência de Bases , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Sequência Conservada , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Humanos , Internet , Camundongos , Biossíntese de Proteínas , Ratos , Ribossomos/metabolismo , Alinhamento de Sequência , Razão Sinal-Ruído , Software , Espectrometria de Massas em Tandem/estatística & dados numéricos , Peixe-Zebra/genética , Peixe-Zebra/metabolismo
10.
J Proteome Res ; 18(6): 2686-2692, 2019 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-31081335

RESUMO

Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .


Assuntos
Proteômica/normas , Humanos , Armazenamento e Recuperação da Informação , Espectrometria de Massas , Software
11.
J Biol Chem ; 293(16): 6052-6063, 2018 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-29487130

RESUMO

Neuropeptides constitute a vast and functionally diverse family of neurochemical signaling molecules and are widely involved in the regulation of various physiological processes. The nematode Caenorhabditis elegans is well-suited for the study of neuropeptide biochemistry and function, as neuropeptide biosynthesis enzymes are not essential for C. elegans viability. This permits the study of neuropeptide biosynthesis in mutants lacking certain neuropeptide-processing enzymes. Mass spectrometry has been used to study the effects of proprotein convertase and carboxypeptidase mutations on proteolytic processing of neuropeptide precursors and on the peptidome in C. elegans However, the enzymes required for the last step in the production of many bioactive peptides, the carboxyl-terminal amidation reaction, have not been characterized in this manner. Here, we describe three genes that encode homologs of neuropeptide amidation enzymes in C. elegans and used tandem LC-MS to compare neuropeptides in WT animals with those in newly generated mutants for these putative amidation enzymes. We report that mutants lacking both a functional peptidylglycine α-hydroxylating monooxygenase and a peptidylglycine α-amidating monooxygenase had a severely altered neuropeptide profile and also a decreased number of offspring. Interestingly, single mutants of the amidation enzymes still expressed some fully processed amidated neuropeptides, indicating the existence of a redundant amidation mechanism in C. elegans All MS data are available via ProteomeXchange with the identifier PXD008942. In summary, the key steps in neuropeptide processing in C. elegans seem to be executed by redundant enzymes, and loss of these enzymes severely affects brood size, supporting the need of amidated peptides for C. elegans reproduction.


Assuntos
Amidina-Liases/metabolismo , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Oxigenases de Função Mista/metabolismo , Complexos Multienzimáticos/metabolismo , Neuropeptídeos/metabolismo , Amidina-Liases/química , Amidina-Liases/genética , Sequência de Aminoácidos , Animais , Vias Biossintéticas , Caenorhabditis elegans/química , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Cobre/metabolismo , Deleção de Genes , Humanos , Oxigenases de Função Mista/química , Oxigenases de Função Mista/genética , Complexos Multienzimáticos/química , Complexos Multienzimáticos/genética , Mutação , Neuropeptídeos/genética , Alinhamento de Sequência , Espectrometria de Massas em Tandem
12.
Nucleic Acids Res ; 45(20): e168, 2017 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-28977509

RESUMO

Prokaryotic genome annotation is highly dependent on automated methods, as manual curation cannot keep up with the exponential growth of sequenced genomes. Current automated methods depend heavily on sequence composition and often underestimate the complexity of the proteome. We developed RibosomeE Profiling Assisted (re-)AnnotaTION (REPARATION), a de novo machine learning algorithm that takes advantage of experimental protein synthesis evidence from ribosome profiling (Ribo-seq) to delineate translated open reading frames (ORFs) in bacteria, independent of genome annotation (https://github.com/Biobix/REPARATION). REPARATION evaluates all possible ORFs in the genome and estimates minimum thresholds based on a growth curve model to screen for spurious ORFs. We applied REPARATION to three annotated bacterial species to obtain a more comprehensive mapping of their translation landscape in support of experimental data. In all cases, we identified hundreds of novel (small) ORFs including variants of previously annotated ORFs and >70% of all (variants of) annotated protein coding ORFs were predicted by REPARATION to be translated. Our predictions are supported by matching mass spectrometry proteomics data, sequence composition and conservation analysis. REPARATION is unique in that it makes use of experimental translation evidence to intrinsically perform a de novo ORF delineation in bacterial genomes irrespective of the sequence features linked to open reading frames.


Assuntos
Bacillus subtilis/genética , Biologia Computacional/métodos , Escherichia coli K12/genética , Genoma Bacteriano/genética , Anotação de Sequência Molecular/métodos , Salmonella typhimurium/genética , Algoritmos , Mapeamento Cromossômico , Aprendizado de Máquina , Fases de Leitura Aberta/genética , Ribossomos/genética
13.
Nucleic Acids Res ; 45(13): 7997-8013, 2017 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-28541577

RESUMO

Alternative translation initiation mechanisms such as leaky scanning and reinitiation potentiate the polycistronic nature of human transcripts. By allowing for reprogrammed translation, these mechanisms can mediate biological responses to stimuli. We combined proteomics with ribosome profiling and mRNA sequencing to identify the biological targets of translation control triggered by the eukaryotic translation initiation factor 1 (eIF1), a protein implicated in the stringency of start codon selection. We quantified expression changes of over 4000 proteins and 10 000 actively translated transcripts, leading to the identification of 245 transcripts undergoing translational control mediated by upstream open reading frames (uORFs) upon eIF1 deprivation. Here, the stringency of start codon selection and preference for an optimal nucleotide context were largely diminished leading to translational upregulation of uORFs with suboptimal start. Interestingly, genes affected by eIF1 deprivation were implicated in energy production and sensing of metabolic stress.


Assuntos
Fatores de Iniciação em Eucariotos/metabolismo , Proteínas de Neoplasias/metabolismo , Proteínas do Tecido Nervoso/metabolismo , Iniciação Traducional da Cadeia Peptídica , Linhagem Celular , Códon de Iniciação , Metabolismo Energético/genética , Fatores de Iniciação em Eucariotos/antagonistas & inibidores , Fatores de Iniciação em Eucariotos/genética , Expressão Gênica , Técnicas de Silenciamento de Genes , Células HCT116 , Humanos , Proteínas de Neoplasias/antagonistas & inibidores , Proteínas de Neoplasias/genética , Proteínas do Tecido Nervoso/antagonistas & inibidores , Proteínas do Tecido Nervoso/genética , Conformação de Ácido Nucleico , Fases de Leitura Aberta , RNA Mensageiro/química , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ribossomos/genética , Ribossomos/metabolismo , Estresse Fisiológico/genética
14.
Proteomics ; 18(10): e1700218, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29710410

RESUMO

Bio-active peptides are involved in the regulation of most physiological processes in the body. Classical bio-active peptides (CBAPs) are cleaved from a larger precursor protein and stored in secretion vesicles from which they are released in the extracellular space. Recently, another non-classical type of bio-active peptides (NCBAPs) has gained interest. These typically are not secreted but instead appear to be translated from short open reading frames (sORF) and released directly into the cytoplasm. In contrast to CBAPs, these peptides are involved in the regulation of intra-cellular processes such as transcriptional control, calcium handling and DNA repair. However, bio-chemical evidence for the translation of sORFs remains elusive. Comprehensive analysis of sORF-encoded polypeptides (SEPs) is hampered by a number of methodological and biological challenges: the low molecular mass (many 4-10 kDa), the low abundance, transient expression and complications in data analysis. We developed a strategy to address a number of these issues. Our strategy is to exclude false positive identifications. In total sample, we identified 926 peptides originated from 37 known (neuro)peptide precursors in mouse striatum. In addition, four SEPs were identified including NoBody, a SEP that was previously discovered in humans and three novel SEPS from 5' untranslated transcript regions (UTRs).

15.
Mol Plant Microbe Interact ; 31(1): 112-124, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-29094648

RESUMO

The salivary protein repertoire released by the herbivorous pest Tetranychus urticae is assumed to hold keys to its success on diverse crops. We report on a spider mite-specific protein family that is expanded in T. urticae. The encoding genes have an expression pattern restricted to the anterior podocephalic glands, while peptide fragments were found in the T. urticae secretome, supporting the salivary nature of these proteins. As peptide fragments were identified in a host-dependent manner, we designated this family as the SHOT (secreted host-responsive protein of Tetranychidae) family. The proteins were divided in three groups based on sequence similarity. Unlike TuSHOT3 genes, TuSHOT1 and TuSHOT2 genes were highly expressed when feeding on a subset of family Fabaceae, while expression was depleted on other hosts. TuSHOT1 and TuSHOT2 expression was induced within 24 h after certain host transfers, pointing toward transcriptional plasticity rather than selection as the cause. Transfer from an 'inducer' to a 'noninducer' plant was associated with slow yet strong downregulation of TuSHOT1 and TuSHOT2, occurring over generations rather than hours. This asymmetric on and off regulation points toward host-specific effects of SHOT proteins, which is further supported by the diversity of SHOT genes identified in Tetranychidae with a distinct host repertoire.


Assuntos
Interações Hospedeiro-Parasita/genética , Família Multigênica , Proteínas e Peptídeos Salivares/genética , Tetranychidae/genética , Transcrição Gênica , Sequência de Aminoácidos , Animais , Regulação da Expressão Gênica de Plantas , Peptídeos/química , Peptídeos/metabolismo , Filogenia , Plantas/genética , Plantas/parasitologia , Proteômica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Saliva/metabolismo , Fatores de Tempo
16.
Mass Spectrom Rev ; 36(5): 584-599, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-26670565

RESUMO

Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:584-599, 2017.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Genômica/métodos , Espectrometria de Massas/métodos , Proteômica/métodos , Animais , Anticorpos/genética , Mapeamento de Peptídeos/métodos , Peptídeos/análise , Peptídeos/genética , Análise de Sequência de Proteína/métodos , Software , Peçonhas/análise
17.
Mol Cell Proteomics ; 15(11): 3361-3372, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27694331

RESUMO

N-terminal acetylation (Nt-acetylation) by N-terminal acetyltransferases (NATs) is one of the most common protein modifications in eukaryotes. The NatC complex represents one of three major NATs of which the substrate profile remains largely unexplored. Here, we defined the in vivo human NatC Nt-acetylome on a proteome-wide scale by combining knockdown of its catalytic subunit Naa30 with positional proteomics. We identified 46 human NatC substrates, expanding our current knowledge on the substrate repertoire of NatC which now includes proteins harboring Met-Leu, Met-Ile, Met-Phe, Met-Trp, Met-Val, Met-Met, Met-His and Met-Lys N termini. Upon Naa30 depletion the expression levels of several organellar proteins were found reduced, in particular mitochondrial proteins, some of which were found to be NatC substrates. Interestingly, knockdown of Naa30 induced the loss of mitochondrial membrane potential and fragmentation of mitochondria. In conclusion, NatC Nt-acetylates a large variety of proteins and is essential for mitochondrial integrity and function.


Assuntos
Proteínas Mitocondriais/metabolismo , Acetiltransferase N-Terminal C/genética , Acetiltransferase N-Terminal C/metabolismo , Proteômica/métodos , Acetilação , Linhagem Celular Tumoral , Técnicas de Silenciamento de Genes , Células HeLa , Humanos , Ligação Proteica , Mapas de Interação de Proteínas , Especificidade por Substrato
18.
Mol Cell Proteomics ; 15(12): 3594-3613, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27703040

RESUMO

The two-spotted spider mite Tetranychus urticae is an extremely polyphagous crop pest. Alongside an unparalleled detoxification potential for plant secondary metabolites, it has recently been shown that spider mites can attenuate or even suppress plant defenses. Salivary constituents, notably effectors, have been proposed to play an important role in manipulating plant defenses and might determine the outcome of plant-mite interactions. Here, the proteomic composition of saliva from T. urticae lines adapted to various host plants-bean, maize, soy, and tomato-was analyzed using a custom-developed feeding assay coupled with nano-LC tandem mass spectrometry. About 90 putative T. urticae salivary proteins were identified. Many are of unknown function, and in numerous cases belonging to multimembered gene families. RNAseq expression analysis revealed that many genes coding for these salivary proteins were highly expressed in the proterosoma, the mite body region that includes the salivary glands. A subset of genes encoding putative salivary proteins was selected for whole-mount in situ hybridization, and were found to be expressed in the anterior and dorsal podocephalic glands. Strikingly, host plant dependent expression was evident for putative salivary proteins, and was further studied in detail by micro-array based genome-wide expression profiling. This meta-analysis revealed for the first time the salivary protein repertoire of a phytophagous chelicerate. The availability of this salivary proteome will assist in unraveling the molecular interface between phytophagous mites and their host plants, and may ultimately facilitate the development of mite-resistant crops. Furthermore, the technique used in this study is a time- and resource-efficient method to examine the salivary protein composition of other small arthropods for which saliva or salivary glands cannot be isolated easily.


Assuntos
Produtos Agrícolas/parasitologia , Proteômica/métodos , Proteínas e Peptídeos Salivares/metabolismo , Tetranychidae/fisiologia , Animais , Proteínas de Artrópodes/metabolismo , Cromatografia Líquida , Produtos Agrícolas/genética , Regulação da Expressão Gênica , Especificidade de Hospedeiro , Interações Hospedeiro-Parasita , Proteínas e Peptídeos Salivares/genética , Análise de Sequência de RNA/métodos , Espectrometria de Massas em Tandem , Tetranychidae/metabolismo , Distribuição Tecidual
19.
Nucleic Acids Res ; 44(D1): D324-9, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26527729

RESUMO

With the advent of ribosome profiling, a next generation sequencing technique providing a "snap-shot'' of translated mRNA in a cell, many short open reading frames (sORFs) with ribosomal activity were identified. Follow-up studies revealed the existence of functional peptides, so-called micropeptides, translated from these 'sORFs', indicating a new class of bio-active peptides. Over the last few years, several micropeptides exhibiting important cellular functions were discovered. However, ribosome occupancy does not necessarily imply an actual function of the translated peptide, leading to the development of various tools assessing the coding potential of sORFs. Here, we introduce sORFs.org (http://www.sorfs.org), a novel database for sORFs identified using ribosome profiling. Starting from ribosome profiling, sORFs.org identifies sORFs, incorporates state-of-the-art tools and metrics and stores results in a public database. Two query interfaces are provided, a default one enabling quick lookup of sORFs and a BioMart interface providing advanced query and export possibilities. At present, sORFs.org harbors 263 354 sORFs that demonstrate ribosome occupancy, originating from three different cell lines: HCT116 (human), E14_mESC (mouse) and S2 (fruit fly). sORFs.org aims to provide an extensive sORFs database accessible to researchers with limited bioinformatics knowledge, thus enabling easy integration into personal projects.


Assuntos
Bases de Dados Genéticas , Fases de Leitura Aberta , Animais , Sequência de Bases , Linhagem Celular , Sequência Conservada , Drosophila melanogaster/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Internet , Espectrometria de Massas , Camundongos , Peptídeos/química , RNA Mensageiro/química , Ribossomos/metabolismo , Análise de Sequência de RNA
20.
J Proteome Res ; 16(7): 2639-2644, 2017 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-28573858

RESUMO

The introduction of new standard formats, proBAM and proBed, improves the integration of genomics and proteomics information, thus aiding proteogenomics applications. These novel formats enable peptide spectrum matches (PSM) to be stored, inspected, and analyzed within the context of the genome. However, an easy-to-use and transparent tool to convert mass spectrometry identification files to these new formats is indispensable. proBAMconvert enables the conversion of common identification file formats (mzIdentML, mzTab, and pepXML) to proBAM/proBed using an intuitive interface. Furthermore, ProBAMconvert enables information to be output both at the PSM and peptide levels and has a command line interface next to the graphical user interface. Detailed documentation and a completely worked-out tutorial is available at http://probam.biobix.be .


Assuntos
Biologia Computacional/métodos , Genoma , Peptídeos/análise , Proteogenômica/estatística & dados numéricos , Interface Usuário-Computador , Algoritmos , Animais , Mapeamento Cromossômico/estatística & dados numéricos , Humanos , Armazenamento e Recuperação da Informação , Proteogenômica/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA