Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 48(D1): D328-D334, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31724716

ABSTRACT

The neXtProt knowledgebase (https://www.nextprot.org) is an integrative resource providing both data on human protein and the tools to explore these. In order to provide comprehensive and up-to-date data, we evaluate and add new data sets. We describe the incorporation of three new data sets that provide expression, function, protein-protein binary interaction, post-translational modifications (PTM) and variant information. New SPARQL query examples illustrating uses of the new data were added. neXtProt has continued to develop tools for proteomics. We have improved the peptide uniqueness checker and have implemented a new protein digestion tool. Together, these tools make it possible to determine which proteases can be used to identify trypsin-resistant proteins by mass spectrometry. In terms of usability, we have finished revamping our web interface and completely rewritten our API. Our SPARQL endpoint now supports federated queries. All the neXtProt data are available via our user interface, API, SPARQL endpoint and FTP site, including the new PEFF 1.0 format files. Finally, the data on our FTP site is now CC BY 4.0 to promote its reuse.


Subject(s)
Databases, Protein , Knowledge Bases , Humans , Internet , Mass Spectrometry , Peptides/chemistry , Protein Kinases/chemistry , Protein Kinases/metabolism , Protein Processing, Post-Translational , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Sequence Analysis, RNA , Software , Trypsin , User-Computer Interface
2.
Rapid Commun Mass Spectrom ; 31(9): 753-761, 2017 May 15.
Article in English | MEDLINE | ID: mdl-28199054

ABSTRACT

RATIONALE: In peptide quantification by liquid chromatography/mass spectrometry (LC/MS), the optimization of multiple reaction monitoring (MRM) parameters is essential for sensitive detection. We have compared different approaches to build MRM assays, based either on flow injection analysis (FIA) of isotopically labelled peptides, or on the knowledge and the prediction of the best settings for MRM transitions and collision energies (CE). In this context, we introduce MRMOptimizer, an open-source software tool that processes spectra and assists the user in selecting transitions in the FIA workflow. METHODS: MS/MS spectral libraries with CE voltages from 10 to 70 V are automatically acquired in FIA mode for isotopically labelled peptides. Then MRMOptimizer determines the optimal MRM settings for each peptide. To assess the quantitative performance of our approach, 155 peptides, representing 84 proteins, were analysed by LC/MRM-MS and the peak areas were compared between: (A) the MRMOptimizer-based workflow, (B1) the SRMAtlas transitions set used 'as-is'; (B2) the same SRMAtlas set with CE parameters optimized by Skyline. RESULTS: 51% of the three most intense transitions per peptide were shown to be common to both A and B1/B2 methods, and displayed similar sensitivity and peak area distributions. The peak areas obtained with MRMOptimizer for transitions sharing either the precursor ion charge state or the fragment ions with the SRMAtlas set at unique transitions were increased 1.8- to 2.3-fold. The gain in sensitivity using MRMOptimizer for transitions with different precursor ion charge state and fragment ions (8% of the total), reaches a ~ 11-fold increase. CONCLUSIONS: Isotopically labelled peptides can be used to optimize MRM transitions more efficiently in FIA than by searching databases. The MRMOptimizer software is MS independent and enables the post-acquisition selection of MRM parameters. Coefficients of variation for optimal CE values are lower than those obtained with the SRMAtlas approach (B2) and one additional peptide was detected. Copyright © 2017 John Wiley & Sons, Ltd.


Subject(s)
Chromatography, Liquid/methods , Peptide Fragments/analysis , Tandem Mass Spectrometry/methods , Cells, Cultured , Databases, Factual , Dendritic Cells/chemistry , Humans , Ions/analysis , Ions/chemistry , Linear Models , Peptide Fragments/chemistry , Reproducibility of Results , Sensitivity and Specificity , Trypsin
3.
Nucleic Acids Res ; 45(D1): D177-D182, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899619

ABSTRACT

The neXtProt human protein knowledgebase (https://www.nextprot.org) continues to add new content and tools, with a focus on proteomics and genetic variation data. neXtProt now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community.Moreover, the neXtProt release 2016-08-25 includes over 8000 phenotypic observations for over 4000 variations in a number of genes involved in hereditary cancers and channelopathies. These changes are presented in the current neXtProt update. All of the neXtProt data are available via our user interface and FTP site. We also provide an API access and a SPARQL endpoint for more technical applications.


Subject(s)
Databases, Protein , Proteomics , Genetic Association Studies , Genetic Variation , Humans , Internet , Phenotype , Proteomics/methods , Software , Web Browser
4.
J Proteomics ; 129: 63-70, 2015 Nov 03.
Article in English | MEDLINE | ID: mdl-26141507

ABSTRACT

Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics.


Subject(s)
Databases, Protein , Information Storage and Retrieval/methods , Mass Spectrometry/methods , Programming Languages , Proteins/chemistry , User-Computer Interface , Amino Acid Sequence , Database Management Systems , Molecular Sequence Data , Peptide Mapping/methods , Sequence Analysis, Protein/methods
5.
Proteomics ; 15(15): 2568-79, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25825003

ABSTRACT

Formalin-fixed paraffin-embedded (FFPE) tissue is considered as an appropriate alternative to frozen/fresh tissue for proteomic analysis. Here we study formalin-induced alternations on a proteome-wide level. We compared LC-MS/MS data of FFPE and frozen human kidney tissues by two methods. First, clustering analysis revealed that the biological variation is higher than the variation introduced by the two sample processing techniques and clusters formed in accordance with the biological tissue origin and not with the sample preservation method. Second, we combined open modification search and spectral counting to find modifications that are more abundant in FFPE samples compared to frozen samples. This analysis revealed lysine methylation (+14 Da) as the most frequent modification induced by FFPE preservation. We also detected a slight increase in methylene (+12 Da) and methylol (+30 Da) adducts as well as a putative modification of +58 Da, but they contribute less to the overall modification count. Subsequent SEQUEST analysis and X!Tandem searches of different datasets confirmed these trends. However, the modifications due to FFPE sample processing are a minor disturbance affecting 2-6% of all peptide-spectrum matches and the peptides lists identified in FFPE and frozen tissues are still highly similar.


Subject(s)
Kidney/metabolism , Lysine/metabolism , Paraffin Embedding/methods , Proteome/metabolism , Proteomics/methods , Tissue Fixation/methods , Amino Acid Sequence , Chromatography, Liquid , Cluster Analysis , Fixatives/chemistry , Formaldehyde/chemistry , Frozen Sections/methods , Humans , Methylation , Proteome/classification , Reproducibility of Results , Tandem Mass Spectrometry
6.
J Am Soc Mass Spectrom ; 24(12): 1862-71, 2013 Dec.
Article in English | MEDLINE | ID: mdl-24006250

ABSTRACT

Data-independent mass spectrometry activates all ion species isolated within a given mass-to-charge window (m/z) regardless of their abundance. This acquisition strategy overcomes the traditional data-dependent ion selection boosting data reproducibility and sensitivity. However, several tandem mass (MS/MS) spectra of the same precursor ion are acquired during chromatographic elution resulting in large data redundancy. Also, the significant number of chimeric spectra and the absence of accurate precursor ion masses hamper peptide identification. Here, we describe an algorithm to preprocess data-independent MS/MS spectra by filtering out noise peaks and clustering the spectra according to both the chromatographic elution profiles and the spectral similarity. In addition, we developed an approach to estimate the m/z value of precursor ions from clustered MS/MS spectra in order to improve database search performance. Data acquired using a small 3 m/z units precursor mass window and multiple injections to cover a m/z range of 400-1400 was processed with our algorithm. It showed an improvement in the number of both peptide and protein identifications by 8% while reducing the number of submitted spectra by 18% and the number of peaks by 55%. We conclude that our clustering method is a valid approach for data analysis of these data-independent fragmentation spectra. The software including the source code is available for the scientific community.


Subject(s)
Proteins/chemistry , Proteomics/methods , Tandem Mass Spectrometry/methods , Algorithms , Cell Line , Cluster Analysis , Humans , Software
7.
J Proteomics ; 79: 146-60, 2013 Feb 21.
Article in English | MEDLINE | ID: mdl-23277275

ABSTRACT

High throughput protein identification and quantification analysis based on mass spectrometry are fundamental steps in most proteomics projects. Here, we present EasyProt (available at http://easyprot.unige.ch), a new platform for mass spectrometry data processing, protein identification, quantification and unexpected post-translational modification characterization. EasyProt provides a fully integrated graphical experience to perform a large part of the proteomic data analysis workflow. Our goal was to develop a software platform that would fulfill the needs of scientists in the field, while emphasizing ease-of-use for non-bioinformatician users. Protein identification is based on OLAV scoring schemes and protein quantification is implemented for both, isobaric labeling and label-free methods. Additional features are available, such as peak list processing, isotopic correction, spectra filtering, charge-state deconvolution and spectra merging. To illustrate the EasyProt platform, we present two identification and quantification workflows based on isobaric tagging and label-free methods.


Subject(s)
Proteomics/methods , Sequence Analysis, Protein/methods , Software , Mass Spectrometry/methods , Protein Processing, Post-Translational , Proteins/analysis
8.
Proteomics ; 11(20): 4085-95, 2011 Oct.
Article in English | MEDLINE | ID: mdl-21898822

ABSTRACT

The relevance of libraries of annotated MS/MS spectra is growing with the amount of proteomic data generated in high-throughput experiments. These reference libraries provide a fast and accurate way to identify newly acquired MS/MS spectra. In the context of multiple hypotheses testing, the control of the number of false-positive identifications expected in the final result list by means of the calculation of the false discovery rate (FDR). In a classical sequence search where experimental MS/MS spectra are compared with the theoretical peptide spectra calculated from a sequence database, the FDR is estimated by searching randomized or decoy sequence databases. Despite on-going discussion on how exactly the FDR has to be calculated, this method is widely accepted in the proteomic community. Recently, similar approaches to control the FDR of spectrum library searches were discussed. We present in this paper a detailed analysis of the similarity between spectra of distinct peptides to set the basis of our own solution for decoy library creation (DeLiberator). It differs from the previously published results in some key points, mainly in implementing new methods that prevent decoy spectra from being too similar to the original library spectra while keeping important features of real MS/MS spectra. Using different proteomic data sets and library creation methods, we evaluate our approach and compare it with alternative methods.


Subject(s)
Algorithms , Peptides/chemistry , Proteomics/methods , Software , Tandem Mass Spectrometry , Animals , Databases, Protein , Genetic Association Studies , Humans
9.
J Proteome Res ; 10(7): 2913-21, 2011 Jul 01.
Article in English | MEDLINE | ID: mdl-21500769

ABSTRACT

MS2 library spectra are rich in reproducible information about peptide fragmentation patterns compared to theoretical spectra modeled by a sequence search tool. So far, spectrum library searches are mostly applied to detect peptides as they are present in the library. However, they also allow finding modified variants of the library peptides if the search is done with a large precursor mass window and an adapted Spectrum-Spectrum Match (SSM) scoring algorithm. We perform a thorough evaluation on the use of library spectra as opposed to theoretical peptide spectra for the identification of PTMs, analyzing spectra of a well-annotated modification-rich test data set compiled from public data repositories. These initial studies motivate the development of our modification tolerant spectrum library search tool QuickMod, designed to identify modified variants of the peptides listed in the spectrum library without any prior input from the user estimating the modifications present in the sample. We built the search algorithm of QuickMod after carefully testing different SSM similarity scores. The final spectrum scoring scheme uses a support vector machine (SVM) on a selection of scoring features to classify correct and incorrect SSM. After identification of a list of modified peptides at a given False Discovery Rate (FDR), the modifications need to be positioned on the peptide sequence. We present a rapid modification site assignment algorithm and evaluate its positioning accuracy. Finally, we demonstrate that QuickMod performs favorably in terms of speed and identification rate when compared to other software solutions for PTM analysis.


Subject(s)
Algorithms , Peptide Fragments/analysis , Peptide Library , Proteomics/methods , Acetylation , Databases, Protein , Humans , Mass Spectrometry , Oxidation-Reduction , Peptide Fragments/blood , Phosphorylation , Protein Processing, Post-Translational , Research Design , Sequence Analysis, Protein , Software
10.
Genome Inform ; 15(2): 266-75, 2004.
Article in English | MEDLINE | ID: mdl-15706512

ABSTRACT

We have studied the projection of protein family data onto single bacterial translated genome as a solution to visualise relationships between families restricted to bacterial sequences. Any member of any type of family as defined in the Pfam database (domains, signatures, etc.) is considered as a protein module. Our first goal is to discover rules correlating the occurrence of modules with biochemical properties. To achieve this goal we have developed a platform to quantify information found in protein databases and to support the analysis of the nature of modules, their position and corresponding frequencies of occurrence (in isolation or in combination) in association with pathway knowledge as found in KEGG. This paper focuses on two pathways: the two-component system and the aminophosphonate metabolism, that are partially but not completely documented. Proteins involved in those pathways were listed separately in each organism to analyse module composition and rules constraining pathway interactions were identified. It is shown how these results can be used to update KEGG pathways and orthologue tables.


Subject(s)
Databases, Genetic , Databases, Protein , Genome , Proteins , Animals , Computational Biology , Computer Graphics , Gene Expression Profiling , Humans , Information Storage and Retrieval , Multigene Family , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Sequence Homology
11.
Comput Biol Chem ; 27(4-5): 481-95, 2003 Oct.
Article in English | MEDLINE | ID: mdl-14642756

ABSTRACT

Protein-related information is more accumulated rather than reduced to a synthetic view. Itemising properties of protein sequences is informative, so is the list of ingredients to do some cooking, but without a recipe, that is, quantification and chronology, understanding is incomplete. If the goal of accumulating information is to discover or reveal the function and related biochemical mechanisms, information has to be weighed and ordered. As a guideline, the weight of a piece of information should reflect how often it consistently occurs in various contexts. We propose a common sense approach to quantify and put data and information into perspective. Complete bacterial proteomes are individually mapped with the Pfam-A database of domains and protein family signatures in an attempt to assess the modularity of proteins at the level of a single proteome and the implications of a modular description of proteins for a functional interpretation. Poorly annotated proteins in the most documented bacteria (E. coli and B. subtilis) were considered in an attempt to formulate hypothesis on the basis of domain/module content.


Subject(s)
Bacterial Proteins/chemistry , Databases, Protein , Proteome/chemistry , Bacillus subtilis/genetics , Bacterial Proteins/classification , Bacterial Proteins/genetics , Escherichia coli/genetics , Genome, Bacterial , Proteome/classification , Proteome/genetics , Sequence Analysis, Protein , Sequence Homology, Amino Acid
12.
Comput Biol Chem ; 27(1): 29-35, 2003 Feb.
Article in English | MEDLINE | ID: mdl-12798037

ABSTRACT

Proteomics enforces the reverse chronological order on the gene to protein dogma and imposes amino acid sequences as a starting point of an investigation relative to function. By this approach, proteomics data can confirm the presence of multiple forms of a protein. Notwithstanding variations attributed specific individual features of organisms and tissues, from two to over ten protein forms can be identified in a given sample. The present work describes some guidelines for tracking the origin of alternative protein forms and attempts to tag the details of sequence data in the literature. Working via these guidelines we have uncovered a third alternative form of the Pim subfamily of oncogenes. The term form is here combined with the qualification alternative to describe any product of a given gene including closely related paralogs. This paper also emphasizes the need for consistency checks in annotation processes, such as gene clustering, to avoid losing important details describing protein alternative forms. By identifying alternative protein forms, we illustrate the fact that rationalizing of protein function via the identification of protein-protein interactions should in reality be that of identifying (alternative) form-form interactions.


Subject(s)
Proteomics/standards , Proto-Oncogene Proteins/genetics , Amino Acid Sequence/genetics , Animals , Computational Biology/methods , Computational Biology/standards , DNA, Complementary/classification , DNA, Complementary/genetics , Databases, Protein/statistics & numerical data , Expressed Sequence Tags , Genetic Variation , Humans , Molecular Sequence Data , Multigene Family/genetics , Protein Serine-Threonine Kinases/chemistry , Protein Serine-Threonine Kinases/classification , Protein Serine-Threonine Kinases/genetics , Proteomics/methods , Proto-Oncogene Proteins/chemistry , Proto-Oncogene Proteins/classification , Proto-Oncogene Proteins c-pim-1 , Quality Control , Sequence Alignment , Sequence Homology, Amino Acid , Swine
SELECTION OF CITATIONS
SEARCH DETAIL
...