Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
Add more filters










Publication year range
1.
J Proteome Res ; 23(1): 418-429, 2024 01 05.
Article in English | MEDLINE | ID: mdl-38038272

ABSTRACT

The inherent diversity of approaches in proteomics research has led to a wide range of software solutions for data analysis. These software solutions encompass multiple tools, each employing different algorithms for various tasks such as peptide-spectrum matching, protein inference, quantification, statistical analysis, and visualization. To enable an unbiased comparison of commonly used bottom-up label-free proteomics workflows, we introduce WOMBAT-P, a versatile platform designed for automated benchmarking and comparison. WOMBAT-P simplifies the processing of public data by utilizing the sample and data relationship format for proteomics (SDRF-Proteomics) as input. This feature streamlines the analysis of annotated local or public ProteomeXchange data sets, promoting efficient comparisons among diverse outputs. Through an evaluation using experimental ground truth data and a realistic biological data set, we uncover significant disparities and a limited overlap in the quantified proteins. WOMBAT-P not only enables rapid execution and seamless comparison of workflows but also provides valuable insights into the capabilities of different software solutions. These benchmarking metrics are a valuable resource for researchers in selecting the most suitable workflow for their specific data sets. The modular architecture of WOMBAT-P promotes extensibility and customization. The software is available at https://github.com/wombat-p/WOMBAT-Pipelines.


Subject(s)
Benchmarking , Proteomics , Workflow , Software , Proteins , Data Analysis
2.
J Proteome Res ; 22(10): 3190-3199, 2023 Oct 06.
Article in English | MEDLINE | ID: mdl-37656829

ABSTRACT

Precision medicine focuses on adapting care to the individual profile of patients, for example, accounting for their unique genetic makeup. Being able to account for the effect of genetic variation on the proteome holds great promise toward this goal. However, identifying the protein products of genetic variation using mass spectrometry has proven very challenging. Here we show that the identification of variant peptides can be improved by the integration of retention time and fragmentation predictors into a unified proteogenomic pipeline. By combining these intrinsic peptide characteristics using the search-engine post-processor Percolator, we demonstrate improved discrimination power between correct and incorrect peptide-spectrum matches. Our results demonstrate that the drop in performance that is induced when expanding a protein sequence database can be compensated, hence enabling efficient identification of genetic variation products in proteomics data. We anticipate that this enhancement of proteogenomic pipelines can provide a more refined picture of the unique proteome of patients and thereby contribute to improving patient care.

3.
Methods Mol Biol ; 2426: 67-89, 2023.
Article in English | MEDLINE | ID: mdl-36308685

ABSTRACT

In the proteomics field, the production and publication of reliable mass spectrometry (MS)-based label-free quantitative results is a major concern. Due to the intrinsic complexity of bottom-up proteomics experiments (requiring aggregation of data relating to both precursor and fragment peptide ions into protein information, and matching this data across samples), inaccuracies and errors can occur throughout the data-processing pipeline. In a classical label-free quantification workflow, the validation of identification results is critical since errors made at this first stage of the workflow may have an impact on the following steps and therefore on the final result. Although false discovery rate (FDR) of the identification is usually controlled by using the popular target-decoy method, it has been demonstrated that this method can sometimes lead to inaccurate FDR estimates. This protocol shows how Proline can be used to validate identification results by using the method based on the Benjamini-Hochberg procedure and then quantify the identified ions and proteins in a single software environment providing data curation capabilities and computational efficiency.


Subject(s)
Proline , Tandem Mass Spectrometry , Tandem Mass Spectrometry/methods , Proteomics/methods , Software , Proteins/chemistry , Databases, Protein
4.
Sci Data ; 9(1): 126, 2022 03 30.
Article in English | MEDLINE | ID: mdl-35354825

ABSTRACT

In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of a single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).


Subject(s)
Benchmarking , Proteomics , Animals , Chromatography, Liquid/methods , Humans , Mass Spectrometry/methods , Proteome
5.
Nat Commun ; 12(1): 5854, 2021 10 06.
Article in English | MEDLINE | ID: mdl-34615866

ABSTRACT

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Subject(s)
Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , Transcriptome
6.
Rapid Commun Mass Spectrom ; : e9087, 2021 Apr 16.
Article in English | MEDLINE | ID: mdl-33861485

ABSTRACT

The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS; eubic-ms.org) was founded in 2014 to unite European computational mass spectrometry researchers and proteomics bioinformaticians working in academia and industry. EuBIC-MS maintains educational resources (proteomics-academy.org) and organises workshops at national and international conferences on proteomics and mass spectrometry. Furthermore, EuBIC-MS is actively involved in several community initiatives such as the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI). Apart from these collaborations, EuBIC-MS has organised two Winter Schools and two Developers' Meetings that have contributed to the strengthening of the European mass spectrometry network and fostered international collaboration in this field, even beyond Europe. Moreover, EuBIC-MS is currently actively developing a community-driven standard dedicated to mass spectrometry data annotation (SDRF-Proteomics) that will facilitate data reuse and collaboration. This manuscript highlights what EuBIC-MS is, what it does, and what it already has achieved. A warm invitation is extended to new researchers at all career stages to join the EuBIC-MS community on its Slack channel (eubic.slack.com).

7.
Commun Biol ; 4(1): 269, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33649389

ABSTRACT

The success of cancer immunotherapy relies on the induction of an immunoprotective response targeting tumor antigens (TAs) presented on MHC-I molecules. We demonstrated that the splicing inhibitor isoginkgetin and its water-soluble and non-toxic derivative IP2 act at the production stage of the pioneer translation products (PTPs). We showed that IP2 increases PTP-derived antigen presentation in cancer cells in vitro and impairs tumor growth in vivo. IP2 action is long-lasting and dependent on the CD8+ T cell response against TAs. We observed that the antigen repertoire displayed on MHC-I molecules at the surface of MCA205 fibrosarcoma is modified upon treatment with IP2. In particular, IP2 enhances the presentation of an exon-derived epitope from the tumor suppressor nischarin. The combination of IP2 with a peptide vaccine targeting the nischarin-derived epitope showed a synergistic antitumor effect in vivo. These findings identify the spliceosome as a druggable target for the development of epitope-based immunotherapies.


Subject(s)
Adaptive Immunity/drug effects , Antigens, Neoplasm/metabolism , Antineoplastic Agents, Phytogenic/pharmacology , Biflavonoids/pharmacology , Cancer Vaccines/pharmacology , Fibrosarcoma/drug therapy , Lymphocytes, Tumor-Infiltrating/drug effects , T-Lymphocytes/drug effects , Animals , Cell Line, Tumor , Cell Proliferation/drug effects , Female , Fibrosarcoma/immunology , Fibrosarcoma/metabolism , Fibrosarcoma/pathology , Histocompatibility Antigens Class I/immunology , Histocompatibility Antigens Class I/metabolism , Imidazoline Receptors/immunology , Imidazoline Receptors/metabolism , Lymphocyte Activation/drug effects , Lymphocytes, Tumor-Infiltrating/immunology , Lymphocytes, Tumor-Infiltrating/metabolism , Mice , Mice, Inbred C57BL , T-Lymphocytes/immunology , T-Lymphocytes/metabolism , Tumor Burden/drug effects , Tumor Microenvironment
8.
Molecules ; 25(10)2020 May 18.
Article in English | MEDLINE | ID: mdl-32443484

ABSTRACT

To date, Mycobacterium tuberculosis (Mtb) remains the world's greatest infectious killer. The rise of multidrug-resistant strains stresses the need to identify new therapeutic targets to fight the epidemic. We previously demonstrated that bacterial protein-O-mannosylation is crucial for Mtb infectiousness, renewing the interest of the bacterial-secreted mannoproteins as potential drug-targetable virulence factors. The difficulty of inventorying the mannoprotein repertoire expressed by Mtb led us to design a stringent multi-step workflow for the reliable identification of glycosylated peptides by large-scale mass spectrometry-based proteomics. Applied to the differential analyses of glycoproteins secreted by the wild-type Mtb strain-and by its derived mutant invalidated for the protein-O-mannosylating enzyme PMTub-this approach led to the identification of not only most already known mannoproteins, but also of yet-unknown mannosylated proteins. In addition, analysis of the glycoproteome expressed by the isogenic recombinant Mtb strain overexpressing the PMTub gene revealed an unexpected mannosylation of proteins, with predicted or demonstrated functions in Mtb growth and interaction with the host cell. Since in parallel, a transient increased expression of the PMTub gene has been observed in the wild-type bacilli when infecting macrophages, our results strongly suggest that the Mtb mannoproteome may undergo adaptive regulation during infection of the host cells. Overall, our results provide deeper insights into the complexity of the repertoire of mannosylated proteins expressed by Mtb, and open the way to novel opportunities to search for still-unexploited potential therapeutic targets.


Subject(s)
Glycoproteins/genetics , Membrane Glycoproteins/genetics , Mycobacterium tuberculosis/genetics , Tuberculosis/genetics , Humans , Macrophages/metabolism , Macrophages/pathology , Mass Spectrometry , Mycobacterium tuberculosis/pathogenicity , Proteomics/methods , Tuberculosis/microbiology , Tuberculosis/pathology , Virulence/genetics , Virulence Factors/genetics
9.
Molecules ; 25(5)2020 Mar 03.
Article in English | MEDLINE | ID: mdl-32138239

ABSTRACT

Assembly of eukaryotic ribosomal subunits is a very complex and sequential process that starts in the nucleolus and finishes in the cytoplasm with the formation of functional ribosomes. Over the past few years, characterization of the many molecular events underlying eukaryotic ribosome biogenesis has been drastically improved by the "resolution revolution" of cryo-electron microscopy (cryo-EM). However, if very early maturation events have been well characterized for both yeast ribosomal subunits, little is known regarding the final maturation steps occurring to the small (40S) ribosomal subunit. To try to bridge this gap, we have used proteomics together with cryo-EM and single particle analysis to characterize yeast pre-40S particles containing the ribosome biogenesis factor Tsr1. Our analyses lead us to refine the timing of the early pre-40S particle maturation steps. Furthermore, we suggest that after an early and structurally stable stage, the beak and platform domains of pre-40S particles enter a "vibrating" or "wriggling" stage, that might be involved in the final maturation of 18S rRNA as well as the fitting of late ribosomal proteins into their mature position.


Subject(s)
Proteomics/methods , Ribosomes/metabolism , Ribosomes/ultrastructure , Computational Biology , Cryoelectron Microscopy/methods , RNA, Ribosomal, 18S/metabolism , Ribosome Subunits, Small/metabolism , Ribosome Subunits, Small/ultrastructure , Tandem Mass Spectrometry
10.
Bioinformatics ; 36(10): 3148-3155, 2020 05 01.
Article in English | MEDLINE | ID: mdl-32096818

ABSTRACT

MOTIVATION: The proteomics field requires the production and publication of reliable mass spectrometry-based identification and quantification results. Although many tools or algorithms exist, very few consider the importance of combining, in a unique software environment, efficient processing algorithms and a data management system to process and curate hundreds of datasets associated with a single proteomics study. RESULTS: Here, we present Proline, a robust software suite for analysis of MS-based proteomics data, which collects, processes and allows visualization and publication of proteomics datasets. We illustrate its ease of use for various steps in the validation and quantification workflow, its data curation capabilities and its computational efficiency. The DDA label-free quantification workflow efficiency was assessed by comparing results obtained with Proline to those obtained with a widely used software using a spiked-in sample. This assessment demonstrated Proline's ability to provide high quantification accuracy in a user-friendly interface for datasets of any size. AVAILABILITY AND IMPLEMENTATION: Proline is available for Windows and Linux under CECILL open-source license. It can be deployed in client-server mode or in standalone mode at http://proline.profiproteomics.fr/#downloads. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proline , Proteomics , Algorithms , Mass Spectrometry , Software
11.
J Proteome Res ; 19(3): 1338-1345, 2020 03 06.
Article in English | MEDLINE | ID: mdl-31975593

ABSTRACT

Phosphorylation-driven cell signaling governs most biological functions and is widely studied using mass-spectrometry-based phosphoproteomics. Identifying the peptides and localizing the phosphorylation sites within them from the raw data is challenging and can be performed by several algorithms that return scores that are not directly comparable. This increases the heterogeneity among published phosphoproteomics data sets and prevents their direct integration. Here we compare 22 pipelines implemented in the main software tools used for bottom-up phosphoproteomics analysis (MaxQuant, Proteome Discoverer, PeptideShaker). We test six search engines (Andromeda, Comet, Mascot, MS Amanda, SequestHT, and X!Tandem) in combination with several localization scoring algorithms (delta score, D-score, PTM-score, phosphoRS, and Ascore). We show that these follow very different score distributions, which can lead to different false localization rates for the same threshold. We provide a strategy to discriminate correctly from incorrectly localized phosphorylation sites in a consistent manner across the tested pipelines. The results presented here can help users choose the most appropriate pipeline and cutoffs for their phosphoproteomics analysis.


Subject(s)
Peptides , Proteomics , Algorithms , Mass Spectrometry , Phosphorylation , Software
12.
Bioinformatics ; 35(24): 5331-5333, 2019 12 15.
Article in English | MEDLINE | ID: mdl-31287496

ABSTRACT

SUMMARY: With the advent of fully automated sample preparation robots for Hydrogen-Deuterium eXchange coupled to Mass Spectrometry (HDX-MS), this method has become paramount for ligand binding or epitope mapping screening, both in academic research and biopharmaceutical industries. However, bridging the gap between commercial HDX-MS software (for raw data interpretation) and molecular viewers (to map experiment results onto a 3D structure for biological interpretation) remains laborious and requires simple but sometimes limiting coding skills. We solved this bottleneck by developing HDX-Viewer, an open-source web-based application that facilitates and quickens HDX-MS data analysis. This user-friendly application automatically incorporates HDX-MS data from a custom template or commercial HDX-MS software in PDB files, and uploads them to an online 3D molecular viewer, thereby facilitating their visualization and biological interpretation. AVAILABILITY AND IMPLEMENTATION: The HDX-Viewer web application is released under the CeCILL (http://www.cecill.info) and GNU LGPL licenses and can be found at https://masstools.ipbs.fr/hdx-viewer. The source code is available at https://github.com/david-bouyssie/hdx-viewer.


Subject(s)
Deuterium Exchange Measurement , Deuterium , Hydrogen , Imaging, Three-Dimensional , Proteins
13.
EuPA Open Proteom ; 22-23: 4-7, 2019 Mar.
Article in English | MEDLINE | ID: mdl-31890545

ABSTRACT

The 2019 European Bioinformatics Community (EuBIC) Winter School was held from January 15th to January 18th 2019 in Zakopane, Poland. This year's meeting was the third of its kind and gathered international researchers in the field of (computational) proteomics to discuss (mainly) challenges in proteomics quantification and data independent acquisition (DIA). Here, we present an overview of the scientific program of the 2019 EuBIC Winter School. Furthermore, we can already give a small outlook to the upcoming EuBIC 2020 Developer's Meeting.

14.
J Proteomics ; 187: 25-27, 2018 09 15.
Article in English | MEDLINE | ID: mdl-29864591

ABSTRACT

The inaugural European Bioinformatics Community (EuBIC) developer's meeting was held from January 9th to January 12th 2018 in Ghent, Belgium. While the meeting kicked off with an interactive keynote session featuring four internationally renowned experts in the field of computational proteomics, its primary focus were the hands-on hackathon sessions which featured six community-proposed projects revolving around three major topics: Here, we present an overview of the scientific program of the EuBIC developer's meeting and provide a starting point for follow-up on the covered projects.


Subject(s)
Computational Biology , Congresses as Topic , Proteomics , Algorithms , Community Networks , Computational Biology/methods , Computational Biology/organization & administration , Computational Biology/trends , Europe , Humans , Proteomics/methods , Proteomics/organization & administration , Proteomics/standards , Proteomics/trends , Quality Control , Workflow
15.
F1000Res ; 62017.
Article in English | MEDLINE | ID: mdl-28713550

ABSTRACT

Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on 'The Future of Proteomics in ELIXIR' that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR's existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper.

16.
J Proteomics ; 161: 78-80, 2017 05 24.
Article in English | MEDLINE | ID: mdl-28385664

ABSTRACT

The 2017 EuBIC Winter School was held from January 10th to January 13th 2017 in Semmering, Austria. This meeting gathered international researchers in the fields of bioinformatics and proteomics to discuss current challenges in data analysis and biological interpretation. This article outlines the scientific program and exchanges that took place on this occasion and presents the current challenges of this ever-growing field. BIOLOGICAL SIGNIFICANCE: The EUPA bioinformatics community (EuBIC) organized its first winter school in January 2017. This successful event illustrates the growing need of the bioinformatics community in proteomics to gather and discuss current and future challenges in the field. In addition to the organization of yearly meetings, the young and active EuBIC community aims to develop new collaborative open source projects, spread bioinformatics knowledge in Europe, and actively promote data sharing through public repositories.


Subject(s)
Computational Biology , Congresses as Topic , Proteomics , Austria , Computational Biology/education , Computational Biology/methods , Computational Biology/trends , Congresses as Topic/organization & administration , Europe , Proteomics/education , Proteomics/methods , Proteomics/trends , Societies, Scientific
17.
Data Brief ; 6: 286-94, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26862574

ABSTRACT

This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.

18.
J Proteomics ; 132: 51-62, 2016 Jan 30.
Article in English | MEDLINE | ID: mdl-26585461

ABSTRACT

Proteomic workflows based on nanoLC-MS/MS data-dependent-acquisition analysis have progressed tremendously in recent years. High-resolution and fast sequencing instruments have enabled the use of label-free quantitative methods, based either on spectral counting or on MS signal analysis, which appear as an attractive way to analyze differential protein expression in complex biological samples. However, the computational processing of the data for label-free quantification still remains a challenge. Here, we used a proteomic standard composed of an equimolar mixture of 48 human proteins (Sigma UPS1) spiked at different concentrations into a background of yeast cell lysate to benchmark several label-free quantitative workflows, involving different software packages developed in recent years. This experimental design allowed to finely assess their performances in terms of sensitivity and false discovery rate, by measuring the number of true and false-positive (respectively UPS1 or yeast background proteins found as differential). The spiked standard dataset has been deposited to the ProteomeXchange repository with the identifier PXD001819 and can be used to benchmark other label-free workflows, adjust software parameter settings, improve algorithms for extraction of the quantitative metrics from raw MS data, or evaluate downstream statistical methods. BIOLOGICAL SIGNIFICANCE: Bioinformatic pipelines for label-free quantitative analysis must be objectively evaluated in their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. This can be done through the use of complex spiked samples, for which the "ground truth" of variant proteins is known, allowing a statistical evaluation of the performances of the data processing workflow. We provide here such a controlled standard dataset and used it to evaluate the performances of several label-free bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, for detection of variant proteins with different absolute expression levels and fold change values. The dataset presented here can be useful for tuning software tool parameters, and also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods.


Subject(s)
Benchmarking/standards , Chromatography, Liquid/standards , Mass Spectrometry/standards , Proteome/analysis , Proteome/standards , Workflow , Benchmarking/methods , Reproducibility of Results , Sensitivity and Specificity , Software , Software Validation , Staining and Labeling
19.
J Proteome Res ; 14(9): 3621-34, 2015 Sep 04.
Article in English | MEDLINE | ID: mdl-26132440

ABSTRACT

In the framework of the C-HPP, our Franco-Swiss consortium has adopted chromosomes 2 and 14, coding for a total of 382 missing proteins (proteins for which evidence is lacking at protein level). Over the last 4 years, the French proteomics infrastructure has collected high-quality data sets from 40 human samples, including a series of rarely studied cell lines, tissue types, and sample preparations. Here we described a step-by-step strategy based on the use of bioinformatics screening and subsequent mass spectrometry (MS)-based validation to identify what were up to now missing proteins in these data sets. Screening database search results (85,326 dat files) identified 58 of the missing proteins (36 on chromosome 2 and 22 on chromosome 14) by 83 unique peptides following the latest release of neXtProt (2014-09-19). PSMs corresponding to these peptides were thoroughly examined by applying two different MS-based criteria: peptide-level false discovery rate calculation and expert PSM quality assessment. Synthetic peptides were then produced and used to generate reference MS/MS spectra. A spectral similarity score was then calculated for each pair of reference-endogenous spectra and used as a third criterion for missing protein validation. Finally, LC-SRM assays were developed to target proteotypic peptides from four of the missing proteins detected in tissue/cell samples, which were still available and for which sample preparation could be reproduced. These LC-SRM assays unambiguously detected the endogenous unique peptide for three of the proteins. For two of these, identification was confirmed by additional proteotypic peptides. We concluded that of the initial set of 58 proteins detected by the bioinformatics screen, the consecutive MS-based validation criteria led to propose the identification of 13 of these proteins (8 on chromosome 2 and 5 on chromosome 14) that passed at least two of the three MS-based criteria. Thus, a rigorous step-by-step approach combining bioinformatics screening and MS-based validation assays is particularly suitable to obtain protein-level evidence for proteins previously considered as missing. All MS/MS data have been deposited in ProteomeXchange under identifier PXD002131.


Subject(s)
Chromosomes, Human, Pair 14 , Chromosomes, Human, Pair 2 , Proteins/genetics , Tandem Mass Spectrometry/methods , Amino Acid Sequence , Chromatography, Liquid , Humans , Molecular Sequence Data , Proteins/chemistry
20.
Mol Cell Proteomics ; 14(3): 771-81, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25505153

ABSTRACT

The analysis and management of MS data, especially those generated by data independent MS acquisition, exemplified by SWATH-MS, pose significant challenges for proteomics bioinformatics. The large size and vast amount of information inherent to these data sets need to be properly structured to enable an efficient and straightforward extraction of the signals used to identify specific target peptides. Standard XML based formats are not well suited to large MS data files, for example, those generated by SWATH-MS, and compromise high-throughput data processing and storing. We developed mzDB, an efficient file format for large MS data sets. It relies on the SQLite software library and consists of a standardized and portable server-less single-file database. An optimized 3D indexing approach is adopted, where the LC-MS coordinates (retention time and m/z), along with the precursor m/z for SWATH-MS data, are used to query the database for data extraction. In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.


Subject(s)
Database Management Systems , Mass Spectrometry/methods , Datasets as Topic , Epithelial Cells/metabolism , Humans , Proteome/analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...