RESUMO
Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodos , Processamento de Sinais Assistido por Computador , Software , AlgoritmosRESUMO
Unipept Desktop 2.0 is the most recent iteration of the Unipept Desktop tool that adds support for the analysis of metaproteogenomics datasets. Unipept Desktop now supports the automatic construction of targeted protein reference databases that only contain proteins (originating from the UniProtKB resource) associated with a predetermined list of taxa. This improves both the taxonomic and functional resolution of a metaproteomic analysis and yields several technical advantages. By limiting the proteins present in a reference database, it is also possible to perform (meta)proteogenomics analyses. Since the protein reference database resides on the user's local machine, they have complete control over the database used during an analysis. Data no longer need to be transmitted over the Internet, decreasing the time required for an analysis and better safeguarding privacy-sensitive data. As a proof of concept, we present a case study in which a human gut metaproteome dataset is analyzed with Unipept Desktop 2.0 using different targeted databases based on matched 16S rRNA gene sequencing data.
Assuntos
Metagenômica , Proteínas , Humanos , Bases de Dados de Proteínas , RNA Ribossômico 16SRESUMO
The Human Proteome Organization (HUPO) Proteomics Standards Initiative (PSI) has been successfully developing guidelines, data formats, and controlled vocabularies (CVs) for the proteomics community and other fields supported by mass spectrometry since its inception 20 years ago. Here we describe the general operation of the PSI, including its leadership, working groups, yearly workshops, and the document process by which proposals are thoroughly and publicly reviewed in order to be ratified as PSI standards. We briefly describe the current state of the many existing PSI standards, some of which remain the same as when originally developed, some of which have undergone subsequent revisions, and some of which have become obsolete. Then the set of proposals currently being developed are described, with an open call to the community for participation in the forging of the next generation of standards. Finally, we describe some synergies and collaborations with other organizations and look to the future in how the PSI will continue to promote the open sharing of data and thus accelerate the progress of the field of proteomics.
Assuntos
Proteoma , Proteômica , Humanos , Padrões de Referência , Vocabulário Controlado , Espectrometria de Massas , Bases de Dados de ProteínasRESUMO
BACKGROUND: It is increasingly recognized that conventional food production systems are not able to meet the globally increasing protein needs, resulting in overexploitation and depletion of resources, and environmental degradation. In this context, microbial biomass has emerged as a promising sustainable protein alternative. Nevertheless, often no consideration is given on the fact that the cultivation conditions affect the composition of microbial cells, and hence their quality and nutritional value. Apart from the properties and nutritional quality of the produced microbial food (ingredient), this can also impact its sustainability. To qualitatively assess these aspects, here, we investigated the link between substrate availability, growth rate, cell composition and size of Cupriavidus necator and Komagataella phaffii. RESULTS: Biomass with decreased nucleic acid and increased protein content was produced at low growth rates. Conversely, high rates resulted in larger cells, which could enable more efficient biomass harvesting. The proteome allocation varied across the different growth rates, with more ribosomal proteins at higher rates, which could potentially affect the techno-functional properties of the biomass. Considering the distinct amino acid profiles established for the different cellular components, variations in their abundance impacts the product quality leading to higher cysteine and phenylalanine content at low growth rates. Therefore, we hint that costly external amino acid supplementations that are often required to meet the nutritional needs could be avoided by carefully applying conditions that enable targeted growth rates. CONCLUSION: In summary, we demonstrate tradeoffs between nutritional quality and production rate, and we discuss the microbial biomass properties that vary according to the growth conditions.
Assuntos
Aminoácidos , Proteoma , Biomassa , Cisteína , Tamanho CelularRESUMO
BACKGROUND: Human cells and bacteria secrete extracellular vesicles (EV) which play a role in intercellular communication. EV from the host intestinal epithelium are involved in the regulation of bacterial gene expression and growth. Bacterial EV (bactEV) produced in the intestine can pass to various tissues where they deliver biomolecules to many kinds of cells, including neurons. Emerging data indicate that gut microbiota is altered in patients with psychotic disorders. We hypothesized that the amount and content of blood-borne EV from intestinal cells and bactEV in psychotic patients would differ from healthy controls. METHODS: We analyzed for human intestinal proteins by proteomics, for bactEV by metaproteomic analysis, and by measuring the level of lipopolysaccharide (LPS) in blood-borne EV from patients with psychotic disorders (n = 25), tested twice, in the acute phase of psychosis and after improvement, with age- and sex-matched healthy controls (n = 25). RESULTS: Patients with psychotic disorders had lower LPS levels in their EV compared to healthy controls (p = .027). Metaproteome analyses confirmed LPS finding and identified Firmicutes and Bacteroidetes as dominating phyla. Total amounts of human intestine proteins in EV isolated from blood was lower in patients compared to controls (p = .02). CONCLUSIONS: Our results suggest that bactEV and host intestinal EV are decreased in patients with psychosis and that this topic is worthy of further investigation given potential pathophysiological implications. Possible mechanisms involve dysregulation of the gut microbiota by host EV, altered translocation of bactEV to systemic circulation where bactEV can interact with both the brain and the immune system.
Assuntos
Vesículas Extracelulares , Transtornos Psicóticos , Humanos , Lipopolissacarídeos/metabolismo , Intestinos/microbiologia , Bactérias/metabolismo , Vesículas Extracelulares/metabolismoRESUMO
In metaproteomics, the study of the collective proteome of microbial communities, the protein inference problem is more challenging than in single-species proteomics. Indeed, a peptide sequence can be present not only in multiple proteins or protein isoforms of the same species, but also in homologous proteins from closely related species. To assign the taxonomy and functions of the microbial species, specialized tools have been developed, such as Prophane. This tool, however, is not directly compatible with post-processing tools such as Percolator. In this manuscript we therefore present Pout2Prot, which takes Percolator Output (.pout) files from multiple experiments and creates protein group and protein subgroup output files (.tsv) that can be used directly with Prophane. We investigated different grouping strategies and compared existing protein grouping tools to develop an advanced protein grouping algorithm that offers a variety of different approaches, allows grouping for multiple files, and uses a weighted spectral count for protein (sub)groups to reflect abundance. Pout2Prot is available as a web application at https://pout2prot.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the Apache License 2.0 and is available at https://github.com/compomics/pout2prot.
Assuntos
Proteômica , Software , Algoritmos , Bases de Dados de Proteínas , ProteomaRESUMO
Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.
Assuntos
Proteômica , Ferramenta de Busca , Algoritmos , Bases de Dados de Proteínas , Biblioteca de Peptídeos , Proteômica/métodos , Ferramenta de Busca/métodos , Software , Espectrometria de Massas em Tandem/métodosRESUMO
It is important for the proteomics community to have a standardized manner to represent all possible variations of a protein or peptide primary sequence, including natural, chemically induced, and artifactual modifications. The Human Proteome Organization Proteomics Standards Initiative in collaboration with several members of the Consortium for Top-Down Proteomics (CTDP) has developed a standard notation called ProForma 2.0, which is a substantial extension of the original ProForma notation developed by the CTDP. ProForma 2.0 aims to unify the representation of proteoforms and peptidoforms. ProForma 2.0 supports use cases needed for bottom-up and middle-/top-down proteomics approaches and allows the encoding of highly modified proteins and peptides using a human- and machine-readable string. ProForma 2.0 can be used to represent protein modifications in a specified or ambiguous location, designated by mass shifts, chemical formulas, or controlled vocabulary terms, including cross-links (natural and chemical) and atomic isotopes. Notational conventions are based on public controlled vocabularies and ontologies. The most up-to-date full specification document and information about software implementations are available at http://psidev.info/proforma.
Assuntos
Proteoma , Proteômica , Humanos , Processamento de Proteína Pós-Traducional , Proteoma/genética , Padrões de Referência , SoftwareRESUMO
Metaproteomics has become an important research tool to study microbial systems, which has resulted in increased metaproteomics data generation. However, efficient tools for processing the acquired data have lagged behind. One widely used tool for metaproteomics data interpretation is Unipept, a web-based tool that provides, among others, interactive and insightful visualizations. Due to its web-based implementation, however, the Unipept web application is limited in the amount of data that can be analyzed. In this manuscript we therefore present Unipept Desktop, a desktop application version of Unipept that is designed to drastically increase the throughput and capacity of metaproteomics data analysis. Moreover, it provides a novel comparative analysis pipeline and improves the organization of experimental data into projects, thus addressing the growing need for more efficient and versatile analysis tools for metaproteomics data.
Assuntos
Análise de Dados , SoftwareRESUMO
The study of microbiomes has gained in importance over the past few years and has led to the emergence of the fields of metagenomics, metatranscriptomics, and metaproteomics. While initially focused on the study of biodiversity within these communities, the emphasis has increasingly shifted to the study of (changes in) the complete set of functions available in these communities. A key tool to study this functional complement of a microbiome is Gene Ontology (GO) term analysis. However, comparing large sets of GO terms is not an easy task due to the deeply branched nature of GO, which limits the utility of exact term matching. To solve this problem, we here present MegaGO, a user-friendly tool that relies on semantic similarity between GO terms to compute the functional similarity between multiple data sets. MegaGO is high performing: Each set can contain thousands of GO terms, and results are calculated in a matter of seconds. MegaGO is available as a web application at https://megago.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the MIT license and is available at https://github.com/MEGA-GO/.
Assuntos
Microbiota , Software , Biologia Computacional , Ontologia Genética , Metagenômica , SemânticaRESUMO
SUMMARY: Unipept is an ecosystem of tools developed for fast metaproteomics data-analysis consisting of a web application, a set of web services (application programming interface, API) and a command-line interface (CLI). After the successful introduction of version 4 of the Unipept web application, we here introduce version 2.0 of the API and CLI. Next to the existing taxonomic analysis, version 2.0 of the API and CLI provides access to Unipept's powerful functional analysis for metaproteomics samples. The functional analysis pipeline supports retrieval of Enzyme Commission numbers, Gene Ontology terms and InterPro entries for the individual peptides in a metaproteomics sample. This paves the way for other applications and developers to integrate these new information sources into their data processing pipelines, which greatly increases insight into the functions performed by the organisms in a specific environment. Both the API and CLI have also been expanded with the ability to render interactive visualizations from a list of taxon ids. These visualizations are automatically made available on a dedicated website and can easily be shared by users. AVAILABILITY AND IMPLEMENTATION: The API is available at http://api.unipept.ugent.be. Information regarding the CLI can be found at https://unipept.ugent.be/clidocs. Both interfaces are freely available and open-source under the MIT license. CONTACT: pieter.verschaffelt@ugent.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ecossistema , Software , Análise de Dados , PeptídeosRESUMO
The European Bioinformatics Community for Mass Spectrometry (EuBIC-MS; eubic-ms.org) was founded in 2014 to unite European computational mass spectrometry researchers and proteomics bioinformaticians working in academia and industry. EuBIC-MS maintains educational resources (proteomics-academy.org) and organises workshops at national and international conferences on proteomics and mass spectrometry. Furthermore, EuBIC-MS is actively involved in several community initiatives such as the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI). Apart from these collaborations, EuBIC-MS has organised two Winter Schools and two Developers' Meetings that have contributed to the strengthening of the European mass spectrometry network and fostered international collaboration in this field, even beyond Europe. Moreover, EuBIC-MS is currently actively developing a community-driven standard dedicated to mass spectrometry data annotation (SDRF-Proteomics) that will facilitate data reuse and collaboration. This manuscript highlights what EuBIC-MS is, what it does, and what it already has achieved. A warm invitation is extended to new researchers at all career stages to join the EuBIC-MS community on its Slack channel (eubic.slack.com).
RESUMO
A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.
Assuntos
Aprendizado de Máquina , Proteômica , Fluxo de Trabalho , Cromatografia Líquida , Espectrometria de MassasRESUMO
Although metaproteomics, the study of the collective proteome of microbial communities, has become increasingly powerful and popular over the past few years, the field has lagged behind on the availability of user-friendly, end-to-end pipelines for data analysis. We therefore describe the connection from two commonly used metaproteomics data processing tools in the field, MetaProteomeAnalyzer and PeptideShaker, to Unipept for downstream analysis. Through these connections, direct end-to-end pipelines are built from database searching to taxonomic and functional annotation.
Assuntos
Análise de Dados , Microbiota , Proteoma , Proteômica , SoftwareRESUMO
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff .
Assuntos
Proteômica/normas , Humanos , Armazenamento e Recuperação da Informação , Espectrometria de Massas , SoftwareRESUMO
INTRODUCTION: The study of microbial communities based on the combined analysis of genomic and proteomic data - called metaproteogenomics - has gained increased research attention in recent years. This relatively young field aims to elucidate the functional and taxonomic interplay of proteins in microbiomes and its implications on human health and the environment. Areas covered: This article reviews bioinformatics methods and software tools dedicated to the analysis of data from metaproteomics and metaproteogenomics experiments. In particular, it focuses on the creation of tailored protein sequence databases, on the optimal use of database search algorithms including methods of error rate estimation, and finally on taxonomic and functional annotation of peptide and protein identifications. Expert opinion: Recently, various promising strategies and software tools have been proposed for handling typical data analysis issues in metaproteomics. However, severe challenges remain that are highlighted and discussed in this article; these include: (i) robust false-positive assessment of peptide and protein identifications, (ii) complex protein inference against a background of highly redundant data, (iii) taxonomic and functional post-processing of identification data, and finally, (iv) the assessment and provision of metrics and tools for quantitative analysis.
Assuntos
Análise de Dados , Metagenômica , Proteômica , Bases de Dados de Proteínas , Humanos , Proteoma/metabolismo , Ferramenta de BuscaRESUMO
Metaproteomics has become a crucial omics technology for studying microbiomes. In this area, the Unipept ecosystem, accessible at https://unipept.ugent.be , has emerged as a valuable resource for analyzing metaproteomic data. It offers in-depth insights into both taxonomic distributions and functional characteristics of complex ecosystems. This tutorial explains essential concepts like Lowest Common Ancestor (LCA) determination and the handling of peptides with missed cleavages. It also provides a detailed, step-by-step guide on using the Unipept Web application and Unipept Desktop for thorough metaproteomics analyses. By integrating theoretical principles with practical methodologies, this tutorial empowers researchers with the essential knowledge and tools needed to fully utilize metaproteomics in their microbiome studies.
Assuntos
Biodiversidade , Microbiota , Proteômica , Software , Proteômica/métodos , Microbiota/genética , Humanos , Biologia Computacional/métodos , Metagenômica/métodosRESUMO
Deadwood provides habitat for fungi and serves diverse ecological functions in forests. We already have profound knowledge of fungal assembly processes, physiological and enzymatic activities, and resulting physico-chemical changes during deadwood decay. However, in situ detection and identification methods, fungal origins, and a mechanistic understanding of the main lignocellulolytic enzymes are lacking. This study used metaproteomics to detect the main extracellular lignocellulolytic enzymes in 12 tree species in a temperate forest that have decomposed for 8 ½ years. Mainly white-rot (and few brown-rot) Basidiomycota were identified as the main wood decomposers, with Armillaria as the dominant genus; additionally, several soft-rot xylariaceous Ascomycota were identified. The key enzymes involved in lignocellulolysis included manganese peroxidase, peroxide-producing alcohol oxidases, laccase, diverse glycoside hydrolases (cellulase, glucosidase, xylanase), esterases, and lytic polysaccharide monooxygenases. The fungal community and enzyme composition differed among the 12 tree species. Ascomycota species were more prevalent in angiosperm logs than in gymnosperm logs. Regarding lignocellulolysis as a function, the extracellular enzyme toolbox acted simultaneously and was interrelated (e.g. peroxidases and peroxide-producing enzymes were strongly correlated), highly functionally redundant, and present in all logs. In summary, our in situ study provides comprehensive and detailed insight into the enzymatic machinery of wood-inhabiting fungi in temperate tree species. These findings will allow us to relate changes in environmental factors to lignocellulolysis as an ecosystem function in the future.
Assuntos
Ascomicetos , Basidiomycota , Madeira/microbiologia , Ecossistema , Árvores , Basidiomycota/fisiologia , Peróxidos/metabolismo , FungosRESUMO
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intralaboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Quality Control (QC) working group of the Human Proteome Organization's Proteomics Standards Initiative has established the standard mzQC file format for reporting and exchanging information relating to data quality. mzQC is based on the JavaScript Object Notation (JSON) format and provides a lightweight yet versatile file format that can be easily implemented in software. Here, we present open-source software libraries to process mzQC data in three programming languages: Python, using pymzqc; R, using rmzqc; and Java, using jmzqc. The libraries follow a common data model and provide shared functionalities, including the (de)serialization and validation of mzQC files. We demonstrate use of the software libraries in a workflow for extracting, analyzing, and visualizing QC metrics from different sources. Additionally, we show how these libraries can be integrated with each other, with existing software tools, and in automated workflows for the QC of mass spectrometry data. All software libraries are available as open source under the MS-Quality-Hub organization on GitHub (https://github.com/MS-Quality-Hub).