RESUMEN
MOTIVATION: Inferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides, which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, the correct taxonomic inference is crucial when identifying different viral strains with high-sequence homology-considering, e.g., the different epidemiological characteristics of the various strains of severe acute respiratory syndrome-related coronavirus-2. Additionally, many viruses mutate frequently, further complicating the correct identification of viral proteomic samples. RESULTS: We present PepGM, a probabilistic graphical model for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence scores, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on the species level, which PepGM clearly indicates by lower confidence scores. AVAILABILITY AND IMPLEMENTATION: PepGM is written in Python and embedded into a Snakemake workflow. It is available at https://github.com/BAMeScience/PepGM.
Asunto(s)
COVID-19 , Virus , Humanos , Proteoma , Proteómica/métodos , Algoritmos , Virus/genética , PéptidosRESUMEN
The strong influence of microbiomes on areas such as ecology and human health has become widely recognized in the past years. Accordingly, various techniques for the investigation of the composition and function of microbial community samples have been developed. Metaproteomics, the comprehensive analysis of the proteins from microbial communities, allows for the investigation of not only the taxonomy but also the functional and quantitative composition of microbiome samples. Due to the complexity of the investigated communities, methods developed for single organism proteomics cannot be readily applied to metaproteomic samples. For this purpose, methods specifically tailored to metaproteomics are required. In this work, a detailed overview of current bioinformatic solutions and protocols in metaproteomics is given. After an introduction to the proteomic database search, the metaproteomic post-processing steps are explained in detail. Ten specific bioinformatic software solutions are focused on, covering various steps including database-driven identification and quantification as well as taxonomic and functional assignment.
Asunto(s)
Biología Computacional , Microbiota , Proteómica , Programas Informáticos , Flujo de Trabajo , Proteómica/métodos , Biología Computacional/métodos , Microbiota/genética , Humanos , Bases de Datos de Proteínas , Metagenómica/métodosRESUMEN
The 2023 European Bioinformatics Community for Mass Spectrometry (EuBIC-MS) Developers Meeting was held from January 15th to January 20th, 2023, in Congressi Stefano Franscin at Monte Verità in Ticino, Switzerland. The participants were scientists and developers working in computational mass spectrometry (MS), metabolomics, and proteomics. The 5-day program was split between introductory keynote lectures and parallel hackathon sessions focusing on "Artificial Intelligence in proteomics" to stimulate future directions in the MS-driven omics areas. During the latter, the participants developed bioinformatics tools and resources addressing outstanding needs in the community. The hackathons allowed less experienced participants to learn from more advanced computational MS experts and actively contribute to highly relevant research projects. We successfully produced several new tools applicable to the proteomics community by improving data analysis and facilitating future research.