Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 2817: 221-239, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38907156

RESUMO

Single-cell proteomics can offer valuable insights into dynamic cellular interactions, but identifying proteins at this level is challenging due to their low abundance. In this chapter, we present a state-of-the-art bioinformatics pipeline for single-cell proteomics that combines the search engine Sage (via SearchGUI), identification rescoring with MS2Rescore, quantification through FlashLFQ, and differential expression analysis using MSqRob2. MS2Rescore leverages LC-MS/MS behavior predictors, such as MS2PIP and DeepLC, to recalibrate scores with Percolator or mokapot. Combining these tools into a unified pipeline, this approach improves the detection of low-abundance peptides, resulting in increased identifications while maintaining stringent FDR thresholds.


Assuntos
Biologia Computacional , Proteômica , Análise de Célula Única , Software , Espectrometria de Massas em Tandem , Análise de Célula Única/métodos , Biologia Computacional/métodos , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Humanos , Cromatografia Líquida/métodos , Ferramenta de Busca , Proteoma/análise
2.
Talanta ; 274: 125970, 2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38621320

RESUMO

The use of collision cross section (CCS) values derived from ion mobility studies is proving to be an increasingly important tool in the characterization and identification of molecules detected in complex mixtures. Here, a novel machine learning (ML) based method for predicting CCS integrating both molecular modeling (MM) and ML methodologies has been devised and shown to be able to accurately predict CCS values for singly charged small molecular weight molecules from a broad range of chemical classes. The model performed favorably compared to existing models, improving compound identifications for isobaric analytes in terms of ranking and assigning identification probability values to the annotation. Furthermore, charge localization was seen to be correlated with CCS prediction accuracy and with gas-phase proton affinity demonstrating the potential to provide a proxy for prediction error based on chemical structural properties. The presented approach and findings represent a further step towards accurate prediction and application of computationally generated CCS values.

3.
Proteomics ; 24(8): e2300144, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38629965

RESUMO

In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.


Assuntos
Ácidos Nucleicos , RNA , Aminoácidos , Espectrometria de Massas , Peptídeos
4.
J Proteome Res ; 23(6): 2078-2089, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38666436

RESUMO

Data-independent acquisition (DIA) has become a well-established method for MS-based proteomics. However, the list of options to analyze this type of data is quite extensive, and the use of spectral libraries has become an important factor in DIA data analysis. More specifically the use of in silico predicted libraries is gaining more interest. By working with a differential spike-in of human standard proteins (UPS2) in a constant yeast tryptic digest background, we evaluated the sensitivity, precision, and accuracy of the use of in silico predicted libraries in data DIA data analysis workflows compared to more established workflows. Three commonly used DIA software tools, DIA-NN, EncyclopeDIA, and Spectronaut, were each tested in spectral library mode and spectral library-free mode. In spectral library mode, we used independent spectral library prediction tools PROSIT and MS2PIP together with DeepLC, next to classical data-dependent acquisition (DDA)-based spectral libraries. In total, we benchmarked 12 computational workflows for DIA. Our comparison showed that DIA-NN reached the highest sensitivity while maintaining a good compromise on the reproducibility and accuracy levels in either library-free mode or using in silico predicted libraries pointing to a general benefit in using in silico predicted libraries.


Assuntos
Simulação por Computador , Proteômica , Software , Fluxo de Trabalho , Proteômica/métodos , Proteômica/estatística & dados numéricos , Humanos , Reprodutibilidade dos Testes , Análise de Dados , Biblioteca de Peptídeos
5.
J Proteome Res ; 23(8): 3200-3207, 2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-38491990

RESUMO

Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.


Assuntos
Algoritmos , Peptídeos , Proteômica , Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Peptídeos/química , Peptídeos/análise , Proteômica/métodos , Interface Usuário-Computador , Humanos , Ferramenta de Busca , Análise de Célula Única/métodos , Bases de Dados de Proteínas
6.
Bioinformatics ; 40(2)2024 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-38192003

RESUMO

MOTIVATION: Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS: To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION: Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.


Assuntos
Proteômica , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
7.
Microb Cell Fact ; 22(1): 254, 2023 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-38072930

RESUMO

BACKGROUND: It is increasingly recognized that conventional food production systems are not able to meet the globally increasing protein needs, resulting in overexploitation and depletion of resources, and environmental degradation. In this context, microbial biomass has emerged as a promising sustainable protein alternative. Nevertheless, often no consideration is given on the fact that the cultivation conditions affect the composition of microbial cells, and hence their quality and nutritional value. Apart from the properties and nutritional quality of the produced microbial food (ingredient), this can also impact its sustainability. To qualitatively assess these aspects, here, we investigated the link between substrate availability, growth rate, cell composition and size of Cupriavidus necator and Komagataella phaffii. RESULTS: Biomass with decreased nucleic acid and increased protein content was produced at low growth rates. Conversely, high rates resulted in larger cells, which could enable more efficient biomass harvesting. The proteome allocation varied across the different growth rates, with more ribosomal proteins at higher rates, which could potentially affect the techno-functional properties of the biomass. Considering the distinct amino acid profiles established for the different cellular components, variations in their abundance impacts the product quality leading to higher cysteine and phenylalanine content at low growth rates. Therefore, we hint that costly external amino acid supplementations that are often required to meet the nutritional needs could be avoided by carefully applying conditions that enable targeted growth rates. CONCLUSION: In summary, we demonstrate tradeoffs between nutritional quality and production rate, and we discuss the microbial biomass properties that vary according to the growth conditions.


Assuntos
Aminoácidos , Proteoma , Biomassa , Cisteína , Tamanho Celular
8.
Nucleic Acids Res ; 51(W1): W338-W342, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37140039

RESUMO

Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.


Assuntos
Proteoma , Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química
9.
J Proteome Res ; 22(4): 1181-1192, 2023 04 07.
Artigo em Inglês | MEDLINE | ID: mdl-36963412

RESUMO

Using data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyze the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.


Assuntos
Proteoma , Proteômica , Humanos , Proteômica/métodos , Proteoma/genética , Proteoma/metabolismo , Algoritmos , Aprendizado de Máquina
10.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
12.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36693629

RESUMO

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Assuntos
Algoritmos , Proteômica , Proteômica/métodos , Reprodutibilidade dos Testes , Peptídeos/análise , Espectrometria de Massas/métodos , Software
13.
J Proteome Res ; 22(2): 557-560, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36508242

RESUMO

A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.


Assuntos
Proteômica , Software , Proteômica/métodos , Peptídeos , Ferramenta de Busca
14.
Mol Cell Proteomics ; 21(8): 100266, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35803561

RESUMO

Immunopeptidomics aims to identify major histocompatibility complex (MHC)-presented peptides on almost all cells that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the nontryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS2PIP and retention time predictions by DeepLC have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS2PIP was tailored toward tryptic peptides, we have here retrained MS2PIP to include nontryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides but also yield further improvements for tryptic peptides. We show that the integration of new MS2PIP models, DeepLC, and Percolator in one software package, MS2Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS2Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Altogether, MS2Rescore thus allows substantially improved identification of novel epitopes from existing immunopeptidomics workflows.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Peptídeos , Proteínas
15.
J Proteome Res ; 21(5): 1365-1370, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35446579

RESUMO

Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.


Assuntos
Proteômica , Ferramenta de Busca , Algoritmos , Bases de Dados de Proteínas , Biblioteca de Peptídeos , Proteômica/métodos , Ferramenta de Busca/métodos , Software , Espectrometria de Massas em Tandem/métodos
16.
Bioinform Adv ; 2(1): vbac002, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36699344

RESUMO

Summary: Proteins tend to bury hydrophobic residues inside their core during the folding process to provide stability to the protein structure and to prevent aggregation. Nevertheless, proteins do expose some 'sticky' hydrophobic residues to the solvent. These residues can play an important functional role, e.g. in protein-protein and membrane interactions. Here, we first investigate how hydrophobic protein surfaces are by providing three measures for surface hydrophobicity: the total hydrophobic surface area, the relative hydrophobic surface area and-using our MolPatch method-the largest hydrophobic patch. Secondly, we analyze how difficult it is to predict these measures from sequence: by adapting solvent accessibility predictions from NetSurfP2.0, we obtain well-performing prediction methods for the THSA and RHSA, while predicting LHP is more challenging. Finally, we analyze implications of exposed hydrophobic surfaces: we show that hydrophobic proteins typically have low expression, suggesting cells avoid an overabundance of sticky proteins. Availability and implementation: The data underlying this article are available in GitHub at https://github.com/ibivu/hydrophobic_patches. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

17.
Anal Chem ; 93(47): 15633-15641, 2021 11 30.
Artigo em Inglês | MEDLINE | ID: mdl-34780168

RESUMO

Machine learning is a popular technique to predict the retention times of molecules based on descriptors. Descriptors and associated labels (e.g., retention times) of a set of molecules can be used to train a machine learning algorithm. However, descriptors are fixed molecular features which are not necessarily optimized for the given machine learning problem (e.g., to predict retention times). Recent advances in molecular machine learning make use of so-called graph convolutional networks (GCNs) to learn molecular representations from atoms and their bonds to adjacent atoms to optimize the molecular representation for the given problem. In this study, two GCNs were implemented to predict the retention times of molecules for three different chromatographic data sets and compared to seven benchmarks (including two state-of-the art machine learning models). Additionally, saliency maps were computed from trained GCNs to better interpret the importance of certain molecular sub-structures in the data sets. Based on the overall observations of this study, the GCNs performed better than all benchmarks, either significantly outperforming them (5-25% lower mean absolute error) or performing similar to them (<5% difference). Saliency maps revealed a significant difference in molecular sub-structures that are important for predictions of different chromatographic data sets (reversed-phase liquid chromatography vs hydrophilic interaction liquid chromatography).


Assuntos
Cromatografia de Fase Reversa , Aprendizado de Máquina , Algoritmos , Cromatografia Líquida , Interações Hidrofóbicas e Hidrofílicas
18.
Nat Commun ; 12(1): 6414, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34741024

RESUMO

While transcriptome- and proteome-wide technologies to assess processes in protein biogenesis are now widely available, we still lack global approaches to assay post-ribosomal biogenesis events, in particular those occurring in the eukaryotic secretory system. We here develop a method, SECRiFY, to simultaneously assess the secretability of >105 protein fragments by two yeast species, S. cerevisiae and P. pastoris, using custom fragment libraries, surface display and a sequencing-based readout. Screening human proteome fragments with a median size of 50-100 amino acids, we generate datasets that enable datamining into protein features underlying secretability, revealing a striking role for intrinsic disorder and chain flexibility. The SECRiFY methodology generates sufficient amounts of annotated data for advanced machine learning methods to deduce secretability patterns. The finding that secretability is indeed a learnable feature of protein sequences provides a solid base for application-focused studies.


Assuntos
Saccharomyces cerevisiae/metabolismo , Humanos , Proteoma/genética , Proteoma/fisiologia , Transcriptoma/genética , Transcriptoma/fisiologia
19.
Nat Methods ; 18(11): 1363-1369, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34711972

RESUMO

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.


Assuntos
Algoritmos , Aprendizado Profundo , Fragmentos de Peptídeos/análise , Processamento de Proteína Pós-Traducional , Proteínas/análise , Proteínas/química , Proteoma/análise , Conjuntos de Dados como Assunto , Humanos , Fragmentos de Peptídeos/química , Mapeamento de Peptídeos
20.
JACS Au ; 1(6): 750-765, 2021 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-34254058

RESUMO

Rising population density and global mobility are among the reasons why pathogens such as SARS-CoV-2, the virus that causes COVID-19, spread so rapidly across the globe. The policy response to such pandemics will always have to include accurate monitoring of the spread, as this provides one of the few alternatives to total lockdown. However, COVID-19 diagnosis is currently performed almost exclusively by reverse transcription polymerase chain reaction (RT-PCR). Although this is efficient, automatable, and acceptably cheap, reliance on one type of technology comes with serious caveats, as illustrated by recurring reagent and test shortages. We therefore developed an alternative diagnostic test that detects proteolytically digested SARS-CoV-2 proteins using mass spectrometry (MS). We established the Cov-MS consortium, consisting of 15 academic laboratories and several industrial partners to increase applicability, accessibility, sensitivity, and robustness of this kind of SARS-CoV-2 detection. This, in turn, gave rise to the Cov-MS Digital Incubator that allows other laboratories to join the effort, navigate, and share their optimizations and translate the assay into their clinic. As this test relies on viral proteins instead of RNA, it provides an orthogonal and complementary approach to RT-PCR using other reagents that are relatively inexpensive and widely available, as well as orthogonally skilled personnel and different instruments. Data are available via ProteomeXchange with identifier PXD022550.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA