Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 46
1.
Proteomics ; 24(8): e2300144, 2024 Apr.
Article En | MEDLINE | ID: mdl-38629965

In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.


Nucleic Acids , RNA , Amino Acids , Mass Spectrometry , Peptides
3.
J Biomol Struct Dyn ; : 1-19, 2023 Sep 27.
Article En | MEDLINE | ID: mdl-37753734

Neuroblastoma, the most common childhood solid tumor, originates from primitive sympathetic nervous system cells. Epoxyazadiradione (EAD) is a limonoid derived from Azadirachta indica, belonging to the family Meliaceae. In this study, we isolated the EAD from Azadirachta indica seed and studied the anti-cancer potential against neuroblastoma. Herein, EAD demonstrated significant efficacy against neuroblastoma by suppressing cell proliferation, enhancing the rate of apoptosis and cycle arrest at the SubG0 and G2/M phases. EAD enhanced the pro-apoptotic Caspase 3 and Caspase 9 and inhibited the NF-kß translocation in a dose-dependent manner. In order to identify the specific EAD target, a gel-free quantitative proteomics study on SH-SY5Y cells using Liquid Chromatography with tandem mass spectrometry was done in a dose-dependent manner, followed by detailed bioinformatics analysis to identify effects on protein. Proteomics data identified that Enolase1 and HSP90 were up-regulated in neuroblastoma. EAD inhibited the expression of Enolase1 and HSP90, validated by mRNA expression, immunoblotting, Enolase1 and HSP90 kit and flow-cytometry based bioassay. Molecular docking study, Molecular dynamic simulation, and along with molecular mechanics/Poisson-Boltzmann surface area analysis also suggested that EAD binds at the active site of the proteins and were stable throughout the 100 ns Molecular dynamic simulation study. Overall, this study suggested EAD exhibited anti-cancer activity against neuroblastoma by targeting Enolase1 and HSP90 pathways.Communicated by Ramaswamy H. Sarma.

4.
Proteomics ; 23(20): e2300188, 2023 Oct.
Article En | MEDLINE | ID: mdl-37488995

Relative and absolute intensity-based protein quantification across cell lines, tissue atlases and tumour datasets is increasingly available in public datasets. These atlases enable researchers to explore fundamental biological questions, such as protein existence, expression location, quantity and correlation with RNA expression. Most studies provide MS1 feature-based label-free quantitative (LFQ) datasets; however, growing numbers of isobaric tandem mass tags (TMT) datasets remain unexplored. Here, we compare traditional intensity-based absolute quantification (iBAQ) proteome abundance ranking to an analogous method using reporter ion proteome abundance ranking with data from an experiment where LFQ and TMT were measured on the same samples. This new TMT method substitutes reporter ion intensities for MS1 feature intensities in the iBAQ framework. Additionally, we compared LFQ-iBAQ values to TMT-iBAQ values from two independent large-scale tissue atlas datasets (one LFQ and one TMT) using robust bottom-up proteomic identification, normalisation and quantitation workflows.

5.
J Cheminform ; 15(1): 52, 2023 May 12.
Article En | MEDLINE | ID: mdl-37173725

Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC-MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets.

6.
J Proteome Res ; 22(6): 2114-2123, 2023 06 02.
Article En | MEDLINE | ID: mdl-37220883

Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings.


Peptides , Software , Peptides/analysis , Proteins/analysis , Mass Spectrometry/methods , Proteomics/methods
7.
Microbiome ; 11(1): 24, 2023 02 09.
Article En | MEDLINE | ID: mdl-36755313

BACKGROUND: Stable isotope probing (SIP) approaches are a critical tool in microbiome research to determine associations between species and substrates, as well as the activity of species. The application of these approaches ranges from studying microbial communities important for global biogeochemical cycling to host-microbiota interactions in the intestinal tract. Current SIP approaches, such as DNA-SIP or nanoSIMS allow to analyze incorporation of stable isotopes with high coverage of taxa in a community and at the single cell level, respectively, however they are limited in terms of sensitivity, resolution or throughput. RESULTS: Here, we present an ultra-sensitive, high-throughput protein-based stable isotope probing approach (Protein-SIP), which cuts cost for labeled substrates by 50-99% as compared to other SIP and Protein-SIP approaches and thus enables isotope labeling experiments on much larger scales and with higher replication. The approach allows for the determination of isotope incorporation into microbiome members with species level resolution using standard metaproteomics liquid chromatography-tandem mass spectrometry (LC-MS/MS) measurements. At the core of the approach are new algorithms to analyze the data, which have been implemented in an open-source software ( https://sourceforge.net/projects/calis-p/ ). We demonstrate sensitivity, precision and accuracy using bacterial cultures and mock communities with different labeling schemes. Furthermore, we benchmark our approach against two existing Protein-SIP approaches and show that in the low labeling range used our approach is the most sensitive and accurate. Finally, we measure translational activity using 18O heavy water labeling in a 63-species community derived from human fecal samples grown on media simulating two different diets. Activity could be quantified on average for 27 species per sample, with 9 species showing significantly higher activity on a high protein diet, as compared to a high fiber diet. Surprisingly, among the species with increased activity on high protein were several Bacteroides species known as fiber consumers. Apparently, protein supply is a critical consideration when assessing growth of intestinal microbes on fiber, including fiber-based prebiotics. CONCLUSIONS: We demonstrate that our Protein-SIP approach allows for the ultra-sensitive (0.01 to 10% label) detection of stable isotopes of elements found in proteins, using standard metaproteomics data.


Microbiota , Tandem Mass Spectrometry , Humans , Carbon Isotopes/analysis , Carbon Isotopes/metabolism , Chromatography, Liquid , Tandem Mass Spectrometry/methods , DNA Probes
8.
J Proteome Res ; 22(2): 625-631, 2023 02 03.
Article En | MEDLINE | ID: mdl-36688502

spectrum_utils is a Python package for mass spectrometry data processing and visualization. Since its introduction, spectrum_utils has grown into a fundamental software solution that powers various applications in proteomics and metabolomics, ranging from spectrum preprocessing prior to spectrum identification and machine learning applications to spectrum plotting from online data repositories and assisting data analysis tasks for dozens of other projects. Here, we present updates to spectrum_utils, which include new functionality to integrate mass spectrometry community data standards, enhanced mass spectral data processing, and unified mass spectral data visualization in Python. spectrum_utils is freely available as open source at https://github.com/bittremieux/spectrum_utils.


Proteomics , Software , Mass Spectrometry , Proteomics/methods , Metabolomics , Machine Learning
9.
J Proteome Res ; 21(6): 1566-1574, 2022 06 03.
Article En | MEDLINE | ID: mdl-35549218

Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.


Proteomics , Tandem Mass Spectrometry , Algorithms , Cluster Analysis , Consensus , Databases, Protein , Proteomics/methods , Software , Tandem Mass Spectrometry/methods
10.
Bioinformatics ; 38(5): 1470-1472, 2022 02 07.
Article En | MEDLINE | ID: mdl-34904638

SUMMARY: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling including optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we have reanalyzed six public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to >5% of the total number of peptides identified. AVAILABILITY AND IMPLEMENTATION: The software is freely available. pypgatk: https://github.com/bigbio/py-pgatk/ and pgdb: https://nf-co.re/pgdb. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Proteogenomics , Humans , Peptides/genetics , Software , Algorithms , Proteins
11.
Nat Commun ; 12(1): 5854, 2021 10 06.
Article En | MEDLINE | ID: mdl-34615866

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , Transcriptome
12.
J Proteome Res ; 20(7): 3758-3766, 2021 07 02.
Article En | MEDLINE | ID: mdl-34153189

Data-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. The main advantages include greater reproducibility and sensitivity and a greater dynamic range compared with data-dependent acquisition (DDA). However, the data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics, a multifunctional, automated, high-throughput pipeline implemented in the Nextflow workflow management system that allows one to easily process proteomics and peptidomics DIA data sets on diverse compute infrastructures. The central components are well-established tools such as the OpenSwathWorkflow for the DIA spectral library search and PyProphet for the false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and to carry out the retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from the statistical postprocessing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is well documented open-source software and is available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.


Data Analysis , Proteomics , Mass Spectrometry , Reproducibility of Results , Software
13.
Anal Chem ; 92(24): 15968-15974, 2020 12 15.
Article En | MEDLINE | ID: mdl-33269929

Technological advances in high-resolution mass spectrometry (MS) vastly increased the number of samples that can be processed in a life science experiment, as well as volume and complexity of the generated data. To address the bottleneck of high-throughput data processing, we present SmartPeak (https://github.com/AutoFlowResearch/SmartPeak), an application that encapsulates advanced algorithms to enable fast, accurate, and automated processing of capillary electrophoresis-, gas chromatography-, and liquid chromatography (LC)-MS(/MS) data and high-pressure LC data for targeted and semitargeted metabolomics, lipidomics, and fluxomics experiments. The application allows for an approximate 100-fold reduction in the data processing time compared to manual processing while enhancing quality and reproducibility of the results.


Electronic Data Processing/methods , Metabolomics/methods , Automation , Chromatography, Liquid , Electrophoresis, Capillary , Tandem Mass Spectrometry , Time Factors
14.
Mol Cell Proteomics ; 19(12): 2157-2168, 2020 12.
Article En | MEDLINE | ID: mdl-33067342

Cross-linking MS (XL-MS) has been recognized as an effective source of information about protein structures and interactions. In contrast to regular peptide identification, XL-MS has to deal with a quadratic search space, where peptides from every protein could potentially be cross-linked to any other protein. To cope with this search space, most tools apply different heuristics for search space reduction. We introduce a new open-source XL-MS database search algorithm, OpenPepXL, which offers increased sensitivity compared with other tools. OpenPepXL searches the full search space of an XL-MS experiment without using heuristics to reduce it. Because of efficient data structures and built-in parallelization OpenPepXL achieves excellent runtimes and can also be deployed on large compute clusters and cloud services while maintaining a slim memory footprint. We compared OpenPepXL to several other commonly used tools for identification of noncleavable labeled and label-free cross-linkers on a diverse set of XL-MS experiments. In our first comparison, we used a data set from a fraction of a cell lysate with a protein database of 128 targets and 128 decoys. At 5% FDR, OpenPepXL finds from 7% to over 50% more unique residue pairs (URPs) than other tools. On data sets with available high-resolution structures for cross-link validation OpenPepXL reports from 7% to over 40% more structurally validated URPs than other tools. Additionally, we used a synthetic peptide data set that allows objective validation of cross-links without relying on structural information and found that OpenPepXL reports at least 12% more validated URPs than other tools. It has been built as part of the OpenMS suite of tools and supports Windows, macOS, and Linux operating systems. OpenPepXL also supports the MzIdentML 1.2 format for XL-MS identification results. It is freely available under a three-clause BSD license at https://openms.org/openpepxl.


Cross-Linking Reagents/chemistry , Peptides/analysis , Software , Algorithms , Amino Acid Sequence , Databases, Protein , HEK293 Cells , Humans , Mass Spectrometry , Models, Molecular , Peptides/chemistry , Ribosomes/metabolism
15.
Nat Commun ; 11(1): 5250, 2020 10 16.
Article En | MEDLINE | ID: mdl-33067435

Protein-DNA interactions are key to the functionality and stability of the genome. Identification and mapping of protein-DNA interaction interfaces and sites is crucial for understanding DNA-dependent processes. Here, we present a workflow that allows mass spectrometric (MS) identification of proteins in direct contact with DNA in reconstituted and native chromatin after cross-linking by ultraviolet (UV) light. Our approach enables the determination of contact interfaces at amino-acid level. With the example of chromatin-associated protein SCML2 we show that our technique allows differentiation of nucleosome-binding interfaces in distinct states. By UV cross-linking of isolated nuclei we determined the cross-linking sites of several factors including chromatin-modifying enzymes, demonstrating that our workflow is not restricted to reconstituted materials. As our approach can distinguish between protein-RNA and DNA interactions in one single experiment, we project that it will be possible to obtain insights into chromatin and its regulation in the future.


Chromatin/metabolism , DNA/metabolism , DNA/radiation effects , Proteins/metabolism , Chromatin/chemistry , Chromatin/genetics , DNA/chemistry , DNA/genetics , Humans , Mass Spectrometry , Nucleosomes/chemistry , Nucleosomes/genetics , Nucleosomes/metabolism , Polycomb-Group Proteins/chemistry , Polycomb-Group Proteins/genetics , Polycomb-Group Proteins/metabolism , Polycomb-Group Proteins/radiation effects , Protein Binding/radiation effects , Proteins/chemistry , Proteins/genetics , Proteins/radiation effects , Ultraviolet Rays
17.
J Proteomics ; 222: 103791, 2020 06 30.
Article En | MEDLINE | ID: mdl-32335296

Stable isotope probing (SIP) approaches are a suitable tool to identify active organisms in bacterial communities, but adding isotopically labeled substrate can alter both the structure and the functionality of the community. Here, we validated and demonstrated a substrate-independent protein-SIP protocol using isotopically labeled water that captures the entire microbial activity of a community. We found that 18O yielded a higher incorporation rate into peptides and thus comprised a higher sensitivity. We then applied the method to an in vitro model of a human distal gut microbial ecosystem grown in two medium formulations, to evaluate changes in microbial activity between a high-fiber and high-protein diet. We showed that only little changes are seen in the community structure but the functionality varied between the diets. In conclusion, our approach can detect species-specific metabolic activity in complex bacterial communities and more specifically to quantify the amount of amino acid synthesis. Heavy water makes possible to analyze the activity of bacterial communities for which adding an isotopically labeled energy and nutrient sources is not easily feasible. SIGNIFICANCE: Heavy stable isotopes allow for the detection of active key players in complex ecosystems where many organisms are thought to be dormant. Opposed to the labelling with energy or nutrient sources, heavy water could be a suitable replacement to trace activity, which has been shown for DNA and RNA. Here we validate, quantify and compare the incorporation of heavy water either labeled with deuterium or 18­oxygen into proteins of Escherichia coli K12 and of an in vitro model of a human gut microbial ecosystem. The significance of our research is in providing a freely available pipeline to analyze the incorporation of deuterium and 18­oxygen into proteins together with the validation of the applicability of tracing heavy water as a proxy for activity. Our approach unveils the relative functional contribution of microbiota in complex ecosystems, which will improve our understanding of both animal- and environment-associated microbiomes and in vitro models.


Microbiota , Proteins , Animals , Carbon Isotopes/analysis , Deuterium Oxide , Humans , Isotope Labeling
18.
Nat Commun ; 11(1): 926, 2020 02 17.
Article En | MEDLINE | ID: mdl-32066737

The field of epitranscriptomics continues to reveal how post-transcriptional modification of RNA affects a wide variety of biological phenomena. A pivotal challenge in this area is the identification of modified RNA residues within their sequence contexts. Mass spectrometry (MS) offers a comprehensive solution by using analogous approaches to shotgun proteomics. However, software support for the analysis of RNA MS data is inadequate at present and does not allow high-throughput processing. Existing software solutions lack the raw performance and statistical grounding to efficiently handle the numerous modifications found on RNA. We present a free and open-source database search engine for RNA MS data, called NucleicAcidSearchEngine (NASE), that addresses these shortcomings. We demonstrate the capability of NASE to reliably identify a wide range of modified RNA sequences in four original datasets of varying complexity. In human tRNA, we characterize over 20 different modification types simultaneously and find many cases of incomplete modification.


Epigenomics/methods , High-Throughput Screening Assays/methods , RNA Processing, Post-Transcriptional/genetics , Search Engine , Tandem Mass Spectrometry/methods , Base Sequence/genetics , Databases, Factual/statistics & numerical data , Datasets as Topic , Humans , Oligonucleotides/chemistry , Oligonucleotides/genetics , Oligonucleotides/metabolism , RNA, Transfer/chemistry , RNA, Transfer/genetics , RNA, Transfer/metabolism , Reproducibility of Results
19.
J Proteome Res ; 19(3): 1060-1072, 2020 03 06.
Article En | MEDLINE | ID: mdl-31975601

Accurate protein inference in the presence of shared peptides is still one of the key problems in bottom-up proteomics. Most protein inference tools employing simple heuristic inference strategies are efficient but exhibit reduced accuracy. More advanced probabilistic methods often exhibit better inference quality but tend to be too slow for large data sets. Here, we present a novel protein inference method, EPIFANY, combining a loopy belief propagation algorithm with convolution trees for efficient processing of Bayesian networks. We demonstrate that EPIFANY combines the reliable protein inference of Bayesian methods with significantly shorter runtimes. On the 2016 iPRG protein inference benchmark data, EPIFANY is the only tested method that finds all true-positive proteins at a 5% protein false discovery rate (FDR) without strict prefiltering on the peptide-spectrum match (PSM) level, yielding an increase in identification performance (+10% in the number of true positives and +14% in partial AUC) compared to previous approaches. Even very large data sets with hundreds of thousands of spectra (which are intractable with other Bayesian and some non-Bayesian tools) can be processed with EPIFANY within minutes. The increased inference quality including shared peptides results in better protein inference results and thus increased robustness of the biological hypotheses generated. EPIFANY is available as open-source software for all major platforms at https://OpenMS.de/epifany.


Algorithms , Proteomics , Bayes Theorem , Databases, Protein , Proteins , Software
20.
J Proteome Res ; 19(1): 537-542, 2020 01 03.
Article En | MEDLINE | ID: mdl-31755270

The field of computational proteomics is approaching the big data age, driven both by a continuous growth in the number of samples analyzed per experiment as well as by the growing amount of data obtained in each analytical run. In order to process these large amounts of data, it is increasingly necessary to use elastic compute resources such as Linux-based cluster environments and cloud infrastructures. Unfortunately, the vast majority of cross-platform proteomics tools are not able to operate directly on the proprietary formats generated by the diverse mass spectrometers. Here, we present ThermoRawFileParser, an open-source, cross-platform tool that converts Thermo RAW files into open file formats such as MGF and the HUPO-PSI standard file format mzML. To ensure the broadest possible availability and to increase integration capabilities with popular workflow systems such as Galaxy or Nextflow, we have also built Conda package and BioContainers container around ThermoRawFileParser. In addition, we implemented a user-friendly interface (ThermoRawFileParserGUI) for those users not familiar with command-line tools. Finally, we performed a benchmark of ThermoRawFileParser and msconvert to verify that the converted mzML files contain reliable quantitative results.


Computational Biology/methods , Proteomics/methods , Software , Databases, Protein , Saccharomyces cerevisiae Proteins/metabolism , Workflow
...