ABSTRACT
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
ABSTRACT
The ability of living systems to adapt to changing conditions originates from their capacity to change their molecular constitution. This is achieved by multiple mechanisms that modulate the quantitative composition and the diversity of the molecular inventory. Molecular diversification is particularly pronounced on the proteome level, at which multiple proteoforms derived from the same gene can in turn combinatorially form different protein complexes, thus expanding the repertoire of functional modules in the cell. The study of molecular and modular diversity and their involvement in responses to changing conditions has only recently become possible through the development of new 'omics'-based screening technologies. This Review explores our current knowledge of the mechanisms regulating functional diversification along the axis of gene expression, with a focus on the proteome and interactome. We explore the interdependence between different molecular levels and how this contributes to functional diversity. Finally, we highlight several recent techniques for studying molecular diversity, with specific focus on mass spectrometry-based analysis of the proteome and its organization into functional modules, and examine future directions for this rapidly growing field.
Subject(s)
Proteome/chemistry , Proteome/metabolism , Proteomics , Animals , Gene Regulatory Networks , Humans , Multiprotein Complexes , Protein Interaction Maps , Protein Isoforms , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Proteome/genetics , TranscriptomeABSTRACT
Cellular functions are mediated by protein-protein interactions, and mapping the interactome provides fundamental insights into biological systems. Affinity purification coupled to mass spectrometry is an ideal tool for such mapping, but it has been difficult to identify low copy number complexes, membrane complexes and complexes that are disrupted by protein tagging. As a result, our current knowledge of the interactome is far from complete, and assessing the reliability of reported interactions is challenging. Here we develop a sensitive high-throughput method using highly reproducible affinity enrichment coupled to mass spectrometry combined with a quantitative two-dimensional analysis strategy to comprehensively map the interactome of Saccharomyces cerevisiae. Thousand-fold reduced volumes in 96-well format enabled replicate analysis of the endogenous GFP-tagged library covering the entire expressed yeast proteome1. The 4,159 pull-downs generated a highly structured network of 3,927 proteins connected by 31,004 interactions, doubling the number of proteins and tripling the number of reliable interactions compared with existing interactome maps2. This includes very-low-abundance epigenetic complexes, organellar membrane complexes and non-taggable complexes inferred by abundance correlation. This nearly saturated interactome reveals that the vast majority of yeast proteins are highly connected, with an average of 16 interactors. Similar to social networks between humans, the average shortest distance between proteins is 4.2 interactions. AlphaFold-Multimer provided novel insights into the functional roles of previously uncharacterized proteins in complexes. Our web portal ( www.yeast-interactome.org ) enables extensive exploration of the interactome dataset.
Subject(s)
Protein Interaction Mapping , Protein Interaction Maps , Proteome , Saccharomyces cerevisiae Proteins , Saccharomyces cerevisiae , Mass Spectrometry , Protein Interaction Mapping/methods , Proteome/chemistry , Proteome/metabolism , Reproducibility of Results , Saccharomyces cerevisiae/chemistry , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/chemistry , Saccharomyces cerevisiae Proteins/metabolism , Epigenesis, Genetic , Databases, FactualABSTRACT
The recent revolution in computational protein structure prediction provides folding models for entire proteomes, which can now be integrated with large-scale experimental data. Mass spectrometry (MS)-based proteomics has identified and quantified tens of thousands of posttranslational modifications (PTMs), most of them of uncertain functional relevance. In this study, we determine the structural context of these PTMs and investigate how this information can be leveraged to pinpoint potential regulatory sites. Our analysis uncovers global patterns of PTM occurrence across folded and intrinsically disordered regions. We found that this information can help to distinguish regulatory PTMs from those marking improperly folded proteins. Interestingly, the human proteome contains thousands of proteins that have large folded domains linked by short, disordered regions that are strongly enriched in regulatory phosphosites. These include well-known kinase activation loops that induce protein conformational changes upon phosphorylation. This regulatory mechanism appears to be widespread in kinases but also occurs in other protein families such as solute carriers. It is not limited to phosphorylation but includes ubiquitination and acetylation sites as well. Furthermore, we performed three-dimensional proximity analysis, which revealed examples of spatial coregulation of different PTM types and potential PTM crosstalk. To enable the community to build upon these first analyses, we provide tools for 3D visualization of proteomics data and PTMs as well as python libraries for data accession and processing.
Subject(s)
Protein Processing, Post-Translational , Proteome , Humans , Mass Spectrometry/methods , Phosphorylation , Proteomics/methodsABSTRACT
The molecular chaperone heat shock protein 90 (HSP90) works in concert with co-chaperones to stabilize its client proteins, which include multiple drivers of oncogenesis and malignant progression. Pharmacologic inhibitors of HSP90 have been observed to exert a wide range of effects on the proteome, including depletion of client proteins, induction of heat shock proteins, dissociation of co-chaperones from HSP90, disruption of client protein signaling networks, and recruitment of the protein ubiquitylation and degradation machinery-suggesting widespread remodeling of cellular protein complexes. However, proteomics studies to date have focused on inhibitor-induced changes in total protein levels, often overlooking protein complex alterations. Here, we use size-exclusion chromatography in combination with mass spectrometry (SEC-MS) to characterize the early changes in native protein complexes following treatment with the HSP90 inhibitor tanespimycin (17-AAG) for 8 h in the HT29 colon adenocarcinoma cell line. After confirming the signature cellular response to HSP90 inhibition (e.g., induction of heat shock proteins, decreased total levels of client proteins), we were surprised to find only modest perturbations to the global distribution of protein elution profiles in inhibitor-treated HT29 cells at this relatively early time-point. Similarly, co-chaperones that co-eluted with HSP90 displayed no clear difference between control and treated conditions. However, two distinct analysis strategies identified multiple inhibitor-induced changes, including known and unknown components of the HSP90-dependent proteome. We validate two of these-the actin-binding protein Anillin and the mitochondrial isocitrate dehydrogenase 3 complex-as novel HSP90 inhibitor-modulated proteins. We present this dataset as a resource for the HSP90, proteostasis, and cancer communities (https://www.bioinformatics.babraham.ac.uk/shiny/HSP90/SEC-MS/), laying the groundwork for future mechanistic and therapeutic studies related to HSP90 pharmacology. Data are available via ProteomeXchange with identifier PXD033459.
Subject(s)
Adenocarcinoma , Antineoplastic Agents , Colonic Neoplasms , Humans , Proteome/metabolism , Adenocarcinoma/drug therapy , Colonic Neoplasms/drug therapy , HSP90 Heat-Shock Proteins , Molecular Chaperones , Antineoplastic Agents/pharmacology , Mass Spectrometry , Chromatography, GelABSTRACT
Despite the availability of methods for analyzing protein complexes, systematic analysis of complexes under multiple conditions remains challenging. Approaches based on biochemical fractionation of intact, native complexes and correlation of protein profiles have shown promise. However, most approaches for interpreting cofractionation datasets to yield complex composition and rearrangements between samples depend considerably on protein-protein interaction inference. We introduce PCprophet, a toolkit built on size exclusion chromatography-sequential window acquisition of all theoretical mass spectrometry (SEC-SWATH-MS) data to predict protein complexes and characterize their changes across experimental conditions. We demonstrate improved performance of PCprophet over state-of-the-art approaches and introduce a Bayesian approach to analyze altered protein-protein interactions across conditions. We provide both command-line and graphical interfaces to support the application of PCprophet to any cofractionation MS dataset, independent of separation or quantitative liquid chromatography-MS workflow, for the detection and quantitative tracking of protein complexes and their physiological dynamics.
Subject(s)
Machine Learning , Proteins/chemistry , Proteomics , Software , Bayes Theorem , Chromatography, Gel , Databases, Protein , Protein ConformationABSTRACT
SUMMARY: The widespread application of mass spectrometry (MS)-based proteomics in biomedical research increasingly requires robust, transparent, and streamlined solutions to extract statistically reliable insights. We have designed and implemented AlphaPeptStats, an inclusive Python package with currently with broad functionalities for normalization, imputation, visualization, and statistical analysis of label-free proteomics data. It modularly builds on the established stack of Python scientific libraries and is accompanied by a rigorous testing framework with 98% test coverage. It imports the output of a range of popular search engines. Data can be filtered and normalized according to user specifications. At its heart, AlphaPeptStats provides a wide range of robust statistical algorithms such as t-tests, analysis of variance, principal component analysis, hierarchical clustering, and multiple covariate analysis-all in an automatable manner. Data visualization capabilities include heat maps, volcano plots, and scatter plots in publication-ready format. AlphaPeptStats advances proteomic research through its robust tools that enable researchers to manually or automatically explore complex datasets to identify interesting patterns and outliers. AVAILABILITY AND IMPLEMENTATION: AlphaPeptStats is implemented in Python and part of the AlphaPept framework. It is released under a permissive Apache license. The source code and one-click installers are freely available and on GitHub at https://github.com/MannLabs/alphapeptstats.
Subject(s)
Proteomics , Software , Proteomics/methods , Mass Spectrometry/methods , Algorithms , Search EngineABSTRACT
Single-cell proteomics aims to characterize biological function and heterogeneity at the level of proteins in an unbiased manner. It is currently limited in proteomic depth, throughput, and robustness, which we address here by a streamlined multiplexed workflow using data-independent acquisition (mDIA). We demonstrate automated and complete dimethyl labeling of bulk or single-cell samples, without losing proteomic depth. Lys-N digestion enables five-plex quantification at MS1 and MS2 level. Because the multiplexed channels are quantitatively isolated from each other, mDIA accommodates a reference channel that does not interfere with the target channels. Our algorithm RefQuant takes advantage of this and confidently quantifies twice as many proteins per single cell compared to our previous work (Brunner et al, PMID 35226415), while our workflow currently allows routine analysis of 80 single cells per day. Finally, we combined mDIA with spatial proteomics to increase the throughput of Deep Visual Proteomics seven-fold for microdissection and four-fold for MS analysis. Applying this to primary cutaneous melanoma, we discovered proteomic signatures of cells within distinct tumor microenvironments, showcasing its potential for precision oncology.
Subject(s)
Melanoma , Skin Neoplasms , Humans , Proteome , Proteomics , Precision Medicine , Tumor MicroenvironmentABSTRACT
Protein complexes constitute the primary functional modules of cellular activity. To respond to perturbations, complexes undergo changes in their abundance, subunit composition, or state of modification. Understanding the function of biological systems requires global strategies to capture this contextual state information. Methods based on cofractionation paired with mass spectrometry have demonstrated the capability for deep biological insight, but the scope of studies using this approach has been limited by the large measurement time per biological sample and challenges with data analysis. There has been little uptake of this strategy into the broader life science community despite its rich biological information content. We present a rapid integrated experimental and computational workflow to assess the reorganization of protein complexes across multiple cellular states. The workflow combines short gradient chromatography and DIA/SWATH mass spectrometry with a data analysis toolset to quantify changes in a complex organization. We applied the workflow to study the global protein complex rearrangements of THP-1 cells undergoing monocyte to macrophage differentiation and subsequent stimulation of macrophage cells with lipopolysaccharide. We observed substantial proteome reorganization on differentiation and less pronounced changes in macrophage stimulation. We establish our integrated differential pipeline for rapid and state-specific profiling of protein complex organization.
Subject(s)
Proteome , Proteome/analysis , Mass Spectrometry/methods , Cell DifferentiationABSTRACT
In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.
Subject(s)
Machine Learning , Proteomics , Proteomics/methods , Algorithms , Mass SpectrometryABSTRACT
Data-independent acquisition modes isolate and concurrently fragment populations of different precursors by cycling through segments of a predefined precursor m/z range. Although these selection windows collectively cover the entire m/z range, overall, only a few per cent of all incoming ions are isolated for mass analysis. Here, we make use of the correlation of molecular weight and ion mobility in a trapped ion mobility device (timsTOF Pro) to devise a scan mode that samples up to 100% of the peptide precursor ion current in m/z and mobility windows. We extend an established targeted data extraction workflow by inclusion of the ion mobility dimension for both signal extraction and scoring and thereby increase the specificity for precursor identification. Data acquired from whole proteome digests and mixed organism samples demonstrate deep proteome coverage and a high degree of reproducibility as well as quantitative accuracy, even from 10 ng sample amounts.
Subject(s)
Data Science/methods , High-Throughput Screening Assays/methods , Ion Channels/metabolism , Ion Transport/physiology , Proteome/metabolism , Cell Line, Tumor , HeLa Cells , Humans , Ions/chemistry , Proteomics/methods , Reproducibility of Results , Tandem Mass Spectrometry/methodsABSTRACT
SUMMARY: Integrating experimental information across proteomic datasets with the wealth of publicly available sequence annotations is a crucial part in many proteomic studies that currently lacks an automated analysis platform. Here, we present AlphaMap, a Python package that facilitates the visual exploration of peptide-level proteomics data. Identified peptides and post-translational modifications in proteomic datasets are mapped to their corresponding protein sequence and visualized together with prior knowledge from UniProt and with expected proteolytic cleavage sites. The functionality of AlphaMap can be accessed via an intuitive graphical user interface or-more flexibly-as a Python package that allows its integration into common analysis workflows for data visualization. AlphaMap produces publication-quality illustrations and can easily be customized to address a given research question. AVAILABILITY AND IMPLEMENTATION: AlphaMap is implemented in Python and released under an Apache license. The source code and one-click installers are freely available at https://github.com/MannLabs/alphamap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Proteomics , Software , Peptides , Amino Acid Sequence , Peptide HydrolasesABSTRACT
Genomic variation impacts on cellular networks by affecting the abundance (e.g., protein levels) and the functional states (e.g., protein phosphorylation) of their components. Previous work has focused on the former, while in this context, the functional states of proteins have largely remained neglected. Here, we generated high-quality transcriptome, proteome, and phosphoproteome data for a panel of 112 genomically well-defined yeast strains. Genetic effects on transcripts were generally transmitted to the protein layer, but specific gene groups, such as ribosomal proteins, showed diverging effects on protein levels compared with RNA levels. Phosphorylation states proved crucial to unravel genetic effects on signaling networks. Correspondingly, genetic variants that cause phosphorylation changes were mostly different from those causing abundance changes in the respective proteins. Underscoring their relevance for cell physiology, phosphorylation traits were more strongly correlated with cell physiological traits such as chemical compound resistance or cell morphology, compared with transcript or protein abundance. This study demonstrates how molecular networks mediate the effects of genomic variants to cellular traits and highlights the particular importance of protein phosphorylation.
Subject(s)
Genome , Genomics , Phosphorylation , Proteome/genetics , Saccharomyces cerevisiae/geneticsABSTRACT
Mass-spectrometry based bottom-up proteomics is the main method to analyze proteomes comprehensively and the rapid evolution of instrumentation and data analysis has made the technology widely available. Data visualization is an integral part of the analysis process and it is crucial for the communication of results. This is a major challenge due to the immense complexity of MS data. In this review, we provide an overview of commonly used visualizations, starting with raw data of traditional and novel MS technologies, then basic peptide and protein level analyses, and finally visualization of highly complex datasets and networks. We specifically provide guidance on how to critically interpret and discuss the multitude of different proteomics data visualizations. Furthermore, we highlight Python-based libraries and other open science tools that can be applied for independent and transparent generation of customized visualizations. To further encourage programmatic data visualization, we provide the Python code used to generate all data figures in this review on GitHub (https://github.com/MannLabs/ProteomicsVisualization).
Subject(s)
Data Visualization , Proteomics , Mass Spectrometry , Peptides , Proteomics/methods , SoftwareABSTRACT
Human erythropoiesis is an exquisitely controlled multistep developmental process, and its dysregulation leads to numerous human diseases. Transcriptome and epigenome studies provided insights into system-wide regulation, but we currently lack a global mechanistic view on the dynamics of proteome and post-translational regulation coordinating erythroid maturation. We established a mass spectrometry (MS)-based proteomics workflow to quantify and dynamically track 7,400 proteins and 27,000 phosphorylation sites of five distinct maturation stages of in vitro reconstituted erythropoiesis of CD34+ HSPCs. Our data reveal developmental regulation through drastic proteome remodeling across stages of erythroid maturation encompassing most protein classes. This includes various orchestrated changes in solute carriers indicating adjustments to altered metabolic requirements. To define the distinct proteome of each maturation stage, we developed a computational deconvolution approach which revealed stage-specific marker proteins. The dynamic phosphoproteomes combined with a kinome-targeted CRISPR/Cas9 screen uncovered coordinated networks of erythropoietic kinases and pinpointed downregulation of c-Kit/MAPK signaling axis as key driver of maturation. Our system-wide view establishes the functional dynamic of complex phosphosignaling networks and regulation through proteome remodeling in erythropoiesis.
Subject(s)
Erythropoiesis , Proteomics , Signal Transduction , Biomarkers/metabolism , CRISPR-Cas Systems/genetics , Cell Differentiation/genetics , Cell Line , Gene Ontology , Humans , Membrane Proteins/metabolism , Phosphorylation , Protein Kinases/metabolism , Proteome/metabolism , Reproducibility of ResultsABSTRACT
Protein complexes are the main functional modules in the cell that coordinate and perform the vast majority of molecular functions. The main approaches to identify and quantify the interactome to date are based on mass spectrometry (MS). Here I summarize the benefits and limitations of different MS-based interactome screens, with a focus on untargeted interactome acquisition, such as co-fractionation MS. Specific emphasis is given to the discussion of discovery- versus hypothesis-driven data analysis concepts and their applicability to large, proteome-wide interactome screens. Hypothesis-driven analysis approaches, i.e., complex- or network-centric, are highlighted as promising strategies for comparative studies. While these approaches require prior information from public databases, also reviewed herein, the available wealth of interactomic data continuously increases, thereby providing more exhaustive information for future studies. Finally, guidance on the selection of interactome acquisition and analysis methods is provided to aid the reader in the design of protein-protein interaction studies.
Subject(s)
Computational Biology/methods , Multiprotein Complexes/metabolism , Protein Interaction Domains and Motifs , Protein Interaction Mapping/methods , Proteins/metabolism , Proteome/analysis , Proteomics/methods , Algorithms , Humans , Proteome/metabolismABSTRACT
Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) is the main method for high-throughput identification and quantification of peptides and inferred proteins. Within this field, data-independent acquisition (DIA) combined with peptide-centric scoring, as exemplified by the technique SWATH-MS, has emerged as a scalable method to achieve deep and consistent proteome coverage across large-scale data sets. We demonstrate that statistical concepts developed for discovery proteomics based on spectrum-centric scoring can be adapted to large-scale DIA experiments that have been analyzed with peptide-centric scoring strategies, and we provide guidance on their application. We show that optimal tradeoffs between sensitivity and specificity require careful considerations of the relationship between proteins in the samples and proteins represented in the spectral library. We propose the application of a global analyte constraint to prevent the accumulation of false positives across large-scale data sets. Furthermore, to increase the quality and reproducibility of published proteomic results, well-established confidence criteria should be reported for the detected peptide queries, peptides and inferred proteins.
Subject(s)
Data Interpretation, Statistical , High-Throughput Screening Assays/methods , Mass Spectrometry/methods , Peptide Mapping/methods , Proteins/chemistry , Sequence Analysis, Protein/methods , Computer Simulation , Models, Statistical , Proteins/analysis , Reproducibility of Results , Sensitivity and SpecificityABSTRACT
Proteins are major effectors and regulators of biological processes that can elicit multiple functions depending on their interaction with other proteins. The organization of proteins into macromolecular complexes and their quantitative distribution across these complexes is, therefore, of great biological and clinical significance. In this paper, we describe an integrated experimental and computational technique to quantify hundreds of protein complexes in a single operation. The method consists of size exclusion chromatography (SEC) to fractionate native protein complexes, SWATH/DIA mass spectrometry to precisely quantify the proteins in each SEC fraction, and the computational framework CCprofiler to detect and quantify protein complexes by error-controlled, complex-centric analysis using prior information from generic protein interaction maps. Our analysis of the HEK293 cell line proteome delineates 462 complexes composed of 2,127 protein subunits. The technique identifies novel sub-complexes and assembly intermediates of central regulatory complexes while assessing the quantitative subunit distribution across them. We make the toolset CCprofiler freely accessible and provide a web platform, SECexplorer, for custom exploration of the HEK293 proteome modularity.
Subject(s)
Chromatography, Gel/methods , Mass Spectrometry/methods , Multiprotein Complexes/analysis , Proteome/analysis , Proteomics/methods , Algorithms , Computational Biology/methods , HEK293 Cells , Humans , Multiprotein Complexes/metabolism , Protein Interaction Maps , Proteome/metabolismABSTRACT
Mitochondria are essential organelles whose dysfunction causes human pathologies that often manifest in a tissue-specific manner. Accordingly, mitochondrial fitness depends on versatile proteomes specialized to meet diverse tissue-specific requirements. Increasing evidence suggests that phosphorylation may play an important role in regulating tissue-specific mitochondrial functions and pathophysiology. Building on recent advances in mass spectrometry (MS)-based proteomics, we here quantitatively profile mitochondrial tissue proteomes along with their matching phosphoproteomes. We isolated mitochondria from mouse heart, skeletal muscle, brown adipose tissue, kidney, liver, brain, and spleen by differential centrifugation followed by separation on Percoll gradients and performed high-resolution MS analysis of the proteomes and phosphoproteomes. This in-depth map substantially quantifies known and predicted mitochondrial proteins and provides a resource of core and tissue-specific mitochondrial proteins (mitophos.de). Predicting kinase substrate associations for different mitochondrial compartments indicates tissue-specific regulation at the phosphoproteome level. Illustrating the functional value of our resource, we reproduce mitochondrial phosphorylation events on dynamin-related protein 1 responsible for its mitochondrial recruitment and fission initiation and describe phosphorylation clusters on MIGA2 linked to mitochondrial fusion.
Subject(s)
Mitochondria , Proteome , Mice , Animals , Humans , Proteome/metabolism , Mitochondria/metabolism , Phosphorylation , Mass Spectrometry , Mitochondrial Proteins/metabolismABSTRACT
In common with other omics technologies, mass spectrometry (MS)-based proteomics produces ever-increasing amounts of raw data, making efficient analysis a principal challenge. A plethora of different computational tools can process the MS data to derive peptide and protein identification and quantification. However, during the last years there has been dramatic progress in computer science, including collaboration tools that have transformed research and industry. To leverage these advances, we develop AlphaPept, a Python-based open-source framework for efficient processing of large high-resolution MS data sets. Numba for just-in-time compilation on CPU and GPU achieves hundred-fold speed improvements. AlphaPept uses the Python scientific stack of highly optimized packages, reducing the code base to domain-specific tasks while accessing the latest advances. We provide an easy on-ramp for community contributions through the concept of literate programming, implemented in Jupyter Notebooks. Large datasets can rapidly be processed as shown by the analysis of hundreds of proteomes in minutes per file, many-fold faster than acquisition. AlphaPept can be used to build automated processing pipelines with web-serving functionality and compatibility with downstream analysis tools. It provides easy access via one-click installation, a modular Python library for advanced users, and via an open GitHub repository for developers.