RESUMO
Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.
Assuntos
Proteoma/metabolismo , Proteômica/estatística & dados numéricos , Algoritmos , Animais , Arabidopsis , Biologia Computacional , Interpretação Estatística de Dados , Drosophila , Células-Tronco Embrionárias/metabolismo , Humanos , Armazenamento e Recuperação da Informação , Espectrometria de Massas , Camundongos , Proteoma/classificação , Software , Frações Subcelulares/metabolismo , Máquina de Vetores de SuporteRESUMO
During mammalian preimplantation development, the cells of the blastocyst's inner cell mass differentiate into the epiblast and primitive endoderm lineages, which give rise to the fetus and extra-embryonic tissues, respectively. Extra-embryonic endoderm (XEN) differentiation can be modeled in vitro by induced expression of GATA transcription factors in mouse embryonic stem cells. Here, we use this GATA-inducible system to quantitatively monitor the dynamics of global proteomic changes during the early stages of this differentiation event and also investigate the fully differentiated phenotype, as represented by embryo-derived XEN cells. Using mass spectrometry-based quantitative proteomic profiling with multivariate data analysis tools, we reproducibly quantified 2,336 proteins across three biological replicates and have identified clusters of proteins characterized by distinct, dynamic temporal abundance profiles. We first used this approach to highlight novel marker candidates of the pluripotent state and XEN differentiation. Through functional annotation enrichment analysis, we have shown that the downregulation of chromatin-modifying enzymes, the reorganization of membrane trafficking machinery, and the breakdown of cell-cell adhesion are successive steps of the extra-embryonic differentiation process. Thus, applying a range of sophisticated clustering approaches to a time-resolved proteomic dataset has allowed the elucidation of complex biological processes which characterize stem cell differentiation and could establish a general paradigm for the investigation of these processes.
Assuntos
Diferenciação Celular/fisiologia , Endoderma/fisiologia , Membranas Extraembrionárias/fisiologia , Células-Tronco Embrionárias Murinas/fisiologia , Proteômica/métodos , Animais , Células Cultivadas , Endoderma/citologia , Membranas Extraembrionárias/citologia , CamundongosRESUMO
Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis.
Assuntos
Interpretação Estatística de Dados , Proteômica/métodos , Inteligência Artificial , Espectrometria de Massas , Software , SomRESUMO
This review presents how R, the popular statistical environment and programming language, can be used in the frame of proteomics data analysis. A short introduction to R is given, with special emphasis on some of the features that make R and its add-on packages premium software for sound and reproducible data analysis. The reader is also advised on how to find relevant R software for proteomics. Several use cases are then presented, illustrating data input/output, quality control, quantitative proteomics and data analysis. Detailed code and additional links to extensive documentation are available in the freely available companion package RforProteomics. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Assuntos
Linguagens de Programação , Proteômica , Sequência de Aminoácidos , Espectrometria de Massas , Dados de Sequência Molecular , Fosfopiruvato Hidratase/química , Controle de QualidadeRESUMO
Despite the increasing popularity of data-independent acquisition workflows, data-dependent acquisition (DDA) is still the prevalent method of LC-MS-based proteomics. DDA is the basis of isobaric mass tagging technique, a powerful MS2 quantification strategy that allows coanalysis of up to 10 proteomics samples. A well-documented limitation of DDA, however, is precursor coselection, whereby a target peptide is coisolated with other ions for fragmentation. Here, we investigated if additional peptide purification by traveling wave ion mobility separation (TWIMS) can reduce precursor contamination using a mixture of Saccharomyces cerevisiae and HeLa proteomes. In accordance with previous reports on FAIMS-Orbitrap instruments, we find that TWIMS provides a remarkable improvement (on average 2.85 times) in the signal-to-noise ratio for sequence ions. We also report that TWIMS reduces reporter ions contamination by around one-third (to 14-15% contamination) and even further (to 6-9%) when combined with a narrowed quadrupole isolation window. We discuss challenges associated with applying TWIMS purification to isobaric mass tagging experiments, including correlation between ion m/z and drift time, which means that coselected peptides are expected to have similar mobility. We also demonstrate that labeling results in peptides having more uniform m/z and drift time distributions than observed for unlabeled peptides. Data are available via ProteomeXchange with identifier PXD001047.
Assuntos
Proteoma/química , Cromatografia Líquida , Células HeLa , Humanos , Peso Molecular , Proteoma/isolamento & purificação , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/isolamento & purificação , Razão Sinal-Ruído , Espectrometria de Massas em TandemRESUMO
ABSTRACT: Immunomodulatory drugs (IMiDs) are key drugs for treating multiple myeloma and myelodysplastic syndrome with chromosome 5q deletion. IMiDs exert their pleiotropic effects through the interaction between cell-specific substrates and cereblon, a substrate receptor of the E3 ubiquitin ligase complex. Thus, identification of cell-specific substrates is important for understanding the effects of IMiDs. IMiDs increase the risk of thromboembolism, which sometimes results in fatal clinical outcomes. In this study, we sought to clarify the molecular mechanisms underlying IMiDs-induced thrombosis. We investigated cereblon substrates in human megakaryocytes using liquid chromatography-mass spectrometry and found that thrombospondin-1 (THBS-1), which is an inhibitor of a disintegrin-like and metalloproteinase with thrombospondin type 1 motifs 13, functions as an endogenous substrate in human megakaryocytes. IMiDs inhibited the proteasomal degradation of THBS-1 by impairing the recruitment of cereblon to THBS-1, leading to aberrant accumulation of THBS-1. We observed a significant increase in THBS-1 in peripheral blood mononuclear cells as well as larger von Willebrand factor multimers in the plasma of patients with myeloma, who were treated with IMiDs. These results collectively suggest that THBS-1 represents an endogenous substrate of cereblon. This pairing is disrupted by IMiDs, and the aberrant accumulation of THBS-1 plays an important role in the pathogenesis of IMiDs-induced thromboembolism.
Assuntos
Mieloma Múltiplo , Tromboembolia , Humanos , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Agentes de Imunomodulação , Leucócitos Mononucleares/metabolismo , Mieloma Múltiplo/genética , Tromboembolia/etiologia , Trombospondinas/metabolismo , Trombospondinas/uso terapêuticoRESUMO
Isobaric tagging has proven to be a popular quantitative proteomics tool and has been rapidly adopted to study a wide range of biological questions in the few years since its commercialization. While the flexibility and multiplexing capacity afforded by this technology are clear attractions, it is not without its shortcomings. As the speed and sensitivity of mass spectrometers have improved and the application of isobaric tags to all manner of biological systems has increased, significant issues with quantitative accuracy and precision have come to light. Here we review the issues associated with the use of isobaric tagging methods and discuss the possible solutions which have been proposed to improve their precision and accuracy to approach the levels required within quantitative proteomics.
Assuntos
Proteínas/química , Proteômica/métodos , Animais , Humanos , Marcação por Isótopo/instrumentação , Marcação por Isótopo/métodos , Espectrometria de Massas/instrumentação , Espectrometria de Massas/métodos , Proteínas/metabolismo , Proteômica/instrumentaçãoRESUMO
Protein localisation and translocation between intracellular compartments underlie almost all physiological processes. The hyperLOPIT proteomics platform combines mass spectrometry with state-of-the-art machine learning to map the subcellular location of thousands of proteins simultaneously. We combine global proteome analysis with hyperLOPIT in a fully Bayesian framework to elucidate spatiotemporal proteomic changes during a lipopolysaccharide (LPS)-induced inflammatory response. We report a highly dynamic proteome in terms of both protein abundance and subcellular localisation, with alterations in the interferon response, endo-lysosomal system, plasma membrane reorganisation and cell migration. Proteins not previously associated with an LPS response were found to relocalise upon stimulation, the functional consequences of which are still unclear. By quantifying proteome-wide uncertainty through Bayesian modelling, a necessary role for protein relocalisation and the importance of taking a holistic overview of the LPS-driven immune response has been revealed. The data are showcased as an interactive application freely available for the scientific community.
Assuntos
Inflamação/metabolismo , Leucemia/metabolismo , Leucemia/patologia , Lipopolissacarídeos/farmacologia , Proteômica , Algoritmos , Anti-Infecciosos/metabolismo , Anti-Inflamatórios/metabolismo , Apresentação de Antígeno , Autofagossomos/metabolismo , Teorema de Bayes , Pontos de Checagem do Ciclo Celular , Membrana Celular/metabolismo , Núcleo Celular/metabolismo , Forma Celular , Humanos , Imunidade , Inflamação/patologia , Leucemia/imunologia , Ativação Linfocitária/imunologia , Lisossomos/metabolismo , Proteínas de Neoplasias/metabolismo , Transporte Proteico , Proteoma/metabolismo , Transdução de Sinais , Linfócitos T/imunologia , Células THP-1 , Fatores de Tempo , Vesículas Transportadoras/metabolismo , Regulação para Cima , Proteínas rho de Ligação ao GTP/metabolismoRESUMO
The organization of eukaryotic cells into distinct subcompartments is vital for all functional processes, and aberrant protein localization is a hallmark of many diseases. Microscopy methods, although powerful, are usually low-throughput and dependent on the availability of fluorescent fusion proteins or highly specific and sensitive antibodies. One method that provides a global picture of the cell is localization of organelle proteins by isotope tagging (LOPIT), which combines biochemical cell fractionation using density gradient ultracentrifugation with multiplexed quantitative proteomics mass spectrometry, allowing simultaneous determination of the steady-state distribution of hundreds of proteins within organelles. Proteins are assigned to organelles based on the similarity of their gradient distribution to those of well-annotated organelle marker proteins. We have substantially re-developed our original LOPIT protocol (published by Nature Protocols in 2006) to enable the subcellular localization of thousands of proteins per experiment (hyperLOPIT), including spatial resolution at the suborganelle and large protein complex level. This Protocol Extension article integrates all elements of the hyperLOPIT pipeline, including an additional enrichment strategy for chromatin, extended multiplexing capacity of isobaric mass tags, state-of-the-art mass spectrometry methods and multivariate machine-learning approaches for analysis of spatial proteomics data. We have also created an open-source infrastructure to support analysis of quantitative mass-spectrometry-based spatial proteomics data (http://bioconductor.org/packages/pRoloc) and an accompanying interactive visualization framework (http://www. bioconductor.org/packages/pRolocGUI). The procedure we outline here is applicable to any cell culture system and requires â¼1 week to complete sample preparation steps, â¼2 d for mass spectrometry data acquisition and 1-2 d for data analysis and downstream informatics.
Assuntos
Proteoma/análise , Proteômica/métodos , Análise Espacial , Fracionamento Celular/métodos , Centrifugação com Gradiente de Concentração/métodos , Células Eucarióticas/química , Espectrometria de Massas/métodosRESUMO
Knowledge of the subcellular distribution of proteins is vital for understanding cellular mechanisms. Capturing the subcellular proteome in a single experiment has proven challenging, with studies focusing on specific compartments or assigning proteins to subcellular niches with low resolution and/or accuracy. Here we introduce hyperLOPIT, a method that couples extensive fractionation, quantitative high-resolution accurate mass spectrometry with multivariate data analysis. We apply hyperLOPIT to a pluripotent stem cell population whose subcellular proteome has not been extensively studied. We provide localization data on over 5,000 proteins with unprecedented spatial resolution to reveal the organization of organelles, sub-organellar compartments, protein complexes, functional networks and steady-state dynamics of proteins and unexpected subcellular locations. The method paves the way for characterizing the impact of post-transcriptional and post-translational modification on protein location and studies involving proteome-level locational changes on cellular perturbation. An interactive open-source resource is presented that enables exploration of these data.
Assuntos
Espaço Intracelular/metabolismo , Células-Tronco Embrionárias Murinas/metabolismo , Proteoma/metabolismo , Animais , Fracionamento Celular , Imuno-Histoquímica , Aprendizado de Máquina , Espectrometria de Massas , Camundongos , Análise Multivariada , Células-Tronco Pluripotentes/metabolismo , Proteômica/métodos , Frações SubcelularesRESUMO
Protein subcellular localization is a fundamental feature of posttranslational functional regulation. Traditional microscopy based approaches to study protein localization are typically of limited throughput, and dependent on the availability of antibodies with high specificity and sensitivity, or fluorescent fusion proteins. In this chapter we describe how Localization of Organelle Proteins by Isotope Tagging (LOPIT), a mass spectrometry based workflow coupling biochemical fractionation and iTRAQ™ 8-plex quantification, can be applied for the high-throughput characterization of protein localization in a mammalian cell culture line.
Assuntos
Proteínas/metabolismo , Frações Subcelulares/metabolismo , Animais , Células Cultivadas , Cromatografia Líquida , Mamíferos , Espectrometria de Massas em TandemRESUMO
Blood serum is one of the easiest accessible sources of biomarkers and its proteome presents a significant parcel of immune system proteins. These proteins can provide not only biological explanation but also diagnostic and drug response answers independently of the type of disease or condition in question. Shotgun mass spectrometry has profoundly contributed to proteome analysis and is presently considered as an indispensible tool in the field of biomarker discovery. In addition, the multiplexing potential of isotopic labeling techniques such as iTRAQ can increase statistical relevance and accuracy of proteomic data through the simultaneous analysis of different biological samples. Here, we describe a complete protocol using iTRAQ in a shotgun proteomics workflow along with data analysis steps, customized for the challenges associated with the serum proteome.