RESUMO
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this Review will serve as a handbook for researchers who are new to the field of bottom-up proteomics.
RESUMO
Although the subcellular dynamics of RNA and proteins are key determinants of cell homeostasis, their characterization is still challenging. Here we present an integrative framework to simultaneously interrogate the dynamics of the transcriptome and proteome at subcellular resolution by combining two methods: localization of RNA (LoRNA) and a streamlined density-based localization of proteins by isotope tagging (dLOPIT) to map RNA and protein to organelles (nucleus, endoplasmic reticulum and mitochondria) and membraneless compartments (cytosol, nucleolus and cytosolic granules). Interrogating all RNA subcellular locations at once enables system-wide quantification of the proportional distribution of RNA. We obtain a cell-wide overview of localization dynamics for 31,839 transcripts and 5,314 proteins during the unfolded protein response, revealing that endoplasmic reticulum-localized transcripts are more efficiently recruited to cytosolic granules than cytosolic RNAs, and that the translation initiation factor eIF3d is key to sustaining cytoskeletal function. Overall, we provide the most comprehensive overview so far of RNA and protein subcellular localization dynamics.
Assuntos
Retículo Endoplasmático , RNA , RNA/genética , RNA/metabolismo , Frações Subcelulares/metabolismo , Retículo Endoplasmático/metabolismo , Proteoma/análiseRESUMO
Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics.
RESUMO
Proteins often undergo structural perturbations upon binding to other proteins or ligands or when they are subjected to environmental changes. Hydrogen-deuterium exchange mass spectrometry (HDX-MS) can be used to explore conformational changes in proteins by examining differences in the rate of deuterium incorporation in different contexts. To determine deuterium incorporation rates, HDX-MS measurements are typically made over a time course. Recently introduced methods show that incorporating the temporal dimension into the statistical analysis improves power and interpretation. However, these approaches have technical assumptions that hinder their flexibility. Here, we propose a more flexible methodology by reframing these methods in a Bayesian framework. Our proposed framework has improved algorithmic stability, allows us to perform uncertainty quantification, and can calculate statistical quantities that are inaccessible to other approaches. We demonstrate the general applicability of the method by showing it can perform rigorous model selection on a spike-in HDX-MS experiment, improved interpretation in an epitope mapping experiment, and increased sensitivity in a small molecule case-study. Bayesian analysis of an HDX experiment with an antibody dimer bound to an E3 ubiquitin ligase identifies at least two interaction interfaces where previous methods obtained confounding results due to the complexities of conformational changes on binding. Our findings are consistent with the cocrystal structure of these proteins, demonstrating a bayesian approach can identify important binding epitopes from HDX data. We also generate HDX-MS data of the bromodomain-containing protein BRD4 in complex with GSK1210151A to demonstrate the increased sensitivity of adopting a Bayesian approach.
Assuntos
Medição da Troca de Deutério , Espectrometria de Massa com Troca Hidrogênio-Deutério , Teorema de Bayes , Deutério/química , Medição da Troca de Deutério/métodos , Proteínas Nucleares , Espectrometria de Massas/métodos , Fatores de TranscriçãoRESUMO
African trypanosomes are dixenous eukaryotic parasites that impose a significant human and veterinary disease burden on sub-Saharan Africa. Diversity between species and life-cycle stages is concomitant with distinct host and tissue tropisms within this group. Here, the spatial proteomes of two African trypanosome species, Trypanosoma brucei and Trypanosoma congolense, are mapped across two life-stages. The four resulting datasets provide evidence of expression of approximately 5500 proteins per cell-type. Over 2500 proteins per cell-type are classified to specific subcellular compartments, providing four comprehensive spatial proteomes. Comparative analysis reveals key routes of parasitic adaptation to different biological niches and provides insight into the molecular basis for diversity within and between these pathogen species.
Assuntos
Trypanosoma brucei brucei , Trypanosoma congolense , Tripanossomíase Africana , Moscas Tsé-Tsé , Humanos , Animais , Tripanossomíase Africana/parasitologia , Moscas Tsé-Tsé/parasitologia , Proteoma , ProteômicaRESUMO
Cryptosporidium is a leading cause of diarrheal disease in children and an important contributor to early childhood mortality. The parasite invades and extensively remodels intestinal epithelial cells, building an elaborate interface structure. How this occurs at the molecular level and the contributing parasite factors are largely unknown. Here, we generated a whole-cell spatial proteome of the Cryptosporidium sporozoite and used genetic and cell biological experimentation to discover the Cryptosporidium-secreted effector proteome. These findings reveal multiple organelles, including an original secretory organelle, and generate numerous compartment markers by tagging native gene loci. We show that secreted proteins are delivered to the parasite-host interface, where they assemble into different structures including a ring that anchors the parasite into its unique epicellular niche. Cryptosporidium thus uses a complex set of secretion systems during and following invasion that act in concert to subjugate its host cell.
Assuntos
Criptosporidiose , Cryptosporidium parvum , Cryptosporidium , Pré-Escolar , Criança , Humanos , Proteoma , Organelas/metabolismo , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Interações Hospedeiro-ParasitaRESUMO
Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.
RESUMO
The ability to identify the designer of engineered biological sequences-termed genetic engineering attribution (GEA)-would help ensure due credit for biotechnological innovation, while holding designers accountable to the communities they affect. Here, we present the results of the first Genetic Engineering Attribution Challenge, a public data-science competition to advance GEA techniques. Top-scoring teams dramatically outperformed previous models at identifying the true lab-of-origin of engineered plasmid sequences, including an increase in top-1 and top-10 accuracy of 10 percentage points. A simple ensemble of prizewinning models further increased performance. New metrics, designed to assess a model's ability to confidently exclude candidate labs, also showed major improvements, especially for the ensemble. Most winning teams adopted CNN-based machine-learning approaches; however, one team achieved very high accuracy with an extremely fast neural-network-free approach. Future work, including future competitions, should further explore a wide diversity of approaches for bringing GEA technology into practical use.
Assuntos
Biotecnologia , Engenharia Genética , Percepção Social , Clonagem Molecular , Técnicas GenéticasRESUMO
The steady-state localisation of proteins provides vital insight into their function. These localisations are context specific with proteins translocating between different subcellular niches upon perturbation of the subcellular environment. Differential localisation, that is a change in the steady-state subcellular location of a protein, provides a step towards mechanistic insight of subcellular protein dynamics. High-accuracy high-throughput mass spectrometry-based methods now exist to map the steady-state localisation and re-localisation of proteins. Here, we describe a principled Bayesian approach, BANDLE, that uses these data to compute the probability that a protein differentially localises upon cellular perturbation. Extensive simulation studies demonstrate that BANDLE reduces the number of both type I and type II errors compared to existing approaches. Application of BANDLE to several datasets recovers well-studied translocations. In an application to cytomegalovirus infection, we obtain insights into the rewiring of the host proteome. Integration of other high-throughput datasets allows us to provide the functional context of these data.
Assuntos
Proteoma , Proteômica , Teorema de Bayes , Espectrometria de Massas/métodos , Proteoma/metabolismo , Proteômica/métodos , Frações Subcelulares/metabolismoRESUMO
BACKGROUND: Cystic Fibrosis (CF) is a genetic disorder affecting around 1 in every 3000 newborns. In the most common mutation, F508del, the defective anion channel, CFTR, is prevented from reaching the plasma membrane (PM) by the quality check control of the cell. Little is known about how CFTR pharmacological rescue impacts the cell proteome. METHODS: We used high-resolution mass spectrometry, differential ultracentrifugation, machine learning and bioinformatics to investigate both changes in the expression and localization of the human bronchial epithelium CF model (F508del-CFTR CFBE41o-) proteome following treatment with VX-809 (Lumacaftor), a drug able to improve the trafficking of CFTR. RESULTS: The data suggested no stark changes in protein expression, yet subtle localization changes of proteins of the mitochondria and peroxisomes were detected. We then used high-content confocal microscopy to further investigate the morphological and compositional changes of peroxisomes and mitochondria under these conditions, as well as in patient-derived primary cells. We profiled several thousand proteins and we determined the subcellular localization data for around 5000 of them using the LOPIT-DC spatial proteomics protocol. CONCLUSIONS: We observed that treatment with VX-809 induces extensive structural and functional remodelling of mitochondria and peroxisomes that resemble the phenotype of healthy cells. Our data suggest additional rescue mechanisms of VX-809 beyond the correction of aberrant folding of F508del-CFTR and subsequent trafficking to the PM.
Assuntos
Fibrose Cística , Aminopiridinas , Benzodioxóis , Fibrose Cística/metabolismo , Regulador de Condutância Transmembrana em Fibrose Cística/metabolismo , Epitélio/metabolismo , Humanos , Recém-Nascido , Mitocôndrias/metabolismo , Proteoma/metabolismoRESUMO
Hydrogen deuterium exchange mass spectrometry (HDX-MS) is a technique to explore differential protein structure by examining the rate of deuterium incorporation for specific peptides. This rate will be altered upon structural perturbation and detecting significant changes to this rate requires a statistical test. To determine rates of incorporation, HDX-MS measurements are frequently made over a time course. However, current statistical testing procedures ignore the correlations in the temporal dimension of the data. Using tools from functional data analysis, we develop a testing procedure that explicitly incorporates a model of hydrogen deuterium exchange. To further improve statistical power, we develop an empirical Bayes version of our method, allowing us to borrow information across peptides and stabilise variance estimates for low sample sizes. Our approach has increased power, reduces false positives and improves interpretation over linear model-based approaches. Due to the improved flexibility of our method, we can apply it to a multi-antibody epitope-mapping experiment where current approaches are inapplicable due insufficient flexibility. Hence, our approach allows HDX-MS to be applied in more experimental scenarios and reduces the burden on experimentalists to produce excessive replicates. Our approach is implemented in the R-package "hdxstats": https://github.com/ococrook/hdxstats .
Assuntos
Medição da Troca de Deutério , Espectrometria de Massa com Troca Hidrogênio-Deutério , Teorema de Bayes , Deutério/química , Medição da Troca de Deutério/métodos , Espectrometria de Massas/métodos , PeptídeosRESUMO
Tandem mass tags (TMTs) enable simple and accurate quantitative proteomics for multiplexed samples by relative quantification of tag reporter ions. Orbitrap quantification of reporter ions has been associated with a characteristic notch region in intensity distribution, within which few reporter intensities are recorded. This has been resolved in version 3 of the instrument acquisition software Tune. However, 47% of Orbitrap Fusion, Lumos, or Eclipse submissions to PRIDE were generated using prior software versions. To quantify the impact of the notch on existing quantitative proteomics data, we generated a mixed species benchmark and acquired quantitative data using Tune versions 2 and 3. Intensities below the notch are predominantly underestimated with Tune version 2, leading to overestimation of the true differences in intensities between samples. However, when summarizing reporter ion intensities to higher-level features, such as peptides and proteins, few features are significantly affected. Targeted removal of spectra with reporter ion intensities below the notch is not beneficial for differential peptide or protein testing. Overall, we find that the systematic quantification bias associated with the notch is not detrimental for a typical proteomics experiment.
RESUMO
Proteomics is a data-rich science with complex experimental designs and an intricate measurement process. To obtain insights from the large data sets produced, statistical methods, including machine learning, are routinely applied. For a quantity of interest, many of these approaches only produce a point estimate, such as a mean, leaving little room for more nuanced interpretations. By contrast, Bayesian statistics allows quantification of uncertainty through the use of probability distributions. These probability distributions enable scientists to ask complex questions of their proteomics data. Bayesian statistics also offers a modular framework for data analysis by making dependencies between data and parameters explicit. Hence, specifying complex hierarchies of parameter dependencies is straightforward in the Bayesian framework. This allows us to use a statistical methodology which equals, rather than neglects, the sophistication of experimental design and instrumentation present in proteomics. Here, we review Bayesian methods applied to proteomics, demonstrating their potential power, alongside the challenges posed by adopting this new statistical framework. To illustrate our review, we give a walk-through of the development of a Bayesian model for dynamic organic orthogonal phase-separation (OOPS) data.
Assuntos
Aprendizado de Máquina , Proteômica , Teorema de Bayes , Probabilidade , IncertezaRESUMO
Genome engineering is undergoing unprecedented development and is now becoming widely available. Genetic engineering attribution can make sequence-lab associations and assist forensic experts in ensuring responsible biotechnology innovation and reducing misuse of engineered DNA sequences. Here we propose a method based on metric learning to rank the most likely labs of origin while simultaneously generating embeddings for plasmid sequences and labs. These embeddings can be used to perform various downstream tasks, such as clustering DNA sequences and labs, as well as using them as features in machine learning models. Our approach employs a circular shift augmentation method and can correctly rank the lab of origin 90% of the time within its top-10 predictions. We also demonstrate that we can perform few-shot learning and obtain 76% top-10 accuracy using only 10% of the sequences. Finally, our approach can also extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
RESUMO
Protein localisation and translocation between intracellular compartments underlie almost all physiological processes. The hyperLOPIT proteomics platform combines mass spectrometry with state-of-the-art machine learning to map the subcellular location of thousands of proteins simultaneously. We combine global proteome analysis with hyperLOPIT in a fully Bayesian framework to elucidate spatiotemporal proteomic changes during a lipopolysaccharide (LPS)-induced inflammatory response. We report a highly dynamic proteome in terms of both protein abundance and subcellular localisation, with alterations in the interferon response, endo-lysosomal system, plasma membrane reorganisation and cell migration. Proteins not previously associated with an LPS response were found to relocalise upon stimulation, the functional consequences of which are still unclear. By quantifying proteome-wide uncertainty through Bayesian modelling, a necessary role for protein relocalisation and the importance of taking a holistic overview of the LPS-driven immune response has been revealed. The data are showcased as an interactive application freely available for the scientific community.
Assuntos
Inflamação/metabolismo , Leucemia/metabolismo , Leucemia/patologia , Lipopolissacarídeos/farmacologia , Proteômica , Algoritmos , Anti-Infecciosos/metabolismo , Anti-Inflamatórios/metabolismo , Apresentação de Antígeno , Autofagossomos/metabolismo , Teorema de Bayes , Pontos de Checagem do Ciclo Celular , Membrana Celular/metabolismo , Núcleo Celular/metabolismo , Forma Celular , Humanos , Imunidade , Inflamação/patologia , Leucemia/imunologia , Ativação Linfocitária/imunologia , Lisossomos/metabolismo , Proteínas de Neoplasias/metabolismo , Transporte Proteico , Proteoma/metabolismo , Transdução de Sinais , Linfócitos T/imunologia , Células THP-1 , Fatores de Tempo , Vesículas Transportadoras/metabolismo , Regulação para Cima , Proteínas rho de Ligação ao GTP/metabolismoRESUMO
The thermal stability of proteins can be altered when they interact with small molecules, other biomolecules or are subject to post-translation modifications. Thus monitoring the thermal stability of proteins under various cellular perturbations can provide insights into protein function, as well as potentially determine drug targets and off-targets. Thermal proteome profiling is a highly multiplexed mass-spectrommetry method for monitoring the melting behaviour of thousands of proteins in a single experiment. In essence, thermal proteome profiling assumes that proteins denature upon heating and hence become insoluble. Thus, by tracking the relative solubility of proteins at sequentially increasing temperatures, one can report on the thermal stability of a protein. Standard thermodynamics predicts a sigmoidal relationship between temperature and relative solubility and this is the basis of current robust statistical procedures. However, current methods do not model deviations from this behaviour and they do not quantify uncertainty in the melting profiles. To overcome these challenges, we propose the application of Bayesian functional data analysis tools which allow complex temperature-solubility behaviours. Our methods have improved sensitivity over the state-of-the art, identify new drug-protein associations and have less restrictive assumptions than current approaches. Our methods allows for comprehensive analysis of proteins that deviate from the predicted sigmoid behaviour and we uncover potentially biphasic phenomena with a series of published datasets.
Assuntos
Teorema de Bayes , Estabilidade Proteica , Proteoma , Solubilidade , Temperatura , TermodinâmicaRESUMO
Intracellular traffic between compartments of the secretory and endocytic pathways is mediated by vesicle-based carriers. The proteomes of carriers destined for many organelles are ill-defined because the vesicular intermediates are transient, low-abundance and difficult to purify. Here, we combine vesicle relocalisation with organelle proteomics and Bayesian analysis to define the content of different endosome-derived vesicles destined for the trans-Golgi network (TGN). The golgin coiled-coil proteins golgin-97 and GCC88, shown previously to capture endosome-derived vesicles at the TGN, were individually relocalised to mitochondria and the content of the subsequently re-routed vesicles was determined by organelle proteomics. Our findings reveal 45 integral and 51 peripheral membrane proteins re-routed by golgin-97, evidence for a distinct class of vesicles shared by golgin-97 and GCC88, and various cargoes specific to individual golgins. These results illustrate a general strategy for analysing intracellular sub-proteomes by combining acute cellular re-wiring with high-resolution spatial proteomics.
Assuntos
Autoantígenos/metabolismo , Proteínas da Matriz do Complexo de Golgi/metabolismo , Proteínas de Membrana/metabolismo , Rede trans-Golgi/metabolismo , Autoantígenos/genética , Endossomos/metabolismo , Técnicas de Silenciamento de Genes , Proteínas da Matriz do Complexo de Golgi/genética , Células HEK293 , Células HeLa , Humanos , Mitocôndrias/metabolismo , Proteômica/métodos , Análise EspacialRESUMO
The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.
Assuntos
Teorema de Bayes , Proteínas de Saccharomyces cerevisiae/metabolismo , Frações Subcelulares/metabolismo , Algoritmos , Animais , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina , Espectrometria de Massas , Camundongos , ProteômicaRESUMO
Apicomplexan parasites cause major human disease and food insecurity. They owe their considerable success to highly specialized cell compartments and structures. These adaptations drive their recognition, nondestructive penetration, and elaborate reengineering of the host's cells to promote their growth, dissemination, and the countering of host defenses. The evolution of unique apicomplexan cellular compartments is concomitant with vast proteomic novelty. Consequently, half of apicomplexan proteins are unique and uncharacterized. Here, we determine the steady-state subcellular location of thousands of proteins simultaneously within the globally prevalent apicomplexan parasite Toxoplasma gondii. This provides unprecedented comprehensive molecular definition of these unicellular eukaryotes and their specialized compartments, and these data reveal the spatial organizations of protein expression and function, adaptation to hosts, and the underlying evolutionary trajectories of these pathogens.
Assuntos
Proteoma , Proteínas de Protozoários/metabolismo , Toxoplasma/metabolismo , Apicomplexa , Evolução Biológica , Epitopos , Interações Hospedeiro-Patógeno , Humanos , Proteômica , Proteínas de Protozoários/química , Proteínas de Protozoários/genética , Toxoplasma/genéticaRESUMO
The spatial subcellular proteome is a dynamic environment; one that can be perturbed by molecular cues and regulated by post-translational modifications. Compartmentalization of this environment and management of these biomolecular dynamics allows for an array of ancillary protein functions. Profiling spatial proteomics has proved to be a powerful technique in identifying the primary subcellular localization of proteins. The approach has also been refashioned to study multi-localization and localization dynamics. Here, the analytical approaches that have been applied to spatial proteomics thus far are critiqued, and challenges particularly associated with multi-localization and dynamic relocalization is identified. To meet some of the current limitations in analytical processing, it is suggested that Bayesian modeling has clear benefits over the methods applied to date and should be favored whenever possible. Careful consideration of the limitations and challenges, and development of robust statistical frameworks, will ensure that profiling spatial proteomics remains a valuable technique as its utility is expanded.