RESUMO
We introduce single cell Proteoform imaging Mass Spectrometry (scPiMS), which realizes the benefit of direct solvent extraction and MS detection of intact proteins from single cells dropcast onto glass slides. Sampling and detection of whole proteoforms by individual ion mass spectrometry enable a scalable approach to single cell proteomics. This new scPiMS platform addresses the throughput bottleneck in single cell proteomics and boosts the cell processing rate by several fold while accessing protein composition with higher coverage.
Assuntos
Espectrometria de Massas , Proteômica , Análise de Célula Única , Análise de Célula Única/métodos , Proteômica/métodos , Humanos , Espectrometria de Massas/métodos , Proteoma/análiseRESUMO
An Orbitrap-based ion analysis procedure determines the direct charge for numerous individual protein ions to generate true mass spectra. This individual ion mass spectrometry (I2MS) method for charge detection enables the characterization of highly complicated mixtures of proteoforms and their complexes in both denatured and native modes of operation, revealing information not obtainable by typical measurements of ensembles of ions.
Assuntos
Espectrometria de Massas/métodos , Proteínas/química , Proteômica/métodos , HumanosRESUMO
The effectiveness of any proteomics database search depends on the theoretical candidate information contained in the protein database. Unfortunately, candidate entries from protein databases such as UniProt rarely contain all the post-translational modifications (PTMs), disulfide bonds, or endogenous cleavages of interest to researchers. These omissions can limit discovery of novel and biologically important proteoforms. Conversely, searching for a specific proteoform becomes a computationally difficult task for heavily modified proteins. Both situations require updates to the database through user-annotated entries. Unfortunately, manually creating properly formatted UniProt Extensible Markup Language (XML) files is tedious and prone to errors. ProSight Annotator solves these issues by providing a graphical interface for adding user-defined features to UniProt-formatted XML files for better informed proteoform searches. It can be downloaded from http://prosightannotator.northwestern.edu.
Assuntos
Idioma , Proteínas , Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteômica , SoftwareRESUMO
Top-down proteomics studies intact proteoform mixtures and offers important advantages over more common bottom-up proteomics technologies, as it avoids the protein inference problem. However, achieving complete molecular characterization of investigated proteoforms using existing technologies remains a fundamental challenge for top-down proteomics. Here, we benchmark the performance of ultraviolet photodissociation (UVPD) using 213 nm photons generated by a solid-state laser applied to the study of intact proteoforms from three organisms. Notably, the described UVPD setup applies multiple laser pulses to induce ion dissociation, and this feature can be used to optimize the fragmentation outcome based on the molecular weight of the analyzed biomolecule. When applied to complex proteoform mixtures in high-throughput top-down proteomics, 213 nm UVPD demonstrated a high degree of complementarity with the most employed fragmentation method in proteomics studies, higher-energy collisional dissociation (HCD). UVPD at 213 nm offered higher average proteoform sequence coverage and degree of proteoform characterization (including localization of post-translational modifications) than HCD. However, previous studies have shown limitations in applying database search strategies developed for HCD fragmentation to UVPD spectra which contains up to nine fragment ion types. We therefore performed an analysis of the different UVPD product ion type frequencies. From these data, we developed an ad hoc fragment matching strategy and determined the influence of each possible ion type on search outcomes. By paring down the number of ion types considered in high-throughput UVPD searches from all types down to the four most abundant, we were ultimately able to achieve deeper proteome characterization with UVPD. Lastly, our detailed product ion analysis also revealed UVPD cleavage propensities and determined the presence of a product ion produced specifically by 213 nm photons. All together, these observations could be used to better elucidate UVPD dissociation mechanisms and improve the utility of the technique for proteomic applications.
Assuntos
Proteômica/métodos , Raios Ultravioleta , Animais , Anidrases Carbônicas , Células Cultivadas , Cromatografia Líquida , Fibroblastos , Proteínas Fúngicas , Humanos , Camundongos , Miócitos Cardíacos , Mioglobina , Fótons , Pseudomonas aeruginosa , Espectrometria de Massas em Tandem , UbiquitinaRESUMO
Unraveling the complexity of biological systems relies on the development of new approaches for spatially resolved proteoform-specific analysis of the proteome. Herein, we employ nanospray desorption electrospray ionization mass spectrometry imaging (nano-DESI MSI) for the proteoform-selective imaging of biological tissues. Nano-DESI generates multiply charged protein ions, which is advantageous for their structural characterization using tandem mass spectrometry (MS/MS) directly on the tissue. Proof-of-concept experiments demonstrate that nano-DESI MSI combined with on-tissue top-down proteomics is ideally suited for the proteoform-selective imaging of tissue sections. Using rat brain tissue as a model system, we provide the first evidence of differential proteoform expression in different regions of the brain.
Assuntos
Espectrometria de Massas por Ionização por Electrospray , Espectrometria de Massas em Tandem , Animais , Íons , Proteoma/análise , Proteômica/métodos , Ratos , Espectrometria de Massas por Ionização por Electrospray/métodos , Espectrometria de Massas em Tandem/métodosRESUMO
Within the last several years, top-down proteomics has emerged as a high throughput technique for protein and proteoform identification. This technique has the potential to identify and characterize thousands of proteoforms within a single study, but the absence of accurate false discovery rate (FDR) estimation could hinder the adoption and consistency of top-down proteomics in the future. In automated identification and characterization of proteoforms, FDR calculation strongly depends on the context of the search. The context includes MS data quality, the database being interrogated, the search engine, and the parameters of the search. Particular to top-down proteomics-there are four molecular levels of study: proteoform spectral match (PrSM), protein, isoform, and proteoform. Here, a context-dependent framework for calculating an accurate FDR at each level was designed, implemented, and validated against a manually curated training set with 546 confirmed proteoforms. We examined several search contexts and found that an FDR calculated at the PrSM level under-reported the true FDR at the protein level by an average of 24-fold. We present a new open-source tool, the TDCD_FDR_Calculator, which provides a scalable, context-dependent FDR calculation that can be applied post-search to enhance the quality of results in top-down proteomics from any search engine.
Assuntos
Proteômica/métodos , Algoritmos , Bases de Dados de Proteínas , Humanos , Isoformas de Proteínas/metabolismo , Reprodutibilidade dos TestesRESUMO
Efforts to map the human protein interactome have resulted in information about thousands of multi-protein assemblies housed in public repositories, but the molecular characterization and stoichiometry of their protein subunits remains largely unknown. Here, we report a computational search strategy that supports hierarchical top-down analysis for precise identification and scoring of multi-proteoform complexes by native mass spectrometry.
Assuntos
Mineração de Dados/métodos , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Análise de Sequência de Proteína/métodos , Algoritmos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Modelos Químicos , Dados de Sequência Molecular , Ligação ProteicaRESUMO
Bottom-up proteomics relies on the use of proteases and is the method of choice for identifying thousands of protein groups in complex samples. Top-down proteomics has been shown to be robust for direct analysis of small proteins and offers a solution to the "peptide-to-protein" inference problem inherent with bottom-up approaches. Here, we describe the first large-scale integration of genomic, bottom-up and top-down proteomic data for the comparative analysis of patient-derived mouse xenograft models of basal and luminal B human breast cancer, WHIM2 and WHIM16, respectively. Using these well-characterized xenograft models established by the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium, we compared and contrasted the performance of bottom-up and top-down proteomics to detect cancer-specific aberrations at the peptide and proteoform levels and to measure differential expression of proteins and proteoforms. Bottom-up proteomic analysis of the tumor xenografts detected almost 10 times as many coding nucleotide polymorphisms and peptides resulting from novel splice junctions than top-down. For proteins in the range of 0-30 kDa, where quantitation was performed using both approaches, bottom-up proteomics quantified 3,519 protein groups from 49,185 peptides, while top-down proteomics quantified 982 proteoforms mapping to 358 proteins. Examples of both concordant and discordant quantitation were found in a â¼60:40 ratio, providing a unique opportunity for top-down to fill in missing information. The two techniques showed complementary performance, with bottom-up yielding eight times more identifications of 0-30 kDa proteins in xenograft proteomes, but failing to detect differences in certain posttranslational modifications (PTMs), such as phosphorylation pattern changes of alpha-endosulfine. This work illustrates the potency of a combined bottom-up and top-down proteomics approach to deepen our knowledge of cancer biology, especially when genomic data are available.
Assuntos
Neoplasias da Mama/metabolismo , Xenoenxertos/metabolismo , Proteoma/metabolismo , Proteômica/métodos , Animais , Neoplasias da Mama/genética , Cromatografia Líquida de Alta Pressão , Feminino , Genótipo , Humanos , Camundongos , Peso Molecular , Peptídeos/genética , Peptídeos/metabolismo , Polimorfismo de Nucleotídeo Único , Proteoma/química , Proteoma/genética , Espectrometria de Massas em Tandem , Transplante HeterólogoRESUMO
Over the past decade, developments in high resolution mass spectrometry have enabled the high throughput analysis of intact proteins from complex proteomes, leading to the identification of thousands of proteoforms. Several previous reports on top-down proteomics (TDP) relied on hybrid ion trap-Fourier transform mass spectrometers combined with data-dependent acquisition strategies. To further reduce TDP to practice, we use a quadrupole-Orbitrap instrument coupled with software for proteoform-dependent data acquisition to identify and characterize nearly 2000 proteoforms at a 1% false discovery rate from human fibroblasts. By combining a 3 m/z isolation window with short transients to improve specificity and signal-to-noise for proteoforms >30 kDa, we demonstrate improving proteome coverage by capturing 439 proteoforms in the 30-60 kDa range. Three different data acquisition strategies were compared and resulted in the identification of many proteoforms not observed in replicate data-dependent experiments. Notably, the data set is reported with updated metrics and tools including a new viewer and assignment of permanent proteoform record identifiers for inclusion of highly characterized proteoforms (i.e., those with C-scores >40) in a repository curated by the Consortium for Top-Down Proteomics.
Assuntos
Espectrometria de Massas/métodos , Proteoma/genética , Proteômica/métodos , Fibroblastos/metabolismo , Humanos , Proteoma/metabolismo , SoftwareRESUMO
A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large-scale proteome analysis has routinely involved digesting intact proteins followed by inferred protein identification using mass spectrometry. This 'bottom-up' process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous characterization of alternative splice forms, diverse modifications (for example, acetylation and methylation) and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species. 'Top-down' interrogation of whole proteins can overcome these problems for individual proteins, but has not been achieved on a proteome scale owing to the lack of intact protein fractionation methods that are well integrated with tandem mass spectrometry. Here we show, using a new four-dimensional separation system, identification of 1,043 gene products from human cells that are dispersed into more than 3,000 protein species created by post-translational modification (PTM), RNA splicing and proteolysis. The overall system produced greater than 20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kDa and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply modified species in response to accelerated cellular ageing (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database, the data provide precise correlations to individual genes and proof-of-concept for large-scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research.
Assuntos
Isoformas de Proteínas/análise , Isoformas de Proteínas/química , Proteoma/análise , Proteoma/química , Proteômica/métodos , Processamento Alternativo , Linhagem Celular , Senescência Celular/genética , Dano ao DNA , Bases de Dados de Proteínas , Proteína HMGA1a/análise , Proteína HMGA1b/análise , Células HeLa , Humanos , Fenótipo , Processamento de Proteína Pós-Traducional , Proteólise , Proteômica/instrumentaçãoRESUMO
Many top-down proteomics experiments focus on identifying and localizing PTMs and other potential sources of "mass shift" on a known protein sequence. A simple application to match ion masses and facilitate the iterative hypothesis testing of PTM presence and location would assist with the data analysis in these experiments. ProSight Lite is a free software tool for matching a single candidate sequence against a set of mass spectrometric observations. Fixed or variable modifications, including both PTMs and a select number of glycosylations, can be applied to the amino acid sequence. The application reports multiple scores and a matching fragment list. Fragmentation maps can be exported for publication in either portable network graphic (PNG) or scalable vector graphic (SVG) format. ProSight Lite can be freely downloaded from http://prosightlite.northwestern.edu, installs and updates from the web, and requires Windows 7 or a higher version.
Assuntos
Proteoma/química , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Humanos , Anotação de Sequência Molecular , Peso Molecular , Processamento de Proteína Pós-Traducional , Proteômica , Espectrometria de Massas em TandemRESUMO
We developed a method for restricted enzymatic proteolysis using the outer membrane protease T (OmpT) to produce large peptides (>6.3 kDa on average) for mass spectrometry-based proteomics. Using this approach to analyze prefractionated high-mass HeLa proteins, we identified 3,697 unique peptides from 1,038 proteins. We demonstrated the ability of large OmpT peptides to differentiate closely related protein isoforms and to enable the detection of many post-translational modifications.
Assuntos
Peptídeos/análise , Peptídeos/metabolismo , Proteólise , Proteômica/métodos , Serina Endopeptidases/metabolismo , Células HeLa , Humanos , Espectrometria de Massas , Peptídeos/química , Isoformas de Proteínas/química , Isoformas de Proteínas/metabolismo , Processamento de Proteína Pós-Traducional , Serina Endopeptidases/químicaRESUMO
Top-down proteomics is emerging as a viable method for the routine identification of hundreds to thousands of proteins. In this work we report the largest top-down study to date, with the identification of 1,220 proteins from the transformed human cell line H1299 at a false discovery rate of 1%. Multiple separation strategies were utilized, including the focused isolation of mitochondria, resulting in significantly improved proteome coverage relative to previous work. In all, 347 mitochondrial proteins were identified, including ~50% of the mitochondrial proteome below 30 kDa and over 75% of the subunits constituting the large complexes of oxidative phosphorylation. Three hundred of the identified proteins were found to be integral membrane proteins containing between 1 and 12 transmembrane helices, requiring no specific enrichment or modified LC-MS parameters. Over 5,000 proteoforms were observed, many harboring post-translational modifications, including over a dozen proteins containing lipid anchors (some previously unknown) and many others with phosphorylation and methylation modifications. Comparison between untreated and senescent H1299 cells revealed several changes to the proteome, including the hyperphosphorylation of HMGA2. This work illustrates the burgeoning ability of top-down proteomics to characterize large numbers of intact proteoforms in a high-throughput fashion.
Assuntos
Senescência Celular/genética , Células Epiteliais/metabolismo , Proteínas de Membrana/isolamento & purificação , Proteínas Mitocondriais/isolamento & purificação , Processamento de Proteína Pós-Traducional , Camptotecina/farmacologia , Fracionamento Celular , Linhagem Celular Transformada , Senescência Celular/efeitos dos fármacos , Cromatografia Líquida , Células Epiteliais/citologia , Células Epiteliais/efeitos dos fármacos , Regulação da Expressão Gênica , Ensaios de Triagem em Larga Escala , Humanos , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Metilação , Mitocôndrias/química , Mitocôndrias/metabolismo , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Anotação de Sequência Molecular , Complexos Multienzimáticos/genética , Complexos Multienzimáticos/metabolismo , Fosforilação , Proteômica/métodos , Transdução de Sinais , Espectrometria de Massas em TandemRESUMO
Pilot Project #1--the identification and characterization of human histone H4 proteoforms by top-down MS--is the first project launched by the Consortium for Top-Down Proteomics (CTDP) to refine and validate top-down MS. Within the initial results from seven participating laboratories, all reported the probability-based identification of human histone H4 (UniProt accession P62805) with expectation values ranging from 10(-13) to 10(-105). Regarding characterization, a total of 74 proteoforms were reported, with 21 done so unambiguously; one new PTM, K79ac, was identified. Inter-laboratory comparison reveals aspects of the results that are consistent, such as the localization of individual PTMs and binary combinations, while other aspects are more variable, such as the accurate characterization of low-abundance proteoforms harboring >2 PTMs. An open-access tool and discussion of proteoform scoring are included, along with a description of general challenges that lie ahead including improved proteoform separations prior to mass spectrometric analysis, better instrumentation performance, and software development.
Assuntos
Proteômica/métodos , Cromatografia Líquida/métodos , Análise por Conglomerados , Células HeLa , Histonas/análise , Histonas/química , Humanos , Espectrometria de Massas/métodos , Projetos Piloto , Processamento de Proteína Pós-Traducional , SoftwareRESUMO
The automated processing of data generated by top down proteomics would benefit from improved scoring for protein identification and characterization of highly related protein forms (proteoforms). Here we propose the "C-score" (short for Characterization Score), a Bayesian approach to the proteoform identification and characterization problem, implemented within a framework to allow the infusion of expert knowledge into generative models that take advantage of known properties of proteins and top down analytical systems (e.g., fragmentation propensities, "off-by-1 Da" discontinuous errors, and intelligent weighting for site-specific modifications). The performance of the scoring system based on the initial generative models was compared to the current probability-based scoring system used within both ProSightPC and ProSightPTM on a manually curated set of 295 human proteoforms. The current implementation of the C-score framework generated a marked improvement over the existing scoring system as measured by the area under the curve on the resulting ROC chart (AUC of 0.99 versus 0.78).
Assuntos
Mapeamento de Peptídeos/métodos , Sequência de Aminoácidos , Área Sob a Curva , Proteínas de Bactérias/química , Teorema de Bayes , Interpretação Estatística de Dados , Células HeLa , Humanos , Dados de Sequência Molecular , Mapeamento de Peptídeos/normas , Proteoma/química , Proteômica , Pseudomonas aeruginosa , Curva ROC , Espectrometria de Massas em TandemRESUMO
Integral membrane proteins (IMPs) are of great biophysical and clinical interest because of the key role they play in many cellular processes. Here, a comprehensive top down study of 152 IMPs and 277 soluble proteins from human H1299 cells including 11 087 fragments obtained from collisionally activated dissociation (CAD), 6452 from higher-energy collisional dissociation (HCD), and 2981 from electron transfer dissociation (ETD) shows their great utility and complementarity for the identification and characterization of IMPs. A central finding is that ETD is â¼2-fold more likely to cleave in soluble regions than threshold fragmentation methods, whereas the reverse is observed in transmembrane domains with an observed â¼4-fold bias toward CAD and HCD. The location of charges just prior to dissociation is consistent with this directed fragmentation: protons remain localized on basic residues during ETD but easily mobilize along the backbone during collisional activation. The fragmentation driven by these protons, which is most often observed in transmembrane domains, both is of higher yield and occurs over a greater number of backbone cleavage sites. Further, while threshold dissociation events in transmembrane domains are on average 10.1 (CAD) and 9.2 (HCD) residues distant from the nearest charge site (R, K, H, N-terminus), fragmentation is strongly influenced by the N- or C-terminal position relative to that site: the ratio of observed b- to y-fragments is â¼1:3 if the cleavage occurs >7 residues N-terminal and â¼3:1 if it occurs >7 residues C-terminal to the nearest basic site. Threshold dissociation products driven by a mobilized proton appear to be strongly dependent on not only relative position of a charge site but also N- or C-terminal directionality of proton movement.
Assuntos
Gases/química , Proteínas de Membrana/química , Sequência de Aminoácidos , Dados de Sequência MolecularRESUMO
With the prospect of resolving whole protein molecules into their myriad proteoforms on a proteomic scale, the question of their quantitative analysis in discovery mode comes to the fore. Here, we demonstrate a robust pipeline for the identification and stringent scoring of abundance changes of whole protein forms <30 kDa in a complex system. The input is ~100-400 µg of total protein for each biological replicate, and the outputs are graphical displays depicting statistical confidence metrics for each proteoform (i.e., a volcano plot and representations of the technical and biological variation). A key part of the pipeline is the hierarchical linear model that is tailored to the original design of the study. Here, we apply this new pipeline to measure the proteoform-level effects of deleting a histone deacetylase (rpd3) in S. cerevisiae. Over 100 proteoform changes were detected above a 5% false positive threshold in WT vs the Δrpd3 mutant, including the validating observation of hyperacetylation of histone H4 and both H2B isoforms. Ultimately, this approach to label-free top down proteomics in discovery mode is a critical technical advance for testing the hypothesis that whole proteoforms can link more tightly to complex phenotypes in cell and disease biology than do peptides created in shotgun proteomics.
Assuntos
Proteínas/química , Proteômica/métodos , Histona Desacetilases/análise , Histona Desacetilases/genética , Mutação/genética , Saccharomyces cerevisiae/enzimologia , Saccharomyces cerevisiae/genéticaRESUMO
Intact protein characterization using mass spectrometry thus far has been achieved at the cost of throughput. Presented here is the application of 193 nm ultraviolet photodissociation (UVPD) for top down identification and characterization of proteins in complex mixtures in an online fashion. Liquid chromatographic separation at the intact protein level coupled with fast UVPD and high-resolution detection resulted in confident identification of 46 unique sequences compared to 44 using HCD from prepared Escherichia coli ribosomes. Importantly, nearly all proteins identified in both the UVPD and optimized HCD analyses demonstrated a substantial increase in confidence in identification (as defined by an average decrease in E value of â¼40 orders of magnitude) due to the higher number of matched fragment ions. Also shown is the potential for high-throughput characterization of intact proteins via liquid chromatography (LC)-UVPD-MS of molecular weight-based fractions of a Saccharomyces cerevisiae lysate. In total, protein products from 215 genes were identified and found in 292 distinct proteoforms, 168 of which contained some type of post-translational modification.
Assuntos
Espectroscopia Fotoeletrônica/métodos , Proteínas de Saccharomyces cerevisiae/análise , Proteínas de Saccharomyces cerevisiae/genética , Sequência de Aminoácidos , Animais , Cromatografia Líquida/métodos , Cavalos , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Fatores de TempoRESUMO
The top-down approach to proteomics offers compelling advantages due to the potential to provide complete characterization of protein sequence and post-translational modifications. Here we describe the implementation of 193 nm ultraviolet photodissociation (UVPD) in an Orbitrap mass spectrometer for characterization of intact proteins. Near-complete fragmentation of proteins up to 29 kDa is achieved with UVPD including the unambiguous localization of a single residue mutation and several protein modifications on Pin1 (Q13526), a protein implicated in the development of Alzheimer's disease and in cancer pathogenesis. The 5 ns, high-energy activation afforded by UVPD exhibits far less precursor ion-charge state dependence than conventional collision- and electron-based dissociation methods.
Assuntos
Peptidilprolil Isomerase/análise , Proteômica , Raios Ultravioleta , Humanos , Espectrometria de Massas , Modelos Moleculares , Peptidilprolil Isomerase de Interação com NIMA , Peptidilprolil Isomerase/genética , Processos FotoquímicosRESUMO
The interrogation of intact integral membrane proteins has long been a challenge for biological mass spectrometry. Here, we demonstrate the application of top down mass spectrometry to whole membrane proteins below 60 kDa with up to 8 transmembrane helices. Analysis of enriched mitochondrial membrane preparations from human cells yielded identification of 83 integral membrane proteins, along with 163 membrane-associated or soluble proteins, with a median q value of 3 × 10(-10). An analysis of matching fragment ions demonstrated that significantly more fragment ions were found within transmembrane domains than would be expected based upon the observed protein sequence. In total, 46 proteins from the complexes of oxidative phosphorylation were identified which exemplifies the increasing ability of top down proteomics to provide extensive coverage in a biological network.