ABSTRACT
Mucosal-associated invariant T (MAIT) cells represent an abundant innate-like T cell subtype in the human liver. MAIT cells are assigned crucial roles in regulating immunity and inflammation, yet their role in liver cancer remains elusive. Here, we present a MAIT cell-centered profiling of hepatocellular carcinoma (HCC) using scRNA-seq, flow cytometry, and co-detection by indexing (CODEX) imaging of paired patient samples. These analyses highlight the heterogeneity and dysfunctionality of MAIT cells in HCC and their defective capacity to infiltrate liver tumors. Machine-learning tools were used to dissect the spatial cellular interaction network within the MAIT cell neighborhood. Co-localization in the adjacent liver and interaction between niche-occupying CSF1R+PD-L1+ tumor-associated macrophages (TAMs) and MAIT cells was identified as a key regulatory element of MAIT cell dysfunction. Perturbation of this cell-cell interaction in ex vivo co-culture studies using patient samples and murine models reinvigorated MAIT cell cytotoxicity. These studies suggest that aPD-1/aPD-L1 therapies target MAIT cells in HCC patients.
Subject(s)
Carcinoma, Hepatocellular , Liver Neoplasms , Mucosal-Associated Invariant T Cells , Animals , Humans , Mice , Carcinoma, Hepatocellular/immunology , Carcinoma, Hepatocellular/pathology , Liver Neoplasms/immunology , Liver Neoplasms/pathology , Mucosal-Associated Invariant T Cells/immunology , Mucosal-Associated Invariant T Cells/pathology , Tumor-Associated MacrophagesABSTRACT
Immune profiling of COVID-19 patients has identified numerous alterations in both innate and adaptive immunity. However, whether those changes are specific to SARS-CoV-2 or driven by a general inflammatory response shared across severely ill pneumonia patients remains unknown. Here, we compared the immune profile of severe COVID-19 with non-SARS-CoV-2 pneumonia ICU patients using longitudinal, high-dimensional single-cell spectral cytometry and algorithm-guided analysis. COVID-19 and non-SARS-CoV-2 pneumonia both showed increased emergency myelopoiesis and displayed features of adaptive immune paralysis. However, pathological immune signatures suggestive of T cell exhaustion were exclusive to COVID-19. The integration of single-cell profiling with a predicted binding capacity of SARS-CoV-2 peptides to the patients' HLA profile further linked the COVID-19 immunopathology to impaired virus recognition. Toward clinical translation, circulating NKT cell frequency was identified as a predictive biomarker for patient outcome. Our comparative immune map serves to delineate treatment strategies to interfere with the immunopathologic cascade exclusive to severe COVID-19.
Subject(s)
COVID-19/immunology , SARS-CoV-2/pathogenicity , Adult , Angiotensin-Converting Enzyme 2/metabolism , Antigen Presentation , Biomarkers/blood , CD4-Positive T-Lymphocytes/immunology , CD4-Positive T-Lymphocytes/metabolism , COVID-19/pathology , Female , HLA Antigens/genetics , HLA Antigens/immunology , Humans , Immunity, Innate , Immunophenotyping , Male , Middle Aged , Natural Killer T-Cells/immunology , Pneumonia/immunology , Pneumonia/pathology , SARS-CoV-2/immunology , Severity of Illness Index , T-Lymphocyte Subsets/immunology , T-Lymphocyte Subsets/metabolismABSTRACT
Single-cell RNA sequencing (scRNA-seq) approaches have transformed our ability to resolve cellular properties across systems, but are currently tailored toward large cell inputs (>1,000 cells). This renders them inefficient and costly when processing small, individual tissue samples, a problem that tends to be resolved by loading bulk samples, yielding confounded mosaic cell population read-outs. Here, we developed a deterministic, mRNA-capture bead and cell co-encapsulation dropleting system, DisCo, aimed at processing low-input samples (<500 cells). We demonstrate that DisCo enables precise particle and cell positioning and droplet sorting control through combined machine-vision and multilayer microfluidics, enabling continuous processing of low-input single-cell suspensions at high capture efficiency (>70%) and at speeds up to 350 cells per hour. To underscore DisCo's unique capabilities, we analyzed 31 individual intestinal organoids at varying developmental stages. This revealed extensive organoid heterogeneity, identifying distinct subtypes including a regenerative fetal-like Ly6a+ stem cell population that persists as symmetrical cysts, or spheroids, even under differentiation conditions, and an uncharacterized 'gobloid' subtype consisting predominantly of precursor and mature (Muc2+) goblet cells. To complement this dataset and to demonstrate DisCo's capacity to process low-input, in vivo-derived tissues, we also analyzed individual mouse intestinal crypts. This revealed the existence of crypts with a compositional similarity to spheroids, which consisted predominantly of regenerative stem cells, suggesting the existence of regenerating crypts in the homeostatic intestine. These findings demonstrate the unique power of DisCo in providing high-resolution snapshots of cellular heterogeneity in small, individual tissues.
Subject(s)
Organoids , Single-Cell Analysis , Animals , Cell Differentiation , Intestinal Mucosa , Mice , Stem CellsABSTRACT
Dimethyl fumarate (DMF) is an immunomodulatory treatment for multiple sclerosis (MS). Despite its wide clinical use, the mechanisms underlying clinical response are not understood. This study aimed to reveal immune markers of therapeutic response to DMF treatment in MS. For this purpose, we prospectively collected peripheral blood mononuclear cells (PBMCs) from a highly characterized cohort of 44 individuals with MS before and at 12 and 48 wk of DMF treatment. Single cells were profiled using high-dimensional mass cytometry. To capture the heterogeneity of different immune subsets, we adopted a bioinformatic multipanel approach that allowed cell population-cluster assignment of more than 50 different parameters, including lineage and activation markers as well as chemokine receptors and cytokines. Data were further analyzed in a semiunbiased fashion implementing a supervised representation learning approach to capture subtle longitudinal immune changes characteristic for therapy response. With this approach, we identified a population of memory T helper cells expressing high levels of neuroinflammatory cytokines (granulocyte-macrophage colony-stimulating factor [GM-CSF], interferon γ [IFNγ]) as well as CXCR3, whose abundance correlated with treatment response. Using spectral flow cytometry, we confirmed these findings in a second cohort of patients. Serum neurofilament light-chain levels confirmed the correlation of this immune cell signature with axonal damage. The identified cell population is expanded in peripheral blood under natalizumab treatment, substantiating a specific role in treatment response. We propose that depletion of GM-CSF-, IFNγ-, and CXCR3-expressing T helper cells is the main mechanism of action of DMF and allows monitoring of treatment response.
Subject(s)
Biomarkers, Pharmacological , Cytokines , Dimethyl Fumarate , Immunosuppressive Agents , Multiple Sclerosis , T-Lymphocytes, Helper-Inducer , Biomarkers, Pharmacological/metabolism , Cytokines/metabolism , Dimethyl Fumarate/pharmacology , Dimethyl Fumarate/therapeutic use , Granulocyte-Macrophage Colony-Stimulating Factor/metabolism , Humans , Immunosuppressive Agents/pharmacology , Immunosuppressive Agents/therapeutic use , Interferon-gamma/metabolism , Lymphocyte Depletion , Multiple Sclerosis/drug therapy , Multiple Sclerosis/immunology , Single-Cell Analysis , T-Lymphocytes, Helper-Inducer/drug effects , T-Lymphocytes, Helper-Inducer/immunologyABSTRACT
OBJECTIVE: Liver metastases are often resistant to immune checkpoint inhibitor therapy (ICI) and portend a worse prognosis compared with metastases to other locations. Regulatory T cells (Tregs) are one of several immunosuppressive cells implicated in ICI resistance of liver tumours, but the role played by Tregs residing within the liver surrounding a tumour is unknown. DESIGN: Flow cytometry and single-cell RNA sequencing were used to characterise hepatic Tregs before and after ICI therapy. RESULTS: We found that the murine liver houses a Treg population that, unlike those found in other organs, is both highly proliferative and apoptotic at baseline. On administration of αPD-1, αPD-L1 or αCTLA4, the liver Treg population doubled regardless of the presence of an intrahepatic tumour. Remarkably, this change was not due to the preferential expansion of the subpopulation of Tregs that express PD-1. Instead, a subpopulation of CD29+ (Itgb1, integrin ß1) Tregs, that were highly proliferative at baseline, doubled its size in response to αPD-1. Partial and full depletion of Tregs identified CD29+ Tregs as the prominent niche-filling subpopulation in the liver, and CD29+ Tregs demonstrated enhanced suppression in vitro when derived from the liver but not the spleen. We identified IL2 as a critical modulator of both CD29+ and CD29- hepatic Tregs, but expansion of the liver Treg population with αPD-1 driven by CD29+ Tregs was in part IL2-independent. CONCLUSION: We propose that CD29+ Tregs constitute a unique subpopulation of hepatic Tregs that are primed to respond to ICI agents and mediate resistance.
Subject(s)
Liver Neoplasms , T-Lymphocytes, Regulatory , Animals , Mice , Interleukin-2 , Integrin beta1 , Liver Neoplasms/drug therapy , Liver Neoplasms/pathologyABSTRACT
MOTIVATION: Improvements in single-cell RNA-seq technologies mean that studies measuring multiple experimental conditions, such as time series, have become more common. At present, few computational methods exist to infer time series-specific transcriptome changes, and such studies have therefore typically used unsupervised pseudotime methods. While these methods identify cell subpopulations and the transitions between them, they are not appropriate for identifying the genes that vary coherently along the time series. In addition, the orderings they estimate are based only on the major sources of variation in the data, which may not correspond to the processes related to the time labels. RESULTS: We introduce psupertime, a supervised pseudotime approach based on a regression model, which explicitly uses time-series labels as input. It identifies genes that vary coherently along a time series, in addition to pseudotime values for individual cells, and a classifier that can be used to estimate labels for new data with unknown or differing labels. We show that psupertime outperforms benchmark classifiers in terms of identifying time-varying genes and provides better individual cell orderings than popular unsupervised pseudotime techniques. psupertime is applicable to any single-cell RNA-seq dataset with sequential labels (e.g. principally time series but also drug dosage and disease progression), derived from either experimental design and provides a fast, interpretable tool for targeted identification of genes varying along with specific biological processes. AVAILABILITY AND IMPLEMENTATION: R package available at github.com/wmacnair/psupertime and code for results reproduction at github.com/wmacnair/psupplementary. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Single-Cell Analysis , Software , Gene Expression Profiling/methods , RNA-Seq , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Time Factors , TranscriptomeABSTRACT
MOTIVATION: Single-cell RNA sequencing (scRNA-seq) allows studying the development of cells in unprecedented detail. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data are expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree structure in two dimensions is highly desirable for biological interpretation and exploratory analysis. RESULTS: Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree structure. We extract the tree structure by means of a density-based maximum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce density-tree biased autoencoder (DTAE), a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method both qualitatively and quantitatively on real and toy data. AVAILABILITY AND IMPLEMENTATION: Our implementation relying on PyTorch and Higra is available at github.com/hci-unihd/DTAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Software , Exome SequencingABSTRACT
Treatment with dimethyl fumarate (DMF) leads to lymphopenia and infectious complications in a subset of patients with multiple sclerosis (MS). Here, we aimed to reveal immune markers of DMF-associated lymphopenia. This prospective observational study longitudinally assessed 31 individuals with MS by single-cell mass cytometry before and after 12 and 48 weeks of DMF therapy. Employing a neural network-based representation learning approach, we identified a CCR4-expressing T helper cell population negatively associated with relevant lymphopenia. CCR4-expressing T helper cells represent a candidate prognostic biomarker for the development of relevant lymphopenia in patients undergoing DMF treatment. ANN NEUROL 2022;91:676-681.
Subject(s)
Lymphopenia , Multiple Sclerosis, Relapsing-Remitting , Multiple Sclerosis , Dimethyl Fumarate/adverse effects , Humans , Immunosuppressive Agents/adverse effects , Lymphopenia/chemically induced , Multiple Sclerosis/chemically induced , Multiple Sclerosis/drug therapy , Multiple Sclerosis, Relapsing-Remitting/drug therapy , Prospective StudiesABSTRACT
BACKGROUND: Comorbidities are risk factors for development of severe coronavirus disease 2019 (COVID-19). However, the extent to which an underlying comorbidity influences the immune response to severe acute respiratory syndrome coronavirus 2 remains unknown. OBJECTIVE: Our aim was to investigate the complex interrelations of comorbidities, the immune response, and patient outcome in COVID-19. METHODS: We used high-throughput, high-dimensional, single-cell mapping of peripheral blood leukocytes and algorithm-guided analysis. RESULTS: We discovered characteristic immune signatures associated not only with severe COVID-19 but also with the underlying medical condition. Different factors of the metabolic syndrome (obesity, hypertension, and diabetes) affected distinct immune populations, thereby additively increasing the immunodysregulatory effect when present in a single patient. Patients with disorders affecting the lung or heart, together with factors of metabolic syndrome, were clustered together, whereas immune disorder and chronic kidney disease displayed a distinct immune profile in COVID-19. In particular, severe acute respiratory syndrome coronavirus 2-infected patients with preexisting chronic kidney disease were characterized by the highest number of altered immune signatures of both lymphoid and myeloid immune branches. This overall major immune dysregulation could be the underlying mechanism for the estimated odds ratio of 16.3 for development of severe COVID-19 in this burdened cohort. CONCLUSION: The combinatorial systematic analysis of the immune signatures, comorbidities, and outcomes of patients with COVID-19 has provided the mechanistic immunologic underpinnings of comorbidity-driven patient risk and uncovered comorbidity-driven immune signatures.
Subject(s)
COVID-19 , Metabolic Syndrome , Renal Insufficiency, Chronic , Comorbidity , Humans , Immunity , Metabolic Syndrome/epidemiology , SARS-CoV-2ABSTRACT
Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering has gained popularity due to its flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the lower dimensional latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on the MNIST benchmark data set and challenging real-world tasks of clustering mouse organs from single-cell RNA-sequencing measurements and defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines as well as competitor methods.
Subject(s)
Single-Cell Analysis/statistics & numerical data , Animals , Cluster Analysis , Computational Biology , Deep Learning , Gene Expression Profiling/statistics & numerical data , Leukocytes, Mononuclear/classification , Mice , Models, Biological , Normal Distribution , Organ Specificity , Phenotype , RNA-Seq/statistics & numerical dataABSTRACT
Recent high-dimensional single-cell technologies such as mass cytometry are enabling time series experiments to monitor the temporal evolution of cell state distributions and to identify dynamically important cell states, such as fate decision states in differentiation. However, these technologies are destructive, and require analysis approaches that temporally map between cell state distributions across time points. Current approaches to approximate the single-cell time series as a dynamical system suffer from too restrictive assumptions about the type of kinetics, or link together pairs of sequential measurements in a discontinuous fashion. We propose Dynamic Distribution Decomposition (DDD), an operator approximation approach to infer a continuous distribution map between time points. On the basis of single-cell snapshot time series data, DDD approximates the continuous time Perron-Frobenius operator by means of a finite set of basis functions. This procedure can be interpreted as a continuous time Markov chain over a continuum of states. By only assuming a memoryless Markov (autonomous) process, the types of dynamics represented are more general than those represented by other common models, e.g., chemical reaction networks, stochastic differential equations. Furthermore, we can a posteriori check whether the autonomy assumptions are valid by calculation of prediction error-which we show gives a measure of autonomy within the studied system. The continuity and autonomy assumptions ensure that the same dynamical system maps between all time points, not arbitrarily changing at each time point. We demonstrate the ability of DDD to reconstruct dynamically important cell states and their transitions both on synthetic data, as well as on mass cytometry time series of iPSC reprogramming of a fibroblast system. We use DDD to find previously identified subpopulations of cells and to visualise differentiation trajectories. Dynamic Distribution Decomposition allows interpretation of high-dimensional snapshot time series data as a low-dimensional Markov process, thereby enabling an interpretable dynamics analysis for a variety of biological processes by means of identifying their dynamically important cell states.
Subject(s)
Cellular Reprogramming/physiology , Computational Biology/methods , Induced Pluripotent Stem Cells/cytology , Single-Cell Analysis/methods , Algorithms , Animals , Cell Line , Markov Chains , MiceABSTRACT
Gene splicing profiles are frequently altered in cancer, and the splice variants of fibronectin (FN) that contain the extra-domains A (EDA) or B (EDB), referred to as EDA+FN or EDB+FN, are highly upregulated in tumor vasculature. Transforming growth factor ß (TGF-ß) signaling has been attributed a pivotal role in glioblastoma, with TGF-ß promoting angiogenesis and vessel remodeling. By using immunohistochemistry staining, we observed that the oncofetal FN isoforms EDA+FN and EDB+FN are expressed in glioblastoma vasculature. Ex vivo single-cell gene expression profiling of tumors by using CD31 and α-smooth muscle actin (αSMA) as markers for endothelial cells, and pericytes and vascular smooth muscle cells (VSMCs), respectively, confirmed the predominant expression of FN, EDA+FN and EDB+FN in the vascular compartment of glioblastoma. Specifically, within the CD31-positive cell population, we identified a positive correlation between the expression of EDA+FN and EDB+FN, and of molecules associated with TGF-ß signaling. Further, TGF-ß induced EDA+FN and EDB+FN in human cerebral microvascular endothelial cells and glioblastoma-derived endothelial cells in a SMAD3- and SMAD4-dependent manner. In turn, we found that FN modulated TGF-ß superfamily signaling in endothelial cells via the EDA and EDB, pointing towards a bidirectional influence of oncofetal FN and TGF-ß superfamily signaling.
Subject(s)
Endothelial Cells/metabolism , Fibronectins/metabolism , Signal Transduction , Transforming Growth Factor beta/pharmacology , Alternative Splicing , Cells, Cultured , Gene Expression Profiling , Humans , Neovascularization, Pathologic , Protein Isoforms/metabolism , RNA, Messenger/geneticsABSTRACT
We introduce TreeTop, an algorithm for single cell data analysis to identify and assign a branching score to branch points in biological processes which may have multi-level branching hierarchies. We demonstrate branch point identification for processes with varying topologies, including T-cell maturation, B-cell differentiation and hematopoiesis. Our analyses are consistent with recent experimental studies suggesting a shallower hierarchy of differentiation events in hematopoiesis, rather than the classical multi-level hierarchy.
Subject(s)
Algorithms , Cell Differentiation , Single-Cell Analysis/methods , B-Lymphocytes/physiology , Hematopoiesis , Humans , Models, Theoretical , T-Lymphocytes/physiologyABSTRACT
In recent years, the number of large-scale metabolomics studies on various cellular processes in different organisms has increased drastically. However, it remains a major challenge to perform a systematic identification of mechanistic regulatory events that mediate the observed changes in metabolite levels, due to complex interdependencies within metabolic networks. We present the metabolic network segmentation (MNS) algorithm, a probabilistic graphical modeling approach that enables genome-scale, automated prediction of regulated metabolic reactions from differential or serial metabolomics data. The algorithm sections the metabolic network into modules of metabolites with consistent changes. Metabolic reactions that connect different modules are the most likely sites of metabolic regulation. In contrast to most state-of-the-art methods, the MNS algorithm is independent of arbitrary pathway definitions, and its probabilistic nature facilitates assessments of noisy and incomplete measurements. With serial (i.e., time-resolved) data, the MNS algorithm also indicates the sequential order of metabolic regulation. We demonstrated the power and flexibility of the MNS algorithm with three, realistic case studies with bacterial and human cells. Thus, this approach enables the identification of mechanistic regulatory events from large-scale metabolomics data, and contributes to the understanding of metabolic processes and their interplay with cellular signaling and regulation processes.
Subject(s)
Gene Expression Profiling/methods , Gene Expression Regulation/physiology , Metabolic Flux Analysis/methods , Metabolic Networks and Pathways/physiology , Metabolome/physiology , Models, Statistical , Computer Graphics , Computer Simulation , Metabolomics/methods , Models, Biological , Proteome/metabolismABSTRACT
Stochastic chemical reaction networks constitute a model class to quantitatively describe dynamics and cell-to-cell variability in biological systems. The topology of these networks typically is only partially characterized due to experimental limitations. Current approaches for refining network topology are based on the explicit enumeration of alternative topologies and are therefore restricted to small problem instances with almost complete knowledge. We propose the reactionet lasso, a computational procedure that derives a stepwise sparse regression approach on the basis of the Chemical Master Equation, enabling large-scale structure learning for reaction networks by implicitly accounting for billions of topology variants. We have assessed the structure learning capabilities of the reactionet lasso on synthetic data for the complete TRAIL induced apoptosis signaling cascade comprising 70 reactions. We find that the reactionet lasso is able to efficiently recover the structure of these reaction systems, ab initio, with high sensitivity and specificity. With only < 1% false discoveries, the reactionet lasso is able to recover 45% of all true reactions ab initio among > 6000 possible reactions and over 102000 network topologies. In conjunction with information rich single cell technologies such as single cell RNA sequencing or mass cytometry, the reactionet lasso will enable large-scale structure learning, particularly in areas with partial network structure knowledge, such as cancer biology, and thereby enable the detection of pathological alterations of reaction networks. We provide software to allow for wide applicability of the reactionet lasso.
Subject(s)
Computational Biology/methods , Models, Biological , Single-Cell Analysis/methods , Apoptosis , Metabolic Networks and Pathways , Regression Analysis , Signal Transduction , Stochastic Processes , TNF-Related Apoptosis-Inducing LigandABSTRACT
The mass spectrometric identification of chemically cross-linked peptides (CXMS) specifies spatial restraints of protein complexes; these values complement data obtained from common structure-determination techniques. Generic methods for determining false discovery rates of cross-linked peptide assignments are currently lacking, thus making data sets from CXMS studies inherently incomparable. Here we describe an automated target-decoy strategy and the software tool xProphet, which solve this problem for large multicomponent protein complexes.
Subject(s)
Cross-Linking Reagents/chemistry , Mass Spectrometry/methods , Peptides/analysis , Peptides/chemistry , Proteomics/methods , Algorithms , Automation , Data Interpretation, Statistical , Databases, Protein , False Positive Reactions , Models, Molecular , Protein Conformation , SoftwareABSTRACT
Single-cell technologies like mass cytometry enable researchers to comprehensively monitor signaling network responses in the context of heterogeneous cell populations. Cell-to-cell variability, the possibly nonlinear topology of signaling processes, and the destructive nature of mass cytometry necessitate nontrivial computational approaches to reconstruct and sensibly describe signaling dynamics. Modeling of signaling states depends on a set of coherent examples, that is, a set of cell events representing the same cell state. This requirement is frequently compromized by process asynchrony phenomena or nonlinear process topologies. We discuss various computational deconvolution approaches to define molecular process coordinates and enable compilation of coherent data sets for cell state inference. In addition to the conceptual presentation of these approaches, we discuss the application of these methods to modeling of TRAIL-induced apoptosis. Due to their generic applicability these computational approaches will contribute to the elucidation of dynamic intracellular signaling networks in various settings. The resulting signaling maps constitute a promising source for novel interventions and are expected to be particularly valuable in clinical settings.
Subject(s)
Cells/cytology , Cells/metabolism , Flow Cytometry , Signal Transduction , Animals , Apoptosis , HumansABSTRACT
Discovery or shotgun proteomics has emerged as the most powerful technique to comprehensively map out a proteome. Reconstruction of protein identities from the raw mass spectrometric data constitutes a cornerstone of any shotgun proteomics workflow. The inherent uncertainty of mass spectrometric data and the complexity of a proteome render protein inference and the statistical validation of protein identifications a non-trivial task, still being a subject of ongoing research. This review aims to survey the different conceptual approaches to the different tasks of inferring and statistically validating protein identifications and to discuss their implications on the scope of proteome exploration.
Subject(s)
Proteins/metabolism , Proteomics , Animals , Guidelines as Topic , Humans , Reproducibility of ResultsABSTRACT
For many research questions in modern molecular and systems biology, information about absolute protein quantities is imperative. This information includes, for example, kinetic modeling of processes, protein turnover determinations, stoichiometric investigations of protein complexes, or quantitative comparisons of different proteins within one sample or across samples. To date, the vast majority of proteomic studies are limited to providing relative quantitative comparisons of protein levels between limited numbers of samples. Here we describe and demonstrate the utility of a targeting MS technique for the estimation of absolute protein abundance in unlabeled and nonfractionated cell lysates. The method is based on selected reaction monitoring (SRM) mass spectrometry and the "best flyer" hypothesis, which assumes that the specific MS signal intensity of the most intense tryptic peptides per protein is approximately constant throughout a whole proteome. SRM-targeted best flyer peptides were selected for each protein from the peptide precursor ion signal intensities from directed MS data. The most intense transitions per peptide were selected from full MS/MS scans of crude synthetic analogs. We used Monte Carlo cross-validation to systematically investigate the accuracy of the technique as a function of the number of measured best flyer peptides and the number of SRM transitions per peptide. We found that a linear model based on the two most intense transitions of the three best flying peptides per proteins (TopPep3/TopTra2) generated optimal results with a cross-correlated mean fold error of 1.8 and a squared Pearson coefficient R(2) of 0.88. Applying the optimized model to lysates of the microbe Leptospira interrogans, we detected significant protein abundance changes of 39 target proteins upon antibiotic treatment, which correlate well with literature values. The described method is generally applicable and exploits the inherent performance advantages of SRM, such as high sensitivity, selectivity, reproducibility, and dynamic range, and estimates absolute protein concentrations of selected proteins at minimized costs.