RESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.
Assuntos
Bases de Conhecimento , Redes e Vias Metabólicas , Transdução de Sinais , Humanos , Redes e Vias Metabólicas/genética , Proteoma/genéticaRESUMO
MOTIVATION: ReactomeGSA is part of the Reactome knowledgebase and one of the leading multi-omics pathway analysis platforms. ReactomeGSA provides access to quantitative pathway analysis methods supporting different 'omics data types. Additionally, ReactomeGSA can process different datasets simultaneously, leading to a comparative pathway analysis that can also be performed across different species. RESULTS: We present a major update to the ReactomeGSA analysis platforms that greatly simplifies the reuse and direct integration of public data. In order to increase the number of available datasets, we developed the new grein_loader Python application that can directly fetch experiments from the GREIN resource. This enabled us to support both EMBL-EBI's Expression Atlas and GEO RNA-seq Experiments Interactive Navigator within ReactomeGSA. To further increase the visibility and simplify the reuse of public datasets, we integrated a novel search function into ReactomeGSA that enables users to search for public datasets across both supported resources. Finally, we completely re-developed ReactomeGSA's web-frontend and R/Bioconductor package to support the new search and loading features, and greatly simplify the use of ReactomeGSA. AVAILABILITY AND IMPLEMENTATION: The new ReactomeGSA web frontend is available at https://www.reactome.org/gsa with an built-in, interactive tutorial. The ReactomeGSA R package (https://bioconductor.org/packages/release/bioc/html/ReactomeGSA.html) is available through Bioconductor and shipped with detailed documentation and vignettes. The grein_loader Python application is available through the Python Package Index (pypi). The complete source code for all applications is available on GitHub at https://github.com/grisslab/grein_loader and https://github.com/reactome.
Assuntos
Software , Humanos , Biologia Computacional/métodos , Bases de ConhecimentoRESUMO
BACKGROUND: Mycosis fungoides (MF), the most common cutaneous T-cell lymphoma, is often underdiagnosed in early stages because of similarities with benign dermatoses such as atopic dermatitis (AD). Furthermore, the delineation from what is called "parapsoriasis en plaque", a disease that can appear either in a small- or large-plaque form, is still controversial. OBJECTIVE: We sought to characterize the parapsoriasis disease spectrum. METHODS: We performed single-cell RNA sequencing of skin biopsies from patients within the parapsoriasis-to-early-stage MF spectrum, stratified for small and large plaques, and compared them to AD, psoriasis, and healthy control skin. RESULTS: Six of 8 large-plaque lesions harbored either an expanded alpha/beta or gamma/delta T-cell clone with downregulation of CD7 expression, consistent with a diagnosis of early-stage MF. In contrast, 6 of 7 small-plaque lesions were polyclonal in nature, thereby lacking a lymphomatous phenotype, and also revealed a less inflammatory microenvironment than early-stage MF or AD. Of note, polyclonal small- and large-plaque lesions characteristically harbored a population of NPY+ innate lymphoid cells and displayed a stromal signature of complement upregulation and antimicrobial hyperresponsiveness in fibroblasts and sweat gland cells, respectively. These conditions were clearly distinct from AD or psoriasis, which uniquely harbored CD3+CRTH2+ IL-13 expressing "TH2A" cells, or strong type 17 inflammation, respectively. CONCLUSION: These data position polyclonal small- and large-plaque parapsoriasis lesions as a separate disease entity that characteristically harbors a so far undescribed innate lymphoid cell population. We thus propose a new term, "polyclonal parapsoriasis en plaque", for this kind of lesion because they can be clearly differentiated from early- and advanced-stage MF, psoriasis, and AD on several cellular and molecular levels.
RESUMO
BACKGROUND: Malignant clones of primary cutaneous T-cell lymphomas (CTCL) can show a CD4, CD8 or TCR-γδ phenotype, but their individual impact on tumor biology and skin lesion formation remains ill-defined. OBJECTIVES: To perform a comprehensive molecular characterization of CD4+ vs. CD8+ and TCR-γ/δ+ CTCL lesions. METHODS: We performed scRNA-seq of 18 CTCL skin biopsies to compare classic CD4+ advanced-stage mycosis fungoides (MF) with TCR-γ/δ+MF and primary cutaneous CD8+ aggressive epidermotropic cytotoxic T-cell lymphoma (Berti's lymphoma). RESULTS: Malignant clones of TCR-γ/δ+MF and Berti's lymphoma showed similar clustering patterns distinct from CD4+MF, along with increased expression of cytotoxic markers such as NKG7, CTSW, GZMA, and GZMM. Only advanced-stage CD4+MF clones expressed central memory T-cell markers (SELL, CCR7, LEF1), alongside B1/B2 blood involvement, whereas TCR-γ/δ+MF and Berti's lymphoma harbored a more tissue-resident phenotype (CD69, CXCR4, NR4A1) without detectable cells in the blood. CD4+MF and TCR-γ/δ+MF skin lesions harbored strong type 2 immune activation across myeloid cells, while Berti's lymphoma was more skewed towards type 1 immune responses. Both CD4+MF and TCR-γ/δ+MF lesions showed upregulation of keratinocyte hyperactivation markers such as S100As and KRT16 genes. This increase was entirely absent in Berti's lymphoma, possibly reflecting an aberrant keratinocyte response to invading tumor cells, that could contribute to the formation of the typical ulcero-necrotic lesions within this entity. CONCLUSIONS: Our scRNAseq profiling study reveals specific molecular patterns associated with distinct CTCL subtypes.
RESUMO
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.
Assuntos
Antivirais/farmacologia , Bases de Conhecimento , Proteínas/metabolismo , COVID-19/metabolismo , Curadoria de Dados , Genoma Humano , Interações Hospedeiro-Patógeno , Humanos , Proteínas/genética , Transdução de Sinais , SoftwareRESUMO
BACKGROUND: Chronic nodular prurigo (CNPG) is an inflammatory skin disease that is maintained by a chronic itch-scratch cycle likely rooted in neuroimmunological dysregulation. This condition may be associated with atopy in some patients, and there are now promising therapeutic results from blocking type 2 cytokines such as IL-4, IL-13, and IL-31. OBJECTIVES: This study aimed to improve the understanding of pathomechanisms underlying CNPG as well as molecular relationships between CNPG and atopic dermatitis (AD). METHODS: We profiled skin lesions from patients with CNPG in comparison with AD and healthy control individuals using single-cell RNA sequencing combined with T-cell receptor sequencing. RESULTS: We found type 2 immune skewing in both CNPG and AD, as evidenced by CD4+ helper T cells expressing IL13. However, only AD harbored an additional, oligoclonally expanded CD8A+IL9R+IL13+ cytotoxic T-cell population, and immune activation pathways were highly upregulated in AD, but less so in CNPG. Conversely, CNPG showed signatures of extracellular matrix organization, collagen synthesis, and fibrosis, including a unique population of CXCL14-IL24+ secretory papillary fibroblasts. Besides known itch mediators such as IL31 and oncostatin M, we also detected increased levels of neuromedin B in fibroblasts of CNPG lesions compared with AD and HC, with neuromedin B receptors detectable on some nerve endings. CONCLUSIONS: These data show that CNPG does not harbor the strong disease-specific immune activation pathways that are typically found in AD but is rather characterized by upregulated stromal remodeling mechanisms that might have a direct impact on itch fibers.
Assuntos
Dermatite Atópica , Prurigo , Humanos , Prurigo/genética , Interleucina-13 , Prurido , Análise de Sequência de RNARESUMO
BACKGROUND AND OBJECTIVES: Mycosis fungoides (MF), the most common primary cutaneous T-cell lymphoma, is characterized by a variable clinical course, presenting either as indolent disease or showing fatal progression due to extracutaneous involvement. Importantly, the lack of prognostic models and predominantly palliative therapy settings hamper patient care. Here, we aimed to define survival rates, disease prediction accuracy, and treatment impact in MF. PATIENTS AND METHODS: Hundred-forty MF patients were assessed retrospectively. Prognosis and disease progression/survival were analyzed using univariate Cox proportional hazards regression model and Kaplan-Meier estimates. RESULTS: Skin tumors were linked to shorter progression-free, overall survival and a 3.48 increased risk for disease progression when compared to erythroderma. The Cutaneous Lymphoma International Prognostic Index identified patients at risk in early-stage disease only. Moreover, expression of Ki-67 >20%, CD30 >10%, CD20+, and CD7- were associated with a significantly worse outcome independent of disease stage. Only single-agent interferon-α and phototherapy combined with interferon-α or retinoids/bexarotene achieved long-term disease control in MF. CONCLUSIONS: Our data support predictive validity of prognostic factors and models in MF and identified further potential parameters associated with poor survival. Prospective studies on prognostic indices across disease stages and treatment modalities are needed to predict and improve survival.
Assuntos
Micose Fungoide , Neoplasias Cutâneas , Humanos , Prognóstico , Estudos Retrospectivos , Estudos Prospectivos , Micose Fungoide/diagnóstico , Micose Fungoide/terapia , Resultado do Tratamento , Interferon-alfa , Progressão da Doença , Estadiamento de NeoplasiasRESUMO
BACKGROUND: Although ample knowledge exists about phenotype and function of cutaneous T lymphocytes, much less is known about the lymphocytic components of the skin's innate immune system. OBJECTIVE: To better understand the biologic role of cutaneous innate lymphoid cells (ILCs), we investigated their phenotypic and molecular features under physiologic (normal human skin [NHS]) and pathologic (lesional skin of patients with atopic dermatitis [AD]) conditions. METHODS: Skin punch biopsies and reduction sheets as well as blood specimens were obtained from either patients with AD or healthy individuals. Cell and/or tissue samples were analyzed by flow cytometry, immunohistochemistry, and single-cell RNA sequencing or subjected to in vitro/ex vivo culture. RESULTS: Notwithstanding substantial quantitative differences between NHS and AD skin, we found that the vast majority of cutaneous ILCs belong to the CRTH2+ subset and reside in the upper skin layers. Single-cell RNA sequencing of cutaneous ILC-enriched cell samples confirmed the predominance of biologically heterogeneous group 2 ILCs and, for the first time, demonstrated considerable ILC lineage infidelity (coexpression of genes typical of either type 2 [GATA3 and IL13] or type 3/17 [RORC, IL22, and IL26] immunity within individual cells) in lesional AD skin, and to a much lesser extent, in NHS. Similar events were demonstrated in ILCs from skin explant cultures and in vitro expanded ILCs from the peripheral blood. CONCLUSION: These findings support the concept that instead of being a stable entity with well-defined components, the skin immune system consists of a network of highly flexible cellular players that are capable of adjusting their function to the needs and challenges of the environment.
Assuntos
Linhagem da Célula , Linfócitos/imunologia , Análise de Célula Única/métodos , Dermatite Atópica/imunologia , Citometria de Fluxo , Humanos , Imunidade Inata , Células Matadoras Naturais/imunologia , RNA-Seq , Pele/imunologiaRESUMO
BACKGROUND: Automatic cell type identification is essential to alleviate a key bottleneck in scRNA-seq data analysis. While most existing classification tools show good sensitivity and specificity, they often fail to adequately not-classify cells that are missing in the used reference. Additionally, many tools do not scale to the continuously increasing size of current scRNA-seq datasets. Therefore, additional tools are needed to solve these challenges. RESULTS: scAnnotatR is a novel R package that provides a complete framework to classify cells in scRNA-seq datasets using pre-trained classifiers. It supports both Seurat and Bioconductor's SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scAnnotatR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior accuracy, sensitivity and specificity compared to existing tools while being able to not-classify unknown cell types. Moreover, scAnnotatR is the only of the best performing tools able to process datasets containing more than 600,000 cells. CONCLUSIONS: scAnnotatR is freely available on GitHub ( https://github.com/grisslab/scAnnotatR ) and through Bioconductor (from version 3.14). It is consistently among the best performing tools in terms of classification accuracy while scaling to the largest datasets.
Assuntos
RNA , Análise de Célula Única , RNA/genética , Análise de Sequência de RNA , Sequenciamento do ExomaRESUMO
Spectrum clustering is a powerful strategy to minimize redundant mass spectra by grouping them based on similarity, with the aim of forming groups of mass spectra from the same repeatedly measured analytes. Each such group of near-identical spectra can be represented by its so-called consensus spectrum for downstream processing. Although several algorithms for spectrum clustering have been adequately benchmarked and tested, the influence of the consensus spectrum generation step is rarely evaluated. Here, we present an implementation and benchmark of common consensus spectrum algorithms, including spectrum averaging, spectrum binning, the most similar spectrum, and the best-identified spectrum. We have analyzed diverse public data sets using two different clustering algorithms (spectra-cluster and MaRaCluster) to evaluate how the consensus spectrum generation procedure influences downstream peptide identification. The BEST and BIN methods were found the most reliable methods for consensus spectrum generation, including for data sets with post-translational modifications (PTM) such as phosphorylation. All source code and data of the present study are freely available on GitHub at https://github.com/statisticalbiotechnology/representative-spectra-benchmark.
Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Consenso , Bases de Dados de Proteínas , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/métodosRESUMO
The order of enzymatic activity across Golgi cisternae is essential for complex molecule biosynthesis. However, an inability to separate Golgi cisternae has meant that the cisternal distribution of most resident proteins, and their underlying localization mechanisms, are unknown. Here, we exploit differences in surface charge of intact cisternae to perform separation of early to late Golgi subcompartments. We determine protein and glycan abundance profiles across the Golgi; over 390 resident proteins are identified, including 136 new additions, with over 180 cisternal assignments. These assignments provide a means to better understand the functional roles of Golgi proteins and how they operate sequentially. Protein and glycan distributions are validated in vivo using high-resolution microscopy. Results reveal distinct functional compartmentalization among resident Golgi proteins. Analysis of transmembrane proteins shows several sequence-based characteristics relating to pI, hydrophobicity, Ser abundance, and Phe bilayer asymmetry that change across the Golgi. Overall, our results suggest that a continuum of transmembrane features, rather than discrete rules, guide proteins to earlier or later locations within the Golgi stack.
Assuntos
Complexo de Golgi/metabolismo , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Complexo de Golgi/ultraestrutura , Interações Hidrofóbicas e Hidrofílicas , Membranas Intracelulares , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Polissacarídeos/química , Polissacarídeos/metabolismo , ProteomaRESUMO
Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.
Assuntos
Bases de Dados Genéticas , Proteômica , Software , Linfócitos B/imunologia , Humanos , Internet , Interface Usuário-ComputadorRESUMO
Prostate cancer (PCa) has a broad spectrum of clinical behavior; hence, biomarkers are urgently needed for risk stratification. Here, we aim to find potential biomarkers for risk stratification, by utilizing a gene co-expression network of transcriptomics data in addition to laser-microdissected proteomics from human and murine prostate FFPE samples. We show up-regulation of oxidative phosphorylation (OXPHOS) in PCa on the transcriptomic level and up-regulation of the TCA cycle/OXPHOS on the proteomic level, which is inversely correlated to STAT3 expression. We hereby identify gene expression of pyruvate dehydrogenase kinase 4 (PDK4), a key regulator of the TCA cycle, as a promising independent prognostic marker in PCa. PDK4 predicts disease recurrence independent of diagnostic risk factors such as grading, staging, and PSA level. Therefore, low PDK4 is a promising marker for PCa with dismal prognosis.
Assuntos
Perfilação da Expressão Gênica/métodos , Recidiva Local de Neoplasia/genética , Neoplasias Experimentais/patologia , Neoplasias da Próstata/genética , Proteômica/métodos , Piruvato Desidrogenase Quinase de Transferência de Acetil/genética , Fator de Transcrição STAT3/genética , Animais , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica , Humanos , Microdissecção e Captura a Laser , Masculino , Camundongos , Gradação de Tumores , Recidiva Local de Neoplasia/metabolismo , Recidiva Local de Neoplasia/patologia , Neoplasias Experimentais/genética , Neoplasias Experimentais/metabolismo , Fosforilação Oxidativa , Prognóstico , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Piruvato Desidrogenase Quinase de Transferência de Acetil/metabolismo , Fator de Transcrição STAT3/metabolismo , Biologia de Sistemas , Adulto JovemRESUMO
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data, and is one of the founding members of the global ProteomeXchange (PX) consortium. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2016. In the last 3 years, public data sharing through PRIDE (as part of PX) has definitely become the norm in the field. In parallel, data re-use of public proteomics data has increased enormously, with multiple applications. We first describe the new architecture of PRIDE Archive, the archival component of PRIDE. PRIDE Archive and the related data submission framework have been further developed to support the increase in submitted data volumes and additional data types. A new scalable and fault tolerant storage backend, Application Programming Interface and web interface have been implemented, as a part of an ongoing process. Additionally, we emphasize the improved support for quantitative proteomics data through the mzTab format. At last, we outline key statistics on the current data contents and volume of downloads, and how PRIDE data are starting to be disseminated to added-value resources including Ensembl, UniProt and Expression Atlas.
Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas , Proteômica , Peptídeos/química , SoftwareRESUMO
Reproducibility has become a major concern in biomedical research. In proteomics, bioinformatic workflows can quickly consist of multiple software tools each with its own set of parameters. Their usage involves the definition of often hundreds of parameters as well as data operations to ensure tool interoperability. Hence, a manuscript's methods section is often insufficient to completely describe and reproduce a data analysis workflow. Here we present IsoProt: A complete and reproducible bioinformatic workflow deployed on a portable container environment to analyze data from isobarically labeled, quantitative proteomics experiments. The workflow uses only open source tools and provides a user-friendly and interactive browser interface to configure and execute the different operations. Once the workflow is executed, the results including the R code to perform statistical analyses can be downloaded as an HTML document providing a complete record of the performed analyses. IsoProt therefore represents a reproducible bioinformatics workflow that will yield identical results on any computer platform.
Assuntos
Marcação por Isótopo , Proteoma/análise , Proteômica/métodos , Software , Espectrometria de Massas em Tandem , Animais , Bases de Dados Factuais , Malária Cerebral/metabolismo , Camundongos , Proteoma/química , Proteoma/metabolismo , Reprodutibilidade dos TestesRESUMO
Label-free quantification has become a common-practice in many mass spectrometry-based proteomics experiments. In recent years, we and others have shown that spectral clustering can considerably improve the analysis of (primarily large-scale) proteomics data sets. Here we show that spectral clustering can be used to infer additional peptide-spectrum matches and improve the quality of label-free quantitative proteomics data in data sets also containing only tens of MS runs. We analyzed four well-known public benchmark data sets that represent different experimental settings using spectral counting and peak intensity based label-free quantification. In both approaches, the additionally inferred peptide-spectrum matches through our spectra-cluster algorithm improved the detectability of low abundant proteins while increasing the accuracy of the derived quantitative data, without increasing the data sets' noise. Additionally, we developed a Proteome Discoverer node for our spectra-cluster algorithm which allows anyone to rebuild our proposed pipeline using the free version of Proteome Discoverer.
Assuntos
Análise por Conglomerados , Espectrometria de Massas/métodos , Proteoma/análise , Proteômica/métodos , Algoritmos , Bases de Dados de Proteínas , HumanosRESUMO
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to consistently characterize three distinct groups of spectra: 1) incorrectly identified spectra, 2) spectra correctly identified but below the set scoring threshold, and 3) truly unidentified spectra. Using a multitude of complementary analysis approaches, we were able to identify less than 20% of the consistently unidentified spectra. The complete spectrum clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
RESUMO
In this article, current and future applications of spectral clustering are discussed in the context of mass spectrometry-based proteomics approaches. First of all, the main algorithms and tools that can currently be used to perform spectral clustering are introduced. In addition, its main applications and their use in current computational proteomics workflows are explained, including the generation of spectral libraries and spectral archives. Finally, possible future directions for spectral clustering, including its potential use to achieve a deeper coverage of the proteome and the discovery of novel post-translational modifications and single amino acid variants.
Assuntos
Algoritmos , Análise por Conglomerados , Proteômica/métodos , Análise Espectral/métodos , Bases de Dados de Proteínas , Humanos , Proteoma/análiseRESUMO
In the recent benchmarking article entitled "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra", Rieder et al. compared several different approaches to cluster MS/MS spectra. While we certainly recognize the value of the manuscript, here, we report some shortcomings detected in the original analyses. For most analyses, the authors clustered only single MS/MS runs. In one of the reported analyses, three MS/MS runs were processed together, which already led to computational performance issues in many of the tested approaches. This fact highlights the difficulties of using many of the tested algorithms on the nowadays produced average proteomics data sets. Second, the authors only processed identified spectra when merging MS runs. Thereby, all unidentified spectra that are of lower quality were already removed from the data set and could not influence the clustering results. Next, we found that the authors did not analyze the effect of chimeric spectra on the clustering results. In our analysis, we found that 3% of the spectra in the used data sets were chimeric, and this had marked effects on the behavior of the different clustering algorithms tested. Finally, the authors' choice to evaluate the MS-Cluster and spectra-cluster algorithms using a precursor tolerance of 5 Da for high-resolution Orbitrap data only was, in our opinion, not adequate to assess the performance of MS/MS clustering approaches.
Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Benchmarking , Análise por Conglomerados , ProteômicaRESUMO
The 2017 Dagstuhl Seminar on Computational Proteomics provided an opportunity for a broad discussion on the current state and future directions of the generation and use of peptide tandem mass spectrometry spectral libraries. Their use in proteomics is growing slowly, but there are multiple challenges in the field that must be addressed to further increase the adoption of spectral libraries and related techniques. The primary bottlenecks are the paucity of high quality and comprehensive libraries and the general difficulty of adopting spectral library searching into existing workflows. There are several existing spectral library formats, but none captures a satisfactory level of metadata; therefore, a logical next improvement is to design a more advanced, Proteomics Standards Initiative-approved spectral library format that can encode all of the desired metadata. The group discussed a series of metadata requirements organized into three designations of completeness or quality, tentatively dubbed bronze, silver, and gold. The metadata can be organized at four different levels of granularity: at the collection (library) level, at the individual entry (peptide ion) level, at the peak (fragment ion) level, and at the peak annotation level. Strategies for encoding mass modifications in a consistent manner and the requirement for encoding high-quality and commonly seen but as-yet-unidentified spectra were discussed. The group also discussed related topics, including strategies for comparing two spectra, techniques for generating representative spectra for a library, approaches for selection of optimal signature ions for targeted workflows, and issues surrounding the merging of two or more libraries into one. We present here a review of this field and the challenges that the community must address in order to accelerate the adoption of spectral libraries in routine analysis of proteomics datasets.