Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Mol Cell Proteomics ; 23(4): 100743, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38403075

RESUMEN

Discovering noncanonical peptides has been a common application of proteogenomics. Recent studies suggest that certain noncanonical peptides, known as noncanonical major histocompatibility complex-I (MHC-I)-associated peptides (ncMAPs), that bind to MHC-I may make good immunotherapeutic targets. De novo peptide sequencing is a great way to find ncMAPs since it can detect peptide sequences from their tandem mass spectra without using any sequence databases. However, this strategy has not been widely applied for ncMAP identification because there is not a good way to estimate its false-positive rates. In order to completely and accurately identify immunopeptides using de novo peptide sequencing, we describe a unique pipeline called proteomics X genomics. In contrast to current pipelines, it makes use of genomic data, RNA-Seq abundance and sequencing quality, in addition to proteomic features to increase the sensitivity and specificity of peptide identification. We show that the peptide-spectrum match quality and genetic traits have a clear relationship, showing that they can be utilized to evaluate peptide-spectrum matches. From 10 samples, we found 24,449 canonical MHC-I-associated peptides and 956 ncMAPs by using a target-decoy competition. Three hundred eighty-seven ncMAPs and 1611 canonical MHC-I-associated peptides were new identifications that had not yet been published. We discovered 11 ncMAPs produced from a squirrel monkey retrovirus in human cell lines in addition to the two ncMAPs originating from a complementarity determining region 3 in an antibody thanks to the unrestricted search space assumed by de novo sequencing. These entirely new identifications show that proteomics X genomics can make the most of de novo peptide sequencing's advantages and its potential use in the search for new immunotherapeutic targets.


Asunto(s)
Antígenos de Histocompatibilidad Clase I , Péptidos , Péptidos/metabolismo , Péptidos/química , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/metabolismo , Humanos , Proteómica/métodos , RNA-Seq/métodos , Animales
2.
Bioinformatics ; 39(12)2023 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-37995286

RESUMEN

MOTIVATION: Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS: Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION: DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.


Asunto(s)
Proteínas , Programas Informáticos , Conformación Proteica , Proteínas/química , Estructura Secundaria de Proteína , Pliegue de Proteína
3.
Anal Chem ; 95(46): 16918-16926, 2023 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-37946317

RESUMEN

To gain a better understanding of the complex human immune system, it is necessary to measure and interpret numerous cellular protein expressions at the single cell level. Mass cytometry is a relatively new technology that offers unprecedented information about the protein expression of a single cell. Conversely, the analysis of high-dimensional and multiparametric mass cytometric data sets presents a new computational challenge. For instance, conventional "manual gating" analysis was inefficient and unreliable for multiparametric phenotyping of the heterogeneous immune cellular system; consequently, automated methods have been developed to address the high dimensionality of mass cytometry data and enhance the reproducibility of the analysis. Here, we present CyGate, a semiautomated method for classifying single cells into their respective cell types. CyGate learns a gating strategy from a reference data set, trains a model for cell classification, and then automatically analyzes additional data sets using the trained model. CyGate also supports the machine learning framework for the classification of "ungated" cells, which are typically disregarded by automated methods. CyGate's utility was demonstrated by its high performance in cell type classification and the lowest generalization error on various public data sets when compared to the state-of-the-art semiautomated methods. Notably, CyGate had the shortest execution time, allowing it to scale with a growing number of samples. CyGate is available at https://github.com/seungjinna/cygate.


Asunto(s)
Biología Computacional , Aprendizaje Automático , Humanos , Citometría de Flujo/métodos , Reproducibilidad de los Resultados , Biología Computacional/métodos , Algoritmos
4.
Anal Chem ; 95(30): 11193-11200, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37459568

RESUMEN

Predicting peptide detectability is useful in a variety of mass spectrometry (MS)-based proteomics applications, particularly targeted proteomics. However, most machine learning-based computational methods have relied solely on information from the peptide itself, such as its amino acid sequences or physicochemical properties, despite the fact that peptides detected by MS are dependent on many factors, including protein sample preparation, digestion, separation, ionization, and precursor selection during MS experiments. DbyDeep (Detectability by Deep learning) is an innovative end-to-end LSTM network model for peptide detectability prediction that incorporates sequence contexts of peptides and their cleavage sites (by protease). Utilizing the cleavage site contexts could improve the performance of prediction, and DbyDeep outperformed existing methods in predicting peptides recognizable from multiple MS/MS data sets with diverse species and MS instruments. We argue for the necessity of a learning model that encompasses several contexts associated with peptide detection, as opposed to depending just on peptide sequences. There is a Python implementation of DbyDeep at https://github.com/BISCodeRepo/DbyDeep.


Asunto(s)
Aprendizaje Profundo , Espectrometría de Masas en Tándem , Péptidos/química , Proteínas , Secuencia de Aminoácidos
6.
Nat Cancer ; 4(2): 290-307, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36550235

RESUMEN

We report a proteogenomic analysis of pancreatic ductal adenocarcinoma (PDAC). Mutation-phosphorylation correlations identified signaling pathways associated with somatic mutations in significantly mutated genes. Messenger RNA-protein abundance correlations revealed potential prognostic biomarkers correlated with patient survival. Integrated clustering of mRNA, protein and phosphorylation data identified six PDAC subtypes. Cellular pathways represented by mRNA and protein signatures, defining the subtypes and compositions of cell types in the subtypes, characterized them as classical progenitor (TS1), squamous (TS2-4), immunogenic progenitor (IS1) and exocrine-like (IS2) subtypes. Compared with the mRNA data, protein and phosphorylation data further classified the squamous subtypes into activated stroma-enriched (TS2), invasive (TS3) and invasive-proliferative (TS4) squamous subtypes. Orthotopic mouse PDAC models revealed a higher number of pro-tumorigenic immune cells in TS4, inhibiting T cell proliferation. Our proteogenomic analysis provides significantly mutated genes/biomarkers, cellular pathways and cell types as potential therapeutic targets to improve stratification of patients with PDAC.


Asunto(s)
Carcinoma Ductal Pancreático , Carcinoma de Células Escamosas , Neoplasias Pancreáticas , Proteogenómica , Animales , Ratones , Humanos , Neoplasias Pancreáticas/genética , Carcinoma Ductal Pancreático/genética , Biomarcadores , Neoplasias Pancreáticas
7.
Bioinformatics ; 38(11): 2980-2987, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35441674

RESUMEN

MOTIVATION: Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS: With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia and breast cancer phosphoproteomes. AVAILABILITY AND IMPLEMENTATION: Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.


Asunto(s)
Fosfopéptidos , Espectrometría de Masas en Tándem , Humanos , Fosfopéptidos/análisis , Espectrometría de Masas en Tándem/métodos , Algoritmos , Bases de Datos Factuales , Programas Informáticos , Bases de Datos de Proteínas
8.
BMC Bioinformatics ; 23(1): 109, 2022 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-35354356

RESUMEN

BACKGROUND: In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. RESULTS: We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23-38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. CONCLUSIONS: TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Aprendizaje Automático , Péptidos , Espectrometría de Masas en Tándem/métodos
9.
Int J Mol Sci ; 21(18)2020 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-32899552

RESUMEN

ß/γ-Crystallins, the main structural protein in human lenses, have highly stable structure for keeping the lens transparent. Their mutations have been linked to cataracts. In this study, we identified 10 new mutations of ß/γ-crystallins in lens proteomic dataset of cataract patients using bioinformatics tools. Of these, two double mutants, S175G/H181Q of ßΒ2-crystallin and P24S/S31G of γD-crystallin, were found mutations occurred in the largest loop linking the distant ß-sheets in the Greek key motif. We selected these double mutants for identifying the properties of these mutations, employing biochemical assay, the identification of protein modifications with nanoUPLC-ESI-TOF tandem MS and examining their structural dynamics with hydrogen/deuterium exchange-mass spectrometry (HDX-MS). We found that both double mutations decrease protein stability and induce the aggregation of ß/γ-crystallin, possibly causing cataracts. This finding suggests that both the double mutants can serve as biomarkers of cataracts.


Asunto(s)
Catarata/genética , Cadena B de beta-Cristalina/genética , gamma-Cristalinas/genética , Adolescente , Adulto , Anciano , Preescolar , Humanos , Recién Nacido , Cristalino/metabolismo , Mutación/genética , Agregado de Proteínas/genética , Estabilidad Proteica , Proteómica/métodos , Cadena B de beta-Cristalina/metabolismo , gamma-Cristalinas/metabolismo
10.
Bioinformatics ; 36(Suppl_1): i203-i209, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32657416

RESUMEN

MOTIVATION: Proteogenomics has proven its utility by integrating genomics and proteomics. Typical approaches use data from next-generation sequencing to infer proteins expressed. A sample-specific protein sequence database is often adopted to identify novel peptides from matched mass spectrometry-based proteomics; nevertheless, there is no software that can practically identify all possible forms of mutated peptides suggested by various genomic information sources. RESULTS: We propose MutCombinator, which enables us to practically identify mutated peptides from tandem mass spectra allowing combinatorial mutations during the database search. It uses an upgraded version of a variant graph, keeping track of frame information. The variant graph is indexed by nine nucleotides for fast access. Using MutCombinator, we could identify more mutated peptides than previous methods, because combinations of point mutations are considered and also because it can be practically applied together with a large mutation database such as COSMIC. Furthermore, MutCombinator supports in-frame search for coding regions and three-frame search for non-coding regions. AVAILABILITY AND IMPLEMENTATION: https://prix.hanyang.ac.kr/download/mutcombinator.jsp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Nucleótidos , Péptidos , Bases de Datos de Proteínas , Mutación , Péptidos/genética , Proteómica , Programas Informáticos
11.
Comput Struct Biotechnol J ; 18: 1391-1402, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32637038

RESUMEN

Mass spectrometry (MS) has made enormous contributions to comprehensive protein identification and quantification in proteomics. MS is also gaining momentum for structural biology in a variety of ways, complementing conventional structural biology techniques. Here, we will review how MS-based techniques, such as hydrogen/deuterium exchange, covalent labeling, and chemical cross-linking, enable the characterization of protein structure, dynamics, and interactions, especially from a perspective of their data analyses. Structural information encoded by chemical probes in intact proteins is decoded by interpreting MS data at a peptide level, i.e., revealing conformational and dynamic changes in local regions of proteins. The structural MS data are not amenable to data analyses in traditional proteomics workflow, requiring dedicated software for each type of data. We first provide basic principles of data interpretation, including isotopic distribution and peptide sequencing. We then focus particularly on computational methods for structural MS data analyses and discuss outstanding challenges in a proteome-wide large scale analysis.

12.
Nat Commun ; 11(1): 3288, 2020 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-32620753

RESUMEN

The prognostic and therapeutic relevance of molecular subtypes for the most aggressive isocitrate dehydrogenase 1/2 (IDH) wild-type glioblastoma (GBM) is currently limited due to high molecular heterogeneity of the tumors that impedes patient stratification. Here, we describe a distinct binary classification of IDH wild-type GBM tumors derived from a quantitative proteomic analysis of 39 IDH wild-type GBMs as well as IDH mutant and low-grade glioma controls. Specifically, GBM proteomic cluster 1 (GPC1) tumors exhibit Warburg-like features, neural stem-cell markers, immune checkpoint ligands, and a poor prognostic biomarker, FKBP prolyl isomerase 9 (FKBP9). Meanwhile, GPC2 tumors show elevated oxidative phosphorylation-related proteins, differentiated oligodendrocyte and astrocyte markers, and a favorable prognostic biomarker, phosphoglycerate dehydrogenase (PHGDH). Integrating these proteomic features with the pharmacological profiles of matched patient-derived cells (PDCs) reveals that the mTORC1/2 dual inhibitor AZD2014 is cytotoxic to the poor prognostic PDCs. Our analyses will guide GBM prognosis and precision treatment strategies.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Neoplasias Encefálicas/metabolismo , Glioblastoma/metabolismo , Isocitrato Deshidrogenasa/genética , Proteogenómica/métodos , Proteómica/métodos , Benzamidas/farmacología , Biomarcadores de Tumor/genética , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/terapia , Línea Celular Tumoral , Supervivencia Celular/efectos de los fármacos , Supervivencia Celular/genética , Glioblastoma/genética , Glioblastoma/terapia , Humanos , Isocitrato Deshidrogenasa/clasificación , Isocitrato Deshidrogenasa/metabolismo , Estimación de Kaplan-Meier , Diana Mecanicista del Complejo 1 de la Rapamicina/antagonistas & inhibidores , Diana Mecanicista del Complejo 1 de la Rapamicina/metabolismo , Diana Mecanicista del Complejo 2 de la Rapamicina/antagonistas & inhibidores , Diana Mecanicista del Complejo 2 de la Rapamicina/metabolismo , Morfolinas/farmacología , Mutación , Pronóstico , Inhibidores de Proteínas Quinasas/farmacología , Pirimidinas/farmacología
13.
J Proteome Res ; 19(1): 212-220, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31714086

RESUMEN

Recent sequencing technologies have highlighted translation of untranslated regions (UTRs) in genomes, although it remains unknown whether the translated products persist in a cell. Here, we propose a proteogenomic approach to UTR identification at the proteome level, which has been challenging due to the lack of corresponding sequences required for peptide spectrum matching. We address the challenge with constructing translated UTR (tUTR) database, consisting of all hypothetical sequences that can be translated from UTR by assuming non-AUG initiation at near-cognate start codons and stop codon readthrough. In the analysis of the H1299 cell line mass spectrometry (MS/MS) dataset, the tUTR DB-based proteogenomic approach enabled the detection of 52 5'-UTR and 9 3'-UTR peptides from 45 and 9 genes, respectively. The identified UTR peptides were validated via high spectral similarity with their synthetic peptides. The 5'-UTR peptides pointed out alternative initiation sites with non-AUG start codons, which exactly conformed to Kozak contexts of annotated initiation sites. It is also worth noting that our approach can detect translated amino acid sequences as well as provide evidence for UTR translation, while ribosome profiling provides only the translation evidence. For previously reported stop codon readthrough in MDH1 gene, we could confirm the amino acid inserted during the readthrough. Data are available via ProteomeXchange with identifier PXD016207.


Asunto(s)
Proteogenómica , Codón Iniciador , Péptidos/genética , Espectrometría de Masas en Tándem , Regiones no Traducidas
14.
J Proteome Res ; 18(10): 3800-3806, 2019 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-31475827

RESUMEN

We propose to use cRFP (common Repository of FBS Proteins) in the MS (mass spectrometry) raw data search of cell secretomes. cRFP is a small supplementary sequence list of highly abundant fetal bovine serum proteins added to the reference database in use. The aim behind using cRFP is to prevent the contaminant FBS proteins from being misidentified as other proteins in the reference database, just as we would use cRAP (common Repository of Adventitious Proteins) to prevent contaminant proteins present either by accident or through unavoidable contacts from being misidentified as other proteins. We expect it to be widely used in experiments where the proteins are obtained from serum-free media after thorough washing of the cells, or from a complex media such as SILAC, or from extracellular vesicles directly.


Asunto(s)
Células Cultivadas/metabolismo , Proteoma/análisis , Proteómica/métodos , Suero/química , Animales , Bovinos , Medios de Cultivo/química , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas
15.
Anal Chem ; 91(17): 11324-11333, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31365238

RESUMEN

Post-translational modifications regulate various cellular processes and are of great biological interest. Unrestrictive searches of mass spectrometry data enable the detection of any type of modification. Here we propose MODplus, which makes practical unrestrictive searches possible by allowing (1) hundreds of modifications, (2) multiple modifications per peptide, (3) the whole proteome database, and (4) any tolerant values in search parameters. The utility of MODplus was demonstrated in large human data sets of HEK293 cells and TMT-labeled phosphorylation enrichment. Notably, MODplus supports identifying different modification types at multiple sites and reports real chemical and biological modifications, as it has been very labor intensive to link unrestrictive search results to real modifications. We also confirmed the presence of Missing Precursor (MP) spectra that were not identifiable using targeted precursor masses. The MP spectra mostly resulted in identifications of wrong modifications and negatively affected the overall performance, often by as much as 10%. MODplus can rapidly recognize MP spectra and correct their identifications, resulting in increased identification rate up to 70% in the HEK293 data set as well as improved reliability.


Asunto(s)
Espectrometría de Masas/métodos , Procesamiento Proteico-Postraduccional , Programas Informáticos , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto/normas , Células HEK293 , Humanos , Proteómica/métodos , Reproducibilidad de los Resultados , Error Científico Experimental
16.
Sci Rep ; 9(1): 3176, 2019 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-30816214

RESUMEN

Characterization of protein structural changes in response to protein modifications, ligand or chemical binding, or protein-protein interactions is essential for understanding protein function and its regulation. Amide hydrogen/deuterium exchange (HDX) coupled with mass spectrometry (MS) is one of the most favorable tools for characterizing the protein dynamics and changes of protein conformation. However, currently the analysis of HDX-MS data is not up to its full power as it still requires manual validation by mass spectrometry experts. Especially, with the advent of high throughput technologies, the data size grows everyday and an automated tool is essential for the analysis. Here, we introduce a fully automated software, referred to as 'deMix', for the HDX-MS data analysis. deMix deals directly with the deuterated isotopic distributions, but not considering their centroid masses and is designed to be robust over random noises. In addition, unlike the existing approaches that can only determine a single state from an isotopic distribution, deMix can also detect a bimodal deuterated distribution, arising from EX1 behavior or heterogeneous peptides in conformational isomer proteins. Furthermore, deMix comes with visualization software to facilitate validation and representation of the analysis results.


Asunto(s)
Espectrometría de Masas de Intercambio de Hidrógeno-Deuterio/métodos , Proteínas/ultraestructura , Programas Informáticos , Conformación Proteica , Proteínas/química
17.
Cancer Cell ; 35(1): 111-124.e10, 2019 01 14.
Artículo en Inglés | MEDLINE | ID: mdl-30645970

RESUMEN

We report proteogenomic analysis of diffuse gastric cancers (GCs) in young populations. Phosphoproteome data elucidated signaling pathways associated with somatic mutations based on mutation-phosphorylation correlations. Moreover, correlations between mRNA and protein abundances provided potential oncogenes and tumor suppressors associated with patient survival. Furthermore, integrated clustering of mRNA, protein, phosphorylation, and N-glycosylation data identified four subtypes of diffuse GCs. Distinguishing these subtypes was possible by proteomic data. Four subtypes were associated with proliferation, immune response, metabolism, and invasion, respectively; and associations of the subtypes with immune- and invasion-related pathways were identified mainly by phosphorylation and N-glycosylation data. Therefore, our proteogenomic analysis provides additional information beyond genomic analyses, which can improve understanding of cancer biology and patient stratification in diffuse GCs.


Asunto(s)
Redes Reguladoras de Genes , Mutación , Proteogenómica/métodos , Neoplasias Gástricas/genética , Neoplasias Gástricas/metabolismo , Edad de Inicio , Femenino , Glicosilación , Humanos , Masculino , Fosforilación , Mapas de Interacción de Proteínas , Análisis de Supervivencia , Secuenciación del Exoma/métodos
18.
J Proteome Res ; 17(10): 3593-3598, 2018 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-30033731

RESUMEN

Most database search tools for proteomics have their own scoring parameter sets depending on experimental conditions such as fragmentation methods, instruments, digestion enzymes, and so on. These scoring parameter sets are usually predefined by tool developers and cannot be modified by users. The number of different experimental conditions grows as the technology develops, and the given set of scoring parameters could be suboptimal for tandem mass spectrometry data acquired using new sample preparation or fragmentation methods. Here we introduce a new approach to optimize scoring parameters in a data-dependent manner using a spectrum quality filter. The new approach conducts a preliminary search for the spectra selected by the spectrum quality filter. Search results from the preliminary search are used to generate data-dependent scoring parameters; then, the full search over the entire input spectra is conducted using the learned scoring parameters. We show that the new approach yields more and better peptide-spectrum matches than the conventional search using built-in scoring parameters when compared at the same 1% false discovery rate.


Asunto(s)
Algoritmos , Péptidos/metabolismo , Proteínas/metabolismo , Proteómica/métodos , Espectrometría de Masas en Tándem/métodos , Exactitud de los Datos , Bases de Datos de Proteínas , Humanos , Reproducibilidad de los Resultados , Motor de Búsqueda/métodos , Programas Informáticos
19.
Sci Rep ; 7(1): 6599, 2017 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-28747677

RESUMEN

Various forms of protein (proteoforms) are generated by genetic variations, alternative splicing, alternative translation initiation, co- or post-translational modification and proteolysis. Different proteoforms are in part discovered by characterizing their N-terminal sequences. Here, we introduce an N-terminal-peptide-enrichment method, Nrich. Filter-aided negative selection formed the basis for the use of two N-blocking reagents and two endoproteases in this method. We identified 6,525 acetylated (or partially acetylated) and 6,570 free protein N-termini arising from 5,727 proteins in HEK293T human cells. The protein N-termini included translation initiation sites annotated in the UniProtKB database, putative alternative translational initiation sites, and N-terminal sites exposed after signal/transit/pro-peptide removal or unknown processing, revealing various proteoforms in cells. In addition, 46 novel protein N-termini were identified in 5' untranslated region (UTR) sequence with pseudo start codons. Our data showing the observation of N-terminal sequences of mature proteins constitutes a useful resource that may provide information for a better understanding of various proteoforms in cells.


Asunto(s)
Células Epiteliales/química , Isoformas de Proteínas/análisis , Células HEK293 , Humanos , Isoformas de Proteínas/aislamiento & purificación
20.
J Proteome Res ; 16(6): 2231-2239, 2017 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-28452485

RESUMEN

Proteogenomic searches are useful for novel peptide identification from tandem mass spectra. Usually, separate and multistage approaches are adopted to accurately control the false discovery rate (FDR) for proteogenomic search. Their performance on novel peptide identification has not been thoroughly evaluated, however, mainly due to the difficulty in confirming the existence of identified novel peptides. We simulated a proteogenomic search using a controlled, spike-in proteomic data set. After confirming that the results of the simulated proteogenomic search were similar to those of a real proteogenomic search using a human cell line data set, we evaluated the performance of six FDR control methods-global, separate, and multistage FDR estimation, respectively, coupled to a target-decoy search and a mixture model-based method-on novel peptide identification. The multistage approach showed the highest accuracy for FDR estimation. However, global and separate FDR estimation with the mixture model-based method showed higher sensitivities than others at the same true FDR. Furthermore, the mixture model-based method performed equally well when applied without or with a reduced set of decoy sequences. Considering different prior probabilities for novel and known protein identification, we recommend using mixture model-based methods with separate FDR estimation for sensitive and reliable identification of novel peptides from proteogenomic searches.


Asunto(s)
Péptidos/análisis , Proteogenómica/métodos , Línea Celular , Simulación por Computador , Reacciones Falso Positivas , Humanos , Métodos , Modelos Teóricos , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...