Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nat Commun ; 15(1): 4025, 2024 May 13.
Artículo en Inglés | MEDLINE | ID: mdl-38740804

RESUMEN

Intracellular membranes composing organelles of eukaryotes include membrane proteins playing crucial roles in physiological functions. However, a comprehensive understanding of the cellular responses triggered by intracellular membrane-focused oxidative stress remains elusive. Herein, we report an amphiphilic photocatalyst localised in intracellular membranes to damage membrane proteins oxidatively, resulting in non-canonical pyroptosis. Our developed photocatalysis generates hydroxyl radicals and hydrogen peroxides via water oxidation, which is accelerated under hypoxia. Single-molecule magnetic tweezers reveal that photocatalysis-induced oxidation markedly destabilised membrane protein folding. In cell environment, label-free quantification reveals that oxidative damage occurs primarily in membrane proteins related to protein quality control, thereby aggravating mitochondrial and endoplasmic reticulum stress and inducing lytic cell death. Notably, the photocatalysis activates non-canonical inflammasome caspases, resulting in gasdermin D cleavage to its pore-forming fragment and subsequent pyroptosis. These findings suggest that the oxidation of intracellular membrane proteins triggers non-canonical pyroptosis.


Asunto(s)
Inflamasomas , Proteínas de la Membrana , Oxidación-Reducción , Piroptosis , Humanos , Inflamasomas/metabolismo , Proteínas de la Membrana/metabolismo , Estrés Oxidativo , Catálisis , Estrés del Retículo Endoplásmico , Peróxido de Hidrógeno/metabolismo , Proteínas de Unión a Fosfato/metabolismo , Radical Hidroxilo/metabolismo , Mitocondrias/metabolismo , Membranas Intracelulares/metabolismo , Péptidos y Proteínas de Señalización Intracelular/metabolismo , Ratones , Animales , Procesos Fotoquímicos , Pliegue de Proteína , Caspasas/metabolismo , Gasderminas
2.
Anal Chem ; 95(46): 16918-16926, 2023 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-37946317

RESUMEN

To gain a better understanding of the complex human immune system, it is necessary to measure and interpret numerous cellular protein expressions at the single cell level. Mass cytometry is a relatively new technology that offers unprecedented information about the protein expression of a single cell. Conversely, the analysis of high-dimensional and multiparametric mass cytometric data sets presents a new computational challenge. For instance, conventional "manual gating" analysis was inefficient and unreliable for multiparametric phenotyping of the heterogeneous immune cellular system; consequently, automated methods have been developed to address the high dimensionality of mass cytometry data and enhance the reproducibility of the analysis. Here, we present CyGate, a semiautomated method for classifying single cells into their respective cell types. CyGate learns a gating strategy from a reference data set, trains a model for cell classification, and then automatically analyzes additional data sets using the trained model. CyGate also supports the machine learning framework for the classification of "ungated" cells, which are typically disregarded by automated methods. CyGate's utility was demonstrated by its high performance in cell type classification and the lowest generalization error on various public data sets when compared to the state-of-the-art semiautomated methods. Notably, CyGate had the shortest execution time, allowing it to scale with a growing number of samples. CyGate is available at https://github.com/seungjinna/cygate.


Asunto(s)
Biología Computacional , Aprendizaje Automático , Humanos , Citometría de Flujo/métodos , Reproducibilidad de los Resultados , Biología Computacional/métodos , Algoritmos
3.
Anal Chem ; 95(30): 11193-11200, 2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37459568

RESUMEN

Predicting peptide detectability is useful in a variety of mass spectrometry (MS)-based proteomics applications, particularly targeted proteomics. However, most machine learning-based computational methods have relied solely on information from the peptide itself, such as its amino acid sequences or physicochemical properties, despite the fact that peptides detected by MS are dependent on many factors, including protein sample preparation, digestion, separation, ionization, and precursor selection during MS experiments. DbyDeep (Detectability by Deep learning) is an innovative end-to-end LSTM network model for peptide detectability prediction that incorporates sequence contexts of peptides and their cleavage sites (by protease). Utilizing the cleavage site contexts could improve the performance of prediction, and DbyDeep outperformed existing methods in predicting peptides recognizable from multiple MS/MS data sets with diverse species and MS instruments. We argue for the necessity of a learning model that encompasses several contexts associated with peptide detection, as opposed to depending just on peptide sequences. There is a Python implementation of DbyDeep at https://github.com/BISCodeRepo/DbyDeep.


Asunto(s)
Aprendizaje Profundo , Espectrometría de Masas en Tándem , Péptidos/química , Proteínas , Secuencia de Aminoácidos
4.
Bioinformatics ; 38(11): 2980-2987, 2022 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-35441674

RESUMEN

MOTIVATION: Tandem mass tag (TMT)-based tandem mass spectrometry (MS/MS) has become the method of choice for the quantification of post-translational modifications in complex mixtures. Many cancer proteogenomic studies have highlighted the importance of large-scale phosphopeptide quantification coupled with TMT labeling. Herein, we propose a predicted Spectral DataBase (pSDB) search strategy called Deephos that can improve both sensitivity and specificity in identifying MS/MS spectra of TMT-labeled phosphopeptides. RESULTS: With deep learning-based fragment ion prediction, we compiled a pSDB of TMT-labeled phosphopeptides generated from ∼8000 human phosphoproteins annotated in UniProt. Deep learning could successfully recognize the fragmentation patterns altered by both TMT labeling and phosphorylation. In addition, we discuss the decoy spectra for false discovery rate (FDR) estimation in the pSDB search. We show that FDR could be inaccurately estimated by the existing decoy spectra generation methods and propose an innovative method to generate decoy spectra for more accurate FDR estimation. The utilities of Deephos were demonstrated in multi-stage analyses (coupled with database searches) of glioblastoma, acute myeloid leukemia and breast cancer phosphoproteomes. AVAILABILITY AND IMPLEMENTATION: Deephos pSDB and the search software are available at https://github.com/seungjinna/deephos.


Asunto(s)
Fosfopéptidos , Espectrometría de Masas en Tándem , Humanos , Fosfopéptidos/análisis , Espectrometría de Masas en Tándem/métodos , Algoritmos , Bases de Datos Factuales , Programas Informáticos , Bases de Datos de Proteínas
5.
BMC Bioinformatics ; 23(1): 109, 2022 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-35354356

RESUMEN

BACKGROUND: In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. RESULTS: We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23-38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. CONCLUSIONS: TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas , Aprendizaje Automático , Péptidos , Espectrometría de Masas en Tándem/métodos
6.
Int J Mol Sci ; 21(18)2020 Sep 05.
Artículo en Inglés | MEDLINE | ID: mdl-32899552

RESUMEN

ß/γ-Crystallins, the main structural protein in human lenses, have highly stable structure for keeping the lens transparent. Their mutations have been linked to cataracts. In this study, we identified 10 new mutations of ß/γ-crystallins in lens proteomic dataset of cataract patients using bioinformatics tools. Of these, two double mutants, S175G/H181Q of ßΒ2-crystallin and P24S/S31G of γD-crystallin, were found mutations occurred in the largest loop linking the distant ß-sheets in the Greek key motif. We selected these double mutants for identifying the properties of these mutations, employing biochemical assay, the identification of protein modifications with nanoUPLC-ESI-TOF tandem MS and examining their structural dynamics with hydrogen/deuterium exchange-mass spectrometry (HDX-MS). We found that both double mutations decrease protein stability and induce the aggregation of ß/γ-crystallin, possibly causing cataracts. This finding suggests that both the double mutants can serve as biomarkers of cataracts.


Asunto(s)
Catarata/genética , Cadena B de beta-Cristalina/genética , gamma-Cristalinas/genética , Adolescente , Adulto , Anciano , Preescolar , Humanos , Recién Nacido , Cristalino/metabolismo , Mutación/genética , Agregado de Proteínas/genética , Estabilidad Proteica , Proteómica/métodos , Cadena B de beta-Cristalina/metabolismo , gamma-Cristalinas/metabolismo
7.
Comput Struct Biotechnol J ; 18: 1391-1402, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32637038

RESUMEN

Mass spectrometry (MS) has made enormous contributions to comprehensive protein identification and quantification in proteomics. MS is also gaining momentum for structural biology in a variety of ways, complementing conventional structural biology techniques. Here, we will review how MS-based techniques, such as hydrogen/deuterium exchange, covalent labeling, and chemical cross-linking, enable the characterization of protein structure, dynamics, and interactions, especially from a perspective of their data analyses. Structural information encoded by chemical probes in intact proteins is decoded by interpreting MS data at a peptide level, i.e., revealing conformational and dynamic changes in local regions of proteins. The structural MS data are not amenable to data analyses in traditional proteomics workflow, requiring dedicated software for each type of data. We first provide basic principles of data interpretation, including isotopic distribution and peptide sequencing. We then focus particularly on computational methods for structural MS data analyses and discuss outstanding challenges in a proteome-wide large scale analysis.

8.
J Proteome Res ; 19(1): 212-220, 2020 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-31714086

RESUMEN

Recent sequencing technologies have highlighted translation of untranslated regions (UTRs) in genomes, although it remains unknown whether the translated products persist in a cell. Here, we propose a proteogenomic approach to UTR identification at the proteome level, which has been challenging due to the lack of corresponding sequences required for peptide spectrum matching. We address the challenge with constructing translated UTR (tUTR) database, consisting of all hypothetical sequences that can be translated from UTR by assuming non-AUG initiation at near-cognate start codons and stop codon readthrough. In the analysis of the H1299 cell line mass spectrometry (MS/MS) dataset, the tUTR DB-based proteogenomic approach enabled the detection of 52 5'-UTR and 9 3'-UTR peptides from 45 and 9 genes, respectively. The identified UTR peptides were validated via high spectral similarity with their synthetic peptides. The 5'-UTR peptides pointed out alternative initiation sites with non-AUG start codons, which exactly conformed to Kozak contexts of annotated initiation sites. It is also worth noting that our approach can detect translated amino acid sequences as well as provide evidence for UTR translation, while ribosome profiling provides only the translation evidence. For previously reported stop codon readthrough in MDH1 gene, we could confirm the amino acid inserted during the readthrough. Data are available via ProteomeXchange with identifier PXD016207.


Asunto(s)
Proteogenómica , Codón Iniciador , Péptidos/genética , Espectrometría de Masas en Tándem , Regiones no Traducidas
9.
J Proteome Res ; 18(10): 3800-3806, 2019 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-31475827

RESUMEN

We propose to use cRFP (common Repository of FBS Proteins) in the MS (mass spectrometry) raw data search of cell secretomes. cRFP is a small supplementary sequence list of highly abundant fetal bovine serum proteins added to the reference database in use. The aim behind using cRFP is to prevent the contaminant FBS proteins from being misidentified as other proteins in the reference database, just as we would use cRAP (common Repository of Adventitious Proteins) to prevent contaminant proteins present either by accident or through unavoidable contacts from being misidentified as other proteins. We expect it to be widely used in experiments where the proteins are obtained from serum-free media after thorough washing of the cells, or from a complex media such as SILAC, or from extracellular vesicles directly.


Asunto(s)
Células Cultivadas/metabolismo , Proteoma/análisis , Proteómica/métodos , Suero/química , Animales , Bovinos , Medios de Cultivo/química , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas
10.
Anal Chem ; 91(17): 11324-11333, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31365238

RESUMEN

Post-translational modifications regulate various cellular processes and are of great biological interest. Unrestrictive searches of mass spectrometry data enable the detection of any type of modification. Here we propose MODplus, which makes practical unrestrictive searches possible by allowing (1) hundreds of modifications, (2) multiple modifications per peptide, (3) the whole proteome database, and (4) any tolerant values in search parameters. The utility of MODplus was demonstrated in large human data sets of HEK293 cells and TMT-labeled phosphorylation enrichment. Notably, MODplus supports identifying different modification types at multiple sites and reports real chemical and biological modifications, as it has been very labor intensive to link unrestrictive search results to real modifications. We also confirmed the presence of Missing Precursor (MP) spectra that were not identifiable using targeted precursor masses. The MP spectra mostly resulted in identifications of wrong modifications and negatively affected the overall performance, often by as much as 10%. MODplus can rapidly recognize MP spectra and correct their identifications, resulting in increased identification rate up to 70% in the HEK293 data set as well as improved reliability.


Asunto(s)
Espectrometría de Masas/métodos , Procesamiento Proteico-Postraduccional , Programas Informáticos , Bases de Datos de Proteínas , Conjuntos de Datos como Asunto/normas , Células HEK293 , Humanos , Proteómica/métodos , Reproducibilidad de los Resultados , Error Científico Experimental
11.
Sci Rep ; 9(1): 3176, 2019 02 28.
Artículo en Inglés | MEDLINE | ID: mdl-30816214

RESUMEN

Characterization of protein structural changes in response to protein modifications, ligand or chemical binding, or protein-protein interactions is essential for understanding protein function and its regulation. Amide hydrogen/deuterium exchange (HDX) coupled with mass spectrometry (MS) is one of the most favorable tools for characterizing the protein dynamics and changes of protein conformation. However, currently the analysis of HDX-MS data is not up to its full power as it still requires manual validation by mass spectrometry experts. Especially, with the advent of high throughput technologies, the data size grows everyday and an automated tool is essential for the analysis. Here, we introduce a fully automated software, referred to as 'deMix', for the HDX-MS data analysis. deMix deals directly with the deuterated isotopic distributions, but not considering their centroid masses and is designed to be robust over random noises. In addition, unlike the existing approaches that can only determine a single state from an isotopic distribution, deMix can also detect a bimodal deuterated distribution, arising from EX1 behavior or heterogeneous peptides in conformational isomer proteins. Furthermore, deMix comes with visualization software to facilitate validation and representation of the analysis results.


Asunto(s)
Espectrometría de Masas de Intercambio de Hidrógeno-Deuterio/métodos , Proteínas/ultraestructura , Programas Informáticos , Conformación Proteica , Proteínas/química
12.
Mol Cell Proteomics ; 16(12): 2111-2124, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29046389

RESUMEN

Immunotherapy is becoming increasingly important in the fight against cancers, using and manipulating the body's immune response to treat tumors. Understanding the immune repertoire-the collection of immunological proteins-of treated and untreated cells is possible at the genomic, but technically difficult at the protein level. Standard protein databases do not include the highly divergent sequences of somatic rearranged immunoglobulin genes, and may lead to miss identifications in a mass spectrometry search. We introduce a novel proteogenomic approach, AbScan, to identify these highly variable antibody peptides, by developing a customized antibody database construction method using RNA-seq reads aligned to immunoglobulin (Ig) genes.AbScan starts by filtering transcript (RNA-seq) reads that match the template for Ig genes. The retained reads are used to construct a repertoire graph using the "split" de Bruijn graph: a graph structure that improves on the standard de Bruijn graph to capture the high diversity of Ig genes in a compact manner. AbScan corrects for sequencing errors, and converts the graph to a format suitable for searching with MS/MS search tools. We used AbScan to create an antibody database from 90 RNA-seq colorectal tumor samples. Next, we used proteogenomic analysis to search MS/MS spectra of matched colorectal samples from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) against the AbScan generated database. AbScan identified 1,940 distinct antibody peptides. Correlating with previously identified Single Amino-Acid Variants (SAAVs) in the tumor samples, we identified 163 pairs (antibody peptide, SAAV) with significant cooccurrence pattern in the 90 samples. The presence of coexpressed antibody and mutated peptides was correlated with survival time of the individuals. Our results suggest that AbScan (https://github.com/csw407/AbScan.git) is an effective tool for a proteomic exploration of the immune response in cancers.


Asunto(s)
Neoplasias Colorrectales/inmunología , Genómica/métodos , Inmunoglobulinas/química , Péptidos/genética , Proteómica/métodos , Algoritmos , Línea Celular Tumoral , Neoplasias Colorrectales/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Humanos , Inmunoglobulinas/genética , Péptidos/química , Análisis de Secuencia de ARN , Espectrometría de Masas en Tándem
13.
Mol Cell Proteomics ; 15(11): 3501-3512, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27609420

RESUMEN

Peptide and protein identification remains challenging in organisms with poorly annotated or rapidly evolving genomes, as are commonly encountered in environmental or biofuels research. Such limitations render tandem mass spectrometry (MS/MS) database search algorithms ineffective as they lack corresponding sequences required for peptide-spectrum matching. We address this challenge with the spectral networks approach to (1) match spectra of orthologous peptides across multiple related species and then (2) propagate peptide annotations from identified to unidentified spectra. We here present algorithms to assess the statistical significance of spectral alignments (Align-GF), reduce the impurity in spectral networks, and accurately estimate the error rate in propagated identifications. Analyzing three related Cyanothece species, a model organism for biohydrogen production, spectral networks identified peptides from highly divergent sequences from networks with dozens of variant peptides, including thousands of peptides in species lacking a sequenced genome. Our analysis further detected the presence of many novel putative peptides even in genomically characterized species, thus suggesting the possibility of gaps in our understanding of their proteomic and genomic expression. A web-based pipeline for spectral networks analysis is available at http://proteomics.ucsd.edu/software.


Asunto(s)
Cyanothece/metabolismo , Péptidos/análisis , Proteómica/métodos , Algoritmos , Proteínas Bacterianas/metabolismo , Análisis por Conglomerados , Cyanothece/clasificación , Bases de Datos de Proteínas , Genoma Bacteriano , Análisis de Secuencia de Proteína , Programas Informáticos , Espectrometría de Masas en Tándem/métodos
14.
J Proteome Res ; 14(9): 3555-67, 2015 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-26139413

RESUMEN

Aiming toward an improved understanding of the regulation of proteins in cancer, recent studies from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) have focused on analyzing cancer tissue using proteomic technologies and workflows. Although many proteogenomics approaches for the study of cancer samples have been proposed, serious methodological challenges remain, especially in the identification of multiple mutational variants or structural variations such as fusion gene events. In addition, although immune system genes play an important role in cancer, identification of IgG peptides remains challenging in proteomic data sets. Here, we describe an integrative proteogenomic method that extends the limit of proteogenomic searches to identify multiple variant peptides as well as immunoglobulin gene variations/rearrangements using customized mining of RNA-seq data. Our results also provide the first extensive characterization of tumor immune response and demonstrate the potential of this method to improve the molecular characterization of tumor subtypes.


Asunto(s)
Genómica , Inmunoglobulinas/química , Mutación , Péptidos/genética , Proteómica , Empalme Alternativo , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Datos de Secuencia Molecular , Péptidos/química , Espectrometría de Masas en Tándem
15.
Mol Biosyst ; 11(4): 1156-64, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25703060

RESUMEN

The identification of disulfide bonds provides critical information regarding the structure and function of a protein and is a key aspect in understanding signaling cascades in biological systems. Recent proteomic approaches using digestion enzymes have facilitated the characterization of disulfide-bonds and/or oxidized products from cysteine residues, although these methods have limitations in the application of MS/MS. For example, protein digestion to obtain the native form of disulfide bonds results in short lengths of amino acids, which can cause ambiguous MS/MS analysis due to false positive identifications. In this study we propose a new approach, termed planned digestion, to obtain sufficient amino acid lengths after cleavage for proteomic approaches. Application of the DBond software to planned digestion of specific proteins accurately identified disulfide-linked peptides. RNase A was used as a model protein in this study because the disulfide bonds of this protein have been well characterized. Application of this approach to peptides digested with Asp-N/C (chemical digestion) and trypsin under acid hydrolysis conditions identified the four native disulfide bonds of RNase A. Missed cleavages introduced by trypsin treatment for only 3 hours generated sufficient lengths of amino acids for identification of the disulfide bonds. Analysis using MS/MS successfully showed additional fragmentation patterns that are cleavage products of S-S and C-S bonds of disulfide-linkage peptides. These fragmentation patterns generate thioaldehydes, persulfide, and dehydroalanine. This approach of planned digestion with missed cleavages using the DBond algorithm could be applied to other proteins to determine their disulfide linkage and the oxidation patterns of cysteine residues.


Asunto(s)
Disulfuros/química , Fragmentos de Péptidos/química , Proteínas/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Espectrometría de Masas en Tándem/métodos , Secuencia de Aminoácidos , Disulfuros/análisis , Datos de Secuencia Molecular , Fragmentos de Péptidos/análisis , Fragmentos de Péptidos/metabolismo , Proteínas/análisis , Tripsina/metabolismo
16.
Mass Spectrom Rev ; 34(2): 133-47, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-24889695

RESUMEN

Post-translational modifications (PTMs) are critical to almost all aspects of complex processes of the cell. Identification of PTMs is one of the biggest challenges for proteomics, and there have been many computational studies for the analysis of PTMs from tandem mass spectrometry (MS/MS). Most early PTM identification studies have been performed by matching MS/MS data to protein databases, using database search tools, but they are prohibitively slow when a large number of PTMs is given as a search parameter. In this article, we present recent developments to search for more types of PTMs and to speed up the search, and discuss many computational issues and solutions in terms of identifying multiply modified peptides or searching for all possible modifications at once in unrestrictive mode. Apart from the most common type of PTMs involving covalent addition of functional groups to proteins, PTMs such as disulfide linkage require dedicated software for the analysis because they may involve cross-linking between two different parts of proteins. Finally, methods for identification of protein disulfide bonds are presented.


Asunto(s)
Disulfuros/análisis , Fragmentos de Péptidos/análisis , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Disulfuros/química , Datos de Secuencia Molecular , Oxidación-Reducción , Proteínas/química , Proteómica/instrumentación , Proteómica/métodos , Espectrometría de Masas en Tándem
17.
Proteomics ; 14(23-24): 2719-30, 2014 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-25263569

RESUMEN

Cancer is driven by the acquisition of somatic DNA lesions. Distinguishing the early driver mutations from subsequent passenger mutations is key to molecular subtyping of cancers, understanding cancer progression, and the discovery of novel biomarkers. The advances of genomics technologies (whole-genome exome, and transcript sequencing, collectively referred to as NGS (next-generation sequencing)) have fueled recent studies on somatic mutation discovery. However, the vision is challenged by the complexity, redundancy, and errors in genomic data, and the difficulty of investigating the proteome translated portion of aberrant genes using only genomic approaches. Combination of proteomic and genomic technologies are increasingly being employed. Various strategies have been employed to allow the usage of large-scale NGS data for conventional MS/MS searches. This paper provides a discussion of applying different strategies relating to large database search, and FDR (false discovery rate) -based error control, and their implication to cancer proteogenomics. Moreover, it extends and develops the idea of a unified genomic variant database that can be searched by any MS sample. A total of 879 BAM files downloaded from TCGA repository were used to create a 4.34 GB unified FASTA database that contained 2787062 novel splice junctions, 38,464 deletions, 1,105 insertions, and 182,302 substitutions. Proteomic data from a single ovarian carcinoma sample (439,858 spectra) was searched against the database. By applying the most conservative FDR measure, we have identified 524 novel peptides and 65,578 known peptides at 1% FDR threshold. The novel peptides include interesting examples of doubly mutated peptides, frame-shifts, and nonsample-recruited mutations, which emphasize the strength of our approach.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias/metabolismo , Proteómica/métodos , Bases de Datos de Proteínas , Humanos , Neoplasias/genética , Péptidos/genética
18.
PLoS One ; 8(12): e81734, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24312579

RESUMEN

Twenty different aminoacyl-tRNA synthetases (ARSs) link each amino acid to their cognate tRNAs. Individual ARSs are also associated with various non-canonical activities involved in neuronal diseases, cancer and autoimmune diseases. Among them, eight ARSs (D, EP, I, K, L, M, Q and RARS), together with three ARS-interacting multifunctional proteins (AIMPs), are currently known to assemble the multi-synthetase complex (MSC). However, the cellular function and global topology of MSC remain unclear. In order to understand the complex interaction within MSC, we conducted affinity purification-mass spectrometry (AP-MS) using each of AIMP1, AIMP2 and KARS as a bait protein. Mass spectrometric data were funneled into SAINT software to distinguish true interactions from background contaminants. A total of 40, 134, 101 proteins in each bait scored over 0.9 of SAINT probability in HEK 293T cells. Complex-forming ARSs, such as DARS, EPRS, IARS, Kars, LARS, MARS, QARS and RARS, were constantly found to interact with each bait. Variants such as, AIMP2-DX2 and AIMP1 isoform 2 were found with specific peptides in KARS precipitates. Relative enrichment analysis of the mass spectrometric data demonstrated that TARSL2 (threonyl-tRNA synthetase like-2) was highly enriched with the ARS-core complex. The interaction was further confirmed by coimmunoprecipitation of TARSL2 with other ARS core-complex components. We suggest TARSL2 as a new component of ARS core-complex.


Asunto(s)
Aminoacil-ARNt Sintetasas/química , Aminoacil-ARNt Sintetasas/metabolismo , Cromatografía de Afinidad , Biología Computacional/métodos , Espectrometría de Masas , Mapeo de Interacción de Proteínas/métodos , Treonina-ARNt Ligasa/análisis , Treonina-ARNt Ligasa/metabolismo , Algoritmos , Secuencia de Aminoácidos , Proteínas Portadoras/química , Proteínas Portadoras/metabolismo , Citocinas/química , Citocinas/metabolismo , Células HEK293 , Humanos , Lisina-ARNt Ligasa/metabolismo , Datos de Secuencia Molecular , Proteínas de Neoplasias/química , Proteínas de Neoplasias/metabolismo , Proteínas Nucleares , Procesamiento Proteico-Postraduccional , Proteínas de Unión al ARN/química , Proteínas de Unión al ARN/metabolismo , Treonina-ARNt Ligasa/aislamiento & purificación
19.
J Proteome Res ; 11(9): 4488-98, 2012 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-22779694

RESUMEN

Selenoproteins, containing selenocysteine (Sec, U) as the 21st amino acid in the genetic code, are well conserved from bacteria to human, except yeast and higher plants that miss the Sec insertion machinery. Determination of Sec association is important to find substrates and to understand redox action of selenoproteins. While mass spectrometry (MS) has become a common and powerful tool to determine an amino acid sequence of a protein, identification of a protein sequence containing Sec was not easy using MS because of the limited stability of Sec in selenoproteins. Se has six naturally occurring isotopes, 74Se, 76Se, 77Se, 78Se, 8°Se, and 8²Se, and 8°Se is the most abundant isotope. These characteristics provide a good indicator for selenopeptides but make it difficult to detect selenopeptides using software analysis tools developed for common peptides. Thus, previous reports verified MS scans of selenopeptides by manual inspection. None of the fully automated algorithms have taken into account the isotopes of Se, leading to the wrong interpretation for selenopeptides. In this paper, we present an algorithm to determine monoisotopic masses of selenocysteine-containing polypeptides. Our algorithm is based on a theoretical model for an isotopic distribution of a selenopeptide, which regards peak intensities in an isotopic distribution as the natural abundances of C, H, N, O, S, and Se. Our algorithm uses two kinds of isotopic peak intensity ratios: one for two adjacent peaks and another for two distant peaks. It is shown that our algorithm for selenopeptides performs accurately, which was demonstrated with two LC-MS/MS data sets. Using this algorithm, we have successfully identified the Sec-Cys and Sec-Sec cross-linking of glutaredoxin 1 (GRX1) from mass spectra obtained by UPLC-ESI-q-TOF instrument.


Asunto(s)
Algoritmos , Espectrometría de Masas/métodos , Modelos Químicos , Péptidos/química , Selenocisteína/química , Selenoproteínas/química , Secuencia de Aminoácidos , Isótopos/química , Datos de Secuencia Molecular
20.
Mol Cell Proteomics ; 11(4): M111.010199, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22186716

RESUMEN

With great biological interest in post-translational modifications (PTMs), various approaches have been introduced to identify PTMs using MS/MS. Recent developments for PTM identification have focused on an unrestrictive approach that searches MS/MS spectra for all known and possibly even unknown types of PTMs at once. However, the resulting expanded search space requires much longer search time and also increases the number of false positives (incorrect identifications) and false negatives (missed true identifications), thus creating a bottleneck in high throughput analysis. Here we introduce MODa, a novel "multi-blind" spectral alignment algorithm that allows for fast unrestrictive PTM searches with no limitation on the number of modifications per peptide while featuring over an order of magnitude speedup in relation to existing approaches. We demonstrate the sensitivity of MODa on human shotgun proteomics data where it reveals multiple mutations, a wide range of modifications (including glycosylation), and evidence for several putative novel modifications. Based on the reported findings, we argue that the efficiency and sensitivity of MODa make it the first unrestrictive search tool with the potential to fully replace conventional restrictive identification of proteomics mass spectrometry data.


Asunto(s)
Algoritmos , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Proteómica/métodos , Bases de Datos de Proteínas , Células HEK293 , Humanos , Cristalino/metabolismo , Mutación , Proteínas/genética , Proteoma , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...