Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Mol Cell Proteomics ; 23(2): 100713, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38184013

RESUMO

Optimizing data-independent acquisition methods for proteomics applications often requires balancing spectral resolution and acquisition speed. Here, we describe a real-time full mass range implementation of the phase-constrained spectrum deconvolution method (ΦSDM) for Orbitrap mass spectrometry that increases mass resolving power without increasing scan time. Comparing its performance to the standard enhanced Fourier transformation signal processing revealed that the increased resolving power of ΦSDM is beneficial in areas of high peptide density and comes with a greater ability to resolve low-abundance signals. In a standard 2 h analysis of a 200 ng HeLa digest, this resulted in an increase of 16% in the number of quantified peptides. As the acquisition speed becomes even more important when using fast chromatographic gradients, we further applied ΦSDM methods to a range of shorter gradient lengths (21, 12, and 5 min). While ΦSDM improved identification rates and spectral quality in all tested gradients, it proved particularly advantageous for the 5 min gradient. Here, the number of identified protein groups and peptides increased by >15% in comparison to enhanced Fourier transformation processing. In conclusion, ΦSDM is an alternative signal processing algorithm for processing Orbitrap data that can improve spectral quality and benefit quantitative accuracy in typical proteomics experiments, especially when using short gradients.


Assuntos
Proteoma , Espectrometria de Massas em Tandem , Humanos , Proteoma/metabolismo , Espectrometria de Massas em Tandem/métodos , Peptídeos/análise , Células HeLa , Proteômica/métodos
2.
Mol Cell Proteomics ; 23(1): 100689, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38043703

RESUMO

Distinction of non-self from self is the major task of the immune system. Immunopeptidomics studies the peptide repertoire presented by the human leukocyte antigen (HLA) protein, usually on tissues. However, HLA peptides are also bound to plasma soluble HLA (sHLA), but little is known about their origin and potential for biomarker discovery in this readily available biofluid. Currently, immunopeptidomics is hampered by complex workflows and limited sensitivity, typically requiring several mL of plasma. Here, we take advantage of recent improvements in the throughput and sensitivity of mass spectrometry (MS)-based proteomics to develop a highly sensitive, automated, and economical workflow for HLA peptide analysis, termed Immunopeptidomics by Biotinylated Antibodies and Streptavidin (IMBAS). IMBAS-MS quantifies more than 5000 HLA class I peptides from only 200 µl of plasma, in just 30 min. Our technology revealed that the plasma immunopeptidome of healthy donors is remarkably stable throughout the year and strongly correlated between individuals with overlapping HLA types. Immunopeptides originating from diverse tissues, including the brain, are proportionately represented. We conclude that sHLAs are a promising avenue for immunology and potentially for precision oncology.


Assuntos
Neoplasias , Humanos , Estreptavidina , Medicina de Precisão , Antígenos de Histocompatibilidade Classe I/metabolismo , Antígenos HLA , Antígenos de Histocompatibilidade Classe II , Peptídeos/metabolismo , Espectrometria de Massas , Anticorpos
3.
Mol Syst Biol ; 19(9): e11503, 2023 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-37602975

RESUMO

Single-cell proteomics aims to characterize biological function and heterogeneity at the level of proteins in an unbiased manner. It is currently limited in proteomic depth, throughput, and robustness, which we address here by a streamlined multiplexed workflow using data-independent acquisition (mDIA). We demonstrate automated and complete dimethyl labeling of bulk or single-cell samples, without losing proteomic depth. Lys-N digestion enables five-plex quantification at MS1 and MS2 level. Because the multiplexed channels are quantitatively isolated from each other, mDIA accommodates a reference channel that does not interfere with the target channels. Our algorithm RefQuant takes advantage of this and confidently quantifies twice as many proteins per single cell compared to our previous work (Brunner et al, PMID 35226415), while our workflow currently allows routine analysis of 80 single cells per day. Finally, we combined mDIA with spatial proteomics to increase the throughput of Deep Visual Proteomics seven-fold for microdissection and four-fold for MS analysis. Applying this to primary cutaneous melanoma, we discovered proteomic signatures of cells within distinct tumor microenvironments, showcasing its potential for precision oncology.


Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Proteoma , Proteômica , Medicina de Precisão , Microambiente Tumoral
4.
Nat Commun ; 13(1): 7539, 2022 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-36477196

RESUMO

Large-scale intact glycopeptide identification has been advanced by software tools. However, tools for quantitative analysis remain lagging behind, which hinders exploring the differential site-specific glycosylation. Here, we report pGlycoQuant, a generic tool for both primary and tandem mass spectrometry-based intact glycopeptide quantitation. pGlycoQuant advances in glycopeptide matching through applying a deep learning model that reduces missing values by 19-89% compared with Byologic, MSFragger-Glyco, Skyline, and Proteome Discoverer, as well as a Match In Run algorithm for more glycopeptide coverage, greatly expanding the quantitative function of several widely used search engines, including pGlyco 2.0, pGlyco3, Byonic and MSFragger-Glyco. Further application of pGlycoQuant to the N-glycoproteomic study in three different metastatic HCC cell lines quantifies 6435 intact N-glycopeptides and, together with in vitro molecular biology experiments, illustrates site 979-core fucosylation of L1CAM as a potential regulator of HCC metastasis. We expected further applications of the freely available pGlycoQuant in glycoproteomic studies.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Biologia Molecular
5.
Nat Commun ; 13(1): 7238, 2022 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-36433986

RESUMO

Machine learning and in particular deep learning (DL) are increasingly important in mass spectrometry (MS)-based proteomics. Recent DL models can predict the retention time, ion mobility and fragment intensities of a peptide just from the amino acid sequence with good accuracy. However, DL is a very rapidly developing field with new neural network architectures frequently appearing, which are challenging to incorporate for proteomics researchers. Here we introduce AlphaPeptDeep, a modular Python framework built on the PyTorch DL library that learns and predicts the properties of peptides ( https://github.com/MannLabs/alphapeptdeep ). It features a model shop that enables non-specialists to create models in just a few lines of code. AlphaPeptDeep represents post-translational modifications in a generic manner, even if only the chemical composition is known. Extensive use of transfer learning obviates the need for large data sets to refine models for particular experimental conditions. The AlphaPeptDeep models for predicting retention time, collisional cross sections and fragment intensities are at least on par with existing tools. Additional sequence-based properties can also be predicted by AlphaPeptDeep, as demonstrated with a HLA peptide prediction model to improve HLA peptide identification for data-independent acquisition ( https://github.com/MannLabs/PeptDeep-HLA ).


Assuntos
Aprendizado Profundo , Proteômica , Proteômica/métodos , Peptídeos/química , Sequência de Aminoácidos , Redes Neurais de Computação
6.
Brain Behav ; 12(2): e2470, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35089644

RESUMO

BACKGROUND: High mobility group box 1 (HMGB1) released by neurons and microglia was demonstrated to be an important mediator in depressive-like behaviors induced by chronic unpredictable mild stress (CUMS), which could lead to the imbalance of two different metabolic approaches in kynurenine pathway (KP), thus enhancing glutamate transmission and exacerbating depressive-like behaviors. Evidence showed that HMGB1 signaling might be regulated by Connexin (Cx) 36 in inflammatory diseases of central nervous system (CNS). Our study aimed to further explore the role of Cx36 in depressive-like behaviors and its relationship with HMGB1. METHODS: After 4-week chronic stress, behavioral tests were conducted to evaluate depressive-like behaviors, including sucrose preference test (SPT), tail suspension test (TST), forced swimming test (FST), and open field test (OFT). Western blot analysis and immunofluorescence staining were used to observe the expression and location of Cx36. Enzyme-linked immunosorbent assay (ELISA) was adopted to detect the concentrations of inflammatory cytokines. And the excitability and inward currents of hippocampal neurons were recorded by whole-cell patch clamping. RESULTS: The expression of Cx36 was significantly increased in hippocampal neurons of mice exposed to CUMS, while treatment with glycyrrhizinic acid (GZA) or quinine could both down-regulate Cx36 and alleviate depressive-like behaviors. The proinflammatory cytokines like HMGB1, tumor necrosis factor alpha (TNF-α), and interleukin-1ß (IL-1ß) were all elevated by CUMS, and application of GZA and quinine could decrease them. In addition, the enhanced excitability and inward currents of hippocampal neurons induced by lipopolysaccharide (LPS) could be reduced by either GZA or quinine. CONCLUSIONS: Inhibition of Cx36 in hippocampal neurons might attenuates HMGB1-mediated depressive-like behaviors induced by CUMS through down-regulation of the proinflammatory cytokines and reduction of the excitability and intracellular ion overload.


Assuntos
Proteína HMGB1 , Animais , Antidepressivos/farmacologia , Comportamento Animal , Conexinas/metabolismo , Citocinas/metabolismo , Depressão/tratamento farmacológico , Depressão/metabolismo , Modelos Animais de Doenças , Hipocampo/metabolismo , Camundongos , Quinina/metabolismo , Estresse Psicológico/complicações , Estresse Psicológico/metabolismo , Proteína delta-2 de Junções Comunicantes
7.
Mol Cell Proteomics ; 20: 100171, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34737085

RESUMO

Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson's correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.


Assuntos
Aprendizado Profundo , Fosfopeptídeos/análise , Animais , Benchmarking , Linhagem Celular , Humanos , Camundongos , Fosforilação , Proteômica/métodos
8.
J Proteome Res ; 20(5): 2570-2582, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33821641

RESUMO

In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.


Assuntos
Aprendizado Profundo , Espectrometria de Massas em Tandem , Algoritmos , Redes Neurais de Computação , Peptídeos , Proteoma
9.
Front Genet ; 12: 790888, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34976022

RESUMO

Breast cancer (BRCA) is a heterogeneous malignancy closely related to the tumor microenvironment (TME) cell infiltration. N6-methyladenosine (m6A) modification of mRNA plays a crucial regulator in regulating the immune microenvironment of BRCA. Immunotherapy represents a paradigm shift in BRCA treatment; however, lack of an appropriate approach for treatment evaluation is a significant issue in this field. In this study, we attempted to establish a prognostic signature of BRCA based on m6A-related immune genes and to investigate the potential association between prognosis and immunotherapy. We comprehensively evaluated the m6A modification patterns of BRCA tissues and non-tumor tissues from The Cancer Genome Atlas and the modification patterns with TME cell-infiltrating characteristics. Overall, 1,977 TME-related genes were identified in the literature. Based on LASSO and Cox regression analyses, the m6A-related immune score (m6A-IS) was established to characterize the TME of BRCA and predict prognosis and efficacy associated with immunotherapy. We developed an m6A-IS to effectively predict immune infiltration and the prognosis of patients with BRCA. The prognostic score model represented robust predictive performance in both the training and validation cohorts. The low-m6A-IS group was characterized by enhanced antigen presentation and improved immune checkpoint expression, further indicating sensitivity to immunotherapy. Compared with the patients in the high-score group, the overall survival rate after treatment in the low-score group was significantly higher in the testing and validation cohorts. We constructed an m6A-IS system to examine the ability of the m6A signature to predict the infiltration of immune cells of the TME in BRCA, and the m6A-IS system acted as an independent prognostic biomarker that predicts the response of patients with BRCA in immunotherapy.

10.
Proteomics ; 20(21-22): e1900344, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32643271

RESUMO

Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.


Assuntos
Aprendizado Profundo , Neoplasias , Proteômica , Humanos , Espectrometria de Massas , Proteoma
11.
Bioinformatics ; 35(14): i183-i190, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510687

RESUMO

MOTIVATION: De novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing. RESULTS: In order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark datasets from six diverse species, pNovo 3 recalled 29-102% more correct spectra, and the precision was 11-89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21-50% more spectra on the nine datasets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences. AVAILABILITY AND IMPLEMENTATION: pNovo 3 can be freely downloaded from http://pfind.ict.ac.cn/software/pNovo/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos , Proteômica , Análise de Sequência de Proteína , Algoritmos , Software , Espectrometria de Massas em Tandem
12.
Nat Commun ; 10(1): 3404, 2019 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-31363125

RESUMO

We describe pLink 2, a search engine with higher speed and reliability for proteome-scale identification of cross-linked peptides. With a two-stage open search strategy facilitated by fragment indexing, pLink 2 is ~40 times faster than pLink 1 and 3~10 times faster than Kojak. Furthermore, using simulated datasets, synthetic datasets, 15N metabolically labeled datasets, and entrapment databases, four analysis methods were designed to evaluate the credibility of ten state-of-the-art search engines. This systematic evaluation shows that pLink 2 outperforms these methods in precision and sensitivity, especially at proteome scales. Lastly, re-analysis of four published proteome-scale cross-linking datasets with pLink 2 required only a fraction of the time used by pLink 1, with up to 27% more cross-linked residue pairs identified. pLink 2 is therefore an efficient and reliable tool for cross-linking mass spectrometry analysis, and the systematic evaluation methods described here will be useful for future software development.


Assuntos
Peptídeos/química , Proteoma/química , Ferramenta de Busca/métodos , Algoritmos , Animais , Bases de Dados de Proteínas , Humanos , Proteômica , Software
13.
J Proteome Res ; 18(7): 2747-2758, 2019 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-31244209

RESUMO

As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.


Assuntos
Peptídeos/análise , Proteômica/métodos , Estudos de Validação como Assunto , Humanos , Proteoma/análise , Reprodutibilidade dos Testes , Erro Científico Experimental , Sensibilidade e Especificidade
14.
Mol Cell Proteomics ; 18(4): 773-785, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30622160

RESUMO

De novo peptide sequencing for large-scale proteomics remains challenging because of the lack of full coverage of ion series in tandem mass spectra. We developed a mirror protease of trypsin, acetylated LysargiNase (Ac-LysargiNase), with superior activity and stability. The mirror spectrum pairs derived from the Ac-LysargiNase and trypsin treated samples can generate full b and y ion series, which provide mutual complementarity of each other, and allow us to develop a novel algorithm, pNovoM, for de novo sequencing. Using pNovoM to sequence peptides of purified proteins, the accuracy of the sequence was close to 100%. More importantly, from a large-scale yeast proteome sample digested with trypsin and Ac-LysargiNase individually, 48% of all tandem mass spectra formed mirror spectrum pairs, 97% of which contained full coverage of ion series, resulting in precision de novo sequencing of full-length peptides by pNovoM. This enabled pNovoM to successfully sequence 21,249 peptides from 3,753 proteins and interpreted 44-152% more spectra than pNovo+ and PEAKS at a 5% FDR at the spectrum level. Moreover, the mirror protease strategy had an obvious advantage in sequencing long peptides. We believe that the combination of mirror protease strategy and pNovoM will be an effective approach for precision de novo sequencing on both single proteins and proteome samples.


Assuntos
Metaloproteases/metabolismo , Peptídeos/metabolismo , Proteômica/métodos , Análise de Sequência de Proteína/métodos , Tripsina/metabolismo , Acetilação , Sequência de Aminoácidos , Anticorpos Monoclonais/metabolismo , Estabilidade Enzimática , Peptídeos/química , Proteoma/metabolismo
15.
Anal Chem ; 89(23): 12690-12697, 2017 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-29125736

RESUMO

In tandem mass spectrometry (MS/MS)-based proteomics, search engines rely on comparison between an experimental MS/MS spectrum and the theoretical spectra of the candidate peptides. Hence, accurate prediction of the theoretical spectra of peptides appears to be particularly important. Here, we present pDeep, a deep neural network-based model for the spectrum prediction of peptides. Using the bidirectional long short-term memory (BiLSTM), pDeep can predict higher-energy collisional dissociation, electron-transfer dissociation, and electron-transfer and higher-energy collision dissociation MS/MS spectra of peptides with >0.9 median Pearson correlation coefficients. Further, we showed that intermediate layer of the neural network could reveal physicochemical properties of amino acids, for example the similarities of fragmentation behaviors between amino acids. We also showed the potential of pDeep to distinguish extremely similar peptides (peptides that contain isobaric amino acids, for example, GG = N, AG = Q, or even I = L), which were very difficult to distinguish using traditional search engines.


Assuntos
Aprendizado Profundo , Peptídeos/química , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas/estatística & dados numéricos , Proteoma/química , Proteômica/métodos , Proteômica/estatística & dados numéricos , Espectrometria de Massas em Tandem/estatística & dados numéricos
16.
J Proteome Res ; 16(2): 645-654, 2017 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-28019094

RESUMO

De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.


Assuntos
Sequência de Aminoácidos/genética , Peptídeos/genética , Processamento de Proteína Pós-Traducional/genética , Algoritmos , Bases de Dados de Proteínas , Análise de Sequência de Proteína , Software , Espectrometria de Massas em Tandem
17.
Bioinformatics ; 31(20): 3249-53, 2015 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-26076724

RESUMO

MOTIVATION: Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes. RESULTS: To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides. Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides. Experimental results on two real datasets of Escherichia coli and Mycobacterium tuberculosis support our conjecture. CONTACT: yfu@amss.ac.cn or xupingghy@gmail.com or smhe@ict.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos/química , Proteômica , Escherichia coli/genética , Anotação de Sequência Molecular , Mycobacterium tuberculosis/genética
18.
Anal Chem ; 86(11): 5286-94, 2014 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-24799117

RESUMO

In relative protein abundance determination from peptide intensities recorded in full mass scans, a major complication that affects quantitation accuracy is signal interference from coeluting ions of similar m/z values. Here, we present pQuant, a quantitation software tool that solves this problem. pQuant detects interference signals, identifies for each peptide a pair of least interfered isotopic chromatograms: one for the light and one for the heavy isotope-labeled peptide. On the basis of these isotopic pairs, pQuant calculates the relative heavy/light peptide ratios along with their 99.75% confidence intervals (CIs). From the peptides ratios and their CIs, pQuant estimates the protein ratios and associated CIs by kernel density estimation. We tested pQuant, Census and MaxQuant on data sets obtained from mixtures (at varying mixing ratios from 10:1 to 1:10) of light- and heavy-SILAC labeled HeLa cells or (14)N- and (15)N-labeled Escherichia coli cells. pQuant quantitated more peptides with better accuracy than Census and MaxQuant in all 14 data sets. On the SILAC data sets, the nonquantified "NaN" (not a number) ratios generated by Census, MaxQuant, and pQuant accounted for 2.5-10.7%, 1.8-2.7%, and 0.01-0.5% of all ratios, respectively. On the (14)N/(15)N data sets, which cannot be quantified by MaxQuant, Census and pQuant produced 0.9-10.0% and 0.3-2.9% NaN ratios, respectively. Excluding these NaN results, the standard deviations of the numerical ratios calculated by Census or MaxQuant are 30-100% larger than those by pQuant. These results show that pQuant outperforms Census and MaxQuant in SILAC and (15)N-based quantitation.


Assuntos
Peptídeos/química , Proteínas/química , Escherichia coli/química , Células HeLa/química , Humanos , Isótopos , Espectrometria de Massas , Isótopos de Nitrogênio , Radioisótopos de Nitrogênio , Software
19.
J Proteome Res ; 12(2): 615-25, 2013 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-23272783

RESUMO

De novo peptide sequencing is the only tool for extracting peptide sequences directly from tandem mass spectrometry (MS) data without any protein database. However, neither the accuracy nor the efficiency of de novo sequencing has been satisfactory, mainly due to incomplete fragmentation information in experimental spectra. Recent advancement in MS technology has enabled acquisition of higher energy collisional dissociation (HCD) and electron transfer dissociation (ETD) spectra of the same precursor. These spectra contain complementary fragmentation information and can be collected with high resolution and high mass accuracy. Taking these advantages, we have developed a new algorithm called pNovo+, which greatly improves the accuracy and speed of de novo sequencing. On tryptic peptides, 86% of the topmost candidate sequences deduced by pNovo+ from HCD + ETD spectral pairs matched the database search results, and the success rate reached 95% if the top three candidates were included, which was much higher than using only HCD (87%) or only ETD spectra (57%). On Asp-N, Glu-C, or Elastase digested peptides, 69-87% of the HCD + ETD spectral pairs were correctly identified by pNovo+ among the topmost candidates, or 84-95% among the top three. On average, it takes pNovo+ only 0.018 s to extract the sequence from a spectrum or spectral pair on a common personal computer. This is more than three times as fast as other de novo sequencing programs. The increase of speed is mainly due to pDAG, a component algorithm of pNovo+. pDAG finds the k longest paths in a directed acyclic graph without the antisymmetry restriction. We have verified that the antisymmetry restriction is unnecessary for high resolution, high mass accuracy data. The extensive use of HCD and ETD spectral information and the pDAG algorithm make pNovo+ an excellent de novo sequencing tool.


Assuntos
Algoritmos , Peptídeos/isolamento & purificação , Análise de Sequência de Proteína/normas , Espectrometria de Massas em Tandem/normas , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Humanos , Metaloendopeptidases/química , Dados de Sequência Molecular , Elastase Pancreática/química , Peptídeos/química , Sensibilidade e Especificidade , Análise de Sequência de Proteína/métodos , Serina Endopeptidases/química , Tripsina/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA