Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Mol Cell Proteomics ; 23(1): 100689, 2024 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-38043703

RESUMEN

Distinction of non-self from self is the major task of the immune system. Immunopeptidomics studies the peptide repertoire presented by the human leukocyte antigen (HLA) protein, usually on tissues. However, HLA peptides are also bound to plasma soluble HLA (sHLA), but little is known about their origin and potential for biomarker discovery in this readily available biofluid. Currently, immunopeptidomics is hampered by complex workflows and limited sensitivity, typically requiring several mL of plasma. Here, we take advantage of recent improvements in the throughput and sensitivity of mass spectrometry (MS)-based proteomics to develop a highly sensitive, automated, and economical workflow for HLA peptide analysis, termed Immunopeptidomics by Biotinylated Antibodies and Streptavidin (IMBAS). IMBAS-MS quantifies more than 5000 HLA class I peptides from only 200 µl of plasma, in just 30 min. Our technology revealed that the plasma immunopeptidome of healthy donors is remarkably stable throughout the year and strongly correlated between individuals with overlapping HLA types. Immunopeptides originating from diverse tissues, including the brain, are proportionately represented. We conclude that sHLAs are a promising avenue for immunology and potentially for precision oncology.


Asunto(s)
Neoplasias , Humanos , Estreptavidina , Medicina de Precisión , Antígenos de Histocompatibilidad Clase I/metabolismo , Antígenos HLA , Antígenos de Histocompatibilidad Clase II , Péptidos/metabolismo , Espectrometría de Masas , Anticuerpos
2.
Mol Cell Proteomics ; 23(2): 100713, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38184013

RESUMEN

Optimizing data-independent acquisition methods for proteomics applications often requires balancing spectral resolution and acquisition speed. Here, we describe a real-time full mass range implementation of the phase-constrained spectrum deconvolution method (ΦSDM) for Orbitrap mass spectrometry that increases mass resolving power without increasing scan time. Comparing its performance to the standard enhanced Fourier transformation signal processing revealed that the increased resolving power of ΦSDM is beneficial in areas of high peptide density and comes with a greater ability to resolve low-abundance signals. In a standard 2 h analysis of a 200 ng HeLa digest, this resulted in an increase of 16% in the number of quantified peptides. As the acquisition speed becomes even more important when using fast chromatographic gradients, we further applied ΦSDM methods to a range of shorter gradient lengths (21, 12, and 5 min). While ΦSDM improved identification rates and spectral quality in all tested gradients, it proved particularly advantageous for the 5 min gradient. Here, the number of identified protein groups and peptides increased by >15% in comparison to enhanced Fourier transformation processing. In conclusion, ΦSDM is an alternative signal processing algorithm for processing Orbitrap data that can improve spectral quality and benefit quantitative accuracy in typical proteomics experiments, especially when using short gradients.


Asunto(s)
Proteoma , Espectrometría de Masas en Tándem , Humanos , Proteoma/metabolismo , Espectrometría de Masas en Tándem/métodos , Péptidos/análisis , Células HeLa , Proteómica/métodos
3.
PLoS Biol ; 20(5): e3001636, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35576205

RESUMEN

The recent revolution in computational protein structure prediction provides folding models for entire proteomes, which can now be integrated with large-scale experimental data. Mass spectrometry (MS)-based proteomics has identified and quantified tens of thousands of posttranslational modifications (PTMs), most of them of uncertain functional relevance. In this study, we determine the structural context of these PTMs and investigate how this information can be leveraged to pinpoint potential regulatory sites. Our analysis uncovers global patterns of PTM occurrence across folded and intrinsically disordered regions. We found that this information can help to distinguish regulatory PTMs from those marking improperly folded proteins. Interestingly, the human proteome contains thousands of proteins that have large folded domains linked by short, disordered regions that are strongly enriched in regulatory phosphosites. These include well-known kinase activation loops that induce protein conformational changes upon phosphorylation. This regulatory mechanism appears to be widespread in kinases but also occurs in other protein families such as solute carriers. It is not limited to phosphorylation but includes ubiquitination and acetylation sites as well. Furthermore, we performed three-dimensional proximity analysis, which revealed examples of spatial coregulation of different PTM types and potential PTM crosstalk. To enable the community to build upon these first analyses, we provide tools for 3D visualization of proteomics data and PTMs as well as python libraries for data accession and processing.


Asunto(s)
Procesamiento Proteico-Postraduccional , Proteoma , Humanos , Espectrometría de Masas/métodos , Fosforilación , Proteómica/métodos
4.
Nat Methods ; 18(12): 1515-1523, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34824474

RESUMEN

Great advances have been made in mass spectrometric data interpretation for intact glycopeptide analysis. However, accurate identification of intact glycopeptides and modified saccharide units at the site-specific level and with fast speed remains challenging. Here, we present a glycan-first glycopeptide search engine, pGlyco3, to comprehensively analyze intact N- and O-glycopeptides, including glycopeptides with modified saccharide units. A glycan ion-indexing algorithm developed for glycan-first search makes pGlyco3 5-40 times faster than other glycoproteomic search engines without decreasing accuracy or sensitivity. By combining electron-based dissociation spectra, pGlyco3 integrates a dynamic programming-based algorithm termed pGlycoSite for site-specific glycan localization. Our evaluation shows that the site-specific glycan localization probabilities estimated by pGlycoSite are suitable to localize site-specific glycans. With pGlyco3, we confidently identified N-glycopeptides and O-mannose glycopeptides that were extensively modified by ammonia adducts in yeast samples. The freely available pGlyco3 is an accurate and flexible tool that can be used to identify glycopeptides and modified saccharide units.


Asunto(s)
Biología Computacional/métodos , Glicopéptidos/química , Proteoma , Proteómica/métodos , Algoritmos , Animales , Luciérnagas , Glicosilación , Células HEK293 , Humanos , Manosa/química , Polisacáridos/química , Probabilidad , Reproducibilidad de los Resultados , Saccharomyces cerevisiae , Schizosaccharomyces , Programas Informáticos
5.
Mol Syst Biol ; 19(9): e11503, 2023 09 12.
Artículo en Inglés | MEDLINE | ID: mdl-37602975

RESUMEN

Single-cell proteomics aims to characterize biological function and heterogeneity at the level of proteins in an unbiased manner. It is currently limited in proteomic depth, throughput, and robustness, which we address here by a streamlined multiplexed workflow using data-independent acquisition (mDIA). We demonstrate automated and complete dimethyl labeling of bulk or single-cell samples, without losing proteomic depth. Lys-N digestion enables five-plex quantification at MS1 and MS2 level. Because the multiplexed channels are quantitatively isolated from each other, mDIA accommodates a reference channel that does not interfere with the target channels. Our algorithm RefQuant takes advantage of this and confidently quantifies twice as many proteins per single cell compared to our previous work (Brunner et al, PMID 35226415), while our workflow currently allows routine analysis of 80 single cells per day. Finally, we combined mDIA with spatial proteomics to increase the throughput of Deep Visual Proteomics seven-fold for microdissection and four-fold for MS analysis. Applying this to primary cutaneous melanoma, we discovered proteomic signatures of cells within distinct tumor microenvironments, showcasing its potential for precision oncology.


Asunto(s)
Melanoma , Neoplasias Cutáneas , Humanos , Proteoma , Proteómica , Medicina de Precisión , Microambiente Tumoral
6.
Mol Cell Proteomics ; 20: 100171, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34737085

RESUMEN

Tandem mass spectrometry (MS/MS)-based phosphoproteomics is a powerful technology for global phosphorylation analysis. However, applying four computational pipelines to a typical mass spectrometry (MS)-based phosphoproteomic dataset from a human cancer study, we observed a large discrepancy among the reported phosphopeptide identification and phosphosite localization results, underscoring a critical need for benchmarking. While efforts have been made to compare performance of computational pipelines using data from synthetic phosphopeptides, evaluations involving real application data have been largely limited to comparing the numbers of phosphopeptide identifications due to the lack of appropriate evaluation metrics. We investigated three deep-learning-derived features as potential evaluation metrics: phosphosite probability, Delta RT, and spectral similarity. Predicted phosphosite probability is computed by MusiteDeep, which provides high accuracy as previously reported; Delta RT is defined as the absolute retention time (RT) difference between RTs observed and predicted by AutoRT; and spectral similarity is defined as the Pearson's correlation coefficient between spectra observed and predicted by pDeep2. Using a synthetic peptide dataset, we found that both Delta RT and spectral similarity provided excellent discrimination between correct and incorrect peptide-spectrum matches (PSMs) both when incorrect PSMs involved wrong peptide sequences and even when incorrect PSMs were caused by only incorrect phosphosite localization. Based on these results, we used all the three deep-learning-derived features as evaluation metrics to compare different computational pipelines on diverse set of phosphoproteomic datasets and showed their utility in benchmarking performance of the pipelines. The benchmark metrics demonstrated in this study will enable users to select computational pipelines and parameters for routine analysis of phosphoproteomics data and will offer guidance for developers to improve computational methods.


Asunto(s)
Aprendizaje Profundo , Fosfopéptidos/análisis , Animales , Benchmarking , Línea Celular , Humanos , Ratones , Fosforilación , Proteómica/métodos
7.
J Proteome Res ; 20(5): 2570-2582, 2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33821641

RESUMEN

In cross-linking mass spectrometry, the identification of cross-linked peptide pairs heavily relies on the ability of a database search engine to measure the similarities between experimental and theoretical MS/MS spectra. However, the lack of accurate ion intensities in theoretical spectra impairs the performance of search engines, in particular, on proteome scales. Here we introduce pDeepXL, a deep neural network to predict MS/MS spectra of cross-linked peptide pairs. To train pDeepXL, we used the transfer-learning technique because it facilitated the training with limited benchmark data of cross-linked peptide pairs. Test results on more than ten data sets showed that pDeepXL accurately predicted the spectra of both noncleavable DSS/BS3/Leiker cross-linked peptide pairs (>80% of predicted spectra have Pearson's r values higher than 0.9) and cleavable DSSO/DSBU cross-linked peptide pairs (>75% of predicted spectra have Pearson's r values higher than 0.9). pDeepXL also achieved the accurate prediction on unseen data sets using an online fine-tuning technique. Lastly, integrating pDeepXL into a database search engine increased the number of identified cross-link spectra by 18% on average.


Asunto(s)
Aprendizaje Profundo , Espectrometría de Masas en Tándem , Algoritmos , Redes Neurales de la Computación , Péptidos , Proteoma
8.
Anal Chem ; 93(14): 5815-5822, 2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33797898

RESUMEN

Spectrum prediction using deep learning has attracted a lot of attention in recent years. Although existing deep learning methods have dramatically increased the prediction accuracy, there is still considerable space for improvement, which is presently limited by the difference of fragmentation types or instrument settings. In this work, we use the few-shot learning method to fit the data online to make up for the shortcoming. The method is evaluated using ten data sets, where the instruments includes Velos, QE, Lumos, and Sciex, with collision energies being differently set. Experimental results show that few-shot learning can achieve higher prediction accuracy with almost negligible computing resources. For example, on the data set from a untrained instrument Sciex-6600, within about 10 s, the prediction accuracy is increased from 69.7% to 86.4%; on the CID (collision-induced dissociation) data set, the prediction accuracy of the model trained by HCD (higher energy collision dissociation) spectra is increased from 48.0% to 83.9%. It is also shown that, the method is not critical to data quality and is sufficiently efficient to fill the accuracy gap. The source code of pDeep3 is available at http://pfind.ict.ac.cn/software/pdeep3.

9.
Mol Cell Proteomics ; 18(4): 773-785, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30622160

RESUMEN

De novo peptide sequencing for large-scale proteomics remains challenging because of the lack of full coverage of ion series in tandem mass spectra. We developed a mirror protease of trypsin, acetylated LysargiNase (Ac-LysargiNase), with superior activity and stability. The mirror spectrum pairs derived from the Ac-LysargiNase and trypsin treated samples can generate full b and y ion series, which provide mutual complementarity of each other, and allow us to develop a novel algorithm, pNovoM, for de novo sequencing. Using pNovoM to sequence peptides of purified proteins, the accuracy of the sequence was close to 100%. More importantly, from a large-scale yeast proteome sample digested with trypsin and Ac-LysargiNase individually, 48% of all tandem mass spectra formed mirror spectrum pairs, 97% of which contained full coverage of ion series, resulting in precision de novo sequencing of full-length peptides by pNovoM. This enabled pNovoM to successfully sequence 21,249 peptides from 3,753 proteins and interpreted 44-152% more spectra than pNovo+ and PEAKS at a 5% FDR at the spectrum level. Moreover, the mirror protease strategy had an obvious advantage in sequencing long peptides. We believe that the combination of mirror protease strategy and pNovoM will be an effective approach for precision de novo sequencing on both single proteins and proteome samples.


Asunto(s)
Metaloproteasas/metabolismo , Péptidos/metabolismo , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Tripsina/metabolismo , Acetilación , Secuencia de Aminoácidos , Anticuerpos Monoclonales/metabolismo , Estabilidad de Enzimas , Péptidos/química , Proteoma/metabolismo
10.
Proteomics ; 20(21-22): e1900344, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32643271

RESUMEN

Since the launch of Chinese Human Proteome Project (CNHPP) and Clinical Proteomic Tumor Analysis Consortium (CPTAC), large-scale mass spectrometry (MS) based proteomic profiling of different kinds of human tumor samples have provided huge amount of valuable data for both basic and clinical researchers. Accurate prediction for tumor and non-tumor samples, as well as the tumor types has become a key step for biological and medical research, such as biomarker discovery, diagnosis, and monitoring of diseases. The traditional MS-based classification strategy mainly depends on the identification and quantification results of MS data, which has some inherent limitations, such as the low identification rate of MS data. Here, a deep learning-based tumor classifier directly using MS raw data is proposed, which is independent of the identification and quantification results of MS data. The potential precursors with intensities and retention times from MS data as input is first detected and extracted. Then, a deep learning-based classifier is trained, which can accurately distinguish between the tumor and non-tumor samples. Finally, it is demonstrated the deep learning-based classifier has a good performance compared with other machine learning methods and may help researchers find the potential biomarkers which are likely to be missed by the traditional strategy.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Proteómica , Humanos , Espectrometría de Masas , Proteoma
11.
Proteomics ; 20(21-22): e1900335, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32939979

RESUMEN

Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.


Asunto(s)
Aprendizaje Profundo , Proteómica , Algoritmos , Procesamiento Proteico-Postraduccional , Espectrometría de Masas en Tándem
12.
Bioinformatics ; 35(14): i183-i190, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510687

RESUMEN

MOTIVATION: De novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing. RESULTS: In order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark datasets from six diverse species, pNovo 3 recalled 29-102% more correct spectra, and the precision was 11-89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21-50% more spectra on the nine datasets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences. AVAILABILITY AND IMPLEMENTATION: pNovo 3 can be freely downloaded from http://pfind.ict.ac.cn/software/pNovo/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos , Proteómica , Análisis de Secuencia de Proteína , Algoritmos , Programas Informáticos , Espectrometría de Masas en Tándem
13.
J Proteome Res ; 18(7): 2747-2758, 2019 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-31244209

RESUMEN

As the de facto validation method in mass spectrometry-based proteomics, the target-decoy approach determines a threshold to estimate the false discovery rate and then filters those identifications beyond the threshold. However, the incorrect identifications within the threshold are still unknown and further validation methods are needed. In this study, we characterized a framework of validation and investigated a number of common and novel validation methods. We first defined the accuracy of a validation method by its false-positive rate (FPR) and false-negative rate (FNR) and, further, proved that a validation method with lower FPR and FNR led to identifications with higher sensitivity and precision. Then we proposed a validation method named pValid that incorporated an open database search and a theoretical spectrum prediction strategy via a machine-learning technology. pValid was compared with four common validation methods as well as a synthetic peptide validation method. Tests on three benchmark data sets indicated that pValid had an FPR of 0.03% and an FNR of 1.79% on average, both superior to the other four common validation methods. Tests on a synthetic peptide data set also indicated that the FPR and FNR of pValid were better than those of the synthetic peptide validation method. Tests on a large-scale human proteome data set indicated that pValid successfully flagged the highest number of incorrect identifications among all five methods. Further considering its cost-effectiveness, pValid has the potential to be a feasible validation tool for peptide identification.


Asunto(s)
Péptidos/análisis , Proteómica/métodos , Estudios de Validación como Asunto , Humanos , Proteoma/análisis , Reproducibilidad de los Resultados , Error Científico Experimental , Sensibilidad y Especificidad
14.
Anal Chem ; 91(15): 9724-9731, 2019 08 06.
Artículo en Inglés | MEDLINE | ID: mdl-31283184

RESUMEN

In the past decade, tandem mass spectrometry (MS/MS)-based bottom-up proteomics has become the method of choice for analyzing post-translational modifications (PTMs) in complex mixtures. The key to the identification of the PTM-containing peptides and localization of the PTM-modified residues is to measure the similarities between the theoretical spectra and the experimental ones. An accurate prediction of the theoretical MS/MS spectra of the modified peptides will improve the similarity measurement. Here, we proposed the deep-learning-based pDeep2 model for PTMs. We used the transfer learning technique to train pDeep2, facilitating the training with a limited scale of benchmark PTM data. Using the public synthetic PTM data sets, including the synthetic phosphopeptides and 21 synthetic PTMs from ProteomeTools, we showed that the model trained by transfer learning was accurate (>80% Pearson correlation coefficients were higher than 0.9), and was significantly better than the models trained without transfer learning. We also showed that accurate prediction of the fragment ion intensities of the PTM neutral loss, for example, the phosphoric acid loss (-98 Da) of the phosphopeptide, will improve the discriminating power to distinguish the true phosphorylated residue from its adjacent candidate sites. pDeep2 is available at https://github.com/pFindStudio/pDeep/tree/master/pDeep2 .

15.
J Proteome Res ; 17(1): 119-128, 2018 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-29130300

RESUMEN

MS-based de novo peptide sequencing has been improved remarkably with significant development of mass-spectrometry and computational approaches but still lacks quality-control methods. Here we proposed a novel algorithm pSite to evaluate the confidence of each amino acid rather than the full-length peptides obtained by de novo peptide sequencing. A semi-supervised learning approach was used to discriminate correct amino acids from random one; then, an expectation-maximization algorithm was used to adaptively control the false amino-acid rate (FAR). On three test data sets, pSite recalled 86% more amino acids on average than PEAKS at the FAR of 5%. pSite also performed superiorly on the modification site localization problem, which is essentially a special case of amino acid confidence evaluation. On three phosphopeptide data sets, at the false localization rate of 1%, the average recall of pSite was 91% while those of Ascore and phosphoRS were 64 and 63%, respectively. pSite covered 98% of Ascore and phosphoRS results and contributed 21% more phosphorylation sites. Further analyses show that the use of distinct fragmentation features in high-resolution MS/MS spectra, such as neutral loss ions, played an important role in improving the precision of pSite. In summary, the effective and universal model together with the extensive use of spectral information makes pSite an excellent quality control tool for both de novo peptide sequencing and modification site localization.


Asunto(s)
Sitios de Unión , Procesamiento Proteico-Postraduccional , Análisis de Secuencia de Proteína/métodos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Aminoácidos , Fosforilación , Control de Calidad
16.
J Proteome Res ; 16(2): 645-654, 2017 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-28019094

RESUMEN

De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.


Asunto(s)
Secuencia de Aminoácidos/genética , Péptidos/genética , Procesamiento Proteico-Postraduccional/genética , Algoritmos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína , Programas Informáticos , Espectrometría de Masas en Tándem
17.
Anal Chem ; 89(23): 12690-12697, 2017 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-29125736

RESUMEN

In tandem mass spectrometry (MS/MS)-based proteomics, search engines rely on comparison between an experimental MS/MS spectrum and the theoretical spectra of the candidate peptides. Hence, accurate prediction of the theoretical spectra of peptides appears to be particularly important. Here, we present pDeep, a deep neural network-based model for the spectrum prediction of peptides. Using the bidirectional long short-term memory (BiLSTM), pDeep can predict higher-energy collisional dissociation, electron-transfer dissociation, and electron-transfer and higher-energy collision dissociation MS/MS spectra of peptides with >0.9 median Pearson correlation coefficients. Further, we showed that intermediate layer of the neural network could reveal physicochemical properties of amino acids, for example the similarities of fragmentation behaviors between amino acids. We also showed the potential of pDeep to distinguish extremely similar peptides (peptides that contain isobaric amino acids, for example, GG = N, AG = Q, or even I = L), which were very difficult to distinguish using traditional search engines.


Asunto(s)
Aprendizaje Profundo , Péptidos/química , Espectrometría de Masas en Tándem , Bases de Datos de Proteínas/estadística & datos numéricos , Proteoma/química , Proteómica/métodos , Proteómica/estadística & datos numéricos , Espectrometría de Masas en Tándem/estadística & datos numéricos
18.
Anal Chem ; 88(6): 3082-90, 2016 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-26844380

RESUMEN

There has been tremendous progress in top-down proteomics (TDP) in the past 5 years, particularly in intact protein separation and high-resolution mass spectrometry. However, bioinformatics to deal with large-scale mass spectra has lagged behind, in both algorithmic research and software development. In this study, we developed pTop 1.0, a novel software tool to significantly improve the accuracy and efficiency of mass spectral data analysis in TDP. The precursor mass offers crucial clues to infer the potential post-translational modifications co-occurring on the protein, the reliability of which relies heavily on its mass accuracy. Concentrating on detecting the precursors more accurately, a machine-learning model incorporating a variety of spectral features was trained online in pTop via a support vector machine (SVM). pTop employs the sequence tags extracted from the MS/MS spectra and a dynamic programming algorithm to accelerate the search speed, especially for those spectra with multiple post-translational modifications. We tested pTop on three publicly available data sets and compared it with ProSight and MS-Align+ in terms of its recall, precision, running time, and so on. The results showed that pTop can, in general, outperform ProSight and MS-Align+. pTop recalled 22% more correct precursors, although it exported 30% fewer precursors than Xtract (in ProSight) from a human histone data set. The running speed of pTop was about 1 to 2 orders of magnitude faster than that of MS-Align+. This algorithmic advancement in pTop, including both accuracy and speed, will inspire the development of other similar software to analyze the mass spectra from the entire proteins.


Asunto(s)
Bases de Datos de Proteínas , Almacenamiento y Recuperación de la Información , Proteínas/análisis , Algoritmos , Aprendizaje Automático , Programas Informáticos
19.
Bioinformatics ; 31(20): 3249-53, 2015 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-26076724

RESUMEN

MOTIVATION: Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes. RESULTS: To quantitatively model this problem, we theoretically analyze the subgroup false discovery rates of annotated and novel peptides. Our analysis shows that the annotation completeness ratio of a genome is the dominant factor influencing the subgroup FDR of novel peptides. Experimental results on two real datasets of Escherichia coli and Mycobacterium tuberculosis support our conjecture. CONTACT: yfu@amss.ac.cn or xupingghy@gmail.com or smhe@ict.ac.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Péptidos/química , Proteómica , Escherichia coli/genética , Anotación de Secuencia Molecular , Mycobacterium tuberculosis/genética
20.
Anal Chem ; 86(11): 5286-94, 2014 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-24799117

RESUMEN

In relative protein abundance determination from peptide intensities recorded in full mass scans, a major complication that affects quantitation accuracy is signal interference from coeluting ions of similar m/z values. Here, we present pQuant, a quantitation software tool that solves this problem. pQuant detects interference signals, identifies for each peptide a pair of least interfered isotopic chromatograms: one for the light and one for the heavy isotope-labeled peptide. On the basis of these isotopic pairs, pQuant calculates the relative heavy/light peptide ratios along with their 99.75% confidence intervals (CIs). From the peptides ratios and their CIs, pQuant estimates the protein ratios and associated CIs by kernel density estimation. We tested pQuant, Census and MaxQuant on data sets obtained from mixtures (at varying mixing ratios from 10:1 to 1:10) of light- and heavy-SILAC labeled HeLa cells or (14)N- and (15)N-labeled Escherichia coli cells. pQuant quantitated more peptides with better accuracy than Census and MaxQuant in all 14 data sets. On the SILAC data sets, the nonquantified "NaN" (not a number) ratios generated by Census, MaxQuant, and pQuant accounted for 2.5-10.7%, 1.8-2.7%, and 0.01-0.5% of all ratios, respectively. On the (14)N/(15)N data sets, which cannot be quantified by MaxQuant, Census and pQuant produced 0.9-10.0% and 0.3-2.9% NaN ratios, respectively. Excluding these NaN results, the standard deviations of the numerical ratios calculated by Census or MaxQuant are 30-100% larger than those by pQuant. These results show that pQuant outperforms Census and MaxQuant in SILAC and (15)N-based quantitation.


Asunto(s)
Péptidos/química , Proteínas/química , Escherichia coli/química , Células HeLa/química , Humanos , Isótopos , Espectrometría de Masas , Isótopos de Nitrógeno , Radioisótopos de Nitrógeno , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA