Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 40(Supplement_1): i410-i417, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940129

RESUMO

MOTIVATION: One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. RESULTS: To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.


Assuntos
Bases de Dados de Proteínas , Peptídeos , Peptídeos/química , Aprendizado de Máquina , Espectrometria de Massas/métodos , Algoritmos , Análise de Sequência de Proteína/métodos , Espectrometria de Massas em Tandem/métodos
2.
J Proteome Res ; 23(6): 1907-1914, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38687997

RESUMO

Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.


Assuntos
Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/análise , Peptídeos/química , Aprendizado de Máquina , Humanos , Algoritmos , Software
3.
Proteomics ; 24(8): e2300084, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38380501

RESUMO

Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.


Assuntos
Algoritmos , Peptídeos , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/análise , Proteômica/métodos
4.
J Proteome Res ; 22(2): 577-584, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36633229

RESUMO

The first step in the analysis of protein tandem mass spectrometry data typically involves searching the observed spectra against a protein database. During database search, the search engine must digest the proteins in the database into peptides, subject to digestion rules that are under user control. The choice of these digestion parameters, as well as selection of post-translational modifications (PTMs), can dramatically affect the size of the search space and hence the statistical power of the search. The Tide search engine separates the creation of the peptide index from the database search step, thereby saving time by allowing a peptide index to be reused in multiple searches. Here we describe an improved implementation of the indexing component of Tide that consumes around four times less resources (CPU and RAM) than the previous version and can generate arbitrarily large peptide databases, limited by only the amount of available disk space. We use this improved implementation to explore the relationship between database size and the parameters controlling digestion and PTMs, as well as database size and statistical power. Our results can help guide practitioners in proper selection of these important parameters.


Assuntos
Algoritmos , Peptídeos , Peptídeos/química , Proteínas/metabolismo , Ferramenta de Busca , Bases de Dados de Proteínas , Software
5.
Bioinformatics ; 39(2)2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36702456

RESUMO

MOTIVATION: Interpretation of newly acquired mass spectrometry data can be improved by identifying, from an online repository, previous mass spectrometry runs that resemble the new data. However, this retrieval task requires computing the similarity between an arbitrary pair of mass spectrometry runs. This is particularly challenging for runs acquired using different experimental protocols. RESULTS: We propose a method, MS1Connect, that calculates the similarity between a pair of runs by examining only the intact peptide (MS1) scans, and we show evidence that the MS1Connect score is accurate. Specifically, we show that MS1Connect outperforms several baseline methods on the task of predicting the species from which a given proteomics sample originated. In addition, we show that MS1Connect scores are highly correlated with similarities computed from fragment (MS2) scans, even though these data are not used by MS1Connect. AVAILABILITY AND IMPLEMENTATION: The MS1Connect software is available at https://github.com/bmx8177/MS1Connect. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos , Software , Espectrometria de Massas , Peptídeos/química , Proteômica/métodos
6.
Methods Mol Biol ; 2426: 25-34, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36308683

RESUMO

Target-decoy competition has been commonly used for over a decade to control the false discovery rate when analyzing tandem mass spectrometry (MS/MS) data. We recently developed a framework that uses multiple decoys to increase the number of detected peptides in MS/MS data. Here, we present a pipeline of Apache licensed, open-source software that allows the user to readily take advantage of our framework.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/química , Software , Bases de Dados de Proteínas , Algoritmos
7.
J Proteome Res ; 21(10): 2412-2420, 2022 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-36166314

RESUMO

The analysis of shotgun proteomics data often involves generating lists of inferred peptide-spectrum matches (PSMs) and/or of peptides. The canonical approach for generating these discovery lists is by controlling the false discovery rate (FDR), most commonly through target-decoy competition (TDC). At the PSM level, TDC is implemented by competing each spectrum's best-scoring target (real) peptide match with its best match against a decoy database. This PSM-level procedure can be adapted to the peptide level by selecting the top-scoring PSM per peptide prior to FDR estimation. Here, we first highlight and empirically augment a little known previous work by He et al., which showed that TDC-based PSM-level FDR estimates can be liberally biased. We thus propose that researchers instead focus on peptide-level analysis. We then investigate three ways to carry out peptide-level TDC and show that the most common method ("PSM-only") offers the lowest statistical power in practice. An alternative approach that carries out a double competition, first at the PSM and then at the peptide level ("PSM-and-peptide"), is the most powerful method, yielding an average increase of 17% more discovered peptides at 1% FDR threshold relative to the PSM-only method.


Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Bases de Dados de Proteínas , Peptídeos/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos
8.
Bioinformatics ; 38(Suppl_2): ii82-ii88, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-36124786

RESUMO

MOTIVATION: Target-decoy competition (TDC) is a commonly used method for false discovery rate (FDR) control in the analysis of tandem mass spectrometry data. This type of competition-based FDR control has recently gained significant popularity in other fields after Barber and Candès laid its theoretical foundation in a more general setting that included the feature selection problem. In both cases, the competition is based on a head-to-head comparison between an (observed) target score and a corresponding decoy (knockoff) score. However, the effectiveness of TDC depends on whether the data are homogeneous, which is often not the case: in many settings, the data consist of groups with different score profiles or different proportions of true nulls. In such cases, applying TDC while ignoring the group structure often yields imbalanced lists of discoveries, where some groups might include relatively many false discoveries and other groups include relatively very few. On the other hand, as we show, the alternative approach of applying TDC separately to each group does not rigorously control the FDR. RESULTS: We developed Group-walk, a procedure that controls the FDR in the target-decoy/knockoff setting while taking into account a given group structure. Group-walk is derived from the recently developed AdaPT-a general framework for controlling the FDR with side-information. We show using simulated and real datasets that when the data naturally divide into groups with different characteristics Group-walk can deliver consistent power gains that in some cases are substantial. These groupings include the precursor charge state (4% more discovered peptides at 1% FDR threshold), the peptide length (3.6% increase) and the mass difference due to modifications (26% increase). AVAILABILITY AND IMPLEMENTATION: Group-walk is available at https://cran.r-project.org/web/packages/groupwalk/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos
9.
J Proteome Res ; 21(7): 1771-1782, 2022 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-35696663

RESUMO

Quantitative mass spectrometry measurements of peptides necessarily incorporate sequence-specific biases that reflect the behavior of the peptide during enzymatic digestion and liquid chromatography and in a mass spectrometer. These sequence-specific effects impair quantification accuracy, yielding peptide quantities that are systematically under- or overestimated. We provide empirical evidence for the existence of such biases, and we use a deep neural network, called Pepper, to automatically identify and reduce these biases. The model generalizes to new proteins and new runs within a related set of tandem mass spectrometry experiments, and the learned coefficients themselves reflect expected physicochemical properties of the corresponding peptide sequences. The resulting adjusted abundance measurements are more correlated with mRNA-based gene expression measurements than the unadjusted measurements. Pepper is suitable for data generated on a variety of mass spectrometry instruments and can be used with labeled or label-free approaches and with data-independent or data-dependent acquisition.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Sequência de Aminoácidos , Viés , Aprendizado de Máquina , Peptídeos/análise , Espectrometria de Massas em Tandem/métodos
10.
Nat Methods ; 19(6): 675-678, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35637305

RESUMO

Computational methods that aim to exploit publicly available mass spectrometry repositories rely primarily on unsupervised clustering of spectra. Here we trained a deep neural network in a supervised fashion on the basis of previous assignments of peptides to spectra. The network, called 'GLEAMS', learns to embed spectra in a low-dimensional space in which spectra generated by the same peptide are close to one another. We applied GLEAMS for large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. We used these clusters to explore the dark proteome of repeatedly observed yet consistently unidentified mass spectra.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Redes Neurais de Computação , Peptídeos/química , Proteoma/análise , Espectrometria de Massas em Tandem/métodos
11.
Bioinformatics ; 37(Suppl_1): i434-i442, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252924

RESUMO

MOTIVATION: Tandem mass spectrometry data acquired using data independent acquisition (DIA) is challenging to interpret because the data exhibits complex structure along both the mass-to-charge (m/z) and time axes. The most common approach to analyzing this type of data makes use of a library of previously observed DIA data patterns (a 'spectral library'), but this approach is expensive because the libraries do not typically generalize well across laboratories. RESULTS: Here, we propose DIAmeter, a search engine that detects peptides in DIA data using only a peptide sequence database. Although some existing library-free DIA analysis methods (i) support data generated using both wide and narrow isolation windows, (ii) detect peptides containing post-translational modifications, (iii) analyze data from a variety of instrument platforms and (iv) are capable of detecting peptides even in the absence of detectable signal in the survey (MS1) scan, DIAmeter is the only method that offers all four capabilities in a single tool. AVAILABILITY AND IMPLEMENTATION: The open source, Apache licensed source code is available as part of the Crux mass spectrometry analysis toolkit (http://crux.ms). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Processamento de Proteína Pós-Traducional , Software
12.
Nucleic Acids Res ; 48(5): 2303-2311, 2020 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-32034421

RESUMO

Chromatin conformation assays such as Hi-C cannot directly measure differences in 3D architecture between cell types or cell states. For this purpose, two or more Hi-C experiments must be carried out, but direct comparison of the resulting Hi-C matrices is confounded by several features of Hi-C data. Most notably, the genomic distance effect, whereby contacts between pairs of genomic loci that are proximal along the chromosome exhibit many more Hi-C contacts that distal pairs of loci, dominates every Hi-C matrix. Furthermore, the form that this distance effect takes often varies between different Hi-C experiments, even between replicate experiments. Thus, a statistical confidence measure designed to identify differential Hi-C contacts must accurately account for the genomic distance effect or risk being misled by large-scale but artifactual differences. ACCOST (Altered Chromatin COnformation STatistics) accomplishes this goal by extending the statistical model employed by DEseq, re-purposing the 'size factors,' which were originally developed to account for differences in read depth between samples, to instead model the genomic distance effect. We show via analysis of simulated and real data that ACCOST provides unbiased statistical confidence estimates that compare favorably with competing methods such as diffHiC, FIND and HiCcompare. ACCOST is freely available with an Apache license at https://bitbucket.org/noblelab/accost.


Assuntos
Cromatina/química , DNA/química , Loci Gênicos , Genoma , Software , Animais , Linhagem Celular , Cromatina/metabolismo , DNA/metabolismo , Epistasia Genética , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Humanos , Linfócitos/citologia , Linfócitos/metabolismo , Camundongos , Conformação Molecular , Plasmodium falciparum/genética , Esporozoítos/genética , Trofozoítos/genética
13.
J Proteome Res ; 18(10): 3792-3799, 2019 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-31448616

RESUMO

Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. On the basis of these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .


Assuntos
Biblioteca de Peptídeos , Proteômica/métodos , Software , Bases de Dados de Proteínas , Humanos , Peptídeos/análise , Processamento de Proteína Pós-Traducional
14.
J Proteome Res ; 18(9): 3353-3359, 2019 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-31407580

RESUMO

The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.


Assuntos
Peptídeos/genética , Proteômica/métodos , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado de Máquina , Peptídeos/classificação , Peptídeos/isolamento & purificação , Espectrometria de Massas em Tandem/métodos
15.
J Proteome Res ; 18(1): 86-94, 2019 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-30362768

RESUMO

In data independent acquisition (DIA) mass spectrometry, precursor scans are interleaved with wide-window fragmentation scans, resulting in complex fragmentation spectra containing multiple coeluting peptide species. In this setting, detecting the isotope distribution profiles of intact peptides in the precursor scans can be a critical initial step in accurate peptide detection and quantification. This peak detection step is particularly challenging when the isotope peaks associated with two different peptide species overlap-or interfere-with one another. We propose a regression model, called Siren, to detect isotopic peaks in precursor DIA data that can explicitly account for interference. We validate Siren's peak-calling performance on a variety of data sets by counting how many of the peaks Siren identifies are associated with confidently detected peptides. In particular, we demonstrate that substituting the Siren regression model in place of the existing peak-calling step in DIA-Umpire leads to improved overall rates of peptide detection.


Assuntos
Espectrometria de Massas/métodos , Peptídeos/análise , Proteômica/métodos , Algoritmos , Análise de Dados , Isótopos/análise , Análise de Regressão
16.
J Proteome Res ; 17(11): 3644-3656, 2018 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-30221945

RESUMO

To achieve accurate assignment of peptide sequences to observed fragmentation spectra, a shotgun proteomics database search tool must make good use of the very high-resolution information produced by state-of-the-art mass spectrometers. However, making use of this information while also ensuring that the search engine's scores are well calibrated, that is, that the score assigned to one spectrum can be meaningfully compared to the score assigned to a different spectrum, has proven to be challenging. Here we describe a database search score function, the "residue evidence" (res-ev) score, that achieves both of these goals simultaneously. We also demonstrate how to combine calibrated res-ev scores with calibrated XCorr scores to produce a "combined p value" score function. We provide a benchmark consisting of four mass spectrometry data sets, which we use to compare the combined p value to the score functions used by several existing search engines. Our results suggest that the combined p value achieves state-of-the-art performance, generally outperforming MS Amanda and Morpheus and performing comparably to MS-GF+. The res-ev and combined p-value score functions are freely available as part of the Tide search engine in the Crux mass spectrometry toolkit ( http://crux.ms ).


Assuntos
Algoritmos , Proteínas de Escherichia coli/química , Mapeamento de Peptídeos/estatística & dados numéricos , Peptídeos/química , Proteínas de Protozoários/química , Espectrometria de Massas em Tandem/estatística & dados numéricos , Glândulas Suprarrenais/química , Sequência de Aminoácidos , Organismos Aquáticos/química , Benchmarking , Calibragem , Misturas Complexas/química , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Proteínas de Escherichia coli/classificação , Proteínas de Escherichia coli/isolamento & purificação , Humanos , Mapeamento de Peptídeos/métodos , Peptídeos/classificação , Peptídeos/isolamento & purificação , Plasmodium falciparum/química , Proteólise , Proteômica/métodos , Proteínas de Protozoários/classificação , Proteínas de Protozoários/isolamento & purificação , Software , Espectrometria de Massas em Tandem/métodos
17.
Nat Genet ; 50(10): 1388-1398, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30202056

RESUMO

Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Neoplasias/genética , Biologia de Sistemas/métodos , Células A549 , Linhagem Celular Tumoral , Mapeamento Cromossômico , DNA de Neoplasias/análise , DNA de Neoplasias/genética , Genes Neoplásicos , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Células K562 , Desequilíbrio de Ligação , Análise de Sequência de DNA/métodos , Integração de Sistemas
18.
J Proteome Res ; 17(10): 3463-3474, 2018 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-30184435

RESUMO

Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .


Assuntos
Bases de Dados de Proteínas , Peptídeos/metabolismo , Proteômica/métodos , Ferramenta de Busca/métodos , Algoritmos , Biologia Computacional/métodos , Células HEK293 , Humanos , Biblioteca de Peptídeos , Peptídeos/química , Processamento de Proteína Pós-Traducional , Reprodutibilidade dos Testes , Software , Espectrometria de Massas em Tandem , Fatores de Tempo
20.
J Proteome Res ; 14(8): 3148-61, 2015 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-26152888

RESUMO

Interpreting the potentially vast number of hypotheses generated by a shotgun proteomics experiment requires a valid and accurate procedure for assigning statistical confidence estimates to identified tandem mass spectra. Despite the crucial role such procedures play in most high-throughput proteomics experiments, the scientific literature has not reached a consensus about the best confidence estimation methodology. In this work, we evaluate, using theoretical and empirical analysis, four previously proposed protocols for estimating the false discovery rate (FDR) associated with a set of identified tandem mass spectra: two variants of the target-decoy competition protocol (TDC) of Elias and Gygi and two variants of the separate target-decoy search protocol of Käll et al. Our analysis reveals significant biases in the two separate target-decoy search protocols. Moreover, the one TDC protocol that provides an unbiased FDR estimate among the target PSMs does so at the cost of forfeiting a random subset of high-scoring spectrum identifications. We therefore propose the mix-max procedure to provide unbiased, accurate FDR estimates in the presence of well-calibrated scores. The method avoids biases associated with the two separate target-decoy search protocols and also avoids the propensity for target-decoy competition to discard a random subset of high-scoring target identifications.


Assuntos
Algoritmos , Biologia Computacional , Peptídeos/metabolismo , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Sequência de Aminoácidos , Animais , Proteínas de Caenorhabditis elegans/metabolismo , Biologia Computacional/métodos , Bases de Dados de Proteínas , Reações Falso-Positivas , Plasmodium falciparum/metabolismo , Proteômica/normas , Proteínas de Protozoários/metabolismo , Reprodutibilidade dos Testes , Proteínas de Saccharomyces cerevisiae/metabolismo , Espectrometria de Massas em Tandem/normas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA