Pesquisa | Secretaria de Estado da Saúde

Annotation of tandem mass spectrometry data using stochastic neural networks in shotgun proteomics.

Sulimov, Pavel; Voronkova, Anastasia; Kertész-Farkas, Attila.

Bioinformatics ; 36(12): 3781-3787, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32207518

RESUMO

MOTIVATION: The discrimination ability of score functions to separate correct from incorrect peptide-spectrum-matches in database-searching-based spectrum identification is hindered by many superfluous peaks belonging to unexpected fragmentation ions or by the lacking peaks of anticipated fragmentation ions. RESULTS: Here, we present a new method, called BoltzMatch, to learn score functions using a particular stochastic neural networks, called restricted Boltzmann machines, in order to enhance their discrimination ability. BoltzMatch learns chemically explainable patterns among peak pairs in the spectrum data, and it can augment peaks depending on their semantic context or even reconstruct lacking peaks of expected ions during its internal scoring mechanism. As a result, BoltzMatch achieved 50% and 33% more annotations on high- and low-resolution MS2 data than XCorr at a 0.1% false discovery rate in our benchmark; conversely, XCorr yielded the same number of spectrum annotations as BoltzMatch, albeit with 4-6 times more errors. In addition, BoltzMatch alone does yield 14% more annotations than Prosit (which runs with Percolator), and BoltzMatch with Percolator yields 32% more annotations than Prosit at 0.1% FDR level in our benchmark. AVAILABILITY AND IMPLEMENTATION: BoltzMatch is freely available at: https://github.com/kfattila/BoltzMatch. CONTACT: akerteszfarkas@hse.ru. SUPPORTING INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Redes Neurais de Computação , Software

Tailor: A Nonparametric and Rapid Score Calibration Method for Database Search-Based Peptide Identification in Shotgun Proteomics.

Sulimov, Pavel; Kertész-Farkas, Attila.

J Proteome Res ; 19(4): 1481-1490, 2020 04 03.

Artigo em Inglês | MEDLINE | ID: mdl-32175744

RESUMO

Peptide-spectrum-match (PSM) scores used in database searching are calibrated to spectrum- or spectrum-peptide-specific null distributions. Some calibration methods rely on specific assumptions and use analytical models (e.g., binomial distributions), whereas other methods utilize exact empirical null distributions. The former may be inaccurate because of unjustified assumptions, while the latter are accurate, albeit computationally exhaustive. Here, we introduce a novel, nonparametric, heuristic PSM score calibration method, called Tailor, which calibrates PSM scores by dividing them with the top 100-quantile of the empirical, spectrum-specific null distributions (i.e., the score with an associated p-value of 0.01 at the tail, hence the name) observed during database searching. Tailor does not require any optimization steps or long calculations; it does not rely on any assumptions on the form of the score distribution (i.e., if it is, e.g., binomial); however, it relies on our empirical observation that the mean and the variance of the null distributions are correlated. In our benchmark, we re-calibrated the match scores of XCorr from Crux, HyperScore scores from X!Tandem, and the p-values from OMSSA with the Tailor method and obtained more spectrum annotations than with raw scores at any false discovery rate level. Moreover, Tailor provided slightly more annotations than E-values of X!Tandem and OMSSA and approached the performance of the computationally exhaustive exact p-value method for XCorr on spectrum data sets containing low-resolution fragmentation information (MS2) around 20-150 times faster. On high-resolution MS2 data sets, the Tailor method with XCorr achieved state-of-the-art performance and produced more annotations than the well-calibrated residue-evidence (Res-ev) score around 50-80 times faster.

Assuntos

Algoritmos , Proteômica , Calibragem , Bases de Dados de Proteínas , Peptídeos

Bias in False Discovery Rate Estimation in Mass-Spectrometry-Based Peptide Identification.

Danilova, Yulia; Voronkova, Anastasia; Sulimov, Pavel; Kertész-Farkas, Attila.

J Proteome Res ; 18(5): 2354-2358, 2019 05 03.

Artigo em Inglês | MEDLINE | ID: mdl-30983355

RESUMO

Accurate target-decoy-based false discovery rate (FDR) control of peptide identification from tandem mass-spectrometry data relies on an important but often neglected assumption that incorrect spectrum annotations are equally likely to receive either target or decoy peptides. Here we argue that this assumption is often violated in practice, even by popular methods. Preference can be given to target peptides by biased scoring functions, which result in liberal FDR estimations, or to decoy peptides by correlated spectra, which result in conservative estimations.

Assuntos

Artefatos , Peptídeos/isolamento & purificação , Proteômica/normas , Espectrometria de Massas em Tandem/normas , Sequência de Aminoácidos , Viés , Humanos , Plasmodium falciparum/química , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos

Dual network embedding for representing research interests in the link prediction problem on co-authorship networks.

Makarov, Ilya; Gerasimova, Olga; Sulimov, Pavel; Zhukov, Leonid E.

PeerJ Comput Sci ; 5: e172, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-33816825

RESUMO

We present a study on co-authorship network representation based on network embedding together with additional information on topic modeling of research papers and new edge embedding operator. We use the link prediction (LP) model for constructing a recommender system for searching collaborators with similar research interests. Extracting topics for each paper, we construct keywords co-occurrence network and use its embedding for further generalizing author attributes. Standard graph feature engineering and network embedding methods were combined for constructing co-author recommender system formulated as LP problem and prediction of future graph structure. We evaluate our survey on the dataset containing temporal information on National Research University Higher School of Economics over 25 years of research articles indexed in Russian Science Citation Index and Scopus. Our model of network representation shows better performance for stated binary classification tasks on several co-authorship networks.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa