Generalized method for probability-based peptide and protein identification from tandem mass spectrometry data and sequence database searching.

Ramos-Fernández, Antonio; Paradela, Alberto; Navajas, Rosana; Albar, Juan Pablo

Ramos-Fernández, Antonio; Paradela, Alberto; Navajas, Rosana; Albar, Juan Pablo.

Afiliação

Ramos-Fernández A; Proteomics Facility, Centro Nacional de Biotecnología, Consejo Superior de Investigaciones Científicas, 28049 Madrid, Spain.

Mol Cell Proteomics ; 7(9): 1748-54, 2008 Sep.

Article em En | MEDLINE | ID: mdl-18515861

RESUMO

Tandem mass spectrometry-based proteomics is currently in great demand of computational methods that facilitate the elimination of likely false positives in peptide and protein identification. In the last few years, a number of new peptide identification programs have been described, but scores or other significance measures reported by these programs cannot always be directly translated into an easy to interpret error rate measurement such as the false discovery rate. In this work we used generalized lambda distributions to model frequency distributions of database search scores computed by MASCOT, X!TANDEM with k-score plug-in, OMSSA, and InsPecT. From these distributions, we could successfully estimate p values and false discovery rates with high accuracy. From the set of peptide assignments reported by any of these engines, we also defined a generic protein scoring scheme that enabled accurate estimation of protein-level p values by simulation of random score distributions that was also found to yield good estimates of protein-level false discovery rate. The performance of these methods was evaluated by searching four freely available data sets ranging from 40,000 to 285,000 MS/MS spectra.

Assuntos

Biologia Computacional/métodos; Peptídeos/química; Proteínas/química; Proteômica/métodos; Análise de Sequência de Proteína/métodos; Animais; Biologia Computacional/estatística & dados numéricos; Bases de Dados de Proteínas/estatística & dados numéricos; Humanos; Camundongos; Probabilidade; Proteômica/estatística & dados numéricos; Análise de Sequência de Proteína/estatística & dados numéricos; Espectrometria de Massas em Tandem

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Peptídeos / Proteínas / Biologia Computacional / Análise de Sequência de Proteína / Proteômica Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Animals / Humans Idioma: En Ano de publicação: 2008 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google