Re-fraction: a machine learning approach for deterministic identification of protein homologues and splice variants in large-scale MS-based proteomics.

Yang, Pengyi; Humphrey, Sean J; Fazakerley, Daniel J; Prior, Matthew J; Yang, Guang; James, David E; Yang, Jean Yee-Hwa

Yang, Pengyi; Humphrey, Sean J; Fazakerley, Daniel J; Prior, Matthew J; Yang, Guang; James, David E; Yang, Jean Yee-Hwa.

Afiliação

Yang P; School of Information Technologies, University of Sydney, NSW 2006, Australia.

J Proteome Res ; 11(5): 3035-45, 2012 May 04.

Article em En | MEDLINE | ID: mdl-22428558

RESUMO

A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.

Assuntos

Inteligência Artificial; Espectrometria de Massas/métodos; Isoformas de Proteínas/isolamento & purificação; Proteômica/métodos; Software; Algoritmos; Animais; Biologia Computacional/métodos; Bases de Dados de Proteínas; Eletroforese em Gel de Poliacrilamida; Camundongos; Modelos Moleculares; Peptídeos/química; Isoformas de Proteínas/química; Proteoma/análise; Proteoma/química; Reprodutibilidade dos Testes; Sensibilidade e Especificidade; Homologia de Sequência de Aminoácidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Espectrometria de Massas / Software / Inteligência Artificial / Isoformas de Proteínas / Proteômica Tipo de estudo: Diagnostic_studies / Prognostic_studies Limite: Animals Idioma: En Ano de publicação: 2012 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google