A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases.

Kumar, Praveen; Johnson, James E; Easterly, Caleb; Mehta, Subina; Sajulga, Ray; Nunn, Brook; Jagtap, Pratik D; Griffin, Timothy J

Kumar, Praveen; Johnson, James E; Easterly, Caleb; Mehta, Subina; Sajulga, Ray; Nunn, Brook; Jagtap, Pratik D; Griffin, Timothy J.

Afiliação

Kumar P; Bioinformatics and Computational Biology, University of Minnesota-Rochester, Rochester, Minnesota 55904, United States.
Johnson JE; Biochemistry Molecular Biology and Biophysics, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
Easterly C; Minnesota Supercomputing Institute, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
Mehta S; Biochemistry Molecular Biology and Biophysics, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
Sajulga R; Biochemistry Molecular Biology and Biophysics, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
Nunn B; Biochemistry Molecular Biology and Biophysics, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.
Jagtap PD; Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.
Griffin TJ; Biochemistry Molecular Biology and Biophysics, University of Minnesota-Twin Cities, Minneapolis, Minnesota 55455, United States.

J Proteome Res ; 19(7): 2772-2785, 2020 07 02.

Article em En | MEDLINE | ID: mdl-32396365

RESUMO

Multiomics approaches focused on mass spectrometry (MS)-based data, such as metaproteomics, utilize genomic and/or transcriptomic sequencing data to generate a comprehensive protein sequence database. These databases can be very large, containing millions of sequences, which reduces the sensitivity of matching tandem mass spectrometry (MS/MS) data to sequences to generate peptide spectrum matches (PSMs). Here, we describe and evaluate a sectioning method for generating an enriched database for those protein sequences that are most likely present in the sample. Our evaluation demonstrates how this method helps to increase the sensitivity of PSMs while maintaining acceptable false discovery rate statistics-offering a flexible alternative to traditional large database searching, as well as previously described two-step database searching methods for large sequence database applications. Furthermore, implementation in the Galaxy platform provides access to an automated and customizable workflow for carrying out the method. Additionally, the results of this study provide valuable insights into the advantages and limitations offered by available methods aimed at addressing challenges of genome-guided, large database applications in proteomics. Relevant raw data has been made available at https://zenodo.org/ using data set identifier "3754789" and https://arcticdata.io/catalog using data set identifier "A2VX06340".

Assuntos

Proteômica; Espectrometria de Massas em Tandem; Bases de Dados de Proteínas; Genômica; Peptídeos/genética; Software

Palavras-chave

Galaxy; false discovery rate; metaproteomics; peptide spectrum match; proteogenomics; tandem mass spectrometry

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Temas: Geral Base de dados: MEDLINE Assunto principal: Proteômica / Espectrometria de Massas em Tandem Idioma: En Revista: J Proteome Res Assunto da revista: BIOQUIMICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google