A comprehensive and scalable database search system for metaproteomics.

Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W.

Afiliación

Chatterjee S; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
Stupp GS; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
Park SK; Department of Chemical Physiology, The Scripps Research Institute, La Jolla, USA.
Ducom JC; High Performance Computing Technology Core, The Scripps Research Institute, La Jolla, USA.
Yates JR; Department of Chemical Physiology, The Scripps Research Institute, La Jolla, USA.
Su AI; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA. asu@scripps.edu.
Wolan DW; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA. asu@scripps.edu.

BMC Genomics ; 17(1): 642, 2016 08 16.

Article en En | MEDLINE | ID: mdl-27528457

RESUMEN

BACKGROUND: Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. RESULTS: Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. CONCLUSIONS: The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

Asunto(s)

Bases de Datos de Proteínas; Proteoma; Proteómica/métodos; Motor de Búsqueda; Proteínas Bacterianas; Microbioma Gastrointestinal; Interacciones Huésped-Patógeno; Humanos; Péptidos; Reproducibilidad de los Resultados

Palabras clave

Database; Metaproteomics; Microbiome; MongoDB; Proteomic search engine; Proteomics

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Proteoma / Bases de Datos de Proteínas / Proteómica / Motor de Búsqueda Tipo de estudio: Prognostic_studies Límite: Humans Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google