Búsqueda | Portal Regional de la BVS

MetaNovo: An open-source pipeline for probabilistic peptide discovery in complex metaproteomic datasets.

Potgieter, Matthys G; Nel, Andrew J M; Fortuin, Suereta; Garnett, Shaun; Wendoh, Jerome M; Tabb, David L; Mulder, Nicola J; Blackburn, Jonathan M.

PLoS Comput Biol ; 19(6): e1011163, 2023 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-37327214

RESUMEN

BACKGROUND: Microbiome research is providing important new insights into the metabolic interactions of complex microbial ecosystems involved in fields as diverse as the pathogenesis of human diseases, agriculture and climate change. Poor correlations typically observed between RNA and protein expression datasets make it hard to accurately infer microbial protein synthesis from metagenomic data. Additionally, mass spectrometry-based metaproteomic analyses typically rely on focused search sequence databases based on prior knowledge for protein identification that may not represent all the proteins present in a set of samples. Metagenomic 16S rRNA sequencing only targets the bacterial component, while whole genome sequencing is at best an indirect measure of expressed proteomes. Here we describe a novel approach, MetaNovo, that combines existing open-source software tools to perform scalable de novo sequence tag matching with a novel algorithm for probabilistic optimization of the entire UniProt knowledgebase to create tailored sequence databases for target-decoy searches directly at the proteome level, enabling metaproteomic analyses without prior expectation of sample composition or metagenomic data generation and compatible with standard downstream analysis pipelines. RESULTS: We compared MetaNovo to published results from the MetaPro-IQ pipeline on 8 human mucosal-luminal interface samples, with comparable numbers of peptide and protein identifications, many shared peptide sequences and a similar bacterial taxonomic distribution compared to that found using a matched metagenome sequence database-but simultaneously identified many more non-bacterial peptides than the previous approaches. MetaNovo was also benchmarked on samples of known microbial composition against matched metagenomic and whole genomic sequence database workflows, yielding many more MS/MS identifications for the expected taxa, with improved taxonomic representation, while also highlighting previously described genome sequencing quality concerns for one of the organisms, and identifying an experimental sample contaminant without prior expectation. CONCLUSIONS: By estimating taxonomic and peptide level information directly on microbiome samples from tandem mass spectrometry data, MetaNovo enables the simultaneous identification of peptides from all domains of life in metaproteome samples, bypassing the need for curated sequence databases to search. We show that the MetaNovo approach to mass spectrometry metaproteomics is more accurate than current gold standard approaches of tailored or matched genomic sequence database searches, can identify sample contaminants without prior expectation and yields insights into previously unidentified metaproteomic signals, building on the potential for complex mass spectrometry metaproteomic data to speak for itself.

Asunto(s)

Microbiota , Espectrometría de Masas en Tándem , Humanos , ARN Ribosómico 16S/genética , Bases de Datos de Proteínas , Péptidos/genética , Péptidos/análisis , Microbiota/genética , Bacterias/genética , Proteoma/genética

Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry.

Potgieter, Matthys G; Nakedi, Kehilwe C; Ambler, Jon M; Nel, Andrew J M; Garnett, Shaun; Soares, Nelson C; Mulder, Nicola; Blackburn, Jonathan M.

Front Microbiol ; 7: 427, 2016.

Artículo en Inglés | MEDLINE | ID: mdl-27092112

RESUMEN

Biochemical evidence is vital for accurate genome annotation. The integration of experimental data collected at the proteome level using high resolution mass spectrometry allows for improvements in genome annotation by providing evidence for novel gene models, while validating or modifying others. Here, we report the results of a proteogenomic analysis of a reference strain of Mycobacterium smegmatis (mc(2)155), a fast growing model organism for the pathogenic Mycobacterium tuberculosis-the causative agent for Tuberculosis. By integrating high throughput LC/MS/MS proteomic data with genomic six frame translation and ab initio gene prediction databases, a total of 2887 ORFs were identified, including 2810 ORFs annotated to a Reference protein, and 63 ORFs not previously annotated to a Reference protein. Further, the translational start site (TSS) was validated for 558 Reference proteome gene models, while upstream translational evidence was identified for 81. In addition, N-terminus derived peptide identifications allowed for downstream TSS modification of a further 24 gene models. We validated the existence of six previously described interrupted coding sequences at the peptide level, and provide evidence for four novel frameshift positions. Analysis of peptide posterior error probability (PEP) scores indicates high-confidence novel peptide identifications and shows that the genome of M. smegmatis mc(2)155 is not yet fully annotated. Data are available via ProteomeXchange with identifier PXD003500.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA