JUMPg: An Integrative Proteogenomics Pipeline Identifying Unannotated Proteins in Human Brain and Cancer Cells.

Li, Yuxin; Wang, Xusheng; Cho, Ji-Hoon; Shaw, Timothy I; Wu, Zhiping; Bai, Bing; Wang, Hong; Zhou, Suiping; Beach, Thomas G; Wu, Gang; Zhang, Jinghui; Peng, Junmin

Li, Yuxin; Wang, Xusheng; Cho, Ji-Hoon; Shaw, Timothy I; Wu, Zhiping; Bai, Bing; Wang, Hong; Zhou, Suiping; Beach, Thomas G; Wu, Gang; Zhang, Jinghui; Peng, Junmin.

Afiliação

Wang H; Integrated Biomedical Sciences Program, University of Tennessee Health Science Center , 920 Madison Avenue, Memphis, Tennessee 38163, United States.
Beach TG; Banner Sun Health Research Institute , Sun City, Arizona 85351, United States.

J Proteome Res ; 15(7): 2309-20, 2016 07 01.

Article em En | MEDLINE | ID: mdl-27225868

ABSTRACT

ABSTRACT

Proteogenomics is an emerging approach to improve gene annotation and interpretation of proteomics data. Here we present JUMPg, an integrative proteogenomics pipeline including customized database construction, tag-based database search, peptide-spectrum match filtering, and data visualization. JUMPg creates multiple databases of DNA polymorphisms, mutations, splice junctions, partially trypticity, as well as protein fragments translated from the whole transcriptome in all six frames upon RNA-seq de novo assembly. We use a multistage strategy to search these databases sequentially, in which the performance is optimized by re-searching only unmatched high-quality spectra and reusing amino acid tags generated by the JUMP search engine. The identified peptides/proteins are displayed with gene loci using the UCSC genome browser. Then, the JUMPg program is applied to process a label-free mass spectrometry data set of Alzheimer's disease postmortem brain, uncovering 496 new peptides of amino acid substitutions, alternative splicing, frame shift, and "non-coding gene" translation. The novel protein PNMA6BL specifically expressed in the brain is highlighted. We also tested JUMPg to analyze a stable-isotope labeled data set of multiple myeloma cells, revealing 991 sample-specific peptides that include protein sequences in the immunoglobulin light chain variable region. Thus, the JUMPg program is an effective proteogenomics tool for multiomics data integration.

Assuntos

Química Encefálica; Proteínas de Neoplasias/análise; Proteínas/análise; Proteogenômica/métodos; Fluxo de Trabalho; Doença de Alzheimer/patologia; Mineração de Dados; Humanos; Mieloma Múltiplo/patologia; Neoplasias/química; Peptídeos/análise; Ferramenta de Busca; Software

Palavras-chave

RNA-seq; database search; genomics; mass spectrometry; multistage analysis; proteogenomics; proteomics; spectrum quality control

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Química Encefálica / Proteínas / Fluxo de Trabalho / Proteogenômica / Proteínas de Neoplasias Idioma: En Ano de publicação: 2016 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google