|

1.

Proteogenomic characterization of difficult-to-treat breast cancer with tumor cells enriched through laser microdissection.

Raj-Kumar, Praveen-Kumar; Lin, Xiaoying; Liu, Tao; Sturtz, Lori A; Gritsenko, Marina A; Petyuk, Vladislav A; Sagendorf, Tyler J; Deyarmin, Brenda; Liu, Jianfang; Praveen-Kumar, Anupama; Wang, Guisong; McDermott, Jason E; Shukla, Anil K; Moore, Ronald J; Monroe, Matthew E; Webb-Robertson, Bobbie-Jo M; Hooke, Jeffrey A; Fantacone-Campbell, Leigh; Mostoller, Brad; Kvecher, Leonid; Kane, Jennifer; Melley, Jennifer; Somiari, Stella; Soon-Shiong, Patrick; Smith, Richard D; Mural, Richard J; Rodland, Karin D; Shriver, Craig D; Kovatich, Albert J; Hu, Hai.

Breast Cancer Res ; 26(1): 76, 2024 May 14.

Article En | MEDLINE | ID: mdl-38745208

BACKGROUND: Breast cancer (BC) is the most commonly diagnosed cancer and the leading cause of cancer death among women globally. Despite advances, there is considerable variation in clinical outcomes for patients with non-luminal A tumors, classified as difficult-to-treat breast cancers (DTBC). This study aims to delineate the proteogenomic landscape of DTBC tumors compared to luminal A (LumA) tumors. METHODS: We retrospectively collected a total of 117 untreated primary breast tumor specimens, focusing on DTBC subtypes. Breast tumors were processed by laser microdissection (LMD) to enrich tumor cells. DNA, RNA, and protein were simultaneously extracted from each tumor preparation, followed by whole genome sequencing, paired-end RNA sequencing, global proteomics and phosphoproteomics. Differential feature analysis, pathway analysis and survival analysis were performed to better understand DTBC and investigate biomarkers. RESULTS: We observed distinct variations in gene mutations, structural variations, and chromosomal alterations between DTBC and LumA breast tumors. DTBC tumors predominantly had more mutations in TP53, PLXNB3, Zinc finger genes, and fewer mutations in SDC2, CDH1, PIK3CA, SVIL, and PTEN. Notably, Cytoband 1q21, which contains numerous cell proliferation-related genes, was significantly amplified in the DTBC tumors. LMD successfully minimized stromal components and increased RNA-protein concordance, as evidenced by stromal score comparisons and proteomic analysis. Distinct DTBC and LumA-enriched clusters were observed by proteomic and phosphoproteomic clustering analysis, some with survival differences. Phosphoproteomics identified two distinct phosphoproteomic profiles for high relapse-risk and low relapse-risk basal-like tumors, involving several genes known to be associated with breast cancer oncogenesis and progression, including KIAA1522, DCK, FOXO3, MYO9B, ARID1A, EPRS, ZC3HAV1, and RBM14. Lastly, an integrated pathway analysis of multi-omics data highlighted a robust enrichment of proliferation pathways in DTBC tumors. CONCLUSIONS: This study provides an integrated proteogenomic characterization of DTBC vs LumA with tumor cells enriched through laser microdissection. We identified many common features of DTBC tumors and the phosphopeptides that could serve as potential biomarkers for high/low relapse-risk basal-like BC and possibly guide treatment selections.

Biomarkers, Tumor , Breast Neoplasms , Proteogenomics , Humans , Female , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Breast Neoplasms/metabolism , Breast Neoplasms/mortality , Biomarkers, Tumor/genetics , Proteogenomics/methods , Mutation , Laser Capture Microdissection , Middle Aged , Retrospective Studies , Aged , Adult , Proteomics/methods , Prognosis

2.

Proteogenomic Gene Structure Validation in the Pineapple Genome.

Ariffin, Norazrin; Newman, David Wells; Nelson, Michael G; O'cualain, Ronan; Hubbard, Simon J.

J Proteome Res ; 23(5): 1583-1592, 2024 May 03.

Article En | MEDLINE | ID: mdl-38651221

MD2 pineapple (Ananas comosus) is the second most important tropical crop that preserves crassulacean acid metabolism (CAM), which has high water-use efficiency and is fast becoming the most consumed fresh fruit worldwide. Despite the significance of environmental efficiency and popularity, until very recently, its genome sequence has not been determined and a high-quality annotated proteome has not been available. Here, we have undertaken a pilot proteogenomic study, analyzing the proteome of MD2 pineapple leaves using liquid chromatography-mass spectrometry (LC-MS/MS), which validates 1781 predicted proteins in the annotated F153 (V3) genome. In addition, a further 603 peptide identifications are found that map exclusively to an independent MD2 transcriptome-derived database but are not found in the standard F153 (V3) annotated proteome. Peptide identifications derived from these MD2 transcripts are also cross-referenced to a more recent and complete MD2 genome annotation, resulting in 402 nonoverlapping peptides, which in turn support 30 high-quality gene candidates novel to both pineapple genomes. Many of the validated F153 (V3) genes are also supported by an independent proteomics data set collected for an ornamental pineapple variety. The contigs and peptides have been mapped to the current F153 genome build and are available as bed files to display a custom gene track on the Ensembl Plants region viewer. These analyses add to the knowledge of experimentally validated pineapple genes and demonstrate the utility of transcript-derived proteomics to discover both novel genes and genetic structure in a plant genome, adding value to its annotation.

Ananas , Genome, Plant , Plant Proteins , Proteogenomics , Tandem Mass Spectrometry , Ananas/genetics , Ananas/chemistry , Proteogenomics/methods , Plant Proteins/genetics , Plant Proteins/metabolism , Chromatography, Liquid , Proteome/genetics , Proteome/analysis , Molecular Sequence Annotation , Plant Leaves/genetics , Plant Leaves/chemistry , Peptides/genetics , Peptides/analysis , Peptides/chemistry

3.

Proteogenomic Characterization Reveals Estrogen Signaling as a Target for Never-Smoker Lung Adenocarcinoma Patients without EGFR or ALK Alterations.

Park, Seung-Jin; Ju, Shinyeong; Goh, Sung-Ho; Yoon, Byoung-Ha; Park, Jong-Lyul; Kim, Jeong-Hwan; Lee, Seonjeong; Lee, Sang-Jin; Kwon, Yumi; Lee, Wonyeop; Park, Kyung Chan; Lee, Geon Kook; Park, Seog Yun; Kim, Sunshin; Kim, Seon-Young; Han, Ji-Youn; Lee, Cheolju.

Cancer Res ; 84(9): 1491-1503, 2024 May 02.

Article En | MEDLINE | ID: mdl-38607364

Never-smoker lung adenocarcinoma (NSLA) is prevalent in Asian populations, particularly in women. EGFR mutations and anaplastic lymphoma kinase (ALK) fusions are major genetic alterations observed in NSLA, and NSLA with these alterations have been well studied and can be treated with targeted therapies. To provide insights into the molecular profile of NSLA without EGFR and ALK alterations (NENA), we selected 141 NSLA tissues and performed proteogenomic characterization, including whole genome sequencing (WGS), transcriptomic, methylation EPIC array, total proteomic, and phosphoproteomic analyses. Forty patients with NSLA harboring EGFR and ALK alterations and seven patients with NENA with microsatellite instability were excluded. Genome analysis revealed that TP53 (25%), KRAS (22%), and SETD2 (11%) mutations and ROS1 fusions (14%) were the most frequent genetic alterations in NENA patients. Proteogenomic impact analysis revealed that STK11 and ERBB2 somatic mutations had broad effects on cancer-associated genes in NENA. DNA copy number alteration analysis identified 22 prognostic proteins that influenced transcriptomic and proteomic changes. Gene set enrichment analysis revealed estrogen signaling as the key pathway activated in NENA. Increased estrogen signaling was associated with proteogenomic alterations, such as copy number deletions in chromosomes 14 and 21, STK11 mutation, and DNA hypomethylation of LLGL2 and ST14. Finally, saracatinib, an Src inhibitor, was identified as a potential drug for targeting activated estrogen signaling in NENA and was experimentally validated in vitro. Collectively, this study enhanced our understanding of NENA NSLA by elucidating the proteogenomic landscape and proposed saracatinib as a potential treatment for this patient population that lacks effective targeted therapies. SIGNIFICANCE: The proteogenomic landscape in never-smoker lung cancer without known driver mutations reveals prognostic proteins and enhanced estrogen signaling that can be targeted as a potential therapeutic strategy to improve patient outcomes.

Adenocarcinoma of Lung , Anaplastic Lymphoma Kinase , ErbB Receptors , Estrogens , Lung Neoplasms , Mutation , Proteogenomics , Signal Transduction , Female , Humans , Male , Middle Aged , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/drug therapy , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/metabolism , Anaplastic Lymphoma Kinase/genetics , Anaplastic Lymphoma Kinase/metabolism , DNA Copy Number Variations , ErbB Receptors/genetics , ErbB Receptors/metabolism , Estrogens/metabolism , Lung Neoplasms/genetics , Lung Neoplasms/drug therapy , Lung Neoplasms/metabolism , Lung Neoplasms/pathology , Non-Smokers/statistics & numerical data , Prognosis , Proteogenomics/methods , Signal Transduction/genetics

4.

Nanoparticle enrichment mass-spectrometry proteomics identifies protein-altering variants for precise pQTL mapping.

Suhre, Karsten; Venkataraman, Guhan Ram; Guturu, Harendra; Halama, Anna; Stephan, Nisha; Thareja, Gaurav; Sarwath, Hina; Motamedchaboki, Khatereh; Donovan, Margaret K R; Siddiqui, Asim; Batzoglou, Serafim; Schmidt, Frank.

Nat Commun ; 15(1): 989, 2024 Feb 02.

Article En | MEDLINE | ID: mdl-38307861

Proteogenomics studies generate hypotheses on protein function and provide genetic evidence for drug target prioritization. Most previous work has been conducted using affinity-based proteomics approaches. These technologies face challenges, such as uncertainty regarding target identity, non-specific binding, and handling of variants that affect epitope affinity binding. Mass spectrometry-based proteomics can overcome some of these challenges. Here we report a pQTL study using the Proteograph™ Product Suite workflow (Seer, Inc.) where we quantify over 18,000 unique peptides from nearly 3000 proteins in more than 320 blood samples from a multi-ethnic cohort in a bottom-up, peptide-centric, mass spectrometry-based proteomics approach. We identify 184 protein-altering variants in 137 genes that are significantly associated with their corresponding variant peptides, confirming target specificity of co-associated affinity binders, identifying putatively causal cis-encoded proteins and providing experimental evidence for their presence in blood, including proteins that may be inaccessible to affinity-based proteomics.

Proteogenomics , Proteomics , Humans , Proteomics/methods , Mass Spectrometry/methods , Proteins/analysis , Peptides/analysis , Proteogenomics/methods , Mutant Proteins

5.

Integrated Proteogenomics Uncover Mechanisms of Glioblastoma Evolution, Pointing to Novel Therapeutic Targets.

Li, Jiabo; Shih, Ling-Kai; Brat, Daniel J.

Cancer Res ; 84(9): 1379-1381, 2024 May 02.

Article En | MEDLINE | ID: mdl-38330148

Nearly all glioblastoma (GBM) patients relapse following standard treatment and eventually succumb to disease. While large-scale, integrated multiomic studies have tremendously advanced the understanding of primary GBM at the cellular and molecular level, the posttherapeutic trajectory and biological properties of recurrent GBM remain poorly understood. This knowledge gap was addressed in a recent Cancer Cell article in which Kim and colleagues report on a highly integrative proteogenomic analysis performed on 123 matched primary and recurrent GBMs that uncovered a dramatic evolutionary shift from a proliferative state at initial diagnosis to the activation of neuronal and synaptogenic pathways at recurrence following therapy. Neuronal transition was characterized by posttranslational activation of WNT/PCP signaling and BRAF kinase, while many canonical oncogenic pathways, and EGFR in particular, were downregulated. Parallel multiomics analyses of patient-derived xenograft (PDX) models corroborated this evolutionary trajectory, allowing in vivo experiments for translational significance. Notably, targeting BRAF kinase disrupted both the neuronal transition and migration capabilities of recurrent gliomas, which were key characteristics of posttreatment progression. Furthermore, combining BRAF inhibitor vemurafenib with temozolomide prolonged survival in PDX models. Overall, the results reveal novel biological mechanisms of GBM evolution and therapy resistance, and suggest promising therapeutic intervention.

Brain Neoplasms , Glioblastoma , Proteogenomics , Humans , Glioblastoma/genetics , Glioblastoma/pathology , Glioblastoma/drug therapy , Glioblastoma/metabolism , Proteogenomics/methods , Brain Neoplasms/genetics , Brain Neoplasms/drug therapy , Brain Neoplasms/pathology , Brain Neoplasms/metabolism , Animals , Proto-Oncogene Proteins B-raf/genetics , Proto-Oncogene Proteins B-raf/antagonists & inhibitors , Proto-Oncogene Proteins B-raf/metabolism , Neoplasm Recurrence, Local/pathology , Neoplasm Recurrence, Local/genetics , Neoplasm Recurrence, Local/drug therapy , Mice , Temozolomide/pharmacology

6.

A Massive Proteogenomic Screen Identifies Thousands of Novel Peptides From the Human "Dark" Proteome.

Cao, Xiaolong; Sun, Siqi; Xing, Jinchuan.

Mol Cell Proteomics ; 23(2): 100719, 2024 Feb.

Article En | MEDLINE | ID: mdl-38242438

Although the human gene annotation has been continuously improved over the past 2 decades, numerous studies demonstrated the existence of a "dark proteome", consisting of proteins that were critical for biological processes but not included in widely used gene catalogs. The Genotype-Tissue Expression project generated more than 15,000 RNA-seq datasets from multiple tissues, which modeled 30 million transcripts in the human genome. To provide a resource of high-confidence novel proteins from the dark proteome, we screened 50,000 mass spectrometry runs from over 900 projects to identify proteins translated from the Genotype-Tissue Expression transcript model with proteomic support. We also integrated 3.8 million common genetic variants from the gnomAD database to improve peptide identification. As a result, we identified 170,529 novel peptides with proteomic evidence, of which 6048 passed the strictest standard we defined and were supported by PepQuery. We provided a user-friendly website (https://ncorf.genes.fun/) for researchers to check the evidence of novel peptides from their studies. The findings will improve our understanding of coding genes and facilitate genomic data interpretation in biomedical research.

Proteogenomics , Humans , Proteogenomics/methods , Proteome/metabolism , Proteomics/methods , Peptides/genetics , Genome, Human

7.

Multi-tissue proteogenomic analysis for mechanistic toxicology studies in non-model species.

Lin, M S; Varunjikar, M S; Lie, K K; Søfteland, L; Dellafiora, L; Ørnsrud, R; Sanden, M; Berntssen, M H G; Dorne, J L C M; Bafna, V; Rasinger, J D.

Environ Int ; 182: 108309, 2023 Dec.

Article En | MEDLINE | ID: mdl-37980879

New approach methodologies (NAM), including omics and in vitro approaches, are contributing to the implementation of 3R (reduction, refinement and replacement) strategies in regulatory science and risk assessment. In this study, we present an integrative transcriptomics and proteomics analysis workflow for the validation and revision of complex fish genomes and demonstrate how proteogenomics expression matrices can be used to support multi-level omics data integration in non-model species in vivo and in vitro. Using Atlantic salmon as an example, we constructed proteogenomic databases from publicly available transcriptomic data and in-house generated RNA-Seq and LC-MS/MS data. Our analysis identified â¼80,000 peptides, providing direct evidence of translation for over 40,000 RefSeq structures. The data also highlighted 183 co-located peptide groups that supported a single transcript each, and in each case, either corrected a previous annotation, supported Ensembl annotations not present in RefSeq, or identified novel previously unannotated genes. Proteogenomics data-derived expression matrices revealed distinct profiles for the different tissue types analyzed. Focusing on proteins involved in defense against xenobiotics, we detected distinct expression patterns across different salmon tissues and observed homology in the expression of chemical defense proteins between in vivo and in vitro liver systems. Our study demonstrates the potential of proteogenomic analyses in extending our understanding of complex fish genomes and provides an advanced bioinformatic toolkit to support the further development of NAMs and their application in regulatory science and (eco)toxicological studies of non-model species.

Proteogenomics , Animals , Proteogenomics/methods , Molecular Sequence Annotation , Chromatography, Liquid , Tandem Mass Spectrometry , Proteomics/methods , Peptides/analysis , Peptides/genetics , Peptides/metabolism

8.

Microproteins-Discovery, structure, and function.

Mohsen, Jessica J; Martel, Alina A; Slavoff, Sarah A.

Proteomics ; 23(23-24): e2100211, 2023 Dec.

Article En | MEDLINE | ID: mdl-37603371

Advances in proteogenomic technologies have revealed hundreds to thousands of translated small open reading frames (sORFs) that encode microproteins in genomes across evolutionary space. While many microproteins have now been shown to play critical roles in biology and human disease, a majority of recently identified microproteins have little or no experimental evidence regarding their functionality. Computational tools have some limitations for analysis of short, poorly conserved microprotein sequences, so additional approaches are needed to determine the role of each member of this recently discovered polypeptide class. A currently underexplored avenue in the study of microproteins is structure prediction and determination, which delivers a depth of functional information. In this review, we provide a brief overview of microprotein discovery methods, then examine examples of microprotein structures (and, conversely, intrinsic disorder) that have been experimentally determined using crystallography, cryo-electron microscopy, and NMR, which provide insight into their molecular functions and mechanisms. Additionally, we discuss examples of predicted microprotein structures that have provided insight or context regarding their function. Analysis of microprotein structure at the angstrom level, and confirmation of predicted structures, therefore, has potential to identify translated microproteins that are of biological importance and to provide molecular mechanism for their in vivo roles.

Micropeptides , Proteogenomics , Humans , Cryoelectron Microscopy , Peptides , Proteogenomics/methods , Open Reading Frames

9.

A proteogenomics data-driven knowledge base of human cancer.

Liao, Yuxing; Savage, Sara R; Dou, Yongchao; Shi, Zhiao; Yi, Xinpei; Jiang, Wen; Lei, Jonathan T; Zhang, Bing.

Cell Syst ; 14(9): 777-787.e5, 2023 09 20.

Article En | MEDLINE | ID: mdl-37619559

By combining mass-spectrometry-based proteomics and phosphoproteomics with genomics, epi-genomics, and transcriptomics, proteogenomics provides comprehensive molecular characterization of cancer. Using this approach, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has characterized over 1,000 primary tumors spanning 10 cancer types, many with matched normal tissues. Here, we present LinkedOmicsKB, a proteogenomics data-driven knowledge base that makes consistently processed and systematically precomputed CPTAC pan-cancer proteogenomics data available to the public through â¼40,000 gene-, protein-, mutation-, and phenotype-centric web pages. Visualization techniques facilitate efficient exploration and reasoning of complex, interconnected data. Using three case studies, we illustrate the practical utility of LinkedOmicsKB in providing new insights into genes, phosphorylation sites, somatic mutations, and cancer phenotypes. With precomputed results of 19,701 coding genes, 125,969 phosphosites, and 256 genotypes and phenotypes, LinkedOmicsKB provides a comprehensive resource to accelerate proteogenomics data-driven discoveries to improve our understanding and treatment of human cancer. A record of this paper's transparent peer review process is included in the supplemental information.

Neoplasms , Proteogenomics , Humans , Proteomics , Proteogenomics/methods , Genomics , Neoplasms/genetics , Knowledge Bases

10.

Frontiers in mass spectrometry-based clinical proteomics for cancer diagnosis and treatment.

Haga, Yoshimi; Minegishi, Yuriko; Ueda, Koji.

Cancer Sci ; 114(5): 1783-1791, 2023 May.

Article En | MEDLINE | ID: mdl-36661476

Numerous omics studies, primarily genomics analyses, have been conducted to fully understand the molecular biological characteristics of cancer. In recent years, the depth of proteomic analysis, which comprehensively analyzes proteins and molecules that function directly in vivo, has increased dramatically. Proteomics using mass spectrometry (MS) is a promising technology to directly examine proteoforms, including post-translational modifications and variants originating from genomic aberrations. Recent advances in MS-based proteomics have enabled direct, in depth, and quantitative analysis of the expression levels of various cancer-related proteins, as well as their cancer-specific proteoforms, and proteins that fluctuate with cancer initiation and progression in cell lines and tissue samples. Additionally, the integration of proteomic data with genomic, epigenomic, and transcriptomic data has formed the growing field of proteogenomics, which is already yielding new biological and diagnostic knowledge. Deep proteomic profiling provides clinically useful information in various aspects, including understanding the mechanisms of cancer development and progression and discovering targets for diagnosis and drug development. Furthermore, it is expected to make a significant contribution to the promotion of personalized medicine. In this review, recent advances and impacts in MS-based clinical proteomics are highlighted with a focus on oncology.

Neoplasms , Proteogenomics , Humans , Proteomics/methods , Genomics/methods , Proteogenomics/methods , Neoplasms/diagnosis , Neoplasms/genetics , Neoplasms/therapy , Mass Spectrometry/methods

11.

Proteogenomics reveals sex-biased aging genes and coordinated splicing in cardiac aging.

Han, Yu; Wennersten, Sara A; Wright, Julianna M; Ludwig, R W; Lau, Edward; Lam, Maggie P Y.

Am J Physiol Heart Circ Physiol ; 323(3): H538-H558, 2022 09 01.

Article En | MEDLINE | ID: mdl-35930447

The risks of heart diseases are significantly modulated by age and sex, but how these factors influence baseline cardiac gene expression remains incompletely understood. Here, we used RNA sequencing and mass spectrometry to compare gene expression in female and male young adult (4 mo) and early aging (20 mo) mouse hearts, identifying thousands of age- and sex-dependent gene expression signatures. Sexually dimorphic cardiac genes are broadly distributed, functioning in mitochondrial metabolism, translation, and other processes. In parallel, we found over 800 genes with differential aging response between male and female, including genes in cAMP and PKA signaling. Analysis of the sex-adjusted aging cardiac transcriptome revealed a widespread remodeling of exon usage patterns that is largely independent from differential gene expression, concomitant with upstream changes in RNA-binding protein and splice factor transcripts. To evaluate the impact of the splicing events on cardiac proteoform composition, we applied an RNA-guided proteomics computational pipeline to analyze the mass spectrometry data and detected hundreds of putative splice variant proteins that have the potential to rewire the cardiac proteome. Taken together, the results here suggest that cardiac aging is associated with 1) widespread sex-biased aging genes and 2) a rewiring of RNA splicing programs, including sex- and age-dependent changes in exon usages and splice patterns that have the potential to influence cardiac protein structure and function. These changes contribute to the emerging evidence for considerable sexual dimorphism in the cardiac aging process that should be considered in the search for disease mechanisms.NEW & NOTEWORTHY Han et al. used proteogenomics to compare male and female mouse hearts at 4 and 20 mo. Sex-biased cardiac genes function in mitochondrial metabolism, translation, autophagy, and other processes. Hundreds of cardiac genes show sex-by-age interactions, that is, sex-biased aging genes. Cardiac aging is accompanied with a remodeling of exon usage in functionally coordinated genes, concomitant with differential expression of RNA-binding proteins and splice factors. These features represent an underinvestigated aspect of cardiac aging that may be relevant to the search for disease mechanisms.

Proteogenomics , Aging/genetics , Alternative Splicing , Animals , Female , Male , Mice , Proteogenomics/methods , RNA Splicing , RNA-Binding Proteins/genetics

12.

An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics.

Fancello, Laura; Burger, Thomas.

Genome Biol ; 23(1): 132, 2022 06 20.

Article En | MEDLINE | ID: mdl-35725496

BACKGROUND: Proteogenomics aims to identify variant or unknown proteins in bottom-up proteomics, by searching transcriptome- or genome-derived custom protein databases. However, empirical observations reveal that these large proteogenomic databases produce lower-sensitivity peptide identifications. Various strategies have been proposed to avoid this, including the generation of reduced transcriptome-informed protein databases, which only contain proteins whose transcripts are detected in the sample-matched transcriptome. These were found to increase peptide identification sensitivity. Here, we present a detailed evaluation of this approach. RESULTS: We establish that the increased sensitivity in peptide identification is in fact a statistical artifact, directly resulting from the limited capability of target-decoy competition to accurately model incorrect target matches when using excessively small databases. As anti-conservative false discovery rates (FDRs) are likely to hamper the robustness of the resulting biological conclusions, we advocate for alternative FDR control methods that are less sensitive to database size. Nevertheless, reduced transcriptome-informed databases are useful, as they reduce the ambiguity of protein identifications, yielding fewer shared peptides. Furthermore, searching the reference database and subsequently filtering proteins whose transcripts are not expressed reduces protein identification ambiguity to a similar extent, but is more transparent and reproducible. CONCLUSIONS: In summary, using transcriptome information is an interesting strategy that has not been promoted for the right reasons. While the increase in peptide identifications from searching reduced transcriptome-informed databases is an artifact caused by the use of an FDR control method unsuitable to excessively small databases, transcriptome information can reduce the ambiguity of protein identifications.

Proteogenomics , Proteomics , Databases, Protein , Eukaryota , Peptides , Proteins , Proteogenomics/methods , Proteomics/methods , Transcriptome

13.

False discovery rate: the Achilles' heel of proteogenomics.

Aggarwal, Suruchi; Raj, Anurag; Kumar, Dhirendra; Dash, Debasis; Yadav, Amit Kumar.

Brief Bioinform ; 23(5)2022 09 20.

Article En | MEDLINE | ID: mdl-35534181

Proteogenomics refers to the integrated analysis of the genome and proteome that leverages mass-spectrometry (MS)-based proteomics data to improve genome annotations, understand gene expression control through proteoforms and find sequence variants to develop novel insights for disease classification and therapeutic strategies. However, proteogenomic studies often suffer from reduced sensitivity and specificity due to inflated database size. To control the error rates, proteogenomics depends on the target-decoy search strategy, the de-facto method for false discovery rate (FDR) estimation in proteomics. The proteogenomic databases constructed from three- or six-frame nucleotide database translation not only increase the search space and compute-time but also violate the equivalence of target and decoy databases. These searches result in poorer separation between target and decoy scores, leading to stringent FDR thresholds. Understanding these factors and applying modified strategies such as two-pass database search or peptide-class-specific FDR can result in a better interpretation of MS data without introducing additional statistical biases. Based on these considerations, a user can interpret the proteogenomics results appropriately and control false positives and negatives in a more informed manner. In this review, first, we briefly discuss the proteogenomic workflows and limitations in database construction, followed by various considerations that can influence potential novel discoveries in a proteogenomic study. We conclude with suggestions to counter these challenges for better proteogenomic data interpretation.

Proteogenomics , Databases, Protein , Nucleotides , Peptides/chemistry , Proteogenomics/methods , Proteome , Proteomics/methods

14.

Validating Amino Acid Variants in Proteogenomics Using Sequence Coverage by Multiple Reads.

Levitsky, Lev I; Kuznetsova, Ksenia G; Kliuchnikova, Anna A; Ilina, Irina Y; Goncharov, Anton O; Lobas, Anna A; Ivanov, Mark V; Lazarev, Vassili N; Ziganshin, Rustam H; Gorshkov, Mikhail V; Moshkovskii, Sergei A.

J Proteome Res ; 21(6): 1438-1448, 2022 06 03.

Article En | MEDLINE | ID: mdl-35536917

Mass spectrometry-based proteome analysis implies matching the mass spectra of proteolytic peptides to amino acid sequences predicted from genomic sequences. Reliability of peptide variant identification in proteogenomic studies is often lacking. We propose a way to interpret shotgun proteomics results, specifically in the data-dependent acquisition mode, as protein sequence coverage by multiple reads as it is done in nucleic acid sequencing for calling of single nucleotide variants. Multiple reads for each sequence position could be provided by overlapping distinct peptides, thus confirming the presence of certain amino acid residues in the overlapping stretch with a lower false discovery rate. Overlapping distinct peptides originate from miscleaved tryptic peptides in combination with their properly cleaved counterparts and from peptides generated by multiple proteases after the same specimen is subject to parallel digestion and analyzed separately. We illustrate this approach using publicly available multiprotease data sets and our own data generated for the HEK-293 cell line digests obtained using trypsin, LysC, and GluC proteases. Totally, up to 30% of the whole proteome was covered by tryptic peptides with up to 7% covered twofold and more. The proteogenomic analysis of the HEK-293 cell line revealed 36 single amino acid variants, seven of which were supported by multiple reads.

Proteogenomics , Amino Acids , HEK293 Cells , Humans , Peptide Hydrolases , Peptides/analysis , Proteogenomics/methods , Proteome/analysis , Reproducibility of Results

15.

Cancer proteogenomics: current impact and future prospects.

Mani, D R; Krug, Karsten; Zhang, Bing; Satpathy, Shankha; Clauser, Karl R; Ding, Li; Ellis, Matthew; Gillette, Michael A; Carr, Steven A.

Nat Rev Cancer ; 22(5): 298-313, 2022 05.

Article En | MEDLINE | ID: mdl-35236940

Genomic analyses in cancer have been enormously impactful, leading to the identification of driver mutations and development of targeted therapies. But the functions of the vast majority of somatic mutations and copy number variants in tumours remain unknown, and the causes of resistance to targeted therapies and methods to overcome them are poorly defined. Recent improvements in mass spectrometry-based proteomics now enable direct examination of the consequences of genomic aberrations, providing deep and quantitative characterization of tumour tissues. Integration of proteins and their post-translational modifications with genomic, epigenomic and transcriptomic data constitutes the new field of proteogenomics, and is already leading to new biological and diagnostic knowledge with the potential to improve our understanding of malignant transformation and therapeutic outcomes. In this Review we describe recent developments in proteogenomics and key findings from the proteogenomic analysis of a wide range of cancers. Considerations relevant to the selection and use of samples for proteogenomics and the current technologies used to generate, analyse and integrate proteomic with genomic data are described. Applications of proteogenomics in translational studies and immuno-oncology are rapidly emerging, and the prospect for their full integration into therapeutic trials and clinical care seems bright.

Neoplasms , Proteogenomics , DNA Copy Number Variations , Genomics , Humans , Neoplasms/metabolism , Proteogenomics/methods , Proteomics

16.

µProteInS-a proteogenomics pipeline for finding novel bacterial microproteins encoded by small ORFs.

de Souza, Eduardo Vieira; Dalberto, Pedro Ferrari; Machado, Vinicius Pellisoli; Canedo, Adriana; Saghatelian, Alan; Machado, Pablo; Basso, Luiz Augusto; Bizarro, Cristiano Valim.

Bioinformatics ; 38(9): 2612-2614, 2022 04 28.

Article En | MEDLINE | ID: mdl-35188179

SUMMARY: Genome annotation pipelines traditionally exclude open reading frames (ORFs) shorter than 100 codons to avoid false identifications. However, studies have been showing that these may encode functional microproteins with meaningful biological roles. We developed µProteInS, a proteogenomics pipeline that combines genomics, transcriptomics and proteomics to identify novel microproteins in bacteria. Our pipeline employs a model to filter out low confidence spectra, to avoid the need for manually inspecting Mass Spectrometry data. It also overcomes the shortcomings of traditional approaches that usually exclude overlapping genes, leaderless transcripts and non-conserved sequences, characteristics that are common among small ORFs (smORFs) and hamper their identification. AVAILABILITY AND IMPLEMENTATION: µProteInS is implemented in Python 3.8 within an Ubuntu 20.04 environment. It is an open-source software distributed under the GNU General Public License v3, available as a command-line tool. It can be downloaded at https://github.com/Eduardo-vsouza/uproteins and either installed from source or executed as a Docker image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Proteogenomics , Open Reading Frames , Proteogenomics/methods , Software , Genomics/methods , Bacteria/genetics

17.

SMAP is a pipeline for sample matching in proteogenomics.

Li, Ling; Niu, Mingming; Erickson, Alyssa; Luo, Jie; Rowbotham, Kincaid; Guo, Kai; Huang, He; Li, Yuxin; Jiang, Yi; Hur, Junguk; Liu, Chunyu; Peng, Junmin; Wang, Xusheng.

Nat Commun ; 13(1): 744, 2022 02 08.

Article En | MEDLINE | ID: mdl-35136070

The integration of genomics and proteomics data (proteogenomics) holds the promise of furthering the in-depth understanding of human disease. However, sample mix-up is a pervasive problem in proteogenomics because of the complexity of sample processing. Here, we present a pipeline for Sample Matching in Proteogenomics (SMAP) to verify sample identity and ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulated data indicates that SMAP is capable of uniquely matching proteomic and genomic samples when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale dataset generated by the PsychENCODE BrainGVEX project, 54 samples (19%) were corrected. The correction was further confirmed by ribosome profiling and chromatin sequencing (ATAC-seq) data from the same set of samples. Our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. SMAP is publicly available at https://github.com/UND-Wanglab/SMAP , and a web-based version can be accessed at https://smap.shinyapps.io/smap/ .

Datasets as Topic , Proteogenomics/methods , Chromatin Immunoprecipitation Sequencing , Data Analysis , Female , Humans , Male , Mass Spectrometry/methods , Mass Spectrometry/statistics & numerical data , Proteogenomics/statistics & numerical data , RNA-Seq , Software , Whole Genome Sequencing

18.

Pharmaco-proteogenomic profiling of pediatric diffuse midline glioma to inform future treatment strategies.

Findlay, Izac J; De Iuliis, Geoffry N; Duchatel, Ryan J; Jackson, Evangeline R; Vitanza, Nicholas A; Cain, Jason E; Waszak, Sebastian M; Dun, Matthew D.

Oncogene ; 41(4): 461-475, 2022 01.

Article En | MEDLINE | ID: mdl-34759345

Diffuse midline glioma (DMG) is a deadly pediatric and adolescent central nervous system (CNS) tumor localized along the midline structures of the brain atop the spinal cord. With a median overall survival (OS) of just 9-11-months, DMG is characterized by global hypomethylation of histone H3 at lysine 27 (H3K27me3), driven by recurring somatic mutations in H3 genes including, HIST1H3B/C (H3.1K27M) or H3F3A (H3.3K27M), or through overexpression of EZHIP in patients harboring wildtype H3. The recent World Health Organization's 5th Classification of CNS Tumors now designates DMG as, 'H3 K27-altered', suggesting that global H3K27me3 hypomethylation is a ubiquitous feature of DMG and drives devastating transcriptional programs for which there are no treatments. H3-alterations co-segregate with various other somatic driver mutations, highlighting the high-level of intertumoral heterogeneity of DMG. Furthermore, DMG is also characterized by very high-level intratumoral diversity with tumors harboring multiple subclones within each primary tumor. Each subclone contains their own combinations of driver and passenger lesions that continually evolve, making precision-based medicine challenging to successful execute. Whilst the intertumoral heterogeneity of DMG has been extensively investigated, this is yet to translate to an increase in patient survival. Conversely, our understanding of the non-genomic factors that drive the rapid growth and fatal nature of DMG, including endogenous and exogenous microenvironmental influences, neurological cues, and the posttranscriptional and posttranslational architecture of DMG remains enigmatic or at best, immature. However, these factors are likely to play a significant role in the complex biological sequelae that drives the disease. Here we summarize the heterogeneity of DMG and emphasize how analysis of the posttranslational architecture may improve treatment paradigms. We describe factors that contribute to treatment response and disease progression, as well as highlight the potential for pharmaco-proteogenomics (i.e., the integration of genomics, proteomics and pharmacology) in the management of this uniformly fatal cancer.

Brain Neoplasms/drug therapy , Brain Neoplasms/genetics , Glioma/drug therapy , Glioma/genetics , Proteogenomics/methods , Animals , Brain Neoplasms/mortality , Child , Child, Preschool , Female , Glioma/mortality , Humans , Male , Mice , Survival Analysis , Tumor Microenvironment

19.

Proteogenomic characterization identifies clinically relevant subgroups of intrahepatic cholangiocarcinoma.

Dong, Liangqing; Lu, Dayun; Chen, Ran; Lin, Youpei; Zhu, Hongwen; Zhang, Zhou; Cai, Shangli; Cui, Peng; Song, Guohe; Rao, Dongning; Yi, Xinpei; Wu, Yingcheng; Song, Nixue; Liu, Fen; Zou, Yunhao; Zhang, Shu; Zhang, Xiaoming; Wang, Xiaoying; Qiu, Shuangjian; Zhou, Jian; Wang, Shisheng; Zhang, Xu; Shi, Yongyong; Figeys, Daniel; Ding, Li; Wang, Pei; Zhang, Bing; Rodriguez, Henry; Gao, Qiang; Gao, Daming; Zhou, Hu; Fan, Jia.

Cancer Cell ; 40(1): 70-87.e15, 2022 01 10.

Article En | MEDLINE | ID: mdl-34971568

We performed proteogenomic characterization of intrahepatic cholangiocarcinoma (iCCA) using paired tumor and adjacent liver tissues from 262 patients. Integrated proteogenomic analyses prioritized genetic aberrations and revealed hallmarks of iCCA pathogenesis. Aflatoxin signature was associated with tumor initiation, proliferation, and immune suppression. Mutation-associated signaling profiles revealed that TP53 and KRAS co-mutations may contribute to iCCA metastasis via the integrin-FAK-SRC pathway. FGFR2 fusions activated the Rho GTPase pathway and could be a potential source of neoantigens. Proteomic profiling identified four patient subgroups (S1-S4) with subgroup-specific biomarkers. These proteomic subgroups had distinct features in prognosis, genetic alterations, microenvironment dysregulation, tumor microbiota composition, and potential therapeutics. SLC16A3 and HKDC1 were further identified as potential prognostic biomarkers associated with metabolic reprogramming of iCCA cells. This study provides a valuable resource for researchers and clinicians to further identify molecular pathogenesis and therapeutic opportunities in iCCA.

Bile Duct Neoplasms/pathology , Bile Ducts, Intrahepatic/pathology , Cholangiocarcinoma/pathology , Liver/pathology , Proteogenomics , Bile Duct Neoplasms/genetics , Cholangiocarcinoma/genetics , Humans , Mutation/genetics , Prognosis , Proteogenomics/methods , Proteomics , Tumor Microenvironment/immunology

20.

Proteogenomic discovery of neoantigens facilitates personalized multi-antigen targeted T cell immunotherapy for brain tumors.

Rivero-Hinojosa, Samuel; Grant, Melanie; Panigrahi, Aswini; Zhang, Huizhen; Caisova, Veronika; Bollard, Catherine M; Rood, Brian R.

Nat Commun ; 12(1): 6689, 2021 11 18.

Article En | MEDLINE | ID: mdl-34795224

Neoantigen discovery in pediatric brain tumors is hampered by their low mutational burden and scant tissue availability. Here we develop a proteogenomic approach combining tumor DNA/RNA sequencing and mass spectrometry proteomics to identify tumor-restricted (neoantigen) peptides arising from multiple genomic aberrations to generate a highly target-specific, autologous, personalized T cell immunotherapy. Our data indicate that aberrant splice junctions are the primary source of neoantigens in medulloblastoma, a common pediatric brain tumor. Proteogenomically identified tumor-specific peptides are immunogenic and generate MHC II-based T cell responses. Moreover, polyclonal and polyfunctional T cells specific for tumor-specific peptides effectively eliminate tumor cells in vitro. Targeting tumor-specific antigens obviates the issue of central immune tolerance while potentially providing a safety margin favoring combination with other immune-activating therapies. These findings demonstrate the proteogenomic discovery of immunogenic tumor-specific peptides and lay the groundwork for personalized targeted T cell therapies for children with brain tumors.

Antigens, Neoplasm/immunology , Brain Neoplasms/therapy , Immunotherapy/methods , Precision Medicine/methods , Proteogenomics/methods , T-Lymphocytes/immunology , Brain Neoplasms/genetics , Brain Neoplasms/metabolism , Cell Line, Tumor , Cells, Cultured , Cerebellar Neoplasms/genetics , Cerebellar Neoplasms/metabolism , Cerebellar Neoplasms/therapy , Child , Chromatography, Liquid/methods , Computational Biology/methods , Humans , Mass Spectrometry/methods , Medulloblastoma/genetics , Medulloblastoma/metabolism , Medulloblastoma/therapy , Mutation , Peptides/analysis , Peptides/immunology , RNA-Seq/methods