RESUMO
Fewer than 200 proteins are targeted by cancer drugs approved by the Food and Drug Administration (FDA). We integrate Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteogenomics data from 1,043 patients across 10 cancer types with additional public datasets to identify potential therapeutic targets. Pan-cancer analysis of 2,863 druggable proteins reveals a wide abundance range and identifies biological factors that affect mRNA-protein correlation. Integration of proteomic data from tumors and genetic screen data from cell lines identifies protein overexpression- or hyperactivation-driven druggable dependencies, enabling accurate predictions of effective drug targets. Proteogenomic identification of synthetic lethality provides a strategy to target tumor suppressor gene loss. Combining proteogenomic analysis and MHC binding prediction prioritizes mutant KRAS peptides as promising public neoantigens. Computational identification of shared tumor-associated antigens followed by experimental confirmation nominates peptides as immunotherapy targets. These analyses, summarized at https://targets.linkedomics.org, form a comprehensive landscape of protein and peptide targets for companion diagnostics, drug repurposing, and therapy development.
Assuntos
Neoplasias , Proteogenômica , Humanos , Proteogenômica/métodos , Neoplasias/genética , Neoplasias/tratamento farmacológico , Neoplasias/terapia , Neoplasias/metabolismo , Terapia de Alvo Molecular , Imunoterapia/métodos , Antígenos de Neoplasias/metabolismo , Antígenos de Neoplasias/genética , Linhagem Celular Tumoral , Antineoplásicos/uso terapêutico , Antineoplásicos/farmacologia , Peptídeos/metabolismo , Proteômica , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismoRESUMO
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.
Assuntos
Adenocarcinoma/genética , Carcinoma Ductal Pancreático/genética , Neoplasias Pancreáticas/genética , Proteogenômica , Adenocarcinoma/diagnóstico , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Carcinoma Ductal Pancreático/diagnóstico , Estudos de Coortes , Células Endoteliais/metabolismo , Epigênese Genética , Feminino , Dosagem de Genes , Genoma Humano , Glicólise , Glicoproteínas/biossíntese , Humanos , Masculino , Pessoa de Meia-Idade , Terapia de Alvo Molecular , Neoplasias Pancreáticas/diagnóstico , Fenótipo , Fosfoproteínas/metabolismo , Fosforilação , Prognóstico , Proteínas Quinases/metabolismo , Proteoma/metabolismo , Especificidade por Substrato , Transcriptoma/genéticaRESUMO
The integration of mass spectrometry-based proteomics with next-generation DNA and RNA sequencing profiles tumors more comprehensively. Here this "proteogenomics" approach was applied to 122 treatment-naive primary breast cancers accrued to preserve post-translational modifications, including protein phosphorylation and acetylation. Proteogenomics challenged standard breast cancer diagnoses, provided detailed analysis of the ERBB2 amplicon, defined tumor subsets that could benefit from immune checkpoint therapy, and allowed more accurate assessment of Rb status for prediction of CDK4/6 inhibitor responsiveness. Phosphoproteomics profiles uncovered novel associations between tumor suppressor loss and targetable kinases. Acetylproteome analysis highlighted acetylation on key nuclear proteins involved in the DNA damage response and revealed cross-talk between cytoplasmic and mitochondrial acetylation and metabolism. Our results underscore the potential of proteogenomics for clinical investigation of breast cancer through more accurate annotation of targetable pathways and biological features of this remarkably heterogeneous malignancy.
Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Carcinogênese/genética , Carcinogênese/patologia , Terapia de Alvo Molecular , Proteogenômica , Desaminases APOBEC/metabolismo , Adulto , Idoso , Idoso de 80 Anos ou mais , Neoplasias da Mama/imunologia , Neoplasias da Mama/terapia , Estudos de Coortes , Dano ao DNA , Reparo do DNA , Feminino , Humanos , Imunoterapia , Metabolômica , Pessoa de Meia-Idade , Mutagênese/genética , Fosforilação , Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases/metabolismo , Receptor ErbB-2/metabolismo , Proteína do Retinoblastoma/metabolismo , Microambiente Tumoral/imunologiaRESUMO
We undertook a comprehensive proteogenomic characterization of 95 prospectively collected endometrial carcinomas, comprising 83 endometrioid and 12 serous tumors. This analysis revealed possible new consequences of perturbations to the p53 and Wnt/ß-catenin pathways, identified a potential role for circRNAs in the epithelial-mesenchymal transition, and provided new information about proteomic markers of clinical and genomic tumor subgroups, including relationships to known druggable pathways. An extensive genome-wide acetylation survey yielded insights into regulatory mechanisms linking Wnt signaling and histone acetylation. We also characterized aspects of the tumor immune landscape, including immunogenic alterations, neoantigens, common cancer/testis antigens, and the immune microenvironment, all of which can inform immunotherapy decisions. Collectively, our multi-omic analyses provide a valuable resource for researchers and clinicians, identify new molecular associations of potential mechanistic significance in the development of endometrial cancers, and suggest novel approaches for identifying potential therapeutic targets.
Assuntos
Carcinoma/genética , Neoplasias do Endométrio/genética , Regulação Neoplásica da Expressão Gênica , Proteoma/genética , Transcriptoma , Acetilação , Animais , Antígenos de Neoplasias/genética , Carcinoma/imunologia , Carcinoma/patologia , Neoplasias do Endométrio/imunologia , Neoplasias do Endométrio/patologia , Transição Epitelial-Mesenquimal/genética , Retroalimentação Fisiológica , Feminino , Instabilidade Genômica , Humanos , Camundongos , MicroRNAs/genética , MicroRNAs/metabolismo , Repetições de Microssatélites , Fosforilação , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Transdução de SinaisRESUMO
We performed the first proteogenomic study on a prospectively collected colon cancer cohort. Comparative proteomic and phosphoproteomic analysis of paired tumor and normal adjacent tissues produced a catalog of colon cancer-associated proteins and phosphosites, including known and putative new biomarkers, drug targets, and cancer/testis antigens. Proteogenomic integration not only prioritized genomically inferred targets, such as copy-number drivers and mutation-derived neoantigens, but also yielded novel findings. Phosphoproteomics data associated Rb phosphorylation with increased proliferation and decreased apoptosis in colon cancer, which explains why this classical tumor suppressor is amplified in colon tumors and suggests a rationale for targeting Rb phosphorylation in colon cancer. Proteomics identified an association between decreased CD8 T cell infiltration and increased glycolysis in microsatellite instability-high (MSI-H) tumors, suggesting glycolysis as a potential target to overcome the resistance of MSI-H tumors to immune checkpoint blockade. Proteogenomics presents new avenues for biological discoveries and therapeutic development.
Assuntos
Neoplasias do Colo/genética , Neoplasias do Colo/terapia , Proteogenômica/métodos , Apoptose/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo , Linfócitos T CD8-Positivos , Proliferação de Células/genética , Neoplasias do Colo/metabolismo , Genômica/métodos , Glicólise , Humanos , Instabilidade de Microssatélites , Mutação , Fosforilação , Estudos Prospectivos , Proteômica/métodos , Proteína do Retinoblastoma/genética , Proteína do Retinoblastoma/metabolismoRESUMO
Global phosphoproteomics experiments quantify tens of thousands of phosphorylation sites. However, data interpretation is hampered by our limited knowledge on functions, biological contexts, or precipitating enzymes of the phosphosites. This study establishes a repository of phosphosites with associated evidence in biomedical abstracts, using deep learning-based natural language processing techniques. Our model for illuminating the dark phosphoproteome through PubMed mining (IDPpub) was generated by fine-tuning BioBERT, a deep learning tool for biomedical text mining. Trained using sentences containing protein substrates and phosphorylation site positions from 3000 abstracts, the IDPpub model was then used to extract phosphorylation sites from all MEDLINE abstracts. The extracted proteins were normalized to gene symbols using the National Center for Biotechnology Information gene query, and sites were mapped to human UniProt sequences using ProtMapper and mouse UniProt sequences by direct match. Precision and recall were calculated using 150 curated abstracts, and utility was assessed by analyzing the CPTAC (Clinical Proteomics Tumor Analysis Consortium) pan-cancer phosphoproteomics datasets and the PhosphoSitePlus database. Using 10-fold cross validation, pairs of correct substrates and phosphosite positions were extracted with an average precision of 0.93 and recall of 0.94. After entity normalization and site mapping to human reference sequences, an independent validation achieved a precision of 0.91 and recall of 0.77. The IDPpub repository contains 18,458 unique human phosphorylation sites with evidence sentences from 58,227 abstracts and 5918 mouse sites in 14,610 abstracts. This included evidence sentences for 1803 sites identified in CPTAC studies that are not covered by manually curated functional information in PhosphoSitePlus. Evaluation results demonstrate the potential of IDPpub as an effective biomedical text mining tool for collecting phosphosites. Moreover, the repository (http://idppub.ptmax.org), which can be automatically updated, can serve as a powerful complement to existing resources.
Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Mineração de Dados/métodos , Bases de Dados Factuais , PubMedRESUMO
Enrichment analysis, crucial for interpreting genomic, transcriptomic, and proteomic data, is expanding into metabolomics. Furthermore, there is a rising demand for integrated enrichment analysis that combines data from different studies and omics platforms, as seen in meta-analysis and multi-omics research. To address these growing needs, we have updated WebGestalt to include enrichment analysis capabilities for both metabolites and multiple input lists of analytes. We have also significantly increased analysis speed, revamped the user interface, and introduced new pathway visualizations to accommodate these updates. Notably, the adoption of a Rust backend reduced gene set enrichment analysis time by 95% from 270.64 to 12.41 s and network topology-based analysis by 89% from 159.59 to 17.31 s in our evaluation. This performance improvement is also accessible in both the R package and a newly introduced Python package. Additionally, we have updated the data in the WebGestalt database to reflect the current status of each source and have expanded our collection of pathways, networks, and gene signatures. The 2024 WebGestalt update represents a significant leap forward, offering new support for metabolomics, streamlined multi-omics analysis capabilities, and remarkable performance enhancements. Discover these updates and more at https://www.webgestalt.org.
Assuntos
Metabolômica , Software , Metabolômica/métodos , Genômica/métodos , Humanos , Proteômica/métodos , Interface Usuário-Computador , Internet , Redes Reguladoras de Genes , Perfilação da Expressão Gênica/métodos , MultiômicaRESUMO
Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms.
Assuntos
Algoritmos , Biomarcadores Tumorais/metabolismo , Carcinoma Hepatocelular/metabolismo , Neoplasias Colorretais/metabolismo , Neoplasias Hepáticas/metabolismo , Proteínas de Neoplasias/metabolismo , Proteômica/métodos , Humanos , Espectrometria de Massas , Prognóstico , SoftwareRESUMO
Gene set analysis plays a critical role in the functional interpretation of omics data. Although this is typically done for one omics experiment at a time, there is an increasing need to combine gene set analysis results from multiple experiments performed on the same or different omics platforms, such as in multi-omics studies. Integrating results from multiple experiments is challenging, and annotation redundancy between gene sets further obscures clear conclusions. We propose to use a weighted set cover algorithm to reduce redundancy of gene sets identified in a single experiment. Next, we use affinity propagation to consolidate similar gene sets identified from multiple experiments into clusters and to automatically determine the most representative gene set for each cluster. Using three examples from over representation analysis and gene set enrichment analysis, we showed that weighted set cover outperformed a previously published set cover method and reduced the number of gene sets by 52-77%. Focusing on overlapping genes between the list of input genes and the enriched gene sets in over-representation analysis and leading-edge genes in gene set enrichment analysis further reduced the number of gene sets. A use case combining enrichment analysis results from RNA-Seq and proteomics data comparing basal and luminal A breast cancer samples highlighted the known difference in proliferation and DNA damage response. Finally, we used these algorithms for a pan-cancer survival analysis. Our analysis clearly revealed prognosis-related pathways common to multiple cancer types or specific to individual cancer types, as well as pathways associated with prognosis in different directions in different cancer types. We implemented these two algorithms in an R package, Sumer, which generates tables and static and interactive plots for exploration and publication. Sumer is publicly available at https://github.com/bzhanglab/sumer.
Assuntos
Algoritmos , Genômica/métodos , Neoplasias da Mama/genética , Neoplasias Colorretais/genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Proteínas de Neoplasias/genética , RNA-SeqRESUMO
WebGestalt is a popular tool for the interpretation of gene lists derived from large scale -omics studies. In the 2019 update, WebGestalt supports 12 organisms, 342 gene identifiers and 155 175 functional categories, as well as user-uploaded functional databases. To address the growing and unique need for phosphoproteomics data interpretation, we have implemented phosphosite set analysis to identify important kinases from phosphoproteomics data. We have completely redesigned result visualizations and user interfaces to improve user-friendliness and to provide multiple types of interactive and publication-ready figures. To facilitate comprehension of the enrichment results, we have implemented two methods to reduce redundancy between enriched gene sets. We introduced a web API for other applications to get data programmatically from the WebGestalt server or pass data to WebGestalt for analysis. We also wrapped the core computation into an R package called WebGestaltR for users to perform analysis locally or in third party workflows. WebGestalt can be freely accessed at http://www.webgestalt.org.
Assuntos
Bases de Dados Genéticas , Software , Conjuntos de Dados como Assunto , Interface Usuário-Computador , NavegadorRESUMO
Proteomics, the study of all the proteins in biological systems, is becoming a data-rich science. Protein sequences and structures are comprehensively catalogued in online databases. With recent advancements in tandem mass spectrometry (MS) technology, protein expression and post-translational modifications (PTMs) can be studied in a variety of biological systems at the global scale. Sophisticated computational algorithms are needed to translate the vast amount of data into novel biological insights. Deep learning automatically extracts data representations at high levels of abstraction from data, and it thrives in data-rich scientific research domains. Here, a comprehensive overview of deep learning applications in proteomics, including retention time prediction, MS/MS spectrum prediction, de novo peptide sequencing, PTM prediction, major histocompatibility complex-peptide binding prediction, and protein structure prediction, is provided. Limitations and the future directions of deep learning in proteomics are also discussed. This review will provide readers an overview of deep learning and how it can be used to analyze proteomics data.
Assuntos
Aprendizado Profundo , Proteômica , Algoritmos , Processamento de Proteína Pós-Traducional , Espectrometria de Massas em TandemRESUMO
Extensive genomic characterization of human cancers presents the problem of inference from genomic abnormalities to cancer phenotypes. To address this problem, we analysed proteomes of colon and rectal tumours characterized previously by The Cancer Genome Atlas (TCGA) and perform integrated proteogenomic analyses. Somatic variants displayed reduced protein abundance compared to germline variants. Messenger RNA transcript abundance did not reliably predict protein abundance differences between tumours. Proteomics identified five proteomic subtypes in the TCGA cohort, two of which overlapped with the TCGA 'microsatellite instability/CpG island methylation phenotype' transcriptomic subtype, but had distinct mutation, methylation and protein expression patterns associated with different clinical outcomes. Although copy number alterations showed strong cis- and trans-effects on mRNA abundance, relatively few of these extend to the protein level. Thus, proteomics data enabled prioritization of candidate driver genes. The chromosome 20q amplicon was associated with the largest global changes at both mRNA and protein levels; proteomics data highlighted potential 20q candidates, including HNF4A (hepatocyte nuclear factor 4, alpha), TOMM34 (translocase of outer mitochondrial membrane 34) and SRC (SRC proto-oncogene, non-receptor tyrosine kinase). Integrated proteogenomic analysis provides functional context to interpret genomic abnormalities and affords a new paradigm for understanding cancer biology.
Assuntos
Neoplasias do Colo/genética , Neoplasias do Colo/metabolismo , Genômica , Proteoma/metabolismo , Neoplasias Retais/genética , Neoplasias Retais/metabolismo , Transcriptoma/genética , Cromossomos Humanos Par 20/genética , Ilhas de CpG/genética , Variações do Número de Cópias de DNA/genética , Metilação de DNA , Fator 4 Nuclear de Hepatócito/genética , Humanos , Repetições de Microssatélites/genética , Proteínas de Transporte da Membrana Mitocondrial/genética , Proteínas do Complexo de Importação de Proteína Precursora Mitocondrial , Mutação de Sentido Incorreto/genética , Proteínas de Neoplasias/análise , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Mutação Puntual/genética , Proteoma/análise , Proteoma/genética , Proteômica , Proto-Oncogene Mas , Proteínas Proto-Oncogênicas pp60(c-src)/genética , RNA Mensageiro/análise , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA Neoplásico/análise , RNA Neoplásico/genética , RNA Neoplásico/metabolismoRESUMO
Functional enrichment analysis has played a key role in the biological interpretation of high-throughput omics data. As a long-standing and widely used web application for functional enrichment analysis, WebGestalt has been constantly updated to satisfy the needs of biologists from different research areas. WebGestalt 2017 supports 12 organisms, 324 gene identifiers from various databases and technology platforms, and 150 937 functional categories from public databases and computational analyses. Omics data with gene identifiers not supported by WebGestalt and functional categories not included in the WebGestalt database can also be uploaded for enrichment analysis. In addition to the Over-Representation Analysis in the previous versions, Gene Set Enrichment Analysis and Network Topology-based Analysis have been added to WebGestalt 2017, providing complementary approaches to the interpretation of high-throughput omics data. The new user-friendly output interface and the GOView tool allow interactive and efficient exploration and comparison of enrichment results. Thus, WebGestalt 2017 enables more comprehensive, powerful, flexible and interactive functional enrichment analysis. It is freely available at http://www.webgestalt.org.
Assuntos
Genes , Software , Animais , Bovinos , Humanos , Internet , Camundongos , Neoplasias/genética , Ratos , Interface Usuário-ComputadorRESUMO
BACKGROUND AND AIMS: Proteomics holds promise for individualizing cancer treatment. We analyzed to what extent the proteomic landscape of human colorectal cancer (CRC) is maintained in established CRC cell lines and the utility of proteomics for predicting therapeutic responses. METHODS: Proteomic and transcriptomic analyses were performed on 44 CRC cell lines, compared against primary CRCs (n=95) and normal tissues (n=60), and integrated with genomic and drug sensitivity data. RESULTS: Cell lines mirrored the proteomic aberrations of primary tumors, in particular for intrinsic programs. Tumor relationships of protein expression with DNA copy number aberrations and signatures of post-transcriptional regulation were recapitulated in cell lines. The 5 proteomic subtypes previously identified in tumors were represented among cell lines. Nonetheless, systematic differences between cell line and tumor proteomes were apparent, attributable to stroma, extrinsic signaling, and growth conditions. Contribution of tumor stroma obscured signatures of DNA mismatch repair identified in cell lines with a hypermutation phenotype. Global proteomic data showed improved utility for predicting both known drug-target relationships and overall drug sensitivity as compared with genomic or transcriptomic measurements. Inhibition of targetable proteins associated with drug responses further identified corresponding synergistic or antagonistic drug combinations. Our data provide evidence for CRC proteomic subtype-specific drug responses. CONCLUSIONS: Proteomes of established CRC cell line are representative of primary tumors. Proteomic data tend to exhibit improved prediction of drug sensitivity as compared with genomic and transcriptomic profiles. Our integrative proteogenomic analysis highlights the potential of proteome profiling to inform personalized cancer medicine.
Assuntos
Antineoplásicos/farmacologia , Biomarcadores Tumorais/metabolismo , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/metabolismo , Proteínas de Neoplasias/metabolismo , Medicina de Precisão , Proteoma , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Cromatografia Líquida , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Bases de Dados de Proteínas , Relação Dose-Resposta a Droga , Ensaios de Seleção de Medicamentos Antitumorais , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Mutação , Proteínas de Neoplasias/genética , Seleção de Pacientes , Polimorfismo de Nucleotídeo Único , Proteômica/métodos , Transdução de Sinais , Células Estromais/metabolismo , Espectrometria de Massas em Tandem , Transcriptoma , Microambiente TumoralRESUMO
MOTIVATION: Recent completion of the global proteomic characterization of The Cancer Genome Atlas (TCGA) colorectal cancer (CRC) cohort resulted in the first tumor dataset with complete molecular measurements at DNA, RNA and protein levels. Using CRC as a paradigm, we describe the application of the NetGestalt framework to provide easy access and interpretation of multi-omics data. RESULTS: The NetGestalt CRC portal includes genomic, epigenomic, transcriptomic, proteomic and clinical data for the TCGA CRC cohort, data from other CRC tumor cohorts and cell lines, and existing knowledge on pathways and networks, giving a total of more than 17 million data points. The portal provides features for data query, upload, visualization and integration. These features can be flexibly combined to serve various needs of the users, maximizing the synergy among omics data, human visualization and quantitative analysis. Using three case studies, we demonstrate that the portal not only provides user-friendly data query and visualization but also enables efficient data integration within a single omics data type, across multiple omics data types, and over biological networks. AVAILABILITY AND IMPLEMENTATION: The NetGestalt CRC portal can be freely accessed at http://www.netgestalt.org. CONTACT: bing.zhang@vanderbilt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Neoplasias Colorretais/genética , Genômica , Proteômica , Software , Neoplasias Colorretais/metabolismo , Epigênese Genética , Redes Reguladoras de Genes , Inativação Gênica , Humanos , Neoplasias , TranscriptomaRESUMO
Functional enrichment analysis is an essential task for the interpretation of gene lists derived from large-scale genetic, transcriptomic and proteomic studies. WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) has become one of the popular software tools in this field since its publication in 2005. For the last 7 years, WebGestalt data holdings have grown substantially to satisfy the requirements of users from different research areas. The current version of WebGestalt supports 8 organisms and 201 gene identifiers from various databases and different technology platforms, making it directly available to the fast growing omics community. Meanwhile, by integrating functional categories derived from centrally and publicly curated databases as well as computational analyses, WebGestalt has significantly increased the coverage of functional categories in various biological contexts including Gene Ontology, pathway, network module, gene-phenotype association, gene-disease association, gene-drug association and chromosomal location, leading to a total of 78 612 functional categories. Finally, new interactive features, such as pathway map, hierarchical network visualization and phenotype ontology visualization have been added to WebGestalt to help users better understand the enrichment results. WebGestalt can be freely accessed through http://www.webgestalt.org or http://bioinfo.vanderbilt.edu/webgestalt/.
Assuntos
Genes , Software , Perfilação da Expressão Gênica , Genômica , Humanos , Internet , Mapeamento de Interação de Proteínas , ProteômicaRESUMO
By combining mass-spectrometry-based proteomics and phosphoproteomics with genomics, epi-genomics, and transcriptomics, proteogenomics provides comprehensive molecular characterization of cancer. Using this approach, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) has characterized over 1,000 primary tumors spanning 10 cancer types, many with matched normal tissues. Here, we present LinkedOmicsKB, a proteogenomics data-driven knowledge base that makes consistently processed and systematically precomputed CPTAC pan-cancer proteogenomics data available to the public through â¼40,000 gene-, protein-, mutation-, and phenotype-centric web pages. Visualization techniques facilitate efficient exploration and reasoning of complex, interconnected data. Using three case studies, we illustrate the practical utility of LinkedOmicsKB in providing new insights into genes, phosphorylation sites, somatic mutations, and cancer phenotypes. With precomputed results of 19,701 coding genes, 125,969 phosphosites, and 256 genotypes and phenotypes, LinkedOmicsKB provides a comprehensive resource to accelerate proteogenomics data-driven discoveries to improve our understanding and treatment of human cancer. A record of this paper's transparent peer review process is included in the supplemental information.
Assuntos
Neoplasias , Proteogenômica , Humanos , Proteômica , Proteogenômica/métodos , Genômica , Neoplasias/genética , Bases de ConhecimentoRESUMO
BACKGROUND: Answering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors. RESULTS: We report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis. CONCLUSIONS: GLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u.
Assuntos
Software , Bases de Dados Factuais , Armazenamento e Recuperação da Informação , Internet , PubMed , Interface Usuário-ComputadorRESUMO
Microscaled proteogenomics was deployed to probe the molecular basis for differential response to neoadjuvant carboplatin and docetaxel combination chemotherapy for triple-negative breast cancer (TNBC). Proteomic analyses of pretreatment patient biopsies uniquely revealed metabolic pathways, including oxidative phosphorylation, adipogenesis, and fatty acid metabolism, that were associated with resistance. Both proteomics and transcriptomics revealed that sensitivity was marked by elevation of DNA repair, E2F targets, G2-M checkpoint, interferon-gamma signaling, and immune-checkpoint components. Proteogenomic analyses of somatic copy-number aberrations identified a resistance-associated 19q13.31-33 deletion where LIG1, POLD1, and XRCC1 are located. In orthogonal datasets, LIG1 (DNA ligase I) gene deletion and/or low mRNA expression levels were associated with lack of pathologic complete response, higher chromosomal instability index (CIN), and poor prognosis in TNBC, as well as carboplatin-selective resistance in TNBC preclinical models. Hemizygous loss of LIG1 was also associated with higher CIN and poor prognosis in other cancer types, demonstrating broader clinical implications. SIGNIFICANCE: Proteogenomic analysis of triple-negative breast tumors revealed a complex landscape of chemotherapy response associations, including a 19q13.31-33 somatic deletion encoding genes serving lagging-strand DNA synthesis (LIG1, POLD1, and XRCC1), that correlate with lack of pathologic response, carboplatin-selective resistance, and, in pan-cancer studies, poor prognosis and CIN. This article is highlighted in the In This Issue feature, p. 2483.