ABSTRACT
Cancer cells enter a reversible drug-tolerant persister (DTP) state to evade death from chemotherapy and targeted agents. It is increasingly appreciated that DTPs are important drivers of therapy failure and tumor relapse. We combined cellular barcoding and mathematical modeling in patient-derived colorectal cancer models to identify and characterize DTPs in response to chemotherapy. Barcode analysis revealed no loss of clonal complexity of tumors that entered the DTP state and recurred following treatment cessation. Our data fit a mathematical model where all cancer cells, and not a small subpopulation, possess an equipotent capacity to become DTPs. Mechanistically, we determined that DTPs display remarkable transcriptional and functional similarities to diapause, a reversible state of suspended embryonic development triggered by unfavorable environmental conditions. Our study provides insight into how cancer cells use a developmentally conserved mechanism to drive the DTP state, pointing to novel therapeutic opportunities to target DTPs.
Subject(s)
Antineoplastic Agents/therapeutic use , Colorectal Neoplasms/drug therapy , Diapause , Drug Resistance, Neoplasm , Animals , Antineoplastic Agents/pharmacology , Autophagy/drug effects , Autophagy/genetics , Cell Line, Tumor , Clone Cells , Colorectal Neoplasms/genetics , Colorectal Neoplasms/pathology , Drug Resistance, Neoplasm/drug effects , Embryo, Mammalian/drug effects , Embryo, Mammalian/metabolism , Gene Expression Profiling , Gene Expression Regulation, Neoplastic/drug effects , Genetic Heterogeneity/drug effects , Humans , Irinotecan/pharmacology , Irinotecan/therapeutic use , Mice, Inbred NOD , Mice, SCID , Models, Biological , Signal Transduction/drug effects , Up-Regulation/drug effects , Up-Regulation/genetics , Xenograft Model Antitumor AssaysABSTRACT
Pancreatic ductal adenocarcinoma (PDAC) is a highly aggressive cancer with poor patient survival. Toward understanding the underlying molecular alterations that drive PDAC oncogenesis, we conducted comprehensive proteogenomic analysis of 140 pancreatic cancers, 67 normal adjacent tissues, and 9 normal pancreatic ductal tissues. Proteomic, phosphoproteomic, and glycoproteomic analyses were used to characterize proteins and their modifications. In addition, whole-genome sequencing, whole-exome sequencing, methylation, RNA sequencing (RNA-seq), and microRNA sequencing (miRNA-seq) were performed on the same tissues to facilitate an integrated proteogenomic analysis and determine the impact of genomic alterations on protein expression, signaling pathways, and post-translational modifications. To ensure robust downstream analyses, tumor neoplastic cellularity was assessed via multiple orthogonal strategies using molecular features and verified via pathological estimation of tumor cellularity based on histological review. This integrated proteogenomic characterization of PDAC will serve as a valuable resource for the community, paving the way for early detection and identification of novel therapeutic targets.
Subject(s)
Adenocarcinoma/genetics , Carcinoma, Pancreatic Ductal/genetics , Pancreatic Neoplasms/genetics , Proteogenomics , Adenocarcinoma/diagnosis , Adult , Aged , Aged, 80 and over , Algorithms , Carcinoma, Pancreatic Ductal/diagnosis , Cohort Studies , Endothelial Cells/metabolism , Epigenesis, Genetic , Female , Gene Dosage , Genome, Human , Glycolysis , Glycoproteins/biosynthesis , Humans , Male , Middle Aged , Molecular Targeted Therapy , Pancreatic Neoplasms/diagnosis , Phenotype , Phosphoproteins/metabolism , Phosphorylation , Prognosis , Protein Kinases/metabolism , Proteome/metabolism , Substrate Specificity , Transcriptome/geneticsABSTRACT
Solitary fibrous tumor (SFT) is a rare mesenchymal neoplasm which can arise at any anatomic site and is characterized by recurrent NAB2::STAT6 fusions and metastatic progression in 10-30%. The cell of origin has not been identified. Despite some progress in understanding the contribution of heterogeneous fusion types and secondary mutations to SFT biology, epigenetic alterations in extrameningeal SFT remain largely unexplored, and most sarcoma research to date has focused on the use of methylation profiling for tumor classification. We interrogated genome-wide DNA methylation in 79 SFTs to identify informative epigenetic changes. RNA-seq data from targeted panels and data from the Cancer Genome Atlas (TCGA) were used for orthogonal validation of selected findings. In unsupervised clustering analysis, the top 500 most variable CpGs segregated SFTs by primary anatomic site. Differentially methylated genes (DMGs) associated with primary SFT site included EGFR, TBX15, multiple HOX genes and their cofactors EBF1, EBF3, and PBX1, as well as RUNX1 and MEIS1. Of the 20 DMGs that were interrogated on the RNA-seq panel, twelve were significantly differentially expressed according to site. However, with the exception of TBX15, most of these also showed differential expression according to NAB2::STAT6 fusion type, suggesting that the fusion oncogene contributes to transcriptional regulation of these genes. Transcriptomic data confirmed an inverse correlation between gene methylation and the expression of TBX15 in both SFT and TCGA sarcomas. TBX15 also showed differential mRNA expression and 5' UTR methylation between tumors located in different anatomic sites in TCGA data. In all analyses, TBX15 methylation and mRNA expression retained the strongest association with tissue of origin in SFT and other sarcomas, suggesting a possible marker to distinguish metastatic tumors from new primaries without genomic profiling. Epigenetic signatures may further help to identify SFT progenitor cells at different anatomic sites.
ABSTRACT
Triple-negative breast cancer (TNBC) is the most aggressive breast cancer subtype with the worst prognosis and few effective therapies. Here we identified MS023, an inhibitor of type I protein arginine methyltransferases (PRMTs), which has antitumor growth activity in TNBC. Pathway analysis of TNBC cell lines indicates that the activation of interferon responses before and after MS023 treatment is a functional biomarker and determinant of response, and these observations extend to a panel of human-derived organoids. Inhibition of type I PRMT triggers an interferon response through the antiviral defense pathway with the induction of double-stranded RNA, which is derived, at least in part, from inverted repeat Alu elements. Together, our results represent a shift in understanding the antitumor mechanism of type I PRMT inhibitors and provide a rationale and biomarker approach for the clinical development of type I PRMT inhibitors.
Subject(s)
Protein-Arginine N-Methyltransferases , Triple Negative Breast Neoplasms , Biomarkers , Cell Line, Tumor , Humans , Interferons/therapeutic use , Protein-Arginine N-Methyltransferases/antagonists & inhibitors , Protein-Arginine N-Methyltransferases/metabolism , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/metabolismABSTRACT
Cancer pharmacogenomics studies provide valuable insights into disease progression and associations between genomic features and drug response. PharmacoDB integrates multiple cancer pharmacogenomics datasets profiling approved and investigational drugs across cell lines from diverse tissue types. The web-application enables users to efficiently navigate across datasets, view and compare drug dose-response data for a specific drug-cell line pair. In the new version of PharmacoDB (version 2.0, https://pharmacodb.ca/), we present (i) new datasets such as NCI-60, the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) dataset, as well as updated data from the Genomics of Drug Sensitivity in Cancer (GDSC) and the Genentech Cell Line Screening Initiative (gCSI); (ii) implementation of FAIR data pipelines using ORCESTRA and PharmacoDI; (iii) enhancements to drug-response analysis such as tissue distribution of dose-response metrics and biomarker analysis; and (iv) improved connectivity to drug and cell line databases in the community. The web interface has been rewritten using a modern technology stack to ensure scalability and standardization to accommodate growing pharmacogenomics datasets. PharmacoDB 2.0 is a valuable tool for mining pharmacogenomics datasets, comparing and assessing drug-response phenotypes of cancer models.
Subject(s)
Databases, Genetic , Pharmacogenetics/standards , Pharmacogenomic Testing/methods , Software , Genomics/methods , HumansABSTRACT
The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.
Subject(s)
Drug Resistance, Neoplasm , Machine Learning , Pharmacogenetics , Algorithms , Cell Line, Tumor , Datasets as Topic , HumansABSTRACT
BACKGROUND: Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. RESULTS: To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. CONCLUSIONS: We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.
Subject(s)
Models, Statistical , Computer Simulation , Drug Evaluation, Preclinical , HumansABSTRACT
Cellular identity relies on cell-type-specific gene expression controlled at the transcriptional level by cis-regulatory elements (CREs). CREs are unevenly distributed across the genome, giving rise to individual CREs and clusters of CREs (COREs). Technical and biological features hinder CORE identification. We addressed these issues by developing an unsupervised machine learning approach termed clustering of genomic regions analysis method (CREAM). CREAM automates CORE detection from chromatin accessibility profiles that are enriched in CREs strongly bound by master transcription regulators, proximal to highly expressed and essential genes, and discriminating cell identity. Although COREs share similarities with super-enhancers, we highlight differences in terms of the genomic distribution and structure of these cis-regulatory units. We further show the enhanced value of COREs over super-enhancers to identify master transcription regulators, highly expressed and essential genes defining cell identity. COREs enrich at topologically associated domain (TAD) boundaries. They are also preferentially bound by the chromatin looping factors CTCF and cohesin, in contrast to super-enhancers, forming clusters of CTCF and cohesin binding regions and defining homotypic clusters of transcription regulator binding regions (HCTs). Finally, we show the clinical utility of CREAM to identify COREs across chromatin accessibility profiles to stratify more than 400 tumor samples according to their cancer type and to delineate cancer type-specific active biological pathways. Collectively, our results support the utility of CREAM to delineate COREs underlying, with greater accuracy than individual CREs or super-enhancers, the cell-type-specific biological underpinning across a wide range of normal and cancer cell types.
Subject(s)
CCCTC-Binding Factor/genetics , Enhancer Elements, Genetic , Neoplasms/genetics , Regulatory Elements, Transcriptional/genetics , Cell Cycle Proteins/genetics , Cell Line, Tumor , Cell Lineage/genetics , Chromatin/genetics , Chromosomal Proteins, Non-Histone/genetics , Gene Expression Regulation/genetics , Genomics , Humans , CohesinsABSTRACT
Drug-combination data portals have recently been introduced to mine huge amounts of pharmacological data with the aim of improving current chemotherapy strategies. However, these portals have only been investigated for isolated datasets, and molecular profiles of cancer cell lines are lacking. Here we developed a cloud-based pharmacogenomics portal called SYNERGxDB (http://SYNERGxDB.ca/) that integrates multiple high-throughput drug-combination studies with molecular and pharmacological profiles of a large panel of cancer cell lines. This portal enables the identification of synergistic drug combinations through harmonization and unified computational analysis. We integrated nine of the largest drug combination datasets from both academic groups and pharmaceutical companies, resulting in 22 507 unique drug combinations (1977 unique compounds) screened against 151 cancer cell lines. This data compendium includes metabolomics, gene expression, copy number and mutation profiles of the cancer cell lines. In addition, SYNERGxDB provides analytical tools to discover effective therapeutic combinations and predictive biomarkers across cancer, including specific types. Combining molecular and pharmacological profiles, we systematically explored the large space of univariate predictors of drug synergism. SYNERGxDB constitutes a comprehensive resource that opens new avenues of research for exploring the mechanism of action for drug synergy with the potential of identifying new treatment strategies for cancer patients.
Subject(s)
Antineoplastic Combined Chemotherapy Protocols/pharmacology , Pharmacogenomic Testing , Software , Cell Line, Tumor , Drug Synergism , Gene Dosage , Genetic Variation , Humans , MetabolomicsABSTRACT
In the past few decades, major initiatives have been launched around the world to address chemical safety testing. These efforts aim to innovate and improve the efficacy of existing methods with the long-term goal of developing new risk assessment paradigms. The transcriptomic and toxicological profiling of mammalian cells has resulted in the creation of multiple toxicogenomic datasets and corresponding tools for analysis. To enable easy access and analysis of these valuable toxicogenomic data, we have developed ToxicoDB (toxicodb.ca), a free and open cloud-based platform integrating data from large in vitro toxicogenomic studies, including gene expression profiles of primary human and rat hepatocytes treated with 231 potential toxicants. To efficiently mine these complex toxicogenomic data, ToxicoDB provides users with harmonized chemical annotations, time- and dose-dependent plots of compounds across datasets, as well as the toxicity-related pathway analysis. The data in ToxicoDB have been generated using our open-source R package, ToxicoGx (github.com/bhklab/ToxicoGx). Altogether, ToxicoDB provides a streamlined process for mining highly organized, curated, and accessible toxicogenomic data that can be ultimately applied to preclinical toxicity studies and further our understanding of adverse outcomes.
Subject(s)
Databases, Genetic , Software , Toxicogenetics/methods , Acetaminophen/toxicity , Animals , Computer Graphics , DNA/biosynthesis , Data Mining , Gene Expression/drug effects , Hepatocytes/drug effects , Hepatocytes/metabolism , Humans , Nucleic Acid Synthesis Inhibitors/toxicity , RatsABSTRACT
Drug combinations have been proposed as a promising therapeutic strategy to overcome drug resistance and improve efficacy of monotherapy regimens in cancer. This strategy aims at targeting multiple components of this complex disease. Despite the increasing number of drug combinations in use, many of them were empirically found in the clinic, and the molecular mechanisms underlying these drug combinations are often unclear. These challenges call for rational, systematic approaches for drug combination discovery. Although high-throughput screening of single-agent therapeutics has been successfully implemented, it is not feasible to test all possible drug combinations, even for a reduced subset of anticancer drugs. Hence, in vitro and in vivo screening of a large number of drug combinations are not practical. Therefore, devising computational methods to efficiently explore the space of drug combinations and to discover efficacious combinations has attracted a lot of attention from the scientific community in the past few years. Nevertheless, in the absence of consensus regarding the computational approaches used to predict efficacious drug combinations, a plethora of methods, techniques and hypotheses have been developed to date, while the research field lacks an elaborate categorization of the existing computational methods and the available data sources. In this manuscript, we review and categorize the state-of-the-art computational approaches for drug combination prediction, and elaborate on the limitations of these methods and the existing challenges. We also discuss about the recent pan-cancer drug combination data sets and their importance in revising the available methods or developing more performant approaches.
Subject(s)
Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Computational Biology/methods , Drug Discovery , Neoplasms/drug therapy , Animals , HumansABSTRACT
Large-scale perturbation databases, such as Connectivity Map (CMap) or Library of Integrated Network-based Cellular Signatures (LINCS), provide enormous opportunities for computational pharmacogenomics and drug design. A reason for this is that in contrast to classical pharmacology focusing at one target at a time, the transcriptomics profiles provided by CMap and LINCS open the door for systems biology approaches on the pathway and network level. In this article, we provide a review of recent developments in computational pharmacogenomics with respect to CMap and LINCS and related applications.
Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Pharmacogenetics , Small Molecule Libraries/pharmacology , Transcriptome , Databases, Factual , Gene Regulatory Networks , HumansABSTRACT
MOTIVATION: Individualized drug response prediction is a fundamental part of personalized medicine for cancer. Great effort has been made to discover biomarkers or to develop machine learning methods for accurate drug response prediction in cancers. Incorporating prior knowledge of biological systems into these methods is a promising avenue to improve prediction performance. High-throughput cell line assays of drug-induced transcriptomic perturbation effects are a prior knowledge that has not been fully incorporated into a drug response prediction model yet. RESULTS: We introduce a unified probabilistic approach, Drug Response Variational Autoencoder (Dr.VAE), that simultaneously models both drug response in terms of viability and transcriptomic perturbations. Dr.VAE is a deep generative model based on variational autoencoders. Our experimental results showed Dr.VAE to do as well or outperform standard classification methods for 23 out of 26 tested Food and Drug Administration-approved drugs. In a series of ablation experiments we showed that the observed improvement of Dr.VAE can be credited to the incorporation of drug-induced perturbation effects with joint modeling of treatment sensitivity. AVAILABILITY AND IMPLEMENTATION: Processed data and software implementation using PyTorch (Paszke et al., 2017) are available at: https://github.com/rampasek/DrVAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Software , Humans , Machine Learning , Neoplasms , Precision MedicineABSTRACT
MOTIVATION: High-throughput molecular profiles of human cells have been used in predictive computational approaches for stratification of healthy and malignant phenotypes and identification of their biological states. In this regard, pathway activities have been used as biological features in unsupervised and supervised learning schemes. RESULTS: We developed SIGN (Similarity Identification in Gene expressioN), a flexible open-source R package facilitating the use of pathway activities and their expression patterns to identify similarities between biological samples. We defined a new measure, the transcriptional similarity coefficient, which captures similarity of gene expression patterns, instead of quantifying overall activity, in biological pathways between the samples. To demonstrate the utility of SIGN in biomedical research, we establish that SIGN discriminates subtypes of breast tumors and patients with good or poor overall survival. SIGN outperforms the best models in DREAM challenge in predicting survival of breast cancer patients using the data from the Molecular Taxonomy of Breast Cancer International Consortium. In summary, SIGN can be used as a new tool for interrogating pathway activity and gene expression patterns in unsupervised and supervised learning schemes to improve prognostic risk estimation for cancer patients by the biomedical research community. AVAILABILITY AND IMPLEMENTATION: An open-source R package is available (https://cran.r-project.org/web/packages/SIGN/).
Subject(s)
Gene Expression , Software , Breast Neoplasms , HumansABSTRACT
Pancreatic ductal adenocarcinoma (PDAC) has the worst prognosis among solid malignancies and improved therapeutic strategies are needed to improve outcomes. Patient-derived xenografts (PDX) and patient-derived organoids (PDO) serve as promising tools to identify new drugs with therapeutic potential in PDAC. For these preclinical disease models to be effective, they should both recapitulate the molecular heterogeneity of PDAC and validate patient-specific therapeutic sensitivities. To date however, deep characterization of the molecular heterogeneity of PDAC PDX and PDO models and comparison with matched human tumour remains largely unaddressed at the whole genome level. We conducted a comprehensive assessment of the genetic landscape of 16 whole-genome pairs of tumours and matched PDX, from primary PDAC and liver metastasis, including a unique cohort of 5 'trios' of matched primary tumour, PDX, and PDO. We developed a pipeline to score concordance between PDAC models and their paired human tumours for genomic events, including mutations, structural variations, and copy number variations. Tumour-model comparisons of mutations displayed single-gene concordance across major PDAC driver genes, but relatively poor agreement across the greater mutational load. Genome-wide and chromosome-centric analysis of structural variation (SV) events highlights previously unrecognized concordance across chromosomes that demonstrate clustered SV events. We found that polyploidy presented a major challenge when assessing copy number changes; however, ploidy-corrected copy number states suggest good agreement between donor-model pairs. Collectively, our investigations highlight that while PDXs and PDOs may serve as tractable and transplantable systems for probing the molecular properties of PDAC, these models may best serve selective analyses across different levels of genomic complexity.
Subject(s)
Carcinoma, Pancreatic Ductal/genetics , Genome/genetics , Models, Biological , Neoplasms, Experimental/genetics , Pancreatic Neoplasms/genetics , Animals , Biomedical Research/standards , Humans , Pancreas/pathologyABSTRACT
Recent cancer pharmacogenomic studies profiled large panels of cell lines against hundreds of approved drugs and experimental chemical compounds. The overarching goal of these screens is to measure sensitivity of cell lines to chemical perturbations, correlate these measures to genomic features, and thereby develop novel predictors of drug response. However, leveraging these valuable data is challenging due to the lack of standards for annotating cell lines and chemical compounds, and quantifying drug response. Moreover, it has been recently shown that the complexity and complementarity of the experimental protocols used in the field result in high levels of technical and biological variation in the in vitro pharmacological profiles. There is therefore a need for new tools to facilitate rigorous comparison and integrative analysis of large-scale drug screening datasets. To address this issue, we have developed PharmacoDB (pharmacodb.pmgenomics.ca), a database integrating the largest cancer pharmacogenomic studies published to date. Here, we describe how the curation of cell line and chemical compound identifiers maximizes the overlap between datasets and how users can leverage such data to compare and extract robust drug phenotypes. PharmacoDB provides a unique resource to mine a compendium of curated cancer pharmacogenomic datasets that are otherwise disparate and difficult to integrate.
Subject(s)
Databases, Pharmaceutical , Drug Screening Assays, Antitumor , Pharmacogenomic Testing , Antineoplastic Agents/pharmacology , Cell Line, Tumor , Data Mining , Dose-Response Relationship, Drug , Humans , User-Computer InterfaceABSTRACT
There has been a paradigm shift in translational oncology with the advent of novel molecular diagnostic tools in the clinic. However, several challenges are associated with the integration of these sophisticated tools into clinical oncology and daily practice. High-throughput profiling at the DNA, RNA and protein levels (omics) generate a massive amount of data. The analysis and interpretation of these is non-trivial but will allow a more thorough understanding of cancer. Linear modelling of the data as it is often used today is likely to limit our understanding of cancer as a complex disease, and at times under-performs to capture a phenotype of interest. Network science and systems biology-based approaches, using machine learning and network science principles, that integrate multiple data sources, can uncover complex changes in a biological system. This approach will integrate a large number of potential biomarkers in preclinical studies to better inform therapeutic decisions and ultimately make substantial progress towards precision medicine. It will however require development of a new generation of clinical trials. Beyond discussing the challenges of high-throughput technologies, this review will develop a framework on how to implement a network science approach in new clinical trial designs in order to advance cancer care.
Subject(s)
Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Precision Medicine/methods , Animals , Clinical Trials as Topic , Humans , Medical Oncology/methods , Neoplasms/therapyABSTRACT
Artificial intelligence (AI) is currently regaining enormous interest due to the success of machine learning (ML), and in particular deep learning (DL). Image analysis, and thus radiomics, strongly benefits from this research. However, effectively and efficiently integrating diverse clinical, imaging, and molecular profile data is necessary to understand complex diseases, and to achieve accurate diagnosis in order to provide the best possible treatment. In addition to the need for sufficient computing resources, suitable algorithms, models, and data infrastructure, three important aspects are often neglected: (1) the need for multiple independent, sufficiently large and, above all, high-quality data sets; (2) the need for domain knowledge and ontologies; and (3) the requirement for multiple networks that provide relevant relationships among biological entities. While one will always get results out of high-dimensional data, all three aspects are essential to provide robust training and validation of ML models, to provide explainable hypotheses and results, and to achieve the necessary trust in AI and confidence for clinical applications.