ABSTRACT
Cancer pharmacogenomics studies provide valuable insights into disease progression and associations between genomic features and drug response. PharmacoDB integrates multiple cancer pharmacogenomics datasets profiling approved and investigational drugs across cell lines from diverse tissue types. The web-application enables users to efficiently navigate across datasets, view and compare drug dose-response data for a specific drug-cell line pair. In the new version of PharmacoDB (version 2.0, https://pharmacodb.ca/), we present (i) new datasets such as NCI-60, the Profiling Relative Inhibition Simultaneously in Mixtures (PRISM) dataset, as well as updated data from the Genomics of Drug Sensitivity in Cancer (GDSC) and the Genentech Cell Line Screening Initiative (gCSI); (ii) implementation of FAIR data pipelines using ORCESTRA and PharmacoDI; (iii) enhancements to drug-response analysis such as tissue distribution of dose-response metrics and biomarker analysis; and (iv) improved connectivity to drug and cell line databases in the community. The web interface has been rewritten using a modern technology stack to ensure scalability and standardization to accommodate growing pharmacogenomics datasets. PharmacoDB 2.0 is a valuable tool for mining pharmacogenomics datasets, comparing and assessing drug-response phenotypes of cancer models.
Subject(s)
Databases, Genetic , Pharmacogenetics/standards , Pharmacogenomic Testing/methods , Software , Genomics/methods , HumansABSTRACT
The goal of precision oncology is to tailor treatment for patients individually using the genomic profile of their tumors. Pharmacogenomics datasets such as cancer cell lines are among the most valuable resources for drug sensitivity prediction, a crucial task of precision oncology. Machine learning methods have been employed to predict drug sensitivity based on the multiple omics data available for large panels of cancer cell lines. However, there are no comprehensive guidelines on how to properly train and validate such machine learning models for drug sensitivity prediction. In this paper, we introduce a set of guidelines for different aspects of training gene expression-based predictors using cell line datasets. These guidelines provide extensive analysis of the generalization of drug sensitivity predictors and challenge many current practices in the community including the choice of training dataset and measure of drug sensitivity. The application of these guidelines in future studies will enable the development of more robust preclinical biomarkers.
Subject(s)
Drug Resistance, Neoplasm , Machine Learning , Pharmacogenetics , Algorithms , Cell Line, Tumor , Datasets as Topic , HumansABSTRACT
BACKGROUND: Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. RESULTS: To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. CONCLUSIONS: We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.
Subject(s)
Models, Statistical , Computer Simulation , Drug Evaluation, Preclinical , HumansABSTRACT
In the past few decades, major initiatives have been launched around the world to address chemical safety testing. These efforts aim to innovate and improve the efficacy of existing methods with the long-term goal of developing new risk assessment paradigms. The transcriptomic and toxicological profiling of mammalian cells has resulted in the creation of multiple toxicogenomic datasets and corresponding tools for analysis. To enable easy access and analysis of these valuable toxicogenomic data, we have developed ToxicoDB (toxicodb.ca), a free and open cloud-based platform integrating data from large in vitro toxicogenomic studies, including gene expression profiles of primary human and rat hepatocytes treated with 231 potential toxicants. To efficiently mine these complex toxicogenomic data, ToxicoDB provides users with harmonized chemical annotations, time- and dose-dependent plots of compounds across datasets, as well as the toxicity-related pathway analysis. The data in ToxicoDB have been generated using our open-source R package, ToxicoGx (github.com/bhklab/ToxicoGx). Altogether, ToxicoDB provides a streamlined process for mining highly organized, curated, and accessible toxicogenomic data that can be ultimately applied to preclinical toxicity studies and further our understanding of adverse outcomes.
Subject(s)
Databases, Genetic , Software , Toxicogenetics/methods , Acetaminophen/toxicity , Animals , Computer Graphics , DNA/biosynthesis , Data Mining , Gene Expression/drug effects , Hepatocytes/drug effects , Hepatocytes/metabolism , Humans , Nucleic Acid Synthesis Inhibitors/toxicity , RatsABSTRACT
MOTIVATION: Individualized drug response prediction is a fundamental part of personalized medicine for cancer. Great effort has been made to discover biomarkers or to develop machine learning methods for accurate drug response prediction in cancers. Incorporating prior knowledge of biological systems into these methods is a promising avenue to improve prediction performance. High-throughput cell line assays of drug-induced transcriptomic perturbation effects are a prior knowledge that has not been fully incorporated into a drug response prediction model yet. RESULTS: We introduce a unified probabilistic approach, Drug Response Variational Autoencoder (Dr.VAE), that simultaneously models both drug response in terms of viability and transcriptomic perturbations. Dr.VAE is a deep generative model based on variational autoencoders. Our experimental results showed Dr.VAE to do as well or outperform standard classification methods for 23 out of 26 tested Food and Drug Administration-approved drugs. In a series of ablation experiments we showed that the observed improvement of Dr.VAE can be credited to the incorporation of drug-induced perturbation effects with joint modeling of treatment sensitivity. AVAILABILITY AND IMPLEMENTATION: Processed data and software implementation using PyTorch (Paszke et al., 2017) are available at: https://github.com/rampasek/DrVAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Software , Humans , Machine Learning , Neoplasms , Precision MedicineABSTRACT
Recent cancer pharmacogenomic studies profiled large panels of cell lines against hundreds of approved drugs and experimental chemical compounds. The overarching goal of these screens is to measure sensitivity of cell lines to chemical perturbations, correlate these measures to genomic features, and thereby develop novel predictors of drug response. However, leveraging these valuable data is challenging due to the lack of standards for annotating cell lines and chemical compounds, and quantifying drug response. Moreover, it has been recently shown that the complexity and complementarity of the experimental protocols used in the field result in high levels of technical and biological variation in the in vitro pharmacological profiles. There is therefore a need for new tools to facilitate rigorous comparison and integrative analysis of large-scale drug screening datasets. To address this issue, we have developed PharmacoDB (pharmacodb.pmgenomics.ca), a database integrating the largest cancer pharmacogenomic studies published to date. Here, we describe how the curation of cell line and chemical compound identifiers maximizes the overlap between datasets and how users can leverage such data to compare and extract robust drug phenotypes. PharmacoDB provides a unique resource to mine a compendium of curated cancer pharmacogenomic datasets that are otherwise disparate and difficult to integrate.
Subject(s)
Databases, Pharmaceutical , Drug Screening Assays, Antitumor , Pharmacogenomic Testing , Antineoplastic Agents/pharmacology , Cell Line, Tumor , Data Mining , Dose-Response Relationship, Drug , Humans , User-Computer InterfaceABSTRACT
UNLABELLED: Pharmacogenomics holds great promise for the development of biomarkers of drug response and the design of new therapeutic options, which are key challenges in precision medicine. However, such data are scattered and lack standards for efficient access and analysis, consequently preventing the realization of the full potential of pharmacogenomics. To address these issues, we implemented PharmacoGx, an easy-to-use, open source package for integrative analysis of multiple pharmacogenomic datasets. We demonstrate the utility of our package in comparing large drug sensitivity datasets, such as the Genomics of Drug Sensitivity in Cancer and the Cancer Cell Line Encyclopedia. Moreover, we show how to use our package to easily perform Connectivity Map analysis. With increasing availability of drug-related data, our package will open new avenues of research for meta-analysis of pharmacogenomic data. AVAILABILITY AND IMPLEMENTATION: PharmacoGx is implemented in R and can be easily installed on any system. The package is available from CRAN and its source code is available from GitHub. CONTACT: bhaibeka@uhnresearch.ca or benjamin.haibe.kains@utoronto.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject(s)
Pharmacogenetics , Software , Genomics , Humans , Neoplasms , Programming LanguagesABSTRACT
Molecular subtyping is instrumental towards selection of model systems for fundamental research in tumor pathogenesis, and clinical patient assessment. Medulloblastoma (MB) is a highly heterogeneous, malignant brain tumor that is the most common cause of cancer-related deaths in children. Current MB classification schemes require large sample sizes, and standard reference samples, for subtype predictions. Such approaches are impractical in clinical settings with limited tumor biopsies, and unsuitable for model system predictions where standard reference samples are unavailable. Our developed Medullo-Model To Subtype (MM2S) classifier stratifies single MB gene expression profiles without reference samples or replicates. Our pathway-centric approach facilitates subtype predictions of patient samples, and model systems including cell lines and mouse models. MM2S demonstrates >96% accuracy for patients of well-characterized normal cerebellum, WNT, or SHH subtypes, and the less-characterized Group 4 (86%) and Group 3 (78.2%). MM2S also enables classification of MB cell lines and mouse models into their human counterparts.
Subject(s)
Cerebellar Neoplasms/classification , Cerebellar Neoplasms/diagnosis , Gene Expression Profiling , Medulloblastoma/classification , Medulloblastoma/diagnosis , Algorithms , Animals , Cell Line, Tumor , Classification/methods , Humans , Mice , Precision Medicine , SoftwareABSTRACT
Breast cancer (BC) prognosis and outcome are adversely affected by obesity. Hyperinsulinemia, common in the obese state, is associated with higher risk of death and recurrence in BC. Up to 80% of BCs overexpress the insulin receptor (INSR), which correlates with worse prognosis. INSR's role in mammary tumorigenesis was tested by generating MMTV-driven polyoma middle T (PyMT) and ErbB2/Her2 BC mouse models, respectively, with coordinate mammary epithelium-restricted deletion of INSR. In both models, deletion of either one or both copies of INSR leads to a marked delay in tumor onset and burden. Longitudinal phenotypic characterization of mouse tumors and cells reveals that INSR deletion affects tumor initiation, not progression and metastasis. INSR upholds a bioenergetic phenotype in non-transformed mammary epithelial cells, independent of its kinase activity. Similarity of phenotypes elicited by deletion of one or both copies of INSR suggest a dose-dependent threshold for INSR impact on mammary tumorigenesis.
Subject(s)
Mammary Neoplasms, Experimental , Receptor, Insulin , Mice , Animals , Receptor, Insulin/genetics , Neoplasm Recurrence, Local , Cell Transformation, Neoplastic/genetics , Epithelial Cells/pathology , Mammary Neoplasms, Experimental/genetics , Mammary Neoplasms, Experimental/pathology , Mice, TransgenicABSTRACT
Identifying biomarkers predictive of cancer cell response to drug treatment constitutes one of the main challenges in precision oncology. Recent large-scale cancer pharmacogenomic studies have opened new avenues of research to develop predictive biomarkers by profiling thousands of human cancer cell lines at the molecular level and screening them with hundreds of approved drugs and experimental chemical compounds. Many studies have leveraged these data to build predictive models of response using various statistical and machine learning methods. However, a common pitfall to these methods is the lack of interpretability as to how they make predictions, hindering the clinical translation of these models. To alleviate this issue, we used the recent logic modeling approach to develop a new machine learning pipeline that explores the space of bimodally expressed genes in multiple large in vitro pharmacogenomic studies and builds multivariate, nonlinear, yet interpretable logic-based models predictive of drug response. The performance of this approach was showcased in a compendium of the three largest in vitro pharmacogenomic datasets to build robust and interpretable models for 101 drugs that span 17 drug classes with high validation rates in independent datasets. These results along with in vivo and clinical validation support a better translation of gene expression biomarkers between model systems using bimodal gene expression. SIGNIFICANCE: A new machine learning pipeline exploits the bimodality of gene expression to provide a reliable set of candidate predictive biomarkers with a high potential for clinical translatability.
Subject(s)
Neoplasms , Biomarkers , Gene Expression , Humans , Neoplasms/drug therapy , Neoplasms/genetics , Pharmacogenetics , Precision MedicineABSTRACT
Multiple myeloma (MM) is a plasma cell malignancy that is often driven by chromosomal translocations. In particular, patients with t(4;14)-positive disease have worse prognosis compared to other MM subtypes. Herein, we demonstrated that t(4;14)-positive cells are highly dependent on the mevalonate (MVA) pathway for survival. Moreover, we showed that this metabolic vulnerability is immediately actionable, as inhibiting the MVA pathway with a statin preferentially induced apoptosis in t(4;14)-positive cells. In response to statin treatment, t(4;14)-positive cells activated the integrated stress response (ISR), which was augmented by co-treatment with bortezomib, a proteasome inhibitor. We identified that t(4;14)-positive cells depend on the MVA pathway for the synthesis of geranylgeranyl pyrophosphate (GGPP), as exogenous GGPP fully rescued statin-induced ISR activation and apoptosis. Inhibiting protein geranylgeranylation similarly induced the ISR in t(4;14)-positive cells, suggesting that this subtype of MM depends on GGPP, at least in part, for protein geranylgeranylation. Notably, fluvastatin treatment synergized with bortezomib to induce apoptosis in t(4;14)-positive cells and potentiated the anti-tumor activity of bortezomib in vivo. Our data implicate the t(4;14) translocation as a biomarker of statin sensitivity and warrant further clinical evaluation of a statin in combination with bortezomib for the treatment of t(4;14)-positive disease.
Subject(s)
Bortezomib/pharmacology , Fluvastatin/pharmacology , Gene Expression Regulation, Neoplastic/drug effects , Hydroxymethylglutaryl-CoA Reductase Inhibitors/pharmacology , Mevalonic Acid/metabolism , Multiple Myeloma/drug therapy , Polyisoprenyl Phosphates/pharmacology , Animals , Antineoplastic Agents/pharmacology , Apoptosis , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Cell Proliferation , Chromosomes, Human, Pair 14 , Chromosomes, Human, Pair 4 , Female , Humans , Mice , Mice, Inbred NOD , Mice, SCID , Multiple Myeloma/genetics , Multiple Myeloma/metabolism , Multiple Myeloma/pathology , Translocation, Genetic , Tumor Cells, Cultured , Xenograft Model Antitumor AssaysABSTRACT
Reproducibility is essential to open science, as there is limited relevance for findings that can not be reproduced by independent research groups, regardless of its validity. It is therefore crucial for scientists to describe their experiments in sufficient detail so they can be reproduced, scrutinized, challenged, and built upon. However, the intrinsic complexity and continuous growth of biomedical data makes it increasingly difficult to process, analyze, and share with the community in a FAIR (findable, accessible, interoperable, and reusable) manner. To overcome these issues, we created a cloud-based platform called ORCESTRA ( orcestra.ca ), which provides a flexible framework for the reproducible processing of multimodal biomedical data. It enables processing of clinical, genomic and perturbation profiles of cancer samples through automated processing pipelines that are user-customizable. ORCESTRA creates integrated and fully documented data objects with persistent identifiers (DOI) and manages multiple dataset versions, which can be shared for future studies.
ABSTRACT
Cancer is a leading cause of death worldwide. Identifying the best treatment using computational models to personalize drug response prediction holds great promise to improve patient's chances of successful recovery. Unfortunately, the computational task of predicting drug response is very challenging, partially due to the limitations of the available data and partially due to algorithmic shortcomings. The recent advances in deep learning may open a new chapter in the search for computational drug response prediction models and ultimately result in more accurate tools for therapy response. This review provides an overview of the computational challenges and advances in drug response prediction, and focuses on comparing the machine learning techniques to be of utmost practical use for clinicians and machine learning non-experts. The incorporation of new data modalities such as single-cell profiling, along with techniques that rapidly find effective drug combinations will likely be instrumental in improving cancer care.
ABSTRACT
Genomic instability affects the reproducibility of experiments that rely on cancer cell lines. However, measuring the genomic integrity of these cells throughout a study is a costly endeavor that is commonly forgone. Here, we validate the identity of cancer cell lines in three pharmacogenomic studies and screen for genetic drift within and between datasets. Using SNP data from these datasets encompassing 1,497 unique cell lines and 63 unique pharmacological compounds, we show that genetic drift is widely prevalent in almost all cell lines with a median of 4.5%-6.1% of the total genome size drifted between any two isogenic cell lines. This study highlights the need for molecular profiling of cell lines to minimize the effects of passaging or misidentification in biomedical studies. We developed the CCLid web application, available at www.cclid.ca, to allow users to screen the genomic profiles of their cell lines against these datasets. A record of this paper's transparent peer review process is included in the Supplemental Information.
Subject(s)
Genetic Drift , Pharmacogenetics/methods , Pharmacogenomic Testing/methods , Cell Line, Tumor , Genome/genetics , Genomics/methods , Humans , Reproducibility of ResultsABSTRACT
The field of pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community.
Subject(s)
Pharmacogenomic Testing , Software , Workflow , Computational Biology , Humans , Information DisseminationABSTRACT
This study is the first attempt to describe the ultrastructure and functional morphology of the dermal glands in Limnochares aquatica (L., 1758). The dermal glands were studied using light-optical, SEM and TEM microscopy methods during different stages of their activity. In contrast to the vast majority of other fresh water mites, dermal glands of the studied species are originally multiplied and scattered freely over the mite body surface. The opening of the glands is saddle-like, formed of several tight cuticular folds and oriented freely to the long axis of the mite body. Either a small cuticular spine or, rarely, a slim sensitive seta is placed on one pole of the opening. On the inside, the central gland portion is provided with a complex cuticular helicoid armature. The glands are composed of prismatic cells situated around the intra-alveolar lumen, variously present, and look like a fig-fruit with the basal surface facing the body cavity. The glands are provided with extremely numerous microtubules, frequently arranged in bundles, and totally devoid of synthetic apparatus such as RER cisterns and Golgi bodies. Three states of the gland morphology depending on their functional activity may be recognized: (i) glands without secretion with highly folded cell walls and numerous microtubules within the cytoplasm, (ii) glands with an electron-dense granular secretion in the expanded vacuoles and (iii) glands with the secretion totally extruded presenting giant empty vacuoles bordered with slim cytoplasmic strips on the periphery. Summer specimens usually show the first gland state, whereas winter specimens, conversely, more often demonstrate the second and the third states. This situation may depend on some factors like changes of the seasonal temperature, pH, or oxygenation of the ambient water. On the assumption of the morphological characters, dermal glands may be classified not as secretory but as a special additional excretory organ system of the body cavity. Despite the glands lack cambial cells, restoration of functions after releasing of 'secretion' looks possible. Organization of dermal glands is discussed in comparison to other water mites studied.
Subject(s)
Mites/anatomy & histology , Animals , Exocrine Glands/anatomy & histology , Exocrine Glands/ultrastructure , Microscopy, Electron, Scanning , Microscopy, Electron, Transmission , Mites/ultrastructureABSTRACT
Radiotherapy is integral to the care of a majority of patients with cancer. Despite differences in tumor responses to radiation (radioresponse), dose prescriptions are not currently tailored to individual patients. Recent large-scale cancer cell line databases hold the promise of unravelling the complex molecular arrangements underlying cellular response to radiation, which is critical for novel predictive biomarker discovery. Here, we present RadioGx, a computational platform for integrative analyses of radioresponse using radiogenomic databases. We fit the dose-response data within RadioGx to the linear-quadratic model. The imputed survival across a range of dose levels (AUC) was a robust radioresponse indicator that correlated with biological processes known to underpin the cellular response to radiation. Using AUC as a metric for further investigations, we found that radiation sensitivity was significantly associated with disruptive mutations in genes related to nonhomologous end joining. Next, by simulating the effects of different oxygen levels, we identified putative genes that may influence radioresponse specifically under hypoxic conditions. Furthermore, using transcriptomic data, we found evidence for tissue-specific determinants of radioresponse, suggesting that tumor type could influence the validity of putative predictive biomarkers of radioresponse. Finally, integrating radioresponse with drug response data, we found that drug classes impacting the cytoskeleton, DNA replication, and mitosis display similar therapeutic effects to ionizing radiation on cancer cell lines. In summary, RadioGx provides a unique computational toolbox for hypothesis generation to advance preclinical research for radiation oncology and precision medicine. SIGNIFICANCE: The RadioGx computational platform enables integrative analyses of cellular response to radiation with drug responses and genome-wide molecular data. GRAPHICAL ABSTRACT: http://cancerres.aacrjournals.org/content/canres/79/24/6227/F1.large.jpg.See related commentary by Spratt and Speers, p. 6076.