RESUMO
In an effort to address a major challenge in chemical safety assessment, alternative approaches for characterizing systemic effect levels, a predictive model was developed. Systemic effect levels were curated from ToxRefDB, HESS-DB and COSMOS-DB from numerous study types totaling 4379 in vivo studies for 1247 chemicals. Observed systemic effects in mammalian models are a complex function of chemical dynamics, kinetics, and inter- and intra-individual variability. To address this complex problem, systemic effect levels were modeled at the study-level by leveraging study covariates (e.g., study type, strain, administration route) in addition to multiple descriptor sets, including chemical (ToxPrint, PaDEL, and Physchem), biological (ToxCast), and kinetic descriptors. Using random forest modeling with cross-validation and external validation procedures, study-level covariates alone accounted for approximately 15% of the variance reducing the root mean squared error (RMSE) from 0.96 log10 to 0.85 log10 mg/kg/day, providing a baseline performance metric (lower expectation of model performance). A consensus model developed using a combination of study-level covariates, chemical, biological, and kinetic descriptors explained a total of 43% of the variance with an RMSE of 0.69 log10 mg/kg/day. A benchmark model (upper expectation of model performance) was also developed with an RMSE of 0.5 log10 mg/kg/day by incorporating study-level covariates and the mean effect level per chemical. To achieve a representative chemical-level prediction, the minimum study-level predicted and observed effect level per chemical were compared reducing the RMSE from 1.0 to 0.73 log10 mg/kg/day, equivalent to 87% of predictions falling within an order-of-magnitude of the observed value. Although biological descriptors did not improve model performance, the final model was enriched for biological descriptors that indicated xenobiotic metabolism gene expression, oxidative stress, and cytotoxicity, demonstrating the importance of accounting for kinetics and non-specific bioactivity in predicting systemic effect levels. Herein, we generated an externally predictive model of systemic effect levels for use as a safety assessment tool and have generated forward predictions for over 30,000 chemicals.
Assuntos
Modelos Químicos , Testes de Toxicidade , Animais , Cosméticos/toxicidade , Bases de Dados de Compostos Químicos , Modelos Estatísticos , ToxicocinéticaRESUMO
Changes in the regulatory landscape of chemical safety assessment call for the use of New Approach Methodologies (NAMs) including read-across to fill data gaps. One critical aspect of analogue evaluation is the extent to which target and source analogues are metabolically similar. In this study, a set of 37 structurally diverse chemicals were compiled from the EPA ToxCast inventory to compare and contrast a selection of metabolism in silico tools, in terms of their coverage and performance relative to metabolism information reported in the literature. The aim was to build understanding of the scope and capabilities of these tools and how they could be utilised in a read-across assessment. The tools were Systematic Generation of Metabolites (SyGMa), Meteor Nexus, BioTransformer, Tissue Metabolism Simulator (TIMES), OECD Toolbox, and Chemical Transformation Simulator (CTS). Performance was characterised by sensitivity and precision determined by comparing predictions against literature reported metabolites (from 44 publications). A coverage score was derived to provide a relative quantitative comparison between the tools. Meteor, TIMES, Toolbox, and CTS predictions were run in batch mode, using default settings. SyGMa and BioTransformer were run with user-defined settings, (two passes of phase I and one pass of phase II). Hierarchical clustering revealed high similarity between TIMES and Toolbox. SyGMa had the highest coverage, matching an average of 38.63% of predictions generated by the other tools though was prone to significant overprediction. It generated 5,125 metabolites, which represented 54.67% of all predictions. Precision and sensitivity values ranged from 1.1-29% and 14.7-28.3% respectively. The Toolbox had the highest performance overall. A case study was presented for 3,4-Toluenediamine (3,4-TDA), assessed for the derivation of screening-level Provisional Peer Reviewed Toxicity Values (PPRTVs), was used to demonstrate the practical role in silico metabolism information can play in analogue evaluation as part of a read-across approach.
RESUMO
BACKGROUND: The advent of high-throughput transcriptomic screening technologies has resulted in a wealth of publicly available gene expression data associated with chemical treatments. From a regulatory perspective, data sets that cover a large chemical space and contain reference chemicals offer utility for the prediction of molecular initiating events associated with chemical exposure. Here, we integrate data from a large compendium of transcriptomic responses to chemical exposure with a comprehensive database of chemical-protein associations to train binary classifiers that predict mechanism(s) of action from transcriptomic responses. First, we linked reference chemicals present in the LINCS L1000 gene expression data collection to chemical identifiers in RefChemDB, a database of chemical-protein interactions. Next, we trained binary classifiers on MCF7 human breast cancer cell line derived gene expression profiles and chemical-protein labels using six classification algorithms to identify optimal analysis parameters. To validate classifier accuracy, we used holdout data sets, training-excluded reference chemicals, and empirical significance testing of null models derived from permuted chemical-protein associations. To identify classifiers that have variable predicting performance across training data derived from different cellular contexts, we trained a separate set of binary classifiers on the PC3 human prostate cancer cell line. RESULTS: We trained classifiers using expression data associated with chemical treatments linked to 51 molecular initiating events. This analysis identified and validated 9 high-performing classifiers with empirical p-values lower than 0.05 and internal accuracies ranging from 0.73 to 0.94 and holdout accuracies of 0.68 to 0.92. High-ranking predictions for training-excluded reference chemicals demonstrating that predictive accuracy extends beyond the set of chemicals used in classifier training. To explore differences in classifier performance as a function of training data cellular context, MCF7-trained classifier accuracies were compared to classifiers trained on the PC3 gene expression data for the same molecular initiating events. CONCLUSIONS: This methodology can offer insight in prioritizing candidate perturbagens of interest for targeted screens. This approach can also help guide the selection of relevant cellular contexts for screening classes of candidate perturbagens using cell line specific model performance.
RESUMO
BACKGROUND: Endocrine disrupting chemicals (EDCs) are xenobiotics that mimic the interaction of natural hormones and alter synthesis, transport, or metabolic pathways. The prospect of EDCs causing adverse health effects in humans and wildlife has led to the development of scientific and regulatory approaches for evaluating bioactivity. This need is being addressed using high-throughput screening (HTS) in vitro approaches and computational modeling. OBJECTIVES: In support of the Endocrine Disruptor Screening Program, the U.S. Environmental Protection Agency (EPA) led two worldwide consortiums to virtually screen chemicals for their potential estrogenic and androgenic activities. Here, we describe the Collaborative Modeling Project for Androgen Receptor Activity (CoMPARA) efforts, which follows the steps of the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP). METHODS: The CoMPARA list of screened chemicals built on CERAPP's list of 32,464 chemicals to include additional chemicals of interest, as well as simulated ToxCast™ metabolites, totaling 55,450 chemical structures. Computational toxicology scientists from 25 international groups contributed 91 predictive models for binding, agonist, and antagonist activity predictions. Models were underpinned by a common training set of 1,746 chemicals compiled from a combined data set of 11 ToxCast™/Tox21 HTS in vitro assays. RESULTS: The resulting models were evaluated using curated literature data extracted from different sources. To overcome the limitations of single-model approaches, CoMPARA predictions were combined into consensus models that provided averaged predictive accuracy of approximately 80% for the evaluation set. DISCUSSION: The strengths and limitations of the consensus predictions were discussed with example chemicals; then, the models were implemented into the free and open-source OPERA application to enable screening of new chemicals with a defined applicability domain and accuracy assessment. This implementation was used to screen the entire EPA DSSTox database of â¼875,000 chemicals, and their predicted AR activities have been made available on the EPA CompTox Chemicals dashboard and National Toxicology Program's Integrated Chemical Environment. https://doi.org/10.1289/EHP5580.
Assuntos
Simulação por Computador , Disruptores Endócrinos , Androgênios , Bases de Dados Factuais , Ensaios de Triagem em Larga Escala , Humanos , Receptores Androgênicos , Estados Unidos , United States Environmental Protection AgencyRESUMO
BACKGROUND: The logarithmic acid dissociation constant pKa reflects the ionization of a chemical, which affects lipophilicity, solubility, protein binding, and ability to pass through the plasma membrane. Thus, pKa affects chemical absorption, distribution, metabolism, excretion, and toxicity properties. Multiple proprietary software packages exist for the prediction of pKa, but to the best of our knowledge no free and open-source programs exist for this purpose. Using a freely available data set and three machine learning approaches, we developed open-source models for pKa prediction. METHODS: The experimental strongest acidic and strongest basic pKa values in water for 7912 chemicals were obtained from DataWarrior, a freely available software package. Chemical structures were curated and standardized for quantitative structure-activity relationship (QSAR) modeling using KNIME, and a subset comprising 79% of the initial set was used for modeling. To evaluate different approaches to modeling, several datasets were constructed based on different processing of chemical structures with acidic and/or basic pKas. Continuous molecular descriptors, binary fingerprints, and fragment counts were generated using PaDEL, and pKa prediction models were created using three machine learning methods, (1) support vector machines (SVM) combined with k-nearest neighbors (kNN), (2) extreme gradient boosting (XGB) and (3) deep neural networks (DNN). RESULTS: The three methods delivered comparable performances on the training and test sets with a root-mean-squared error (RMSE) around 1.5 and a coefficient of determination (R2) around 0.80. Two commercial pKa predictors from ACD/Labs and ChemAxon were used to benchmark the three best models developed in this work, and performance of our models compared favorably to the commercial products. CONCLUSIONS: This work provides multiple QSAR models to predict the strongest acidic and strongest basic pKas of chemicals, built using publicly available data, and provided as free and open-source software on GitHub.
RESUMO
Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence in compound identifications, the use of structural fragmentation data collected via tandem mass spectrometry (MS/MS or MS2) is vital. However, the availability of empirically collected MS/MS data for identification of unknowns is limited. Researchers have therefore turned to in silico generation of MS/MS data for use in HRMS-based screening studies. This paper describes the generation en masse of predicted MS/MS spectra for the entirety of the US EPA's DSSTox database using competitive fragmentation modelling and a freely available open source tool, CFM-ID. The generated dataset comprises predicted MS/MS spectra for ~700,000 structures, and mappings between predicted spectra, structures, associated substances, and chemical metadata. Together, these resources facilitate improved compound identifications in HRMS screening studies. These data are accessible via an SQL database, a comma-separated export file (.csv), and EPA's CompTox Chemicals Dashboard.
RESUMO
The collection of chemical structure information and associated experimental data for quantitative structure-activity/property relationship (QSAR/QSPR) modeling is facilitated by an increasing number of public databases containing large amounts of useful data. However, the performance of QSAR models highly depends on the quality of the data and modeling methodology used. This study aims to develop robust QSAR/QSPR models for chemical properties of environmental interest that can be used for regulatory purposes. This study primarily uses data from the publicly available PHYSPROP database consisting of a set of 13 common physicochemical and environmental fate properties. These datasets have undergone extensive curation using an automated workflow to select only high-quality data, and the chemical structures were standardized prior to calculation of the molecular descriptors. The modeling procedure was developed based on the five Organization for Economic Cooperation and Development (OECD) principles for QSAR models. A weighted k-nearest neighbor approach was adopted using a minimum number of required descriptors calculated using PaDEL, an open-source software. The genetic algorithms selected only the most pertinent and mechanistically interpretable descriptors (2-15, with an average of 11 descriptors). The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-life to 14,050 chemicals for logP, with an average of 3222 chemicals across all endpoints. The optimal models were built on randomly selected training sets (75%) and validated using fivefold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95, with an average of 0.86 and an R2 test value from 0.71 to 0.96, with an average of 0.82. Modeling and performance details are described in QSAR model reporting format and were validated by the European Commission's Joint Research Center to be OECD compliant. All models are freely available as an open-source, command-line application called OPEn structure-activity/property Relationship App (OPERA). OPERA models were applied to more than 750,000 chemicals to produce freely available predicted data on the U.S. Environmental Protection Agency's CompTox Chemistry Dashboard.
RESUMO
Chemical database searching has become a fixture in many non-targeted identification workflows based on high-resolution mass spectrometry (HRMS). However, the form of a chemical structure observed in HRMS does not always match the form stored in a database (e.g., the neutral form versus a salt; one component of a mixture rather than the mixture form used in a consumer product). Linking the form of a structure observed via HRMS to its related form(s) within a database will enable the return of all relevant variants of a structure, as well as the related metadata, in a single query. A Konstanz Information Miner (KNIME) workflow has been developed to produce structural representations observed using HRMS ("MS-Ready structures") and links them to those stored in a database. These MS-Ready structures, and associated mappings to the full chemical representations, are surfaced via the US EPA's Chemistry Dashboard ( https://comptox.epa.gov/dashboard/ ). This article describes the workflow for the generation and linking of ~ 700,000 MS-Ready structures (derived from ~ 760,000 original structures) as well as download, search and export capabilities to serve structure identification using HRMS. The importance of this form of structural representation for HRMS is demonstrated with several examples, including integration with the in silico fragmentation software application MetFrag. The structures, search, download and export functionality are all available through the CompTox Chemistry Dashboard, while the MetFrag implementation can be viewed at https://msbi.ipb-halle.de/MetFragBeta/ .