RESUMEN
Mitochondrial toxicity is a significant concern in the drug discovery process, as compounds that disrupt the function of these organelles can lead to serious side effects, including liver injury and cardiotoxicity. Different in vitro assays exist to detect mitochondrial toxicity at varying mechanistic levels: disruption of the respiratory chain, disruption of the membrane potential, or general mitochondrial dysfunction. In parallel, whole cell imaging assays like Cell Painting provide a phenotypic overview of the cellular system upon treatment and enable the assessment of mitochondrial health from cell profiling features. In this study, we aim to establish machine learning models for the prediction of mitochondrial toxicity, making the best use of the available data. For this purpose, we first derived highly curated datasets of mitochondrial toxicity, including subsets for different mechanisms of action. Due to the limited amount of labeled data often associated with toxicological endpoints, we investigated the potential of using morphological features from a large Cell Painting screen to label additional compounds and enrich our dataset. Our results suggest that models incorporating morphological profiles perform better in predicting mitochondrial toxicity than those trained on chemical structures alone (up to +0.08 and +0.09 mean MCC in random and cluster cross-validation, respectively). Toxicity labels derived from Cell Painting images improved the predictions on an external test set up to +0.08 MCC. However, we also found that further research is needed to improve the reliability of Cell Painting image labeling. Overall, our study provides insights into the importance of considering different mechanisms of action when predicting a complex endpoint like mitochondrial disruption as well as into the challenges and opportunities of using Cell Painting data for toxicity prediction.
Asunto(s)
Aprendizaje Automático , Mitocondrias , Reproducibilidad de los Resultados , Hígado , Membranas MitocondrialesRESUMEN
Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.
Asunto(s)
Redes Neurales de la Computación , Agua , Humanos , Animales , Solubilidad , Agua/química , Aprendizaje Automático , Concentración de Iones de Hidrógeno , Preparaciones FarmacéuticasRESUMEN
Simple physico-chemical properties, like logD, solubility, or melting point, can reveal a great deal about how a compound under development might later behave. These data are typically measured for most compounds in drug discovery projects in a medium throughput fashion. Collecting and assembling all the Bayer in-house data related to these properties allowed us to apply powerful machine learning techniques to predict the outcome of those assays for new compounds. In this paper, we report our finding that, especially for predicting physicochemical ADMET endpoints, a multitask graph convolutional approach appears a highly competitive choice. For seven endpoints of interest, we compared the performance of that approach to fully connected neural networks and different single task models. The new model shows increased predictive performance compared to previous modeling methods and will allow early prioritization of compounds even before they are synthesized. In addition, our model follows the generalized solubility equation without being explicitly trained under this constraint.
Asunto(s)
Descubrimiento de Drogas/métodos , Preparaciones Farmacéuticas/química , Algoritmos , Aprendizaje Automático , Modelos Químicos , Redes Neurales de la Computación , Preparaciones Farmacéuticas/síntesis química , Relación Estructura-Actividad CuantitativaRESUMEN
Chemical compound bioactivity and related data are nowadays easily available from open data sources and the open medicinal chemistry literature for many transmembrane proteins. Computational ligand-based modeling of transporters has therefore experienced a shift from local (quantitative) models to more global, qualitative, predictive models. As the size and heterogeneity of the data set rises, careful data curation becomes even more important. This includes, for example, not only a tailored cutoff setting for the generation of binary classes, but also the proper assessment of the applicability domain. Powerful machine learning algorithms (such as multi-label classification) now allow the simultaneous prediction of multiple related targets. However, the more complex, the less interpretable these models will get. We emphasize that transmembrane transporters are very peculiar, some of which act as off-targets rather than as real drug targets. Thus, careful selection of the right modeling technique is important, as well as cautious interpretation of results. We hope that, as more and more data will become available, we will be able to ameliorate and specify our models, coming closer towards function elucidation and the development of safer medicine.
Asunto(s)
Proteínas Portadoras/química , Simulación por Computador , Modelos Moleculares , Proteínas Portadoras/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Ligandos , Unión Proteica , Relación Estructura-Actividad CuantitativaRESUMEN
The bile salt export pump (BSEP) is an ABC-transporter expressed at the canalicular membrane of hepatocytes. Its physiological role is to expel bile salts into the canaliculi from where they drain into the bile duct. Inhibition of this transporter may lead to intrahepatic cholestasis. Predictive computational models of BSEP inhibition may allow for fast identification of potentially harmful compounds in large databases. This article presents a predictive in silico model based on physicochemical descriptors that is able to flag compounds as potential BSEP inhibitors. This model was built using a training set of 670 compounds with available BSEP inhibition potencies. It successfully predicted BSEP inhibition for two independent test sets and was in a further step used for a virtual screening experiment. After in vitro testing of selected candidates, a marketed drug, bromocriptin, was identified for the first time as BSEP inhibitor. This demonstrates the usefulness of the model to identify new BSEP inhibitors and therefore potential cholestasis perpetrators.
Asunto(s)
Transportadoras de Casetes de Unión a ATP/antagonistas & inhibidores , Bromocriptina/farmacología , Animales , Células CHO , Línea Celular , Colestasis/prevención & control , Simulación por Computador , Cricetulus , PorcinosRESUMEN
ATP-driven transport across biological membranes is a key process to translocate solutes from the interior of the cell to the extracellular environment. In humans, ATP-binding cassette transporters are involved in absorption, distribution, metabolism, excretion, and toxicity, and also play a major role in anticancer drug resistance. Analogous transporters are also known to be involved in phytohormone translocation. These include, e.g., the transport of auxin by ABCB1/19 in Arabidopsis thaliana, the transport of abscisic acid by AtABCG25, and the transport of strigolactone by the Petunia hybrida ABC transporter PDR1. Within this article, we outline the current knowledge about plant ABC transporters with respect to their structure and function, and provide, for the first time, a protein homology model of the strigolactone transporter PDR1 from P. hybrida.
Asunto(s)
Transportadoras de Casetes de Unión a ATP/metabolismo , Reguladores del Crecimiento de las Plantas/metabolismo , Transportadoras de Casetes de Unión a ATP/química , Transporte Biológico , Modelos Moleculares , Conformación ProteicaRESUMEN
Existing computational methods for estimating pKa values in proteins rely on theoretical approximations and lengthy computations. In this work, we use a data set of 6 million theoretically determined pKa shifts to train deep learning models, which are shown to rival the physics-based predictors. These neural networks managed to infer the electrostatic contributions of different chemical groups and learned the importance of solvent exposure and close interactions, including hydrogen bonds. Although trained only using theoretical data, our pKAI+ model displayed the best accuracy in a test set of â¼750 experimental values. Inference times allow speedups of more than 1000× compared to physics-based methods. By combining speed, accuracy, and a reasonable understanding of the underlying physics, our models provide a game-changing solution for fast estimations of macroscopic pKa values from ensembles of microscopic values as well as for many downstream applications such as molecular docking and constant-pH molecular dynamics simulations.
Asunto(s)
Aprendizaje Profundo , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Proteínas/química , Electricidad EstáticaRESUMEN
The introduction of machine learning to small molecule research- an inherently multidisciplinary field in which chemists and data scientists combine their expertise and collaborate - has been vital to making screening processes more efficient. In recent years, numerous models that predict pharmacokinetic properties or bioactivity have been published, and these are used on a daily basis by chemists to make decisions and prioritize ideas. The emerging field of explainable artificial intelligence is opening up new possibilities for understanding the reasoning that underlies a model. In small molecule research, this means relating contributions of substructures of compounds to their predicted properties, which in turn also allows the areas of the compounds that have the greatest influence on the outcome to be identified. However, there is no interactive visualization tool that facilitates such interdisciplinary collaborations towards interpretability of machine learning models for small molecules. To fill this gap, we present CIME (ChemInformatics Model Explorer), an interactive web-based system that allows users to inspect chemical data sets, visualize model explanations, compare interpretability techniques, and explore subgroups of compounds. The tool is model-agnostic and can be run on a server or a workstation.
RESUMEN
The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.
RESUMEN
Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.
Asunto(s)
Aprendizaje Automático , Farmacocinética , Animales , Simulación por Computador , Humanos , Absorción Intestinal , Modelos Teóricos , Preparaciones Farmacéuticas/metabolismoRESUMEN
There has been a recent surge of interest in using machine learning across chemical space in order to predict properties of molecules or design molecules and materials with the desired properties. Most of this work relies on defining clever feature representations, in which the chemical graph structure is encoded in a uniform way such that predictions across chemical space can be made. In this work, we propose to exploit the powerful ability of deep neural networks to learn a feature representation from low-level encodings of a huge corpus of chemical structures. Our model borrows ideas from neural machine translation: it translates between two semantically equivalent but syntactically different representations of molecular structures, compressing the meaningful information both representations have in common in a low-dimensional representation vector. Once the model is trained, this representation can be extracted for any new molecule and utilized as a descriptor. In fair benchmarks with respect to various human-engineered molecular fingerprints and graph-convolution models, our method shows competitive performance in modelling quantitative structure-activity relationships in all analysed datasets. Additionally, we show that our descriptor significantly outperforms all baseline molecular fingerprints in two ligand-based virtual screening tasks. Overall, our descriptors show the most consistent performances in all experiments. The continuity of the descriptor space and the existence of the decoder that permits deducing a chemical structure from an embedding vector allow for exploration of the space and open up new opportunities for compound optimization and idea generation.
RESUMEN
One of the main challenges in small molecule drug discovery is finding novel chemical compounds with desirable properties. In this work, we propose a novel method that combines in silico prediction of molecular properties such as biological activity or pharmacokinetics with an in silico optimization algorithm, namely Particle Swarm Optimization. Our method takes a starting compound as input and proposes new molecules with more desirable (predicted) properties. It navigates a machine-learned continuous representation of a drug-like chemical space guided by a defined objective function. The objective function combines multiple in silico prediction models, defined desirability ranges and substructure constraints. We demonstrate that our proposed method is able to consistently find more desirable molecules for the studied tasks in relatively short time. We hope that our method can support medicinal chemists in accelerating and improving the lead optimization process.
RESUMEN
Transporters expressed in the liver play a major role in drug pharmacokinetics and are a key component of the physiological bile flow. Inhibition of these transporters may lead to drug-drug interactions or even drug-induced liver injury. Therefore, predicting the interaction profile of small molecules with transporters expressed in the liver may help medicinal chemists and toxicologists to prioritize compounds in an early phase of the drug development process. Based on a comprehensive analysis of the data available in the public domain, we developed a set of classification models which allow to predict-for a small molecule-the inhibition of and transport by a set of liver transporters considered to be relevant by FDA, EMA, and the Japanese regulatory agency. The models were validated by cross-validation and external test sets and comprise cross validated balanced accuracies in the range of 0.64-0.88. Finally, models were implemented as an easy to use web-service which is freely available at https://livertox.univie.ac.at.
RESUMEN
Drug-induced liver injury (DILI) is a major issue for both patients and pharmaceutical industry due to insufficient means of prevention/prediction. In the current work we present a 2-class classification model for DILI, generated with Random Forest and 2D molecular descriptors on a dataset of 966 compounds. In addition, predicted transporter inhibition profiles were also included into the models. The initially compiled dataset of 1773 compounds was reduced via a 2-step approach to 966 compounds, resulting in a significant increase (p-value<0.05) in model performance. The models have been validated via 10-fold cross-validation and against three external test sets of 921, 341 and 96 compounds, respectively. The final model showed an accuracy of 64% (AUC 68%) for 10-fold cross-validation (average of 50 iterations) and comparable values for two test sets (AUC 59%, 71% and 66%, respectively). In the study we also examined whether the predictions of our in-house transporter inhibition models for BSEP, BCRP, P-glycoprotein, and OATP1B1 and 1B3 contributed in improvement of the DILI mode. Finally, the model was implemented with open-source 2D RDKit descriptors in order to be provided to the community as a Python script.
Asunto(s)
Enfermedad Hepática Inducida por Sustancias y Drogas/etiología , Simulación por Computador , Curaduría de Datos , Hígado/efectos de los fármacos , Proteínas de Transporte de Membrana/efectos de los fármacos , Modelos Estadísticos , Pruebas de Toxicidad/métodos , Algoritmos , Animales , Área Bajo la Curva , Enfermedad Hepática Inducida por Sustancias y Drogas/metabolismo , Enfermedad Hepática Inducida por Sustancias y Drogas/patología , Minería de Datos , Bases de Datos Factuales , Humanos , Hígado/metabolismo , Hígado/patología , Proteínas de Transporte de Membrana/metabolismo , Reproducibilidad de los Resultados , Medición de RiesgoRESUMEN
The breast cancer resistance protein (BCRP) is an ABC transporter playing a crucial role in the pharmacokinetics of drugs. The early identification of substrates and inhibitors of this efflux transporter can help to prevent or foresee drug-drug interactions. In this work, we built a ligand-based in silico classification model to predict the inhibitory potential of drugs toward BCRP. The model was applied as a virtual screening technique to identify potential inhibitors among the small-molecules subset of DrugBank. Ten compounds were selected and tested for their capacity to inhibit mitoxantrone efflux in BCRP-expressing PLB985 cells. Results identified cisapride (IC50 = 0.4 µM) and roflumilast (IC50 = 0.9 µM) as two new BCRP inhibitors. The in silico strategy proved useful to prefilter potential drug-drug interaction perpetrators among a database of small molecules and can reduce the amount of compounds to test.
Asunto(s)
Transportador de Casetes de Unión a ATP, Subfamilia G, Miembro 2/antagonistas & inhibidores , Evaluación Preclínica de Medicamentos , Proteínas de Neoplasias/antagonistas & inhibidores , Interfaz Usuario-Computador , Transportador de Casetes de Unión a ATP, Subfamilia G, Miembro 2/metabolismo , Aminopiridinas/química , Aminopiridinas/farmacología , Antineoplásicos/química , Antineoplásicos/farmacología , Benzamidas/química , Benzamidas/farmacología , Línea Celular Tumoral , Cisaprida/química , Cisaprida/farmacología , Ciclopropanos/química , Ciclopropanos/farmacología , Humanos , Concentración 50 Inhibidora , Modelos Logísticos , Proteínas de Neoplasias/metabolismo , Probabilidad , Curva ROC , Reproducibilidad de los ResultadosRESUMEN
The first challenge in the 2014 competition launched by the Teach-Discover-Treat (TDT) initiative asked for the development of a tutorial for ligand-based virtual screening, based on data from a primary phenotypic high-throughput screen (HTS) against malaria. The resulting Workflows were applied to select compounds from a commercial database, and a subset of those were purchased and tested experimentally for anti-malaria activity. Here, we present the two most successful Workflows, both using machine-learning approaches, and report the results for the 114 compounds tested in the follow-up screen. Excluding the two known anti-malarials quinidine and amodiaquine and 31 compounds already present in the primary HTS, a high hit rate of 57% was found.
RESUMEN
BACKGROUND: The human ATP binding cassette transporters Breast Cancer Resistance Protein (BCRP) and Multidrug Resistance Protein 1 (P-gp) are co-expressed in many tissues and barriers, especially at the blood-brain barrier and at the hepatocyte canalicular membrane. Understanding their interplay in affecting the pharmacokinetics of drugs is of prime interest. In silico tools to predict inhibition and substrate profiles towards BCRP and P-gp might serve as early filters in the drug discovery and development process. However, to build such models, pharmacological data must be collected for both targets, which is a tedious task, often involving manual and poorly reproducible steps. RESULTS: Compounds with inhibitory activity measured against BCRP and/or P-gp were retrieved by combining Open Data and manually curated data from literature using a KNIME workflow. After determination of compound overlap, machine learning approaches were used to establish multi-label classification models for BCRP/P-gp. Different ways of addressing multi-label problems are explored and compared: label-powerset, binary relevance and classifiers chain. Label-powerset revealed important molecular features for selective or polyspecific inhibitory activity. In our dataset, only two descriptors (the numbers of hydrophobic and aromatic atoms) were sufficient to separate selective BCRP inhibitors from selective P-gp inhibitors. Also, dual inhibitors share properties with both groups of selective inhibitors. Binary relevance and classifiers chain allow improving the predictivity of the models. CONCLUSIONS: The KNIME workflow proved a useful tool to merge data from diverse sources. It could be used for building multi-label datasets of any set of pharmacological targets for which there is data available either in the open domain or in-house. By applying various multi-label learning algorithms, important molecular features driving transporter selectivity could be retrieved. Finally, using the dataset with missing annotations, predictive models can be derived in cases where no accurate dense dataset is available (not enough data overlap or no well balanced class distribution).Graphical abstract.
RESUMEN
The transmembrane ABC transporters P-glycoprotein (P-gp) and breast cancer resistance protein (BCRP) are widely recognized for their role in cancer multidrug resistance and absorption and distribution of compounds. Furthermore, they are linked to drug-drug interactions and toxicity. Nevertheless, due to their polyspecificity, a molecular understanding of the ligand-transporter interaction, which allows designing of both selective and dual inhibitors, is still in its infancy. This study comprises a combined approach of synthesis, inâ silico prediction, and inâ vitro testing to identify molecular features triggering transporter selectivity. Synthesis and testing of a series of 15 propafenone analogues with varied rigidity and basicity of substituents provide first trends for selective and dual inhibitors. Results indicate that both the flexibility of the substituent at the nitrogen atom, as well as the basicity of the nitrogen atom, trigger transporter selectivity. Furthermore, inhibitory activity of compounds at P-gp seems to be much more influenced by logP than those at BCRP. Exploiting these differences further should thus allow designing specific inhibitors for these two polyspecific ABC-transporters.
Asunto(s)
Miembro 1 de la Subfamilia B de Casetes de Unión a ATP/metabolismo , Transportador de Casetes de Unión a ATP, Subfamilia G, Miembro 2/metabolismo , Transportadoras de Casetes de Unión a ATP/metabolismo , Proteínas de Neoplasias/metabolismo , Propafenona/farmacología , Femenino , Humanos , Técnicas In Vitro , Propafenona/análogos & derivados , Relación Estructura-ActividadRESUMEN
With the discovery of P-glycoprotein (P-gp), it became evident that ABC-transporters play a vital role in bioavailability and toxicity of drugs. They prevent intracellular accumulation of toxic compounds, which renders them a major defense mechanism against xenotoxic compounds. Their expression in cells of all major barriers (intestine, blood-brain barrier, blood-placenta barrier) as well as in metabolic organs (liver, kidney) also explains their influence on the ADMET properties of drugs and drug candidates. Thus, in silico models for the prediction of the probability of a compound to interact with P-gp or analogous transporters are of high value in the early phase of the drug discovery process. Within this review, we highlight recent developments in the area, with a special focus on the molecular basis of drug-transporter interaction. In addition, with the recent availability of X-ray structures of several ABC-transporters, also structure-based design methods have been applied and will be addressed.
Asunto(s)
Transportadoras de Casetes de Unión a ATP/metabolismo , Modelos Biológicos , Preparaciones Farmacéuticas/metabolismo , Animales , Humanos , Ligandos , Estructura MolecularRESUMEN
Early prediction of safety issues in drug development is at the same time highly desirable and highly challenging. Recent advances emphasize the importance of understanding the whole chain of causal events leading to observable toxic outcomes. Here we describe an integrative modeling strategy based on these ideas that guided the design of eTOXsys, the prediction system used by the eTOX project. Essentially, eTOXsys consists of a central server that marshals requests to a collection of independent prediction models and offers a single user interface to the whole system. Every of such model lives in a self-contained virtual machine easy to maintain and install. All models produce toxicity-relevant predictions on their own but the results of some can be further integrated and upgrade its scale, yielding in vivo toxicity predictions. Technical aspects related with model implementation, maintenance and documentation are also discussed here. Finally, the kind of models currently implemented in eTOXsys is illustrated presenting three example models making use of diverse methodology (3D-QSAR and decision trees, Molecular Dynamics simulations and Linear Interaction Energy theory, and fingerprint-based QSAR).