RESUMEN
The endocannabinoid system (ECS) is a critical regulatory network composed of endogenous cannabinoids (eCBs), their synthesizing and degrading enzymes, and associated receptors. It is integral to maintaining homeostasis and orchestrating key functions within the central nervous and immune systems. Given its therapeutic significance, we have launched a series of drug discovery endeavors aimed at ECS targets, including peroxisome proliferator-activated receptors (PPARs), cannabinoid receptors types 1 (CB1R) and 2 (CB2R), and monoacylglycerol lipase (MAGL), addressing a wide array of medical needs. The pursuit of new therapeutic agents has been enhanced by the creation of specialized labeled chemical probes, which aid in target localization, mechanistic studies, assay development, and the establishment of biomarkers for target engagement. By fusing medicinal chemistry with chemical biology in a comprehensive, translational end-to-end drug discovery strategy, we have expedited the development of novel therapeutics. Additionally, this strategy promises to foster highly productive partnerships between industry and academia, as will be illustrated through various examples.
Asunto(s)
Química Farmacéutica , Descubrimiento de Drogas , Endocannabinoides , Endocannabinoides/metabolismo , Endocannabinoides/química , Humanos , Industria Farmacéutica , Monoacilglicerol Lipasas/metabolismo , Monoacilglicerol Lipasas/antagonistas & inhibidores , Desarrollo de Medicamentos , AcademiaRESUMEN
Suzuki cross-coupling reactions are considered a valuable tool for constructing carbon-carbon bonds in small molecule drug discovery. However, the synthesis of chemical matter often represents a time-consuming and labour-intensive bottleneck. We demonstrate how machine learning methods trained on high-throughput experimentation (HTE) data can be leveraged to enable fast reaction condition selection for novel coupling partners. We show that the trained models support chemists in determining suitable catalyst-solvent-base combinations for individual transformations including an evaluation of the need for HTE screening. We introduce an algorithm for designing 96-well plates optimized towards reaction yields and discuss the model performance of zero- and few-shot machine learning. The best-performing machine learning model achieved a three-category classification accuracy of 76.3% (±0.2%) and an F 1-score for a binary classification of 79.1% (±0.9%). Validation on eight reactions revealed a receiver operating characteristic (ROC) curve (AUC) value of 0.82 (±0.07) for few-shot machine learning. On the other hand, zero-shot machine learning models achieved a mean ROC-AUC value of 0.63 (±0.16). This study positively advocates the application of few-shot machine learning-guided reaction condition selection for HTE campaigns in medicinal chemistry and highlights practical applications as well as challenges associated with zero-shot machine learning.
RESUMEN
Machine learning models support computer-aided molecular design and compound optimization. However, the initial phases of drug discovery often face a scarcity of training data for these models. Meta-learning has emerged as a potentially promising strategy, harnessing the wealth of structure-activity data available for known targets to facilitate efficient few-shot model training for the specific target of interest. In this study, we assessed the effectiveness of two different meta-learning methods, namely model-agnostic meta-learning (MAML) and adaptive deep kernel fitting (ADKF), specifically in the regression setting. We investigated how factors such as dataset size and the similarity of training tasks impact predictability. The results indicate that ADKF significantly outperformed both MAML and a single-task baseline model on the inhibition data. However, the performance of ADKF varied across different test tasks. Our findings suggest that considerable enhancements in performance can be anticipated primarily when the task of interest is similar to the tasks incorporated in the meta-learning process.
Asunto(s)
Aprendizaje Automático , Relación Estructura-Actividad , Humanos , Descubrimiento de DrogasRESUMEN
De novo drug design aims to generate molecules from scratch that possess specific chemical and pharmacological properties. We present a computational approach utilizing interactome-based deep learning for ligand- and structure-based generation of drug-like molecules. This method capitalizes on the unique strengths of both graph neural networks and chemical language models, offering an alternative to the need for application-specific reinforcement, transfer, or few-shot learning. It enables the "zero-shot" construction of compound libraries tailored to possess specific bioactivity, synthesizability, and structural novelty. In order to proactively evaluate the deep interactome learning framework for protein structure-based drug design, potential new ligands targeting the binding site of the human peroxisome proliferator-activated receptor (PPAR) subtype gamma are generated. The top-ranking designs are chemically synthesized and computationally, biophysically, and biochemically characterized. Potent PPAR partial agonists are identified, demonstrating favorable activity and the desired selectivity profiles for both nuclear receptors and off-target interactions. Crystal structure determination of the ligand-receptor complex confirms the anticipated binding mode. This successful outcome positively advocates interactome-based de novo design for application in bioorganic and medicinal chemistry, enabling the creation of innovative bioactive molecules.
Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , PPAR gamma , Humanos , Ligandos , PPAR gamma/metabolismo , PPAR gamma/agonistas , PPAR gamma/química , Sitios de Unión , Unión ProteicaRESUMEN
INTRODUCTION: Janus kinase (JAK) inhibitors were recently identified as promising drug candidates for repurposing in Alzheimer's disease (AD) due to their capacity to suppress inflammation via modulation of JAK/STAT signaling pathways. Besides interaction with primary therapeutic targets, JAK inhibitor drugs frequently interact with unintended, often unknown, biological off-targets, leading to associated effects. Nevertheless, the relevance of JAK inhibitors' off-target interactions in the context of AD remains unclear. METHODS: Putative off-targets of baricitinib and tofacitinib were predicted using a machine learning (ML) approach. After screening scientific literature, off-targets were filtered based on their relevance to AD. Targets that had not been previously identified as off-targets of baricitinib or tofacitinib were subsequently tested using biochemical or cell-based assays. From those, active concentrations were compared to bioavailable concentrations in the brain predicted by physiologically based pharmacokinetic (PBPK) modeling. RESULTS: With the aid of ML and in vitro activity assays, we identified two enzymes previously unknown to be inhibited by baricitinib, namely casein kinase 2 subunit alpha 2 (CK2-α2) and dual leucine zipper kinase (MAP3K12), both with binding constant (K d) values of 5.8 µM. Predicted maximum concentrations of baricitinib in brain tissue using PBPK modeling range from 1.3 to 23 nM, which is two to three orders of magnitude below the corresponding binding constant. CONCLUSION: In this study, we extended the list of baricitinib off-targets that are potentially relevant for AD progression and predicted drug distribution in the brain. The results suggest a low likelihood of successful repurposing in AD due to low brain permeability, even at the maximum recommended daily dose. While additional research is needed to evaluate the potential impact of the off-target interaction on AD, the combined approach of ML-based target prediction, in vitro confirmation, and PBPK modeling may help prioritize drugs with a high likelihood of being effectively repurposed for AD. Highlights: This study explored JAK inhibitors' off-targets in AD using a multidisciplinary approach.We combined machine learning, in vitro tests, and PBPK modelling to predict and validate new off-target interactions of tofacitinib and baricitinib in AD.Previously unknown inhibition of two enzymes (CK2-a2 and MAP3K12) by baricitinib were confirmed using in vitro experiments.Our PBPK model indicates that baricitinib low brain permeability limits AD repurposing.The proposed multidisciplinary approach optimizes drug repurposing efforts in AD research.
RESUMEN
Rational structure-based drug design relies on accurate predictions of protein-ligand binding affinity from structural molecular information. Although deep learning-based methods for predicting binding affinity have shown promise in computational drug design, certain approaches have faced criticism for their potential to inadequately capture the fundamental physical interactions between ligands and their macromolecular targets or for being susceptible to dataset biases. Herein, we propose to include bond-critical points based on the electron density of a protein-ligand complex as a fundamental physical representation of protein-ligand interactions. Employing a geometric deep learning model, we explore the usefulness of these bond-critical points to predict absolute binding affinities of protein-ligand complexes, benchmark model performance against existing methods, and provide a critical analysis of this new approach. The models achieved root-mean-squared errors of 1.4-1.8 log units on the PDBbind dataset, and 1.0-1.7 log units on the PDE10A dataset, not indicating significant advantages over benchmark methods, and thus rendering the utility of electron density for deep learning models context-dependent. The relationship between intermolecular electron density and corresponding binding affinity was analyzed, and Pearson correlation coefficients r > 0.7 were obtained for several macromolecular targets.
RESUMEN
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Asunto(s)
Aprendizaje Profundo , Relación Estructura-Actividad Cuantitativa , Humanos , Inteligencia Artificial , Metodologías Computacionales , Teoría Cuántica , Descubrimiento de Drogas/métodos , Diseño de FármacosRESUMEN
Late-stage functionalization is an economical approach to optimize the properties of drug candidates. However, the chemical complexity of drug molecules often makes late-stage diversification challenging. To address this problem, a late-stage functionalization platform based on geometric deep learning and high-throughput reaction screening was developed. Considering borylation as a critical step in late-stage functionalization, the computational model predicted reaction yields for diverse reaction conditions with a mean absolute error margin of 4-5%, while the reactivity of novel reactions with known and unknown substrates was classified with a balanced accuracy of 92% and 67%, respectively. The regioselectivity of the major products was accurately captured with a classifier F-score of 67%. When applied to 23 diverse commercial drug molecules, the platform successfully identified numerous opportunities for structural diversification. The influence of steric and electronic information on model performance was quantified, and a comprehensive simple user-friendly reaction format was introduced that proved to be a key enabler for seamlessly integrating deep learning and high-throughput experimentation for late-stage functionalization.
Asunto(s)
Aprendizaje Profundo , Ensayos Analíticos de Alto RendimientoRESUMEN
Enhancing the properties of advanced drug candidates is aided by the direct incorporation of specific chemical groups, avoiding the need to construct the entire compound from the ground up. Nevertheless, their chemical intricacy often poses challenges in predicting reactivity for C-H activation reactions and planning their synthesis. We adopted a reaction screening approach that combines high-throughput experimentation (HTE) at a nanomolar scale with computational graph neural networks (GNNs). This approach aims to identify suitable substrates for late-stage C-H alkylation using Minisci-type chemistry. GNNs were trained using experimentally generated reactions derived from in-house HTE and literature data. These trained models were then used to predict, in a forward-looking manner, the coupling of 3180 advanced heterocyclic building blocks with a diverse set of sp3-rich carboxylic acids. This predictive approach aimed to explore the substrate landscape for Minisci-type alkylations. Promising candidates were chosen, their production was scaled up, and they were subsequently isolated and characterized. This process led to the creation of 30 novel, functionally modified molecules that hold potential for further refinement. These results positively advocate the application of HTE-based machine learning to virtual reaction screening.
RESUMEN
Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.
Asunto(s)
Algoritmos , Redes Neurales de la Computación , Aprendizaje AutomáticoRESUMEN
Integrins are a family of cell surface receptors well-recognized for their therapeutic potential in a wide range of diseases. However, the development of integrin targeting medications has been impacted by unexpected downstream effects, reflecting originally unforeseen interference with the bidirectional signalling and cross-communication of integrins. We here selected one of the most severely affected target integrins, the integrin lymphocyte function-associated antigen-1 (LFA-1, αLß2, CD11a/CD18), as a prototypic integrin to systematically assess and overcome these known shortcomings. We employed a two-tiered ligand-based virtual screening approach to identify a novel class of allosteric small molecule inhibitors targeting this integrin's αI domain. The newly discovered chemical scaffold was derivatized, yielding potent bis-and tris-aryl-bicyclic-succinimides which inhibit LFA-1 in vitro at low nanomolar concentrations. The characterisation of these compounds in comparison to earlier LFA-1 targeting modalities established that the allosteric LFA-1 inhibitors (i) are devoid of partial agonism, (ii) selectively bind LFA-1 versus other integrins, (iii) do not trigger internalization of LFA-1 itself or other integrins and (iv) display oral availability. This profile differentiates the new generation of allosteric LFA-1 inhibitors from previous ligand mimetic-based LFA-1 inhibitors and anti-LFA-1 antibodies, and is projected to support novel immune regulatory regimens selectively targeting the integrin LFA-1. The rigorous computational and experimental assessment schedule described here is designed to be adaptable to the preclinical discovery and development of novel allosterically acting compounds targeting integrins other than LFA-1, providing an exemplary approach for the early characterisation of next generation integrin inhibitors.
Asunto(s)
Antígeno-1 Asociado a Función de Linfocito , Transducción de Señal , Antígeno-1 Asociado a Función de Linfocito/química , Antígeno-1 Asociado a Función de Linfocito/metabolismo , Ligandos , Molécula 1 de Adhesión Intercelular/metabolismoRESUMEN
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.
Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , Redes Neurales de la Computación , Descubrimiento de Drogas/métodos , Aprendizaje Automático , LigandosRESUMEN
Lipophilicity, as measured by the partition coefficient between octanol and water (log P), is a key parameter in early drug discovery research. However, measuring log P experimentally is difficult for specific compounds and log P ranges. The resulting lack of reliable experimental data impedes development of accurate in silico models for such compounds. In certain discovery projects at Novartis focused on such compounds, a quantum mechanics (QM)-based tool for log P estimation has emerged as a valuable supplement to experimental measurements and as a preferred alternative to existing empirical models. However, this QM-based approach incurs a substantial computational cost, limiting its applicability to small series and prohibiting quick, interactive ideation. This work explores a set of machine learning models (Random Forest, Lasso, XGBoost, Chemprop, and Chemprop3D) to learn calculated log P values on both a public data set and an in-house data set to obtain a computationally affordable, QM-based estimation of drug lipophilicity. The message-passing neural network model Chemprop emerged as the best performing model with mean absolute errors of 0.44 and 0.34 log units for scaffold split test sets of the public and in-house data sets, respectively. Analysis of learning curves suggests that a further decrease in the test set error can be achieved by increasing the training set size. While models directly trained on experimental data perform better at approximating experimentally determined log P values than models trained on calculated values, we discuss the potential advantages of using calculated log P values going beyond the limits of experimental quantitation. We analyze the impact of the data set splitting strategy and gain insights into model failure modes. Potential use cases for the presented models include pre-screening of large compound collections and prioritization of compounds for full QM calculations.
RESUMEN
Generative chemical language models (CLMs) can be used for de novo molecular structure generation by learning from a textual representation of molecules. Here, we show that hybrid CLMs can additionally leverage the bioactivity information available for the training compounds. To computationally design ligands of phosphoinositide 3-kinase gamma (PI3Kγ), a collection of virtual molecules was created with a generative CLM. This virtual compound library was refined using a CLM-based classifier for bioactivity prediction. This second hybrid CLM was pretrained with patented molecular structures and fine-tuned with known PI3Kγ ligands. Several of the computer-generated molecular designs were commercially available, enabling fast prescreening and preliminary experimental validation. A new PI3Kγ ligand with sub-micromolar activity was identified, highlighting the method's scaffold-hopping potential. Chemical synthesis and biochemical testing of two of the top-ranked de novo designed molecules and their derivatives corroborated the model's ability to generate PI3Kγ ligands with medium to low nanomolar activity for hit-to-lead expansion. The most potent compounds led to pronounced inhibition of PI3K-dependent Akt phosphorylation in a medulloblastoma cell model, demonstrating efficacy of PI3Kγ ligands in PI3K/Akt pathway repression in human tumor cells. The results positively advocate hybrid CLMs for virtual compound screening and activity-focused molecular design.
Asunto(s)
Fosfatidilinositol 3-Quinasas , Proteínas Proto-Oncogénicas c-akt , Humanos , Estructura Molecular , Ligandos , Diseño de Fármacos , Fosfatidilinositol 3-QuinasaRESUMEN
PURPOSE: Aberrant activation of the fibroblast growth factor receptor (FGFR) family of receptor tyrosine kinases drives oncogenic signaling through its proximal adaptor protein FRS2. Precise disruption of this disease-causing signal transmission in metastatic cancers could stall tumor growth and progression. The purpose of this study was to identify a small molecule ligand of FRS2 to interrupt oncogenic signal transmission from activated FGFRs. METHODS: We used pharmacophore-based computational screening to identify potential small molecule ligands of the PTB domain of FRS2, which couples FRS2 to FGFRs. We confirmed PTB domain binding of molecules identified with biophysical binding assays and validated compound activity in cell-based functional assays in vitro and in an ovarian cancer model in vivo. We used thermal proteome profiling to identify potential off-targets of the lead compound. RESULTS: We describe a small molecule ligand of the PTB domain of FRS2 that prevents FRS2 activation and interrupts FGFR signaling. This PTB-domain ligand displays on-target activity in cells and stalls FGFR-dependent matrix invasion in various cancer models. The small molecule ligand is detectable in the serum of mice at the effective concentration for prolonged time and reduces growth of the ovarian cancer model in vivo. Using thermal proteome profiling, we furthermore identified potential off-targets of the lead compound that will guide further compound refinement and drug development. CONCLUSIONS: Our results illustrate a phenotype-guided drug discovery strategy that identified a novel mechanism to repress FGFR-driven invasiveness and growth in human cancers. The here identified bioactive leads targeting FGF signaling and cell dissemination provide a novel structural basis for further development as a tumor agnostic strategy to repress FGFR- and FRS2-driven tumors.
Asunto(s)
Descubrimiento de Drogas , Neoplasias Ováricas , Animales , Femenino , Humanos , Ratones , Proteínas Adaptadoras Transductoras de Señales/química , Proteínas Adaptadoras Transductoras de Señales/metabolismo , Ligandos , Proteínas de la Membrana/química , Proteínas de la Membrana/metabolismo , Neoplasias Ováricas/tratamiento farmacológico , Proteoma/metabolismo , Receptores de Factores de Crecimiento de Fibroblastos/metabolismo , Transducción de Señal/fisiología , Descubrimiento de Drogas/métodosRESUMEN
Computational methods in medicinal chemistry facilitate drug discovery and design. In particular, machine learning methodologies have recently gained increasing attention. This chapter provides a structured overview of the current state of computational chemistry and its applications for the interrogation of the endocannabinoid system (ECS), highlighting methods in structure-based drug design, virtual screening, ligand-based quantitative structure-activity relationship (QSAR) modeling, and de novo molecular design. We emphasize emerging methods in machine learning and anticipate a forecast of future opportunities of computational medicinal chemistry for the ECS.
Asunto(s)
Química Computacional , Endocannabinoides , Diseño de Fármacos , Ligandos , Aprendizaje Automático , Relación Estructura-Actividad CuantitativaRESUMEN
Autoencoders are versatile tools in molecular informatics. These unsupervised neural networks serve diverse tasks such as data-driven molecular representation and constructive molecular design. This Review explores their algorithmic foundations and applications in drug discovery, highlighting the most active areas of development and the contributions autoencoder networks have made in advancing this field. We also explore the challenges and prospects concerning the utilization of autoencoders and the various adaptations of this neural network architecture in molecular design.
Asunto(s)
Descubrimiento de Drogas , Redes Neurales de la ComputaciónRESUMEN
Machine learning approaches in drug discovery, as well as in other areas of the chemical sciences, benefit from curated datasets of physical molecular properties. However, there currently is a lack of data collections featuring large bioactive molecules alongside first-principle quantum chemical information. The open-access QMugs (Quantum-Mechanical Properties of Drug-like Molecules) dataset fills this void. The QMugs collection comprises quantum mechanical properties of more than 665 k biologically and pharmacologically relevant molecules extracted from the ChEMBL database, totaling ~2 M conformers. QMugs contains optimized molecular geometries and thermodynamic data obtained via the semi-empirical method GFN2-xTB. Atomic and molecular properties are provided on both the GFN2-xTB and on the density-functional levels of theory (DFT, ωB97X-D/def2-SVP). QMugs features molecules of significantly larger size than previously-reported collections and comprises their respective quantum mechanical wave functions, including DFT density and orbital matrices. This dataset is intended to facilitate the development of models that learn from molecular data on different levels of theory while also providing insight into the corresponding relationships between molecular structure and biological activity.
Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , TermodinámicaRESUMEN
As there are no clear on-target mechanisms that explain the increased risk for thrombosis and viral infection or reactivation associated with JAK inhibitors, the observed elevated risk may be a result of an off-target effect. Computational approaches combined with in vitro studies can be used to predict and validate the potential for an approved drug to interact with additional (often unwanted) targets and identify potential safety-related concerns. Potential off-targets of the JAK inhibitors baricitinib and tofacitinib were identified using two established machine learning approaches based on ligand similarity. The identified targets related to thrombosis or viral infection/reactivation were subsequently validated using in vitro assays. Inhibitory activity was identified for four drug-target pairs (PDE10A [baricitinib], TRPM6 [tofacitinib], PKN2 [baricitinib, tofacitinib]). Previously unknown off-target interactions of the two JAK inhibitors were identified. As the proposed pharmacological effects of these interactions include attenuation of pulmonary vascular remodeling, modulation of HCV response, and hypomagnesemia, the newly identified off-target interactions cannot explain an increased risk of thrombosis or viral infection/reactivation. While further evidence is required to explain both the elevated thrombosis and viral infection/reactivation risk, our results add to the evidence that these JAK inhibitors are promiscuous binders and highlight the potential for repurposing.
Asunto(s)
Antirreumáticos , Inhibidores de las Cinasas Janus , Trombosis , Virosis , Antirreumáticos/efectos adversos , Azetidinas , Humanos , Inhibidores de las Cinasas Janus/efectos adversos , Aprendizaje Automático , Hidrolasas Diéster Fosfóricas , Piperidinas , Purinas , Pirazoles , Pirimidinas , Sulfonamidas , Trombosis/inducido químicamenteRESUMEN
Identifying druggable ligand-binding sites on the surface of the macromolecular targets is an important process in structure-based drug discovery. Deep-learning models have been shown to successfully predict ligand-binding sites of proteins. As a step toward predicting binding sites in RNA and RNA-protein complexes, we employ three-dimensional convolutional neural networks. We introduce a dataset splitting approach to minimize structure-related bias in training data, and investigate the influence of protein-based neural network pre-training before fine-tuning on RNA structures. Models that were pre-trained on proteins considerably outperformed the models that were trained exclusively on RNA structures. Overall, 71 % of the known RNA binding sites were correctly located within 4â Å of their true centres.