Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
Nat Commun ; 15(1): 5764, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38982061

ABSTRACT

Machine learning (ML) systems can model quantitative structure-property relationships (QSPR) using existing experimental data and make property predictions for new molecules. With the advent of modalities such as targeted protein degraders (TPD), the applicability of QSPR models is questioned and ML usage in TPD-centric projects remains limited. Herein, ML models are developed and evaluated for TPDs' property predictions, including passive permeability, metabolic clearance, cytochrome P450 inhibition, plasma protein binding, and lipophilicity. Interestingly, performance on TPDs is comparable to that of other modalities. Predictions for glues and heterobifunctionals often yield lower and higher errors, respectively. For permeability, CYP3A4 inhibition, and human and rat microsomal clearance, misclassification errors into high and low risk categories are lower than 4% for glues and 15% for heterobifunctionals. For all modalities, misclassification errors range from 0.8% to 8.1%. Investigated transfer learning strategies improve predictions for heterobifunctionals. This is the first comprehensive evaluation of ML for the prediction of absorption, distribution, metabolism, and excretion (ADME) and physicochemical properties of TPD molecules, including heterobifunctional and molecular glue sub-modalities. Taken together, our investigations show that ML-based QSPR models are applicable to TPDs and support ML usage for TPDs' design, to potentially accelerate drug discovery.


Subject(s)
Machine Learning , Humans , Rats , Animals , Quantitative Structure-Activity Relationship , Proteolysis , Cytochrome P-450 CYP3A/metabolism , Cytochrome P-450 CYP3A/chemistry , Protein Binding , Permeability
2.
Arch Toxicol ; 2024 Jul 18.
Article in English | MEDLINE | ID: mdl-39023798

ABSTRACT

Hepatic bile acid regulation is a multifaceted process modulated by several hepatic transporters and enzymes. Drug-induced cholestasis (DIC), a main type of drug-induced liver injury (DILI), denotes any drug-mediated condition in which hepatic bile flow is impaired. Our ability in translating preclinical toxicological findings to human DIC risk is currently very limited, mainly due to important interspecies differences. Accordingly, the anticipation of clinical DIC with available in vitro or in silico models is also challenging, due to the complexity of the bile acid homeostasis. Herein, we assessed the in vitro inhibition potential of 47 marketed drugs with various degrees of reported DILI severity towards all metabolic and transport mechanisms currently known to be involved in the hepatic regulation of bile acids. The reported DILI concern and/or cholestatic annotation correlated with the number of investigated processes being inhibited. Furthermore, we employed univariate and multivariate statistical methods to determine the important processes for DILI discrimination. We identified time-dependent inhibition (TDI) of cytochrome P450 (CYP) 3A4 and reversible inhibition of the organic anion transporting polypeptide (OATP) 1B1 as the major risk factors for DIC among the tested mechanisms related to bile acid transport and metabolism. These results were consistent across multiple statistical methods and DILI classification systems applied in our dataset. We anticipate that our assessment of the two most important processes in the development of cholestasis will enable a risk assessment for DIC to be efficiently integrated into the preclinical development process.

3.
Chem Res Toxicol ; 37(4): 549-560, 2024 Apr 15.
Article in English | MEDLINE | ID: mdl-38501689

ABSTRACT

Most drugs are mainly metabolized by cytochrome P450 (CYP450), which can lead to drug-drug interactions (DDI). Specifically, time-dependent inhibition (TDI) of CYP3A4 isoenzyme has been associated with clinically relevant DDI. To overcome potential DDI issues, high-throughput in vitro assays were established to assess the TDI of CYP3A4 during the discovery and lead optimization phases. However, in silico machine learning models would enable an earlier and larger-scale assessment of TDI potential liabilities. For CYP inhibition, most modeling efforts have focused on highly imbalanced and small data sets. Moreover, assay variability is rarely considered, which is key to understand the model's quality and suitability for decision-making. In this work, machine learning models were built for the prediction of TDI of CYP3A4, evaluated prospectively, and compared to the variability of the experimental assay. Different modeling strategies were investigated to assess their influence on the model's performance. Through multitask learning, additional data sets were leveraged for model building, coming from public databases, in-house CYP-related assays, or other pharmaceutical companies (federated learning). Apart from the numerical prediction of inactivation rates of CYP3A4 TDI, three-class predictions were carried out, giving a negative (inactivation rate kobs < 0.01 min-1), weak positive (0.01 ≤ kobs ≤ 0.025 min-1), or positive (kobs > 0.025 min-1) output. The final multitask graph neural network model achieved misclassification rates of 8 and 7% for positive and negative TDI, respectively. Importantly, the presented deep learning-based predictions had a similar precision to the reproducibility of in vitro experiments and thus offered great opportunities for drug design, early derisk of DDI potential, and selection of experiments. To facilitate CYP inhibition modeling efforts in the public domain, the developed model was used to annotate ∼16 000 publicly available structures, and a surrogate data set is shared as Supporting Information.


Subject(s)
Cytochrome P-450 CYP3A , Deep Learning , Cytochrome P-450 CYP3A/metabolism , Reproducibility of Results , Cytochrome P-450 Enzyme System/metabolism , Drug Interactions , Models, Biological
4.
Drug Metab Dispos ; 52(5): 345-354, 2024 Apr 16.
Article in English | MEDLINE | ID: mdl-38360916

ABSTRACT

It is common practice in drug discovery and development to predict in vivo hepatic clearance from in vitro incubations with liver microsomes or hepatocytes using the well-stirred model (WSM). When applying the WSM to a set of approximately 3000 Novartis research compounds, 73% of neutral and basic compounds (extended clearance classification system [ECCS] class 2) were well-predicted within 3-fold. In contrast, only 44% (ECCS class 1A) or 34% (ECCS class 1B) of acids were predicted within 3-fold. To explore the hypothesis whether the higher degree of plasma protein binding for acids contributes to the in vitro-in vivo correlation (IVIVC) disconnect, 68 proprietary compounds were incubated with rat liver microsomes in the presence and absence of 5% plasma. A minor impact of plasma on clearance IVIVC was found for moderately bound compounds (fraction unbound in plasma [fup] ≥1%). However, addition of plasma significantly improved the IVIVC for highly bound compounds (fup <1%) as indicated by an increase of the average fold error from 0.10 to 0.36. Correlating fup with the scaled unbound intrinsic clearance ratio in the presence or absence of plasma allowed the establishment of an empirical, nonlinear correction equation that depends on fup Taken together, estimation of the metabolic clearance of highly bound compounds was enhanced by the addition of plasma to microsomal incubations. For standard incubations in buffer only, application of an empirical correction provided improved clearance predictions. SIGNIFICANCE STATEMENT: Application of the well-stirred liver model for clearance in vitro-in vivo extrapolation (IVIVE) in rat generally underpredicts the clearance of acids and the strong protein binding of acids is suspected to be one responsible factor. Unbound intrinsic in vitro clearance (CLint,u) determinations using rat liver microsomes supplemented with 5% plasma resulted in an improved IVIVE. An empirical equation was derived that can be applied to correct CLint,u-values in dependance of fraction unbound in plasma (fup) and measured CLint in buffer.


Subject(s)
Microsomes, Liver , Models, Biological , Animals , Rats , Microsomes, Liver/metabolism , Metabolic Clearance Rate , Liver/metabolism , Hepatocytes/metabolism , Blood Proteins/metabolism
5.
Mol Pharm ; 21(4): 1817-1826, 2024 Apr 01.
Article in English | MEDLINE | ID: mdl-38373038

ABSTRACT

Medicinal chemistry and drug design efforts can be assisted by machine learning (ML) models that relate the molecular structure to compound properties. Such quantitative structure-property relationship models are generally trained on large data sets that include diverse chemical series (global models). In the pharmaceutical industry, these ML global models are available across discovery projects as an "out-of-the-box" solution to assist in drug design, synthesis prioritization, and experiment selection. However, drug discovery projects typically focus on confined parts of the chemical space (e.g., chemical series), where global models might not be applicable. Local ML models are sometimes generated to focus on specific projects or series. Herein, ML-based global models, local models, and hybrid global-local strategies were benchmarked. Analyses were done for more than 300 drug discovery projects at Novartis and ten absorption, distribution, metabolism, and excretion (ADME) assays. In this work, hybrid global-local strategies based on transfer learning approaches were proposed to leverage both historical ADME data (global) and project-specific data (local) to adapt model predictions. Fine-tuning a pretrained global ML model (used for weights' initialization, WI) was the top-performing method. Average improvements of mean absolute errors across all assays were 16% and 27% compared with global and local models, respectively. Interestingly, when the effect of training set size was analyzed, WI fine-tuning was found to be successful even in low-data scenarios (e.g., ∼10 molecules per project). Taken together, this work highlights the potential of domain adaptation in the field of molecular property predictions to refine existing pretrained models on a new compound data distribution.


Subject(s)
Deep Learning , Drug Discovery/methods , Drug Design , Machine Learning , Quantitative Structure-Activity Relationship
6.
J Cheminform ; 15(1): 67, 2023 Jul 25.
Article in English | MEDLINE | ID: mdl-37491407

ABSTRACT

Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach shows higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.

7.
Mol Pharm ; 20(3): 1758-1767, 2023 03 06.
Article in English | MEDLINE | ID: mdl-36745394

ABSTRACT

Machine learning (ML) has become an indispensable tool to predict absorption, distribution, metabolism, and excretion (ADME) properties in pharmaceutical research. ML algorithms are trained on molecular structures and corresponding ADME assay data to develop quantitative structure-property relationship (QSPR) models. Traditional QSPR models were trained on compound sets of limited size. With the advent of more complex ML algorithms and data availability, training sets have become larger and more diverse. Most common training approaches consist in either training a model with a small set of similar compounds, namely, compounds designed for the same drug discovery project or chemical series (local model approach) or with a larger set of diverse compounds (global model approach). Global models are built with all experimental data available for an assay, combining compound data from different projects and disease areas. Despite the ML progress made so far, the choice of the appropriate data composition for building ML models is still unclear. Herein, a systematic evaluation of local and global ML models was performed for 10 different experimental assays and 112 drug discovery projects. Results show a consistent superior performance of global models for ADME property predictions. Diagnostic analyses were also carried out to investigate the influence of training set size, structural diversity, and data shift in the relative performance of local and global ML models. Training set and structural diversity did not have an impact in the relative performance on the methods. Instead, data shift helped to identify the projects with larger performance differences between local and global models. Results presented in this work can be leveraged to improve ML-based ADME properties predictions and thus decision-making in drug discovery projects.


Subject(s)
Drug Discovery , Machine Learning , Drug Discovery/methods , Algorithms , Molecular Structure , Quantitative Structure-Activity Relationship , Pharmaceutical Preparations , Pharmacokinetics
8.
Mol Pharm ; 20(1): 383-394, 2023 01 02.
Article in English | MEDLINE | ID: mdl-36437712

ABSTRACT

In pharmaceutical research, compounds are optimized for metabolic stability to avoid a too fast elimination of the drug. Intrinsic clearance (CLint) measured in liver microsomes or hepatocytes is an important parameter during lead optimization. In this work, machine learning models were developed to relate the compound structure to microsomal metabolic stability and predict CLint for new compounds. A multitask (MT) learning architecture was introduced to model the CLint of six species simultaneously, giving as a result a multispecies machine learning model. MT graph neural network (MT-GNN) regression was identified as the top-performing method, and an ensemble of 10 MT-GNN models was evaluated prospectively. Geometric mean fold errors were consistently smaller than 2-fold. Moreover, high precision values were obtained in the prediction of "high" (>300 µL/min/mg) and "low" (<100 µL/min/mg) CLint compounds. Precision values ranged from 80 to 94% for low CLint predictions and from 75 to 97% for high CLint predictions, depending on the species. Uncertainty on experimental values and model predictions was systematically quantified. Experimental variability (aleatoric uncertainty) of all historical Novartis in vitro clearance experiments was analyzed. Interestingly, MT-GNN models' performance approached assays' experimental variability. Moreover, uncertainty estimation in predictions (epistemic uncertainty) enabled identifying predictions associated with lower and higher error. Taken together, our manuscript combines a multispecies deep learning model and large-scale uncertainty analyses to improve CLint predictions and facilitate early informed decisions for compound prioritization.


Subject(s)
Hepatocytes , Microsomes, Liver , Metabolic Clearance Rate , Uncertainty , Hepatocytes/metabolism , Microsomes, Liver/metabolism , Kinetics
9.
J Cheminform ; 14(1): 82, 2022 Dec 02.
Article in English | MEDLINE | ID: mdl-36461094

ABSTRACT

We report the main conclusions of the first Chemoinformatics and Artificial Intelligence Colloquium, Mexico City, June 15-17, 2022. Fifteen lectures were presented during a virtual public event with speakers from industry, academia, and non-for-profit organizations. Twelve hundred and ninety students and academics from more than 60 countries. During the meeting, applications, challenges, and opportunities in drug discovery, de novo drug design, ADME-Tox (absorption, distribution, metabolism, excretion and toxicity) property predictions, organic chemistry, peptides, and antibiotic resistance were discussed. The program along with the recordings of all sessions are freely available at https://www.difacquim.com/english/events/2022-colloquium/ .

10.
iScience ; 25(10): 105043, 2022 Oct 21.
Article in English | MEDLINE | ID: mdl-36134335

ABSTRACT

Graph neural networks (GNNs) recursively propagate signals along the edges of an input graph, integrate node feature information with graph structure, and learn object representations. Like other deep neural network models, GNNs have notorious black box character. For GNNs, only few approaches are available to rationalize model decisions. We introduce EdgeSHAPer, a generally applicable method for explaining GNN-based models. The approach is devised to assess edge importance for predictions. Therefore, EdgeSHAPer makes use of the Shapley value concept from game theory. For proof-of-concept, EdgeSHAPer is applied to compound activity prediction, a central task in drug discovery. EdgeSHAPer's edge centricity is relevant for molecular graphs where edges represent chemical bonds. Combined with feature mapping, EdgeSHAPer produces intuitive explanations for compound activity predictions. Compared to a popular node-centric and another edge-centric GNN explanation method, EdgeSHAPer reveals higher resolution in differentiating features determining predictions and identifies minimal pertinent positive feature sets.

11.
J Chem Inf Model ; 62(13): 3180-3190, 2022 07 11.
Article in English | MEDLINE | ID: mdl-35738004

ABSTRACT

Assessing whether compounds penetrate the brain can become critical in drug discovery, either to prevent adverse events or to reach the biological target. Generally, pre-clinical in vivo studies measuring the ratio of brain and blood concentrations (Kp) are required to estimate the brain penetration potential of a new drug entity. In this work, we developed machine learning models to predict in vivo compound brain penetration (as LogKp) from chemical structure. Our results show the benefit of including in vitro experimental data as auxiliary tasks in multi-task graph neural network (MT-GNN) models. MT-GNNs outperformed single-task (ST) models solely trained on in vivo brain penetration data. The best-performing MT-GNN regression model achieved a coefficient of determination of 0.42 and a mean absolute error of 0.39 (2.5-fold) on a prospective validation set and outperformed all tested ST models. To facilitate decision-making, compounds were classified into brain-penetrant or non-penetrant, achieving a Matthew's correlation coefficient of 0.66. Taken together, our findings indicate that the inclusion of in vitro assay data as MT-GNN auxiliary tasks improves in vivo brain penetration predictions and prospective compound prioritization.


Subject(s)
Machine Learning , Neural Networks, Computer , Brain , Drug Discovery
12.
Annu Rev Biomed Data Sci ; 5: 43-65, 2022 08 10.
Article in English | MEDLINE | ID: mdl-35440144

ABSTRACT

In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains.


Subject(s)
Cheminformatics , Chemistry, Pharmaceutical , Algorithms , Chemistry, Pharmaceutical/methods , Machine Learning
13.
J Comput Aided Mol Des ; 36(5): 355-362, 2022 05.
Article in English | MEDLINE | ID: mdl-35304657

ABSTRACT

The support vector machine (SVM) algorithm is one of the most widely used machine learning (ML) methods for predicting active compounds and molecular properties. In chemoinformatics and drug discovery, SVM has been a state-of-the-art ML approach for more than a decade. A unique attribute of SVM is that it operates in feature spaces of increasing dimensionality. Hence, SVM conceptually departs from the paradigm of low dimensionality that applies to many other methods for chemical space navigation. The SVM approach is applicable to compound classification, and ranking, multi-class predictions, and -in algorithmically modified form- regression modeling. In the emerging era of deep learning (DL), SVM retains its relevance as one of the premier ML methods in chemoinformatics, for reasons discussed herein. We describe the SVM methodology including strengths and weaknesses and discuss selected applications that have contributed to the evolution of SVM as a premier approach for compound classification, property predictions, and virtual compound screening.


Subject(s)
Cheminformatics , Support Vector Machine , Algorithms , Drug Discovery
14.
J Med Chem ; 64(24): 17744-17752, 2021 12 23.
Article in English | MEDLINE | ID: mdl-34902252

ABSTRACT

The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.


Subject(s)
Drug Discovery/methods , Machine Learning , Algorithms , Chemistry, Pharmaceutical , Libraries, Digital
15.
ACS Omega ; 6(49): 33293-33299, 2021 Dec 14.
Article in English | MEDLINE | ID: mdl-34926881

ABSTRACT

As in other areas, artificial intelligence (AI) is heavily promoted in different scientific fields, including chemistry. Although chemistry traditionally tends to be a conservative field and slower than others to adapt new concepts, AI is increasingly being investigated across chemical disciplines. In medicinal chemistry, supported by computer-aided drug design and cheminformatics, computational methods have long been employed to aid in the search for and optimization of active compounds. We are currently witnessing a multitude of AI-related publications in the medicinal-chemistry-relevant literature and anticipate that the numbers will further increase. Often, advances through AI promoted in such reports are difficult to reconcile or remain questionable, which hampers the acceptance of computational work in interdisciplinary environments. Herein we attempt to highlight selected investigations in which AI has shown promise to impact medicinal chemistry in areas such as compound design and synthesis.

16.
Sci Rep ; 11(1): 14245, 2021 07 09.
Article in English | MEDLINE | ID: mdl-34244588

ABSTRACT

Machine learning is widely applied in drug discovery research to predict molecular properties and aid in the identification of active compounds. Herein, we introduce a new approach that uses model-internal information from compound activity predictions to uncover relationships between target proteins. On the basis of a large-scale analysis generating and comparing machine learning models for more than 200 proteins, feature importance correlation analysis is shown to detect similar compound binding characteristics. Furthermore, rather unexpectedly, the analysis also reveals functional relationships between proteins that are independent of active compounds and binding characteristics. Feature importance correlation analysis does not depend on specific representations, algorithms, or metrics and is generally applicable as long as predictive models can be derived. Moreover, the approach does not require or involve explainable or interpretable machine learning, but only access to feature weights or importance values. On the basis of our findings, the approach represents a new facet of machine learning in drug discovery with potential for practical applications.

17.
ACS Omega ; 6(5): 4080-4089, 2021 Feb 09.
Article in English | MEDLINE | ID: mdl-33585783

ABSTRACT

Carbonic anhydrases (CAs) catalyze the physiological hydration of carbon dioxide and are among the most intensely studied pharmaceutical target enzymes. A hallmark of CA inhibition is the complexation of the catalytic zinc cation in the active site. Human (h) CA isoforms belonging to different families are implicated in a wide range of diseases and of very high interest for therapeutic intervention. Given the conserved catalytic mechanisms and high similarity of many hCA isoforms, a major challenge for CA-based therapy is achieving inhibitor selectivity for hCA isoforms that are associated with specific pathologies over other widely distributed isoforms such as hCA I or hCA II that are of critical relevance for the integrity of many physiological processes. To address this challenge, we have attempted to predict compounds that are selective for isoform hCA IX, which is a tumor-associated protein and implicated in metastasis, over hCA II on the basis of a carefully curated data set of selective and nonselective inhibitors. Machine learning achieved surprisingly high accuracy in predicting hCA IX-selective inhibitors. The results were further investigated, and compound features determining successful predictions were identified. These features were then studied on the basis of X-ray structures of hCA isoform-inhibitor complexes and found to include substructures that explain compound selectivity. Our findings lend credence to selectivity predictions and indicate that the machine learning models derived herein have considerable potential to aid in the identification of new hCA IX-selective compounds.

18.
J Comput Aided Mol Des ; 35(3): 285-295, 2021 03.
Article in English | MEDLINE | ID: mdl-33598870

ABSTRACT

Machine learning (ML) enables modeling of quantitative structure-activity relationships (QSAR) and compound potency predictions. Recently, multi-target QSAR models have been gaining increasing attention. Simultaneous compound potency predictions for multiple targets can be carried out using ensembles of independently derived target-based QSAR models or in a more integrated and advanced manner using multi-target deep neural networks (MT-DNNs). Herein, single-target and multi-target ML models were systematically compared on a large scale in compound potency value predictions for 270 human targets. By design, this large-magnitude evaluation has been a special feature of our study. To these ends, MT-DNN, single-target DNN (ST-DNN), support vector regression (SVR), and random forest regression (RFR) models were implemented. Different test systems were defined to benchmark these ML methods under conditions of varying complexity. Source compounds were divided into training and test sets in a compound- or analog series-based manner taking target information into account. Data partitioning approaches used for model training and evaluation were shown to influence the relative performance of ML methods, especially for the most challenging compound data sets. For example, the performance of MT-DNNs with per-target models yielded superior performance compared to single-target models. For a test compound or its analogs, the availability of potency measurements for multiple targets affected model performance, revealing the influence of ML synergies.


Subject(s)
Drug Evaluation, Preclinical/methods , Machine Learning , Neural Networks, Computer , Quantitative Structure-Activity Relationship , Regression Analysis
19.
J Comput Aided Mol Des ; 34(10): 1013-1026, 2020 10.
Article in English | MEDLINE | ID: mdl-32361862

ABSTRACT

Difficulties in interpreting machine learning (ML) models and their predictions limit the practical applicability of and confidence in ML in pharmaceutical research. There is a need for agnostic approaches aiding in the interpretation of ML models regardless of their complexity that is also applicable to deep neural network (DNN) architectures and model ensembles. To these ends, the SHapley Additive exPlanations (SHAP) methodology has recently been introduced. The SHAP approach enables the identification and prioritization of features that determine compound classification and activity prediction using any ML model. Herein, we further extend the evaluation of the SHAP methodology by investigating a variant for exact calculation of Shapley values for decision tree methods and systematically compare this variant in compound activity and potency value predictions with the model-independent SHAP method. Moreover, new applications of the SHAP analysis approach are presented including interpretation of DNN models for the generation of multi-target activity profiles and ensemble regression models for potency prediction.


Subject(s)
Drug Discovery , Machine Learning , Neural Networks, Computer , Pharmaceutical Preparations/standards , Humans , Models, Molecular , Pharmaceutical Preparations/metabolism , Structure-Activity Relationship , Therapeutic Equivalency
20.
J Cheminform ; 12(1): 36, 2020 May 24.
Article in English | MEDLINE | ID: mdl-33431025

ABSTRACT

For kinase inhibitors, X-ray crystallography has revealed different types of binding modes. Currently, more than 2000 kinase inhibitors with known binding modes are available, which makes it possible to derive and test machine learning models for the prediction of inhibitors with different binding modes. We have addressed this prediction task to evaluate and compare the information content of distinct molecular representations including protein-ligand interaction fingerprints (IFPs) and compound structure-based structural fingerprints (i.e., atom environment/fragment fingerprints). IFPs were designed to capture binding mode-specific interaction patterns at different resolution levels. Accurate predictions of kinase inhibitor binding modes were achieved with random forests using both representations. The performance of IFPs was consistently superior to atom environment fingerprints, albeit only by less than 10%. An active learning strategy applying information entropy-based selection of training instances was applied as a diagnostic approach to assess the relative information content of distinct representations. IFPs were found to capture more binding mode-relevant information than atom environment fingerprints, leading to highly predictive models even when training instances were randomly selected. By contrast, for atom environment fingerprints, the derivation of accurate models via active learning depended on entropy-based selection of informative training compounds. Notably, higher information content of IFPs confirmed by active learning only resulted in small improvements in global prediction accuracy compared to models derived using atom environment fingerprints. For practical applications, prediction of binding modes of new kinase inhibitors on the basis of chemical structure is highly attractive.

SELECTION OF CITATIONS
SEARCH DETAIL
...