Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
Add more filters








Publication year range
1.
bioRxiv ; 2024 Sep 10.
Article in English | MEDLINE | ID: mdl-39314436

ABSTRACT

Recent advances in machine learning (ML) are reshaping drug discovery. Structure-based ML methods use physically-inspired models to predict binding affinities from protein:ligand complexes. These methods promise to enable the integration of data for many related targets, which addresses issues related to data scarcity for single targets and could enable generalizable predictions for a broad range of targets, including mutants. In this work, we report our experiences in building KinoML, a novel framework for ML in target-based small molecule drug discovery with an emphasis on structure-enabled methods. KinoML focuses currently on kinases as the relative structural conservation of this protein superfamily, particularly in the kinase domain, means it is possible to leverage data from the entire superfamily to make structure-informed predictions about binding affinities, selectivities, and drug resistance. Some key lessons learned in building KinoML include: the importance of reproducible data collection and deposition, the harmonization of molecular data and featurization, and the choice of the right data format to ensure reusability and reproducibility of ML models. As a result, KinoML allows users to easily achieve three tasks: accessing and curating molecular data; featurizing this data with representations suitable for ML applications; and running reproducible ML experiments that require access to ligand, protein, and assay information to predict ligand affinity. Despite KinoML focusing on kinases, this framework can be applied to other proteins. The lessons reported here can help guide the development of platforms for structure-enabled ML in other areas of drug discovery.

2.
J Chem Inf Model ; 64(16): 6259-6280, 2024 Aug 26.
Article in English | MEDLINE | ID: mdl-39136669

ABSTRACT

Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pretraining data, optimal architecture selections, and promising pretraining objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.


Subject(s)
Machine Learning , Drug Discovery/methods , Deep Learning
3.
Front Toxicol ; 6: 1401036, 2024.
Article in English | MEDLINE | ID: mdl-39086553

ABSTRACT

The cell painting (CP) assay has emerged as a potent imaging-based high-throughput phenotypic profiling (HTPP) tool that provides comprehensive input data for in silico prediction of compound activities and potential hazards in drug discovery and toxicology. CP enables the rapid, multiplexed investigation of various molecular mechanisms for thousands of compounds at the single-cell level. The resulting large volumes of image data provide great opportunities but also pose challenges to image and data analysis routines as well as property prediction models. This review addresses the integration of CP-based phenotypic data together with or in substitute of structural information from compounds into machine (ML) and deep learning (DL) models to predict compound activities for various human-relevant disease endpoints and to identify the underlying modes-of-action (MoA) while avoiding unnecessary animal testing. The successful application of CP in combination with powerful ML/DL models promises further advances in understanding compound responses of cells guiding therapeutic development and risk assessment. Therefore, this review highlights the importance of unlocking the potential of CP assays when combined with molecular fingerprints for compound evaluation and discusses the current challenges that are associated with this approach.

4.
J Chem Inf Model ; 64(10): 4009-4020, 2024 May 27.
Article in English | MEDLINE | ID: mdl-38751014

ABSTRACT

Drug discovery pipelines nowadays rely on machine learning models to explore and evaluate large chemical spaces. While including 3D structural information is considered beneficial, structural models are hindered by the availability of protein-ligand complex structures. Exemplified for kinase drug discovery, we address this issue by generating kinase-ligand complex data using template docking for the kinase compound subset of available ChEMBL assay data. To evaluate the benefit of the created complex data, we use it to train a structure-based E(3)-invariant graph neural network. Our evaluation shows that binding affinities can be predicted with significantly higher precision by models that take synthetic binding poses into account compared to ligand- or drug-target interaction models alone.


Subject(s)
Machine Learning , Molecular Docking Simulation , Ligands , Protein Kinase Inhibitors/pharmacology , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/metabolism , Neural Networks, Computer , Protein Kinases/metabolism , Protein Kinases/chemistry , Drug Discovery/methods , Protein Binding , Protein Conformation , Phosphotransferases/metabolism , Phosphotransferases/chemistry , Phosphotransferases/antagonists & inhibitors
5.
Sci Rep ; 14(1): 12303, 2024 05 29.
Article in English | MEDLINE | ID: mdl-38811639

ABSTRACT

The application of machine learning (ML) to solve real-world problems does not only bear great potential but also high risk. One fundamental challenge in risk mitigation is to ensure the reliability of the ML predictions, i.e., the model error should be minimized, and the prediction uncertainty should be estimated. Especially for medical applications, the importance of reliable predictions can not be understated. Here, we address this challenge for anti-cancer drug sensitivity prediction and prioritization. To this end, we present a novel drug sensitivity prediction and prioritization approach guaranteeing user-specified certainty levels. The developed conformal prediction approach is applicable to classification, regression, and simultaneous regression and classification. Additionally, we propose a novel drug sensitivity measure that is based on clinically relevant drug concentrations and enables a straightforward prioritization of drugs for a given cancer sample.


Subject(s)
Antineoplastic Agents , Machine Learning , Neoplasms , Antineoplastic Agents/therapeutic use , Antineoplastic Agents/pharmacology , Humans , Neoplasms/drug therapy , Reproducibility of Results
6.
Chem Commun (Camb) ; 60(7): 870-873, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38164786

ABSTRACT

Herein, we present the first application of target-directed dynamic combinatorial chemistry (tdDCC) to the whole complex of the highly dynamic transmembrane, energy-coupling factor (ECF) transporter ECF-PanT in Streptococcus pneumoniae. In addition, we successfully employed the tdDCC technique as a hit-identification and -optimization strategy that led to the identification of optimized ECF inhibitors with improved activity. We characterized the best compounds regarding cytotoxicity and performed computational modeling studies on the crystal structure of ECF-PanT to rationalize their binding mode. Notably, docking studies showed that the acylhydrazone linker is able to maintain the crucial interactions.


Subject(s)
Bacterial Proteins , Streptococcus pneumoniae , Models, Molecular , Bacterial Proteins/chemistry
7.
Nat Rev Drug Discov ; 22(11): 895-916, 2023 11.
Article in English | MEDLINE | ID: mdl-37697042

ABSTRACT

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.


Subject(s)
Artificial Intelligence , Biological Products , Humans , Algorithms , Machine Learning , Drug Discovery , Drug Design , Biological Products/pharmacology
8.
bioRxiv ; 2023 Sep 14.
Article in English | MEDLINE | ID: mdl-37745489

ABSTRACT

In recent years machine learning has transformed many aspects of the drug discovery process including small molecule design for which the prediction of the bioactivity is an integral part. Leveraging structural information about the interactions between a small molecule and its protein target has great potential for downstream machine learning scoring approaches, but is fundamentally limited by the accuracy with which protein:ligand complex structures can be predicted in a reliable and automated fashion. With the goal of finding practical approaches to generating useful kinase:inhibitor complex geometries for downstream machine learning scoring approaches, we present a kinase-centric docking benchmark assessing the performance of different classes of docking and pose selection strategies to assess how well experimentally observed binding modes are recapitulated in a realistic cross-docking scenario. The assembled benchmark data set focuses on the well-studied protein kinase family and comprises a subset of 589 protein structures co-crystallized with 423 ATP-competitive ligands. We find that the docking methods biased by the co-crystallized ligand-utilizing shape overlap with or without maximum common substructure matching-are more successful in recovering binding poses than standard physics-based docking alone. Also, docking into multiple structures significantly increases the chance to generate a low RMSD docking pose. Docking utilizing an approach that combines all three methods (Posit) into structures with the most similar co-crystallized ligands according to shape and electrostatics proofed to be the most efficient way to reproduce binding poses achieving a success rate of 66.9 % across all included systems. The studied docking and pose selection strategies-which utilize the OpenEye Toolkit-were implemented into pipelines of the KinoML framework allowing automated and reliable protein:ligand complex generation for future downstream machine learning tasks. Although focused on protein kinases, we believe the general findings can also be transferred to other protein families.

9.
Nature ; 615(7954): 913-919, 2023 03.
Article in English | MEDLINE | ID: mdl-36922589

ABSTRACT

Chromatin-binding proteins are critical regulators of cell state in haematopoiesis1,2. Acute leukaemias driven by rearrangement of the mixed lineage leukaemia 1 gene (KMT2Ar) or mutation of the nucleophosmin gene (NPM1) require the chromatin adapter protein menin, encoded by the MEN1 gene, to sustain aberrant leukaemogenic gene expression programs3-5. In a phase 1 first-in-human clinical trial, the menin inhibitor revumenib, which is designed to disrupt the menin-MLL1 interaction, induced clinical responses in patients with leukaemia with KMT2Ar or mutated NPM1 (ref. 6). Here we identified somatic mutations in MEN1 at the revumenib-menin interface in patients with acquired resistance to menin inhibition. Consistent with the genetic data in patients, inhibitor-menin interface mutations represent a conserved mechanism of therapeutic resistance in xenograft models and in an unbiased base-editor screen. These mutants attenuate drug-target binding by generating structural perturbations that impact small-molecule binding but not the interaction with the natural ligand MLL1, and prevent inhibitor-induced eviction of menin and MLL1 from chromatin. To our knowledge, this study is the first to demonstrate that a chromatin-targeting therapeutic drug exerts sufficient selection pressure in patients to drive the evolution of escape mutants that lead to sustained chromatin occupancy, suggesting a common mechanism of therapeutic resistance.


Subject(s)
Drug Resistance, Neoplasm , Leukemia , Mutation , Proto-Oncogene Proteins , Animals , Humans , Antineoplastic Agents/chemistry , Antineoplastic Agents/metabolism , Antineoplastic Agents/pharmacology , Antineoplastic Agents/therapeutic use , Binding Sites/drug effects , Binding Sites/genetics , Chromatin/genetics , Chromatin/metabolism , Drug Resistance, Neoplasm/genetics , Leukemia/drug therapy , Leukemia/genetics , Leukemia/metabolism , Protein Binding/drug effects , Proto-Oncogene Proteins/antagonists & inhibitors , Proto-Oncogene Proteins/chemistry , Proto-Oncogene Proteins/genetics , Proto-Oncogene Proteins/metabolism
10.
PLoS One ; 18(2): e0278325, 2023.
Article in English | MEDLINE | ID: mdl-36745631

ABSTRACT

Microglia are the immune effector cells of the central nervous system (CNS) and react to pathologic events with a complex process including the release of nitric oxide (NO). NO is a free radical, which is toxic for all cells at high concentrations. To target an exaggerated NO release, we tested a library of 16 544 chemical compounds for their effect on lipopolysaccharide (LPS)-induced NO release in cell line and primary neonatal microglia. We identified a compound (C1) which significantly reduced NO release in a dose-dependent manner, with a low IC50 (252 nM) and no toxic side effects in vitro or in vivo. Target finding strategies such as in silico modelling and mass spectroscopy hint towards a direct interaction between C1 and the nitric oxide synthase making C1 a great candidate for specific intra-cellular interaction with the NO producing machinery.


Subject(s)
Microglia , Nitric Oxide , Infant, Newborn , Humans , Microglia/metabolism , Nitric Oxide/metabolism , Neuroinflammatory Diseases , Nitric Oxide Synthase Type II/metabolism , Cell Line , Lipopolysaccharides/pharmacology , Lipopolysaccharides/metabolism
11.
Nat Rev Chem ; 6(4): 287-295, 2022 Apr.
Article in English | MEDLINE | ID: mdl-35783295

ABSTRACT

One aspirational goal of computational chemistry is to predict potent and drug-like binders for any protein, such that only those that bind are synthesized. In this Roadmap, we describe the launch of Critical Assessment of Computational Hit-finding Experiments (CACHE), a public benchmarking project to compare and improve small molecule hit-finding algorithms through cycles of prediction and experimental testing. Participants will predict small molecule binders for new and biologically relevant protein targets representing different prediction scenarios. Predicted compounds will be tested rigorously in an experimental hub, and all predicted binders as well as all experimental screening data, including the chemical structures of experimentally tested compounds, will be made publicly available, and not subject to any intellectual property restrictions. The ability of a range of computational approaches to find novel binders will be evaluated, compared, and openly published. CACHE will launch 3 new benchmarking exercises every year. The outcomes will be better prediction methods, new small molecule binders for target proteins of importance for fundamental biology or drug discovery, and a major technological step towards achieving the goal of Target 2035, a global initiative to identify pharmacological probes for all human proteins.

12.
Sci Rep ; 12(1): 7244, 2022 05 04.
Article in English | MEDLINE | ID: mdl-35508546

ABSTRACT

Machine learning models are widely applied to predict molecular properties or the biological activity of small molecules on a specific protein. Models can be integrated in a conformal prediction (CP) framework which adds a calibration step to estimate the confidence of the predictions. CP models present the advantage of ensuring a predefined error rate under the assumption that test and calibration set are exchangeable. In cases where the test data have drifted away from the descriptor space of the training data, or where assay setups have changed, this assumption might not be fulfilled and the models are not guaranteed to be valid. In this study, the performance of internally valid CP models when applied to either newer time-split data or to external data was evaluated. In detail, temporal data drifts were analysed based on twelve datasets from the ChEMBL database. In addition, discrepancies between models trained on publicly-available data and applied to proprietary data for the liver toxicity and MNT in vivo endpoints were investigated. In most cases, a drastic decrease in the validity of the models was observed when applied to the time-split or external (holdout) test sets. To overcome the decrease in model validity, a strategy for updating the calibration set with data more similar to the holdout set was investigated. Updating the calibration set generally improved the validity, restoring it completely to its expected value in many cases. The restored validity is the first requisite for applying the CP models with confidence. However, the increased validity comes at the cost of a decrease in model efficiency, as more predictions are identified as inconclusive. This study presents a strategy to recalibrate CP models to mitigate the effects of data drifts. Updating the calibration sets without having to retrain the model has proven to be a useful approach to restore the validity of most models.


Subject(s)
Biological Assay , Machine Learning , Calibration , Molecular Conformation
13.
Nucleic Acids Res ; 50(W1): W753-W760, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35524571

ABSTRACT

Computational pipelines have become a crucial part of modern drug discovery campaigns. Setting up and maintaining such pipelines, however, can be challenging and time-consuming-especially for novice scientists in this domain. TeachOpenCADD is a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects. We offer Python-based solutions for common tasks in cheminformatics and structural bioinformatics in the form of Jupyter notebooks, based on open source resources only. Including the 12 newly released additions, TeachOpenCADD now contains 22 notebooks that cover both theoretical background as well as hands-on programming. To promote reproducible and reusable research, we apply software best practices to our notebooks such as testing with automated continuous integration and adhering to the idiomatic Python style. The new TeachOpenCADD website is available at https://projects.volkamerlab.org/teachopencadd and all code is deposited on GitHub.


Subject(s)
Cheminformatics , Software , Computational Biology , Drug Discovery
14.
J Chem Inf Model ; 62(10): 2600-2616, 2022 05 23.
Article in English | MEDLINE | ID: mdl-35536589

ABSTRACT

Protein kinases are among the most important drug targets because their dysregulation can cause cancer, inflammatory and degenerative diseases, and many more. Developing selective inhibitors is challenging due to the highly conserved binding sites across the roughly 500 human kinases. Thus, detecting subtle similarities on a structural level can help explain and predict off-targets among the kinase family. Here, we present the kinase-focused, subpocket-enhanced KiSSim fingerprint (Kinase Structural Similarity). The fingerprint builds on the KLIFS pocket definition, composed of 85 residues aligned across all available protein kinase structures, which enables residue-by-residue comparison without a computationally expensive alignment. The residues' physicochemical and spatial properties are encoded within their structural context including key subpockets at the hinge region, the DFG motif, and the front pocket. Since structure was found to contain information complementary to sequence, we used the fingerprint to calculate all-against-all similarities within the structurally covered kinome. We could identify off-targets that are unexpected if solely considering the sequence-based kinome tree grouping; for example, Erlobinib's known kinase off-targets SLK and LOK show high similarities to the key target EGFR (TK group), although belonging to the STE group. KiSSim reflects profiling data better or at least as well as other approaches such as KLIFS pocket sequence identity, KLIFS interaction fingerprints (IFPs), or SiteAlign. To rationalize observed (dis)similarities, the fingerprint values can be visualized in 3D by coloring structures with residue and feature resolution. We believe that the KiSSim fingerprint is a valuable addition to the kinase research toolbox to guide off-target and polypharmacology prediction. The method is distributed as an open-source Python package on GitHub and as a conda package: https://github.com/volkamerlab/kissim.


Subject(s)
Protein Kinase Inhibitors , Protein Kinases , Binding Sites , Humans , Ligands , Polypharmacology , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Protein Kinases/metabolism
15.
Environ Int ; 158: 106947, 2022 01.
Article in English | MEDLINE | ID: mdl-34717173

ABSTRACT

BACKGROUND: Exposure to environmental chemicals that interfere with normal estrogen function can lead to adverse health effects, including cancer. High-throughput screening (HTS) approaches facilitate the efficient identification and characterization of such substances. OBJECTIVES: We recently described the development of the E-Morph Assay, which measures changes at adherens junctions as a clinically-relevant phenotypic readout for estrogen receptor (ER) alpha signaling activity. Here, we describe its further development and application for automated robotic HTS. METHODS: Using the advanced E-Morph Screening Assay, we screened a substance library comprising 430 toxicologically-relevant industrial chemicals, biocides, and plant protection products to identify novel substances with estrogenic activities. Based on the primary screening data and the publicly available ToxCast dataset, we performed an insilico similarity search to identify further substances with potential estrogenic activity for follow-up hit expansion screening, and built seven insilico ER models using the conformal prediction (CP) framework to evaluate the HTS results. RESULTS: The primary and hit confirmation screens identified 27 'known' estrogenic substances with potencies correlating very well with the published ToxCast ER Agonist Score (r=+0.95). We additionally detected potential 'novel' estrogenic activities for 10 primary hit substances and for another nine out of 20 structurally similar substances from insilico predictions and follow-up hit expansion screening. The concordance of the E-Morph Screening Assay with the ToxCast ER reference data and the generated CP ER models was 71% and 73%, respectively, with a high predictivity for ER active substances of up to 87%, which is particularly important for regulatory purposes. DISCUSSION: These data provide a proof-of-concept for the combination of in vitro HTS approaches with insilico methods (similarity search, CP models) for efficient analysis of large substance libraries in order to prioritize substances with potential estrogenic activity for subsequent testing against higher tier human endpoints.


Subject(s)
Endocrine Disruptors , Biological Assay , Estrogens/toxicity , Estrone , High-Throughput Screening Assays , Humans
16.
J Chem Inf Model ; 61(7): 3255-3272, 2021 07 26.
Article in English | MEDLINE | ID: mdl-34153183

ABSTRACT

Computational methods such as machine learning approaches have a strong track record of success in predicting the outcomes of in vitro assays. In contrast, their ability to predict in vivo endpoints is more limited due to the high number of parameters and processes that may influence the outcome. Recent studies have shown that the combination of chemical and biological data can yield better models for in vivo endpoints. The ChemBioSim approach presented in this work aims to enhance the performance of conformal prediction models for in vivo endpoints by combining chemical information with (predicted) bioactivity assay outcomes. Three in vivo toxicological endpoints, capturing genotoxic (MNT), hepatic (DILI), and cardiological (DICC) issues, were selected for this study due to their high relevance for the registration and authorization of new compounds. Since the sparsity of available biological assay data is challenging for predictive modeling, predicted bioactivity descriptors were introduced instead. Thus, a machine learning model for each of the 373 collected biological assays was trained and applied on the compounds of the in vivo toxicity data sets. Besides the chemical descriptors (molecular fingerprints and physicochemical properties), these predicted bioactivities served as descriptors for the models of the three in vivo endpoints. For this study, a workflow based on a conformal prediction framework (a method for confidence estimation) built on random forest models was developed. Furthermore, the most relevant chemical and bioactivity descriptors for each in vivo endpoint were preselected with lasso models. The incorporation of bioactivity descriptors increased the mean F1 scores of the MNT model from 0.61 to 0.70 and for the DICC model from 0.72 to 0.82 while the mean efficiencies increased by roughly 0.10 for both endpoints. In contrast, for the DILI endpoint, no significant improvement in model performance was observed. Besides pure performance improvements, an analysis of the most important bioactivity features allowed detection of novel and less intuitive relationships between the predicted biological assay outcomes used as descriptors and the in vivo endpoints. This study presents how the prediction of in vivo toxicity endpoints can be improved by the incorporation of biological information-which is not necessarily captured by chemical descriptors-in an automated workflow without the need for adding experimental workload for the generation of bioactivity descriptors as predicted outcomes of bioactivity assays were utilized. All bioactivity CP models for deriving the predicted bioactivities, as well as the in vivo toxicity CP models, can be freely downloaded from https://doi.org/10.5281/zenodo.4761225.


Subject(s)
Liver , Machine Learning , Biological Assay , Molecular Conformation
17.
Arch Pharm (Weinheim) ; 354(9): e2100123, 2021 Sep.
Article in English | MEDLINE | ID: mdl-34008218

ABSTRACT

The bioactive components of Garcinia indica, garcinol (camboginol), and isogarcinol (cambogin), are suitable drug candidates for the treatment of various human diseases. HIV-1-RNase H assay was used to study the RNase H inhibition by garcinol and isogarcinol. Docking of garcinol into the active site of the enzyme was carried out to rationalize the difference in activities between the two compounds. Garcinol showed higher HIV-1-RNase H inhibition than the known inhibitor RDS1759 and retained full potency against the RNase H of a drug-resistant HIV-1 reverse transcriptase form. Isogarcinol was distinctly less active than garcinol, indicating the importance of the enolizable ß-diketone moiety of garcinol for anti-RNase H activity. Docking calculations confirmed these findings and suggested this moiety to be involved in the chelation of metal ions of the active site. On the basis of its HIV-1 reverse transcriptase-associated RNase H inhibitory activity, garcinol is worth being further explored concerning its potential as a cost-effective treatment for HIV patients.


Subject(s)
Garcinia/chemistry , Reverse Transcriptase Inhibitors/pharmacology , Ribonuclease H, Human Immunodeficiency Virus/antagonists & inhibitors , Terpenes/pharmacology , HIV Infections/drug therapy , HIV Infections/virology , HIV-1/drug effects , HIV-1/enzymology , Molecular Docking Simulation , Reverse Transcriptase Inhibitors/isolation & purification , Terpenes/isolation & purification
18.
Int J Mol Sci ; 22(9)2021 Apr 23.
Article in English | MEDLINE | ID: mdl-33922714

ABSTRACT

Drug discovery is a cost and time-intensive process that is often assisted by computational methods, such as virtual screening, to speed up and guide the design of new compounds. For many years, machine learning methods have been successfully applied in the context of computer-aided drug discovery. Recently, thanks to the rise of novel technologies as well as the increasing amount of available chemical and bioactivity data, deep learning has gained a tremendous impact in rational active compound discovery. Herein, recent applications and developments of machine learning, with a focus on deep learning, in virtual screening for active compound design are reviewed. This includes introducing different compound and protein encodings, deep learning techniques as well as frequently used bioactivity and benchmark data sets for model training and testing. Finally, the present state-of-the-art, including the current challenges and emerging problems, are examined and discussed.


Subject(s)
Computational Biology/methods , Deep Learning , Drug Design , Drug Discovery/methods , Neural Networks, Computer , Proteins/chemistry , Humans , Technology, Pharmaceutical
19.
J Cheminform ; 13(1): 35, 2021 Apr 29.
Article in English | MEDLINE | ID: mdl-33926567

ABSTRACT

Machine learning methods are widely used in drug discovery and toxicity prediction. While showing overall good performance in cross-validation studies, their predictive power (often) drops in cases where the query samples have drifted from the training data's descriptor space. Thus, the assumption for applying machine learning algorithms, that training and test data stem from the same distribution, might not always be fulfilled. In this work, conformal prediction is used to assess the calibration of the models. Deviations from the expected error may indicate that training and test data originate from different distributions. Exemplified on the Tox21 datasets, composed of chronologically released Tox21Train, Tox21Test and Tox21Score subsets, we observed that while internally valid models could be trained using cross-validation on Tox21Train, predictions on the external Tox21Score data resulted in higher error rates than expected. To improve the prediction on the external sets, a strategy exchanging the calibration set with more recent data, such as Tox21Test, has successfully been introduced. We conclude that conformal prediction can be used to diagnose data drifts and other issues related to model calibration. The proposed improvement strategy-exchanging the calibration data only-is convenient as it does not require retraining of the underlying model.

20.
Int J Mol Sci ; 22(5)2021 Feb 24.
Article in English | MEDLINE | ID: mdl-33668139

ABSTRACT

New 2-(thien-2-yl)-acrylonitriles with putative kinase inhibitory activity were prepared and tested for their antineoplastic efficacy in hepatoma models. Four out of the 14 derivatives were shown to inhibit hepatoma cell proliferation at (sub-)micromolar concentrations with IC50 values below that of the clinically relevant multikinase inhibitor sorafenib, which served as a reference. Colony formation assays as well as primary in vivo examinations of hepatoma tumors grown on the chorioallantoic membrane of fertilized chicken eggs (CAM assay) confirmed the excellent antineoplastic efficacy of the new derivatives. Their mode of action included an induction of apoptotic capsase-3 activity, while no contribution of unspecific cytotoxic effects was observed in LDH-release measurements. Kinase profiling of cancer relevant protein kinases identified the two 3-aryl-2-(thien-2-yl)acrylonitrile derivatives 1b and 1c as (multi-)kinase inhibitors with a preferential activity against the VEGFR-2 tyrosine kinase. Additional bioinformatic analysis of the VEGFR-2 binding modes by docking and molecular dynamics calculations supported the experimental findings and indicated that the hydroxy group of 1c might be crucial for its distinct inhibitory potency against VEGFR-2. Forthcoming studies will further unveil the underlying mode of action of the promising new derivatives as well as their suitability as an urgently needed novel approach in HCC treatment.


Subject(s)
Acrylonitrile/chemistry , Carcinoma, Hepatocellular/drug therapy , Liver Neoplasms/drug therapy , Protein Kinase Inhibitors/chemistry , Protein Kinase Inhibitors/pharmacology , Thiophenes/pharmacology , Vascular Endothelial Growth Factor Receptor-2/antagonists & inhibitors , Carcinoma, Hepatocellular/metabolism , Carcinoma, Hepatocellular/pathology , Cell Proliferation , Dose-Response Relationship, Drug , Drug Screening Assays, Antitumor , Hep G2 Cells , Humans , Liver Neoplasms/metabolism , Liver Neoplasms/pathology , Molecular Docking Simulation , Molecular Structure , Structure-Activity Relationship , Thiophenes/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL