RESUMO
The widespread proliferation of artificial intelligence (AI) and machine learning (ML) methods has a profound effect on the drug discovery process. However, many scientists are reluctant to utilize these powerful tools due to the steep learning curve typically associated with them. AIDDISON offers a convenient, secure, web-based platform for drug discovery, addressing the reluctance of scientists to adopt AI and ML methods due to the steep learning curve. By seamlessly integrating generative models, ADMET property predictions, searches in vast chemical spaces, and molecular docking, AIDDISON provides a sophisticated platform for modern drug discovery. It enables less computer-savvy scientists to utilize these powerful tools in their daily activities, as demonstrated by an example of identifying a valuable set of molecules for lead optimization. With AIDDISON, the benefits of AI/ML in drug discovery are accessible to all.
Assuntos
Inteligência Artificial , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Descoberta de Drogas , Poder Psicológico , InternetRESUMO
Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.
Assuntos
Benchmarking , Relação Quantitativa Estrutura-Atividade , Bioensaio , Aprendizado de MáquinaRESUMO
We present the computational deâ novo design of synthetically accessible chemical entities that mimic the complex sesquiterpene natural product (-)-Englerinâ A. We synthesized lead-like probes from commercially available building blocks and profiled them for activity against a computationally predicted panel of macromolecular targets. Both the design template (-)-Englerinâ A and its low-molecular weight mimetics presented nanomolar binding affinities and antagonized the transient receptor potential calcium channel TRPM8 in a cell-based assay, without showing target promiscuity or frequent-hitter properties. This proof-of-concept study outlines an expeditious solution to obtaining natural-product-inspired chemical matter with desirable properties.
RESUMO
Efficient prioritization of bioactive compounds from high throughput screening campaigns is a fundamental challenge for accelerating drug development efforts. In this study, we present the first data-driven approach to simultaneously detect assay interferents and prioritize true bioactive compounds. By analyzing the learning dynamics during training of a gradient boosting model on noisy high throughput screening data using a novel formulation of sample influence, we are able to distinguish between compounds exhibiting the desired biological response and those producing assay artifacts. Therefore, our method enables false positive and true positive detection without relying on prior screens or assay interference mechanisms, making it applicable to any high throughput screening campaign. We demonstrate that our approach consistently excludes assay interferents with different mechanisms and prioritizes biologically relevant compounds more efficiently than all tested baselines, including a retrospective case study simulating its use in a real drug discovery campaign. Finally, our tool is extremely computationally efficient, requiring less than 30 s per assay on low-resource hardware. As such, our findings show that our method is an ideal addition to existing false positive detection tools and can be used to guide further pharmacological optimization after high throughput screening campaigns.
RESUMO
Decision tree ensembles are among the most robust, high-performing and computationally efficient machine learning approaches for quantitative structure-activity relationship (QSAR) modeling. Among them, gradient boosting has recently garnered particular attention, for its performance in data science competitions, virtual screening campaigns, and bioactivity prediction. However, different variants of gradient boosting exist, the most popular being XGBoost, LightGBM and CatBoost. Our study provides the first comprehensive comparison of these approaches for QSAR. To this end, we trained 157,590 gradient boosting models, which were evaluated on 16 datasets and 94 endpoints, comprising 1.4 million compounds in total. Our results show that XGBoost generally achieves the best predictive performance, while LightGBM requires the least training time, especially for larger datasets. In terms of feature importance, the models surprisingly rank molecular features differently, reflecting differences in regularization techniques and decision tree structures. Thus, expert knowledge must always be employed when evaluating data-driven explanations of bioactivity. Furthermore, our results show that the relevance of each hyperparameter varies greatly across datasets and that it is crucial to optimize as many hyperparameters as possible to maximize the predictive performance. In conclusion, our study provides the first set of guidelines for cheminformatics practitioners to effectively train, optimize and evaluate gradient boosting models for virtual screening and QSAR applications.
RESUMO
Praziquantel (PZQ) is an essential anthelmintic drug recently established to be an activator of a Transient Receptor Potential Melastatin (TRPMPZQ ) ion channel in trematode worms. Bioinformatic, mutagenesis and drug metabolism work indicate that the cyclohexyl ring of PZQ is a key pharmacophore for activation of trematode TRPMPZQ , as well as serving as the primary site of oxidative metabolism which results in PZQ being a short-lived drug. Based on our recent findings, the hydrophobic cleft in schistosome TRPMPZQ defined by three hydrophobic residues surrounding the cyclohexyl ring has little tolerance for polarity. Here we evaluate the inâ vitro and inâ vivo activities of PZQ analogues with improved metabolic stability relative to the challenge of maintaining activity on the channel. Finally, an estimation of the respective contribution to the overall activity of both the parent and the main metabolite of PZQ in humans is reported.
Assuntos
Anti-Helmínticos , Parasitos , Canais de Cátion TRPM , Canais de Potencial de Receptor Transitório , Humanos , Animais , Praziquantel/farmacologia , Praziquantel/química , Anti-Helmínticos/farmacologia , Anti-Helmínticos/uso terapêutico , Schistosoma mansoniRESUMO
Introduction: In this study, we demonstrate the feasibility of yeast surface display (YSD) and nextgeneration sequencing (NGS) in combination with artificial intelligence and machine learning methods (AI/ML) for the identification of de novo humanized single domain antibodies (sdAbs) with favorable early developability profiles. Methods: The display library was derived from a novel approach, in which VHH-based CDR3 regions obtained from a llama (Lama glama), immunized against NKp46, were grafted onto a humanized VHH backbone library that was diversified in CDR1 and CDR2. Following NGS analysis of sequence pools from two rounds of fluorescence-activated cell sorting we focused on four sequence clusters based on NGS frequency and enrichment analysis as well as in silico developability assessment. For each cluster, long short-term memory (LSTM) based deep generative models were trained and used for the in silico sampling of new sequences. Sequences were subjected to sequence- and structure-based in silico developability assessment to select a set of less than 10 sequences per cluster for production. Results: As demonstrated by binding kinetics and early developability assessment, this procedure represents a general strategy for the rapid and efficient design of potent and automatically humanized sdAb hits from screening selections with favorable early developability profiles.
RESUMO
We report an analysis of the propensity of the antimalarial agent cabamiquine, a Plasmodium-specific eukaryotic elongation factor 2 inhibitor, to select for resistant Plasmodium falciparum parasites. Through in vitro studies of laboratory strains and clinical isolates, a humanized mouse model, and volunteer infection studies, we identified resistance-associated mutations at 11 amino acid positions. Of these, six (55%) were present in more than one infection model, indicating translatability across models. Mathematical modelling suggested that resistant mutants were likely pre-existent at the time of drug exposure across studies. Here, we estimated a wide range of frequencies of resistant mutants across the different infection models, much of which can be attributed to stochastic differences resulting from experimental design choices. Structural modelling implicates binding of cabamiquine to a shallow mRNA binding site adjacent to two of the most frequently identified resistance mutations.
Assuntos
Antimaláricos , Parasitos , Animais , Camundongos , Antimaláricos/farmacologia , Aminoácidos , Sítios de Ligação , Modelos Animais de DoençasRESUMO
While in the last years there has been a dramatic increase in the number of available bioassay datasets, many of them suffer from extremely imbalanced distribution between active and inactive compounds. Thus, there is an urgent need for novel approaches to tackle class imbalance in drug discovery. Inspired by recent advances in computer vision, we investigated a panel of alternative loss functions for imbalanced classification in the context of Gradient Boosting and benchmarked them on six datasets from public and proprietary sources, for a total of 42 tasks and 2 million compounds. Our findings show that with these modifications, we achieve statistically significant improvements over the conventional cross-entropy loss function on five out of six datasets. Furthermore, by employing these bespoke loss functions we are able to push Gradient Boosting to match or outperform a wide variety of previously reported classifiers and neural networks. We also investigate the impact of changing the loss function on training time and find that it increases convergence speed up to 8 times faster. As such, these results show that tuning the loss function for Gradient Boosting is a straightforward and computationally efficient method to achieve state-of-the-art performance on imbalanced bioassay datasets without compromising on interpretability and scalability.
RESUMO
The repertoire of natural products offers tremendous opportunities for chemical biology and drug discovery. Natural product-inspired synthetic molecules represent an ecologically and economically sustainable alternative to the direct utilization of natural products. De novo design with machine intelligence bridges the gap between the worlds of bioactive natural products and synthetic molecules. On employing the compound Marinopyrrole A from marine Streptomyces as a design template, the algorithm constructs innovative small molecules that can be synthesized in three steps, following the computationally suggested synthesis route. Computational activity prediction reveals cyclooxygenase (COX) as a putative target of both Marinopyrrole A and the de novo designs. The molecular designs are experimentally confirmed as selective COX-1 inhibitors with nanomolar potency. X-ray structure analysis reveals the binding of the most selective compound to COX-1. This molecular design approach provides a blueprint for natural product-inspired hit and lead identification for drug discovery with machine intelligence.
Assuntos
Produtos Biológicos/química , Inibidores de Ciclo-Oxigenase/síntese química , Desenho de Fármacos/métodos , Descoberta de Drogas/métodos , Pirróis/química , Inteligência Artificial , Inibidores de Ciclo-Oxigenase/químicaRESUMO
Praziquantel (PZQ) is an essential medicine for treating parasitic flatworm infections such as schistosomiasis, which afflicts over 250 million people. However, PZQ is not universally effective, lacking activity against liver flukes of the Fasciola genus. The reason for this insensitivity is unclear, as the mechanism of PZQ action is unknown. Here, we use ligand- and target-based methods to demonstrate that PZQ activates a transient receptor potential melastatin ion channel (TRPMPZQ) in schistosomes by engaging a hydrophobic ligand binding pocket within the voltage sensorlike domain of the channel to cause calcium entry and worm paralysis. PZQ activates TRPMPZQ homologs in other PZQ-sensitive flukes, but not Fasciola hepatica. However, a single amino acid change in the F. hepatica TRPMPZQ binding pocket, to mimic schistosome TRPMPZQ, confers PZQ sensitivity. After decades of clinical use, the molecular basis of PZQ action at a druggable TRP channel is resolved.
Assuntos
Anti-Helmínticos , Platelmintos , Animais , Anti-Helmínticos/farmacologia , Anti-Helmínticos/uso terapêutico , Humanos , Canais Iônicos/metabolismo , Praziquantel/metabolismo , Praziquantel/farmacologia , Praziquantel/uso terapêutico , Schistosoma/metabolismoRESUMO
Molecular shape and pharmacological function are interconnected. To capture shape, the fractal dimensionality concept was employed, providing a natural similarity measure for the virtual screening of de novo generated small molecules mimicking the structurally complex natural product (-)-englerin A. Two of the top-ranking designs were synthesized and tested for their ability to modulate transient receptor potential (TRP) cation channels which are cellular targets of (-)-englerin A. Intracellular calcium assays and electrophysiological whole-cell measurements of TRPC4 and TRPM8 channels revealed potent inhibitory effects of one of the computer-generated compounds. Four derivatives of this identified hit compound had comparable effects on TRPC4 and TRPM8. The results of this study corroborate the use of fractal dimensionality as an innovative shape-based molecular representation for molecular scaffold-hopping.
Assuntos
Desenho de Fármacos , Sesquiterpenos de Guaiano/farmacologia , Canais de Cátion TRPC/antagonistas & inibidores , Canais de Cátion TRPM/antagonistas & inibidores , Células HEK293 , Humanos , Modelos Moleculares , Estrutura Molecular , Sesquiterpenos de Guaiano/síntese química , Sesquiterpenos de Guaiano/química , Canais de Cátion TRPC/metabolismo , Canais de Cátion TRPM/metabolismoRESUMO
Invited for this month's cover picture is the group of Prof.â Dr. Gisbert Schneider from the Swiss Federal Institute of Technology (ETH) Zurich (Switzerland). The cover picture illustrates the application of machine-learning methods to expand the chemical space of farnesoidâ X receptor (FXR)-targeting small molecules, by employing an ensemble of three complementary machine-learning approaches (counter-propagation artificial neural network, k-nearest neighbor learner, and three-dimensional pharmacophore model). Read the full text of their Full Paper at 10.1002/open.201800156.
RESUMO
The bile acid activated transcription factor farnesoidâ X receptor (FXR) has revealed therapeutic potential as a molecular drug target for the treatment of hepatic and metabolic disorders. Despite strong efforts in FXR ligand development, the structural diversity among the known FXR modulators is limited. Only four molecular frameworks account for more than 50 % of the FXR modulators annotated in ChEMBL. Here, we leverage machine learning methods to expand the chemical space of FXR-targeting small molecules by employing an ensemble of three complementary machine learning approaches. A counter-propagation artificial neural network, a k-nearest neighbor learner, and a three-dimensional pharmacophore descriptor were combined to retrieve novel FXR ligands from a collection of more than 3â million compounds. The ensemble machine learning model identified six new FXR modulators among ten top-ranked candidates. These active hits comprise both FXR activators and antagonists with micromolar potencies. With four novel FXR ligand scaffolds, these computationally identified bioactive compounds appreciably expand the chemical space of known FXR modulators and may serve as starting points for hit-to-lead expansion.
RESUMO
A virtual screening protocol based on machine learning models was used to identify mimetics of the natural product (-)-galantamine. This fully automated approach identified eight compounds with bioactivities on at least one of the macromolecular targets of (-)-galantamine, with different polypharmacological profiles. Two of the computer-generated hits possess an expanded spectrum of bioactivity on targets relevant to the treatment of Alzheimer's disease and are suitable for hit-to-lead expansion. These results advocate multitarget drug design by advanced virtual screening protocols based on chemically informed machine learning models.
Assuntos
Doença de Alzheimer/tratamento farmacológico , Produtos Biológicos/farmacologia , Inibidores da Colinesterase/farmacologia , Desenho de Fármacos , Galantamina/farmacologia , Aprendizado de Máquina , Fármacos Neuroprotetores/farmacologia , Acetilcolinesterase/metabolismo , Doença de Alzheimer/metabolismo , Produtos Biológicos/síntese química , Produtos Biológicos/química , Linhagem Celular Tumoral , Inibidores da Colinesterase/síntese química , Inibidores da Colinesterase/química , Avaliação Pré-Clínica de Medicamentos , Galantamina/síntese química , Galantamina/química , Humanos , Ligantes , Simulação de Acoplamento Molecular , Estrutura Molecular , Fármacos Neuroprotetores/síntese química , Fármacos Neuroprotetores/química , EstereoisomerismoRESUMO
Generative artificial intelligence offers a fresh view on molecular design. We present the first-time prospective application of a deep learning model for designing new druglike compounds with desired activities. For this purpose, we trained a recurrent neural network to capture the constitution of a large set of known bioactive compounds represented as SMILES strings. By transfer learning, this general model was fine-tuned on recognizing retinoid X and peroxisome proliferator-activated receptor agonists. We synthesized five top-ranking compounds designed by the generative model. Four of the compounds revealed nanomolar to low-micromolar receptor modulatory activity in cell-based assays. Apparently, the computational model intrinsically captured relevant chemical and biological knowledge without the need for explicit rules. The results of this study advocate generative artificial intelligence for prospective de novo molecular design, and demonstrate the potential of these methods for future medicinal chemistry.
Assuntos
Aprendizado Profundo , Desenho de Fármacos , Receptores Ativados por Proliferador de Peroxissomo/agonistas , Receptores X de Retinoides/agonistas , Células HEK293 , Humanos , Simulação de Acoplamento Molecular , Receptores Ativados por Proliferador de Peroxissomo/química , Relação Quantitativa Estrutura-Atividade , Receptores X de Retinoides/química , Bibliotecas de Moléculas Pequenas/síntese química , Bibliotecas de Moléculas Pequenas/farmacologiaRESUMO
The lack of potent subtype-selective modulators of retinoid X receptors (RXRs) has hindered their full exploitation as promising drug targets. Using computational similarity searching, target prediction and automated de novo design, we identified novel RXR ligands exhibiting innovative molecular frameworks, pronounced receptor-subtype preference and suitable properties for hit-to-lead expansion.
RESUMO
Natural products (NPs) are progressively recognized as invaluable source of pharmacological tools and lead structures. To enable NP-inspired retinoid X receptor (RXR) modulator design, three novel RXR-targeting NPs were computationally identified. Among them, valerenic acid was found to be selective for RXRß, rendering it a unique pharmacological tool compound. The NPs then served as templates for automated, ligand-based de novo design of innovative, easily accessible mimetics that inherited the biological activities of their natural templates.
Assuntos
Produtos Biológicos/química , Biologia Computacional/métodos , Indenos/farmacologia , Receptores X de Retinoides/metabolismo , Sesquiterpenos/farmacologia , Abietanos/química , Abietanos/farmacologia , Ácidos Carboxílicos/química , Ácidos Carboxílicos/farmacologia , Descoberta de Drogas/métodos , Avaliação Pré-Clínica de Medicamentos/métodos , Células Hep G2 , Humanos , Indenos/química , Ligantes , Fenantrenos/química , Fenantrenos/farmacologia , Receptores X de Retinoides/agonistas , Receptores X de Retinoides/química , Sesquiterpenos/químicaRESUMO
Molecular descriptors capture diverse structural information of molecules and are a prerequisite for ligand-based similarity searching. In this study, we introduce topological matrix-based descriptors to virtual screening for hit discovery. We evaluated the usefulness of matrix-based descriptors in a retrospective setting and compared them with topological pharmacophore descriptors. Special attention was given to the influence of data pre-processing and the applied similarity metric on the virtual screening performance. Overall, the MB descriptors showed a competitive and complementary performance to other descriptors. A prospective screen of a commercial compound library led to the discovery of a novel natural-product-derived cyclooxygenase-2 inhibitor predicted to interact differently with the target protein compared to the query compound ibuprofen. The results of our study motivate the use of matrix-based descriptors for molecular similarity-based virtual screening and scaffold hopping.