Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 117
Filtrar
1.
J Chem Inf Model ; 2024 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-38950185

RESUMEN

Machine-learning (ML) and deep-learning (DL) approaches to predict the molecular properties of small molecules are increasingly deployed within the design-make-test-analyze (DMTA) drug design cycle to predict molecular properties of interest. Despite this uptake, there are only a few automated packages to aid their development and deployment that also support uncertainty estimation, model explainability, and other key aspects of model usage. This represents a key unmet need within the field, and the large number of molecular representations and algorithms (and associated parameters) means it is nontrivial to robustly optimize, evaluate, reproduce, and deploy models. Here, we present QSARtuna, a molecule property prediction modeling pipeline, written in Python and utilizing the Optuna, Scikit-learn, RDKit, and ChemProp packages, which enables the efficient and automated comparison between molecular representations and machine learning models. The platform was developed by considering the increasingly important aspect of model uncertainty quantification and explainability by design. We provide details for our framework and provide illustrative examples to demonstrate the capability of the software when applied to simple molecular property, reaction/reactivity prediction, and DNA encoded library enrichment classification. We hope that the release of QSARtuna will further spur innovation in automatic ML modeling and provide a platform for education of best practices in molecular property modeling. The code for the QSARtuna framework is made freely available via GitHub.

2.
Nat Commun ; 15(1): 5640, 2024 Jul 05.
Artículo en Inglés | MEDLINE | ID: mdl-38965235

RESUMEN

The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design.


Asunto(s)
Ciencia de los Datos , Descubrimiento de Drogas , Aprendizaje Automático , Descubrimiento de Drogas/métodos , Ciencia de los Datos/métodos , Humanos , Inteligencia Artificial , Difusión de la Información/métodos , Minería de Datos/métodos , Nube Computacional , Bases de Datos Factuales
3.
J Cheminform ; 16(1): 57, 2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38778382

RESUMEN

We present an updated overview of the AiZynthFinder package for retrosynthesis planning. Since the first version was released in 2020, we have added a substantial number of new features based on user feedback. Feature enhancements include policies for filter reactions, support for any one-step retrosynthesis model, a scoring framework and several additional search algorithms. To exemplify the typical use-cases of the software and highlight some learnings, we perform a large-scale analysis on several hundred thousand target molecules from diverse sources. This analysis looks at for instance route shape, stock usage and exploitation of reaction space, and points out strengths and weaknesses of our retrosynthesis approach. The software is released as open-source for educational purposes as well as to provide a reference implementation of the core algorithms for synthesis prediction. We hope that releasing the software as open-source will further facilitate innovation in developing novel methods for synthetic route prediction. AiZynthFinder is a fast, robust and extensible open-source software and can be downloaded from https://github.com/MolecularAI/aizynthfinder .

4.
Chem Sci ; 15(11): 4146-4160, 2024 Mar 13.
Artículo en Inglés | MEDLINE | ID: mdl-38487235

RESUMEN

Reinforcement learning (RL) is a powerful and flexible paradigm for searching for solutions in high-dimensional action spaces. However, bridging the gap between playing computer games with thousands of simulated episodes and solving real scientific problems with complex and involved environments (up to actual laboratory experiments) requires improvements in terms of sample efficiency to make the most of expensive information. The discovery of new drugs is a major commercial application of RL, motivated by the very large nature of the chemical space and the need to perform multiparameter optimization (MPO) across different properties. In silico methods, such as virtual library screening (VS) and de novo molecular generation with RL, show great promise in accelerating this search. However, incorporation of increasingly complex computational models in these workflows requires increasing sample efficiency. Here, we introduce an active learning system linked with an RL model (RL-AL) for molecular design, which aims to improve the sample-efficiency of the optimization process. We identity and characterize unique challenges combining RL and AL, investigate the interplay between the systems, and develop a novel AL approach to solve the MPO problem. Our approach greatly expedites the search for novel solutions relative to baseline-RL for simple ligand- and structure-based oracle functions, with a 5-66-fold increase in hits generated for a fixed oracle budget and a 4-64-fold reduction in computational time to find a specific number of hits. Furthermore, compounds discovered through RL-AL display substantial enrichment of a multi-parameter scoring objective, indicating superior efficacy in curating high-scoring compounds, without a reduction in output diversity. This significant acceleration improves the feasibility of oracle functions that have largely been overlooked in RL due to high computational costs, for example free energy perturbation methods, and in principle is applicable to any RL domain.

5.
J Cheminform ; 16(1): 20, 2024 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-38383444

RESUMEN

REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.

6.
Drug Discov Today ; 29(3): 103886, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38244673

RESUMEN

The European Lead Factory (ELF) is a consortium of universities and small and medium-sized enterprises (SMEs) dedicated to drug discovery, and the pharmaceutical industry. This unprecedented consortium provides high-throughput screening, triage, and hit validation, including to non-consortium members. The ELF library was created through a novel compound-sharing model between nine pharmaceutical companies and expanded through library synthesis by chemistry-specialized SMEs. The library has been screened against ∼270 different targets and 15 phenotypic assays, and hits have been developed to form the basis of patents and spin-off companies. Here, we review the outcome of screening campaigns of the ELF, including the performance and physicochemical properties of the library, identification of possible frequent hitter compounds, and the effectiveness of the compound-sharing model.


Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , Bibliotecas de Moléculas Pequeñas/química , Descubrimiento de Drogas/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Industria Farmacéutica , Universidades
7.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37642660

RESUMEN

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Asunto(s)
Benchmarking , Relación Estructura-Actividad Cuantitativa , Bioensayo , Aprendizaje Automático
8.
Nat Commun ; 14(1): 4761, 2023 08 14.
Artículo en Inglés | MEDLINE | ID: mdl-37580318

RESUMEN

Genome editing, specifically CRISPR/Cas9 technology, has revolutionized biomedical research and offers potential cures for genetic diseases. Despite rapid progress, low efficiency of targeted DNA integration and generation of unintended mutations represent major limitations for genome editing applications caused by the interplay with DNA double-strand break repair pathways. To address this, we conduct a large-scale compound library screen to identify targets for enhancing targeted genome insertions. Our study reveals DNA-dependent protein kinase (DNA-PK) as the most effective target to improve CRISPR/Cas9-mediated insertions, confirming previous findings. We extensively characterize AZD7648, a selective DNA-PK inhibitor, and find it to significantly enhance precise gene editing. We further improve integration efficiency and precision by inhibiting DNA polymerase theta (PolÏ´). The combined treatment, named 2iHDR, boosts templated insertions to 80% efficiency with minimal unintended insertions and deletions. Notably, 2iHDR also reduces off-target effects of Cas9, greatly enhancing the fidelity and performance of CRISPR/Cas9 gene editing.


Asunto(s)
Sistemas CRISPR-Cas , Edición Génica , Sistemas CRISPR-Cas/genética , Proteínas Quinasas/genética , Reparación del ADN/genética , ADN/genética
10.
Small Methods ; 7(9): e2201695, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37317010

RESUMEN

Poor understanding of intracellular delivery and targeting hinders development of nucleic acid-based therapeutics transported by nanoparticles. Utilizing a siRNA-targeting and small molecule profiling approach with advanced imaging and machine learning biological insights is generated into the mechanism of lipid nanoparticle (MC3-LNP) delivery of mRNA. This workflow is termed Advanced Cellular and Endocytic profiling for Intracellular Delivery (ACE-ID). A cell-based imaging assay and perturbation of 178 targets relevant to intracellular trafficking is used to identify corresponding effects on functional mRNA delivery. Targets improving delivery are analyzed by extracting data-rich phenotypic fingerprints from images using advanced image analysis algorithms. Machine learning is used to determine key features correlating with enhanced delivery, identifying fluid-phase endocytosis as a productive cellular entry route. With this new knowledge, MC3-LNP is re-engineered to target macropinocytosis, and this significantly improves mRNA delivery in vitro and in vivo. The ACE-ID approach can be broadly applicable for optimizing nanomedicine-based intracellular delivery systems and has the potential to accelerate the development of delivery systems for nucleic acid-based therapeutics.


Asunto(s)
Endocitosis , Nanopartículas , ARN Mensajero/genética , Endocitosis/genética , Biología
11.
Commun Chem ; 6(1): 82, 2023 Apr 27.
Artículo en Inglés | MEDLINE | ID: mdl-37106032

RESUMEN

In drug discovery, computational methods are a key part of making informed design decisions and prioritising experiments. In particular, optimizing compound affinity is a central concern during the early stages of development. In the last 10 years, alchemical free energy (FE) calculations have transformed our ability to incorporate accurate in silico potency predictions in design decisions, and represent the 'gold standard' for augmenting experiment-driven drug discovery. However, relative FE calculations are complex to set up, require significant expert intervention to prepare the calculation and analyse the results or are provided only as closed-source software, not allowing for fine-grained control over the underlying settings. In this work, we introduce an end-to-end relative FE workflow based on the non-equilibrium switching approach that facilitates calculation of binding free energies starting from SMILES strings. The workflow is implemented using fully modular steps, allowing various components to be exchanged depending on licence availability. We further investigate the dependence of the calculated free energy accuracy on the initial ligand pose generated by various docking algorithms. We show that both commercial and open-source docking engines can be used to generate poses that lead to good correlation of free energies with experimental reference data.

12.
J Chem Inf Model ; 63(7): 1841-1846, 2023 04 10.
Artículo en Inglés | MEDLINE | ID: mdl-36959737

RESUMEN

We introduce the AiZynthTrain Python package for training synthesis models in a robust, reproducible, and extensible way. It contains two pipelines that create a template-based one-step retrosynthesis model and a RingBreaker model that can be straightforwardly integrated in retrosynthesis software. We train such models on the publicly available reaction data set from the U.S. Patent and Trademark Office (USPTO), and these are the first retrosynthesis models created in a completely reproducible end-to-end fashion, starting with the original reaction data source and ending with trained machine-learning models. In particular, we show that employing new heuristics implemented in the pipeline greatly improves the ability of the RingBreaker model for disconnecting ring systems. Furthermore, we demonstrate the robustness of the pipeline by training on a more diverse but proprietary data set. We envisage that this framework will be extended with other synthesis models in the future.


Asunto(s)
Aprendizaje Automático , Programas Informáticos
13.
Curr Opin Struct Biol ; 80: 102575, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36966692

RESUMEN

In this mini review, we capture the latest progress of applying artificial intelligence (AI) techniques based on deep learning architectures to molecular de novo design with a focus on integration with experimental validation. We will cover the progress and experimental validation of novel generative algorithms, the validation of QSAR models and how AI-based molecular de novo design is starting to become connected with chemistry automation. While progress has been made in the last few years, it is still early days. The experimental validations conducted thus far should be considered proof-of-principle, providing confidence that the field is moving in the right direction.


Asunto(s)
Algoritmos , Inteligencia Artificial , Automatización , Diseño de Fármacos
14.
J Chem Inf Model ; 63(4): 1099-1113, 2023 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-36758178

RESUMEN

Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.


Asunto(s)
Aprendizaje Profundo , Solubilidad , Redes Neurales de la Computación , Aprendizaje Automático , Algoritmos
15.
J Med Chem ; 66(2): 1221-1238, 2023 01 26.
Artículo en Inglés | MEDLINE | ID: mdl-36607408

RESUMEN

Probing multiple proprietary pharmaceutical libraries in parallel via virtual screening allowed rapid expansion of the structure-activity relationship (SAR) around hit compounds with moderate efficacy against Trypanosoma cruzi, the causative agent of Chagas Disease. A potency-improving scaffold hop, followed by elaboration of the SAR via design guided by the output of the phenotypic virtual screening efforts, identified two promising hit compounds 54 and 85, which were profiled further in pharmacokinetic studies and in an in vivo model of T. cruzi infection. Compound 85 demonstrated clear reduction of parasitemia in the in vivo setting, confirming the interest in this series of 2-(pyridin-2-yl)quinazolines as potential anti-trypanosome treatments.


Asunto(s)
Enfermedad de Chagas , Tripanocidas , Trypanosoma cruzi , Humanos , Enfermedad de Chagas/tratamiento farmacológico , Quinazolinas/farmacología , Quinazolinas/uso terapéutico , Relación Estructura-Actividad , Tripanocidas/uso terapéutico , Tripanocidas/farmacocinética
16.
J Cheminform ; 14(1): 86, 2022 Dec 28.
Artículo en Inglés | MEDLINE | ID: mdl-36578043

RESUMEN

A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.

17.
Commun Biol ; 5(1): 1291, 2022 11 24.
Artículo en Inglés | MEDLINE | ID: mdl-36434048

RESUMEN

The druggability of targets is a crucial consideration in drug target selection. Here, we adopt a stochastic semi-supervised ML framework to develop DrugnomeAI, which estimates the druggability likelihood for every protein-coding gene in the human exome. DrugnomeAI integrates gene-level properties from 15 sources resulting in 324 features. The tool generates exome-wide predictions based on labelled sets of known drug targets (median AUC: 0.97), highlighting features from protein-protein interaction networks as top predictors. DrugnomeAI provides generic as well as specialised models stratified by disease type or drug therapeutic modality. The top-ranking DrugnomeAI genes were significantly enriched for genes previously selected for clinical development programs (p value < 1 × 10-308) and for genes achieving genome-wide significance in phenome-wide association studies of 450 K UK Biobank exomes for binary (p value = 1.7 × 10-5) and quantitative traits (p value = 1.6 × 10-7). We accompany our method with a web application ( http://drugnomeai.public.cgr.astrazeneca.com ) to visualise the druggability predictions and the key features that define gene druggability, per disease type and modality.


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Sistemas de Liberación de Medicamentos
18.
J Chem Inf Model ; 62(20): 4863-4872, 2022 Oct 24.
Artículo en Inglés | MEDLINE | ID: mdl-36219571

RESUMEN

Machine learning provides effective computational tools for exploring the chemical space via deep generative models. Here, we propose a new reinforcement learning scheme to fine-tune graph-based deep generative models for de novo molecular design tasks. We show how our computational framework can successfully guide a pretrained generative model toward the generation of molecules with a specific property profile, even when such molecules are not present in the training set and unlikely to be generated by the pretrained model. We explored the following tasks: generating molecules of decreasing/increasing size, increasing drug-likeness, and increasing bioactivity. Using the proposed approach, we achieve a model which generates diverse compounds with predicted DRD2 activity for 95% of sampled molecules, outperforming previously reported methods on this metric.


Asunto(s)
Diseño de Fármacos , Aprendizaje Automático
19.
Brief Bioinform ; 23(6)2022 11 19.
Artículo en Inglés | MEDLINE | ID: mdl-36151740

RESUMEN

Drug discovery and development is a complex and costly process. Machine learning approaches are being investigated to help improve the effectiveness and speed of multiple stages of the drug discovery pipeline. Of these, those that use Knowledge Graphs (KG) have promise in many tasks, including drug repurposing, drug toxicity prediction and target gene-disease prioritization. In a drug discovery KG, crucial elements including genes, diseases and drugs are represented as entities, while relationships between them indicate an interaction. However, to construct high-quality KGs, suitable data are required. In this review, we detail publicly available sources suitable for use in constructing drug discovery focused KGs. We aim to help guide machine learning and KG practitioners who are interested in applying new techniques to the drug discovery field, but who may be unfamiliar with the relevant data sources. The datasets are selected via strict criteria, categorized according to the primary type of information contained within and are considered based upon what information could be extracted to build a KG. We then present a comparative analysis of existing public drug discovery KGs and an evaluation of selected motivating case studies from the literature. Additionally, we raise numerous and unique challenges and issues associated with the domain and its datasets, while also highlighting key future research directions. We hope this review will motivate KGs use in solving key and emerging questions in the drug discovery domain.


Asunto(s)
Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas , Descubrimiento de Drogas , Conocimiento , Almacenamiento y Recuperación de la Información
20.
Bioinformatics ; 38(21): 4951-4952, 2022 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-36073898

RESUMEN

SUMMARY: We present Icolos, a workflow manager written in Python as a tool for automating complex structure-based workflows for drug design. Icolos can be used as a standalone tool, for example in virtual screening campaigns, or can be used in conjunction with deep learning-based molecular generation facilitated for example by REINVENT, a previously published molecular de novo design package. In this publication, we focus on the internal structure and general capabilities of Icolos, using molecular docking experiments as an illustrative example. AVAILABILITY AND IMPLEMENTATION: The source code is freely available at https://github.com/MolecularAI/Icolos under the Apache 2.0 license. Tutorial notebooks containing minimal working examples can be found at https://github.com/MolecularAI/IcolosCommunity. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Diseño de Fármacos , Programas Informáticos , Flujo de Trabajo , Simulación del Acoplamiento Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...