Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
J Cheminform ; 16(1): 100, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39143631

RESUMO

One challenge that current de novo drug design models face is a disparity between the user's expectations and the actual output of the model in practical applications. Tailoring models to better align with chemists' implicit knowledge, expectation and preferences is key to overcoming this obstacle effectively. While interest in preference-based and human-in-the-loop machine learning in chemistry is continuously increasing, no tool currently exists that enables the collection of standardized and chemistry-specific feedback. Metis is a Python-based open-source graphical user interface (GUI), designed to solve this and enable the collection of chemists' detailed feedback on molecular structures. The GUI enables chemists to explore and evaluate molecules, offering a user-friendly interface for annotating preferences and specifying desired or undesired structural features. By providing chemists the opportunity to give detailed feedback, allows researchers to capture more efficiently the chemist's implicit knowledge and preferences. This knowledge is crucial to align the chemist's idea with the de novo design agents. The GUI aims to enhance this collaboration between the human and the "machine" by providing an intuitive platform where chemists can interactively provide feedback on molecular structures, aiding in preference learning and refining de novo design strategies. Metis integrates with the existing de novo framework REINVENT, creating a closed-loop system where human expertise can continuously inform and refine the generative models.Scientific contributionWe introduce a novel Graphical User Interface, that allows chemists/researchers to give detailed feedback on substructures and properties of small molecules. This tool can be used to learn the preferences of chemists in order to align de novo drug design models with the chemist's ideas. The GUI can be customized to fit different needs and projects and enables direct integration into de novo REINVENT runs. We believe that Metis can facilitate the discussion and development of novel ways to integrate human feedback that goes beyond binary decisions of liking or disliking a molecule.

2.
J Comput Aided Mol Des ; 37(8): 373-394, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37329395

RESUMO

Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands.


Assuntos
Desenho de Fármacos , Redes Neurais de Computação , Reprodutibilidade dos Testes , Descoberta de Drogas/métodos
3.
J Cheminform ; 14(1): 86, 2022 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-36578043

RESUMO

A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.

4.
Mol Inform ; 41(12): e2200043, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35732584

RESUMO

Computer aided synthesis planning, suggesting synthetic routes for molecules of interest, is a rapidly growing field. The machine learning methods used are often dependent on access to large datasets for training, but finite experimental budgets limit how much data can be obtained from experiments. This suggests the use of schemes for data collection such as active learning, which identifies the data points of highest impact for model accuracy, and which has been used in recent studies with success. However, little has been done to explore the robustness of the methods predicting reaction yield when used together with active learning to reduce the amount of experimental data needed for training. This study aims to investigate the influence of machine learning algorithms and the number of initial data points on reaction yield prediction for two public high-throughput experimentation datasets. Our results show that active learning based on output margin reached a pre-defined AUROC faster than random sampling on both datasets. Analysis of feature importance of the trained machine learning models suggests active learning had a larger influence on the model accuracy when only a few features were important for the model prediction.


Assuntos
Aprendizado de Máquina
5.
J Cheminform ; 14(1): 18, 2022 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-35346368

RESUMO

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.

6.
J Chem Inf Model ; 62(9): 2046-2063, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-34460269

RESUMO

Because of the strong relationship between the desired molecular activity and its structural core, the screening of focused, core-sharing chemical libraries is a key step in lead optimization. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been proposed. In this work, we present a novel tool for de novo drug design called LibINVENT. It is capable of rapidly proposing chemical libraries of compounds sharing the same core while maximizing a range of desirable properties. To further help the process of designing focused libraries, the user can list specific chemical reactions that can be used for the library creation. LibINVENT is therefore a flexible tool for generating virtual chemical libraries for lead optimization in a broad range of scenarios. Additionally, the shared core ensures that the compounds in the library are similar, possess desirable properties, and can also be synthesized under the same or similar conditions. The LibINVENT code is freely available in our public repository at https://github.com/MolecularAI/Lib-INVENT. The code necessary for data preprocessing is further available at: https://github.com/MolecularAI/Lib-INVENT-dataset.


Assuntos
Desenho de Fármacos , Bibliotecas de Moléculas Pequenas , Bibliotecas de Moléculas Pequenas/química
7.
J Chem Inf Model ; 62(9): 2093-2100, 2022 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-34757744

RESUMO

Here, we explore the impact of different graph traversal algorithms on molecular graph generation. We do this by training a graph-based deep molecular generative model to build structures using a node order determined via either a breadth- or depth-first search algorithm. What we observe is that using a breadth-first traversal leads to better coverage of training data features compared to a depth-first traversal. We have quantified these differences using a variety of metrics on a data set of natural products. These metrics include percent validity, molecular coverage, and molecular shape. We also observe that by using either a breadth- or depth-first traversal it is possible to overtrain the generative models, at which point the results with either graph traversal algorithm are identical.


Assuntos
Algoritmos
8.
J Chem Inf Model ; 61(8): 3899-3907, 2021 08 23.
Artigo em Inglês | MEDLINE | ID: mdl-34342428

RESUMO

We present a novel algorithm to compute the distance between synthetic routes based on tree edit distances. Such distances can be used to cluster synthesis routes generated using a retrosynthesis prediction tool. We show that the clustering of selected routes from a retrosynthesis analysis is performed in less than 10 s on average and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized and that the routes in a cluster tend to use similar chemistry. The algorithm is included in the latest version of open-source AiZynthFinder software (https://github.com/MolecularAI/aizynthfinder) and as a separate package (https://github.com/MolecularAI/route-distances).


Assuntos
Software , Algoritmos , Análise por Conglomerados
9.
Chem Sci ; 12(9): 3339-3349, 2021 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-34164104

RESUMO

Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis routes to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes at least 4500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity.

10.
J Cheminform ; 13(1): 26, 2021 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-33743817

RESUMO

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

11.
J Cheminform ; 12(1): 70, 2020 Nov 17.
Artigo em Inglês | MEDLINE | ID: mdl-33292482

RESUMO

We present the open-source AiZynthFinder software that can be readily used in retrosynthetic planning. The algorithm is based on a Monte Carlo tree search that recursively breaks down a molecule to purchasable precursors. The tree search is guided by an artificial neural network policy that suggests possible precursors by utilizing a library of known reaction templates. The software is fast and can typically find a solution in less than 10 s and perform a complete search in less than 1 min. Moreover, the development of the code was guided by a range of software engineering principles such as automatic testing, system design and continuous integration leading to robust software with high maintainability. Finally, the software is well documented to make it suitable for beginners. The software is available at http://www.github.com/MolecularAI/aizynthfinder .

12.
J Med Chem ; 63(16): 8791-8808, 2020 08 27.
Artigo em Inglês | MEDLINE | ID: mdl-32352286

RESUMO

Ring systems in pharmaceuticals, agrochemicals, and dyes are ubiquitous chemical motifs. While the synthesis of common ring systems is well described and novel ring systems can be readily and computationally enumerated, the synthetic accessibility of unprecedented ring systems remains a challenge. "Ring Breaker" uses a data-driven approach to enable the prediction of ring-forming reactions, for which we have demonstrated its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. We demonstrate the performance of the neural network on a range of ring fragments from the ZINC and DrugBank databases and highlight its potential for incorporation into computer aided synthesis planning tools. These approaches to ring formation and retrosynthetic disconnection offer opportunities for chemists to explore and select more efficient syntheses/synthetic routes.


Assuntos
Técnicas de Química Sintética/métodos , Compostos Heterocíclicos/síntese química , Hidrocarbonetos Cíclicos/síntese química , Redes Neurais de Computação , Bases de Dados de Compostos Químicos
13.
Chem Sci ; 11(1): 154-168, 2020 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-32110367

RESUMO

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

14.
J Cheminform ; 12(1): 38, 2020 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-33431013

RESUMO

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

15.
Front Pharmacol ; 10: 1303, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31749705

RESUMO

In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.

16.
J Chem Inf Model ; 59(3): 1182-1196, 2019 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-30785751

RESUMO

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).


Assuntos
Aprendizado Profundo , Desenho de Fármacos , Domínio Catalítico , Avaliação Pré-Clínica de Medicamentos , Ligantes , Simulação de Acoplamento Molecular , Receptor A2A de Adenosina/química , Receptor A2A de Adenosina/metabolismo , Bibliotecas de Moléculas Pequenas/metabolismo , Bibliotecas de Moléculas Pequenas/farmacologia
17.
Drug Discov Today Technol ; 32-33: 65-72, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386096

RESUMO

Application of AI technologies in synthesis prediction has developed very rapidly in recent years. We attempt here to give a comprehensive summary on the latest advancement on retro-synthesis planning, forward synthesis prediction as well as quantum chemistry-based reaction prediction models. Besides an introduction on the AI/ML models for addressing various synthesis related problems, the sources of the reaction datasets used in model building is also covered. In addition to the predictive models, the robotics based high throughput experimentation technology will be another crucial factor for conducting synthesis in an automated fashion. Some state-of-the-art of high throughput experimentation practices carried out in the pharmaceutical industry are highlighted in this chapter to give the reader a sense of how future chemistry will be conducted to make compounds faster and cheaper.


Assuntos
Inteligência Artificial , Desenho Assistido por Computador , Medicamentos Sintéticos/química , Humanos
18.
J Cheminform ; 11(1): 74, 2019 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-33430938

RESUMO

Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.

19.
J Cheminform ; 11(1): 71, 2019 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-33430971

RESUMO

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.

20.
Biomolecules ; 8(4)2018 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-30380783

RESUMO

Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here, it is shown that the choice of chemical representation, such as strings from the simplified molecular-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the code layer in quantitative structure activity relationship (QSAR) of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a marked increase in the rate of decoding to different molecules than encoded, a tendency that can be counteracted with more complex network architectures.


Assuntos
Algoritmos , Modelos Moleculares , Redes Neurais de Computação , Probabilidade , Relação Quantitativa Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA