Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
J Comput Aided Mol Des ; 37(8): 373-394, 2023 08.
Article in English | MEDLINE | ID: mdl-37329395

ABSTRACT

Using generative deep learning models and reinforcement learning together can effectively generate new molecules with desired properties. By employing a multi-objective scoring function, thousands of high-scoring molecules can be generated, making this approach useful for drug discovery and material science. However, the application of these methods can be hindered by computationally expensive or time-consuming scoring procedures, particularly when a large number of function calls are required as feedback in the reinforcement learning optimization. Here, we propose the use of double-loop reinforcement learning with simplified molecular line entry system (SMILES) augmentation to improve the efficiency and speed of the optimization. By adding an inner loop that augments the generated SMILES strings to non-canonical SMILES for use in additional reinforcement learning rounds, we can both reuse the scoring calculations on the molecular level, thereby speeding up the learning process, as well as offer additional protection against mode collapse. We find that employing between 5 and 10 augmentation repetitions is optimal for the scoring functions tested and is further associated with an increased diversity in the generated compounds, improved reproducibility of the sampling runs and the generation of molecules of higher similarity to known ligands.


Subject(s)
Drug Design , Neural Networks, Computer , Reproducibility of Results , Drug Discovery/methods
2.
J Cheminform ; 14(1): 86, 2022 Dec 28.
Article in English | MEDLINE | ID: mdl-36578043

ABSTRACT

A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.

3.
J Cheminform ; 14(1): 18, 2022 Mar 28.
Article in English | MEDLINE | ID: mdl-35346368

ABSTRACT

Molecular optimization aims to improve the drug profile of a starting molecule. It is a fundamental problem in drug discovery but challenging due to (i) the requirement of simultaneous optimization of multiple properties and (ii) the large chemical space to explore. Recently, deep learning methods have been proposed to solve this task by mimicking the chemist's intuition in terms of matched molecular pairs (MMPs). Although MMPs is a widely used strategy by medicinal chemists, it offers limited capability in terms of exploring the space of structural modifications, therefore does not cover the complete space of solutions. Often more general transformations beyond the nature of MMPs are feasible and/or necessary, e.g. simultaneous modifications of the starting molecule at different places including the core scaffold. This study aims to provide a general methodology that offers more general structural modifications beyond MMPs. In particular, the same Transformer architecture is trained on different datasets. These datasets consist of a set of molecular pairs which reflect different types of transformations. Beyond MMP transformation, datasets reflecting general structural changes are constructed from ChEMBL based on two approaches: Tanimoto similarity (allows for multiple modifications) and scaffold matching (allows for multiple modifications but keep the scaffold constant) respectively. We investigate how the model behavior can be altered by tailoring the dataset while using the same model architecture. Our results show that the models trained on differently prepared datasets transform a given starting molecule in a way that it reflects the nature of the dataset used for training the model. These models could complement each other and unlock the capability for the chemists to pursue different options for improving a starting molecule.

4.
J Chem Inf Model ; 62(9): 2046-2063, 2022 05 09.
Article in English | MEDLINE | ID: mdl-34460269

ABSTRACT

Because of the strong relationship between the desired molecular activity and its structural core, the screening of focused, core-sharing chemical libraries is a key step in lead optimization. Despite the plethora of current research focused on in silico methods for molecule generation, to our knowledge, no tool capable of designing such libraries has been proposed. In this work, we present a novel tool for de novo drug design called LibINVENT. It is capable of rapidly proposing chemical libraries of compounds sharing the same core while maximizing a range of desirable properties. To further help the process of designing focused libraries, the user can list specific chemical reactions that can be used for the library creation. LibINVENT is therefore a flexible tool for generating virtual chemical libraries for lead optimization in a broad range of scenarios. Additionally, the shared core ensures that the compounds in the library are similar, possess desirable properties, and can also be synthesized under the same or similar conditions. The LibINVENT code is freely available in our public repository at https://github.com/MolecularAI/Lib-INVENT. The code necessary for data preprocessing is further available at: https://github.com/MolecularAI/Lib-INVENT-dataset.


Subject(s)
Drug Design , Small Molecule Libraries , Small Molecule Libraries/chemistry
5.
Chem Sci ; 12(9): 3339-3349, 2021 Jan 22.
Article in English | MEDLINE | ID: mdl-34164104

ABSTRACT

Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis routes to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes at least 4500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity.

6.
J Cheminform ; 13(1): 26, 2021 Mar 20.
Article in English | MEDLINE | ID: mdl-33743817

ABSTRACT

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

7.
J Med Chem ; 63(16): 8791-8808, 2020 08 27.
Article in English | MEDLINE | ID: mdl-32352286

ABSTRACT

Ring systems in pharmaceuticals, agrochemicals, and dyes are ubiquitous chemical motifs. While the synthesis of common ring systems is well described and novel ring systems can be readily and computationally enumerated, the synthetic accessibility of unprecedented ring systems remains a challenge. "Ring Breaker" uses a data-driven approach to enable the prediction of ring-forming reactions, for which we have demonstrated its utility on frequently found and unprecedented ring systems, in agreement with literature syntheses. We demonstrate the performance of the neural network on a range of ring fragments from the ZINC and DrugBank databases and highlight its potential for incorporation into computer aided synthesis planning tools. These approaches to ring formation and retrosynthetic disconnection offer opportunities for chemists to explore and select more efficient syntheses/synthetic routes.


Subject(s)
Chemistry Techniques, Synthetic/methods , Heterocyclic Compounds/chemical synthesis , Hydrocarbons, Cyclic/chemical synthesis , Neural Networks, Computer , Databases, Chemical
8.
Chem Sci ; 11(1): 154-168, 2020 Jan 07.
Article in English | MEDLINE | ID: mdl-32110367

ABSTRACT

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

9.
J Cheminform ; 12(1): 38, 2020 May 29.
Article in English | MEDLINE | ID: mdl-33431013

ABSTRACT

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

10.
Front Pharmacol ; 10: 1303, 2019.
Article in English | MEDLINE | ID: mdl-31749705

ABSTRACT

In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.

11.
J Chem Inf Model ; 59(3): 1182-1196, 2019 03 25.
Article in English | MEDLINE | ID: mdl-30785751

ABSTRACT

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).


Subject(s)
Deep Learning , Drug Design , Catalytic Domain , Drug Evaluation, Preclinical , Ligands , Molecular Docking Simulation , Receptor, Adenosine A2A/chemistry , Receptor, Adenosine A2A/metabolism , Small Molecule Libraries/metabolism , Small Molecule Libraries/pharmacology
12.
J Cheminform ; 11(1): 74, 2019 Dec 03.
Article in English | MEDLINE | ID: mdl-33430938

ABSTRACT

Deep learning methods applied to drug discovery have been used to generate novel structures. In this study, we propose a new deep learning architecture, LatentGAN, which combines an autoencoder and a generative adversarial neural network for de novo molecular design. We applied the method in two scenarios: one to generate random drug-like compounds and another to generate target-biased compounds. Our results show that the method works well in both cases. Sampled compounds from the trained model can largely occupy the same chemical space as the training set and also generate a substantial fraction of novel compounds. Moreover, the drug-likeness score of compounds sampled from LatentGAN is also similar to that of the training set. Lastly, generated compounds differ from those obtained with a Recurrent Neural Network-based generative model approach, indicating that both methods can be used complementarily.

13.
J Cheminform ; 11(1): 71, 2019 Nov 21.
Article in English | MEDLINE | ID: mdl-33430971

ABSTRACT

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.

14.
Biomolecules ; 8(4)2018 10 30.
Article in English | MEDLINE | ID: mdl-30380783

ABSTRACT

Chemical autoencoders are attractive models as they combine chemical space navigation with possibilities for de novo molecule generation in areas of interest. This enables them to produce focused chemical libraries around a single lead compound for employment early in a drug discovery project. Here, it is shown that the choice of chemical representation, such as strings from the simplified molecular-input line-entry system (SMILES), has a large influence on the properties of the latent space. It is further explored to what extent translating between different chemical representations influences the latent space similarity to the SMILES strings or circular fingerprints. By employing SMILES enumeration for either the encoder or decoder, it is found that the decoder has the largest influence on the properties of the latent space. Training a sequence to sequence heteroencoder based on recurrent neural networks (RNNs) with long short-term memory cells (LSTM) to predict different enumerated SMILES strings from the same canonical SMILES string gives the largest similarity between latent space distance and molecular similarity measured as circular fingerprints similarity. Using the output from the code layer in quantitative structure activity relationship (QSAR) of five molecular datasets shows that heteroencoder derived vectors markedly outperforms autoencoder derived vectors as well as models built using ECFP4 fingerprints, underlining the increased chemical relevance of the latent space. However, the use of enumeration during training of the decoder leads to a marked increase in the rate of decoding to different molecules than encoded, a tendency that can be counteracted with more complex network architectures.


Subject(s)
Algorithms , Models, Molecular , Neural Networks, Computer , Probability , Quantitative Structure-Activity Relationship
15.
J Hepatol ; 68(6): 1137-1143, 2018 06.
Article in English | MEDLINE | ID: mdl-29452205

ABSTRACT

BACKGROUND & AIMS: Liver failure results in hyperammonaemia, impaired regulation of cerebral microcirculation, encephalopathy, and death. However, the key mediator that alters cerebral microcirculation remains unidentified. In this study we show that topically applied ammonium significantly increases periarteriolar adenosine tone on the brain surface of healthy rats and is associated with a disturbed microcirculation. METHODS: Cranial windows were prepared in anaesthetized Wistar rats. The flow velocities were measured by speckle contrast imaging and compared before and after 30 min of exposure to 10 mM ammonium chloride applied on the brain surface. These flow velocities were compared with those for control groups exposed to artificial cerebrospinal fluid or ammonium plus an adenosine receptor antagonist. A flow preservation curve was obtained by analysis of flow responses to a haemorrhagic hypotensive challenge and during stepwise exsanguination. The periarteriolar adenosine concentration was measured with enzymatic biosensors inserted in the cortex. RESULTS: After ammonium exposure the arteriolar flow velocity increased by a median (interquartile range) of 21.7% (23.4%) vs. 7.2% (10.2%) in controls (n = 10 and n = 6, respectively, p <0.05), and the arteriolar surface area increased. There was a profound rise in the periarteriolar adenosine concentration. During the hypotensive challenge the flow decreased by 27.8% (14.9%) vs. 9.2% (14.9%) in controls (p <0.05). The lower limit of flow preservation remained unaffected, 27.7 (3.9) mmHg vs. 27.6 (6.4) mmHg, whereas the autoregulatory index increased, 0.29 (0.33) flow units per millimetre of mercury vs. 0.03 (0.21) flow units per millimetre of mercury (p <0.05). When ammonium exposure was combined with topical application of an adenosine receptor antagonist, the autoregulatory index was normalized. CONCLUSIONS: Vasodilation of the cerebral microcirculation during exposure to ammonium chloride is associated with an increase in the adenosine tone. Application of a specific adenosine receptor antagonist restores the regulation of the microcirculation. This indicates that adenosine could be a key mediator of the brain dysfunction seen during hyperammonaemia and is a potential therapeutic target. LAY SUMMARY: In patients with liver failure, disturbances in brain function are caused in part by ammonium toxicity. In our project we studied how ammonia, through adenosine release, affects the blood flow in the brain of rats. In our experimental model we demonstrated that the detrimental effect of ammonia on blood flow regulation was counteracted by blocking the adenosine receptors in the brain. With this observation we identified a novel potential treatment target. If we can confirm our findings in a future clinical study, this might help patients with liver failure and the severe condition called hepatic encephalopathy.


Subject(s)
Adenosine/metabolism , Ammonium Chloride/toxicity , Cerebral Cortex/metabolism , Cerebrovascular Circulation/physiology , Administration, Topical , Ammonium Chloride/administration & dosage , Animals , Arterioles/metabolism , Blood Flow Velocity/drug effects , Blood Flow Velocity/physiology , Cerebral Cortex/drug effects , Cerebrovascular Circulation/drug effects , Disease Models, Animal , Hepatic Encephalopathy/etiology , Hepatic Encephalopathy/physiopathology , Humans , Hyperammonemia/complications , Hyperammonemia/physiopathology , Liver Failure, Acute/complications , Liver Failure, Acute/physiopathology , Male , Microcirculation/drug effects , Microcirculation/physiology , Rats , Rats, Wistar , Vasodilation/drug effects , Vasodilation/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...