Búsqueda | OPS/OMS Uruguay

1.

QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach.

Zankov, Dmitry V; Matveieva, Mariia; Nikonenko, Aleksandra V; Nugmanov, Ramil I; Baskin, Igor I; Varnek, Alexandre; Polishchuk, Pavel; Madzhidov, Timur I.

J Chem Inf Model ; 61(10): 4913-4923, 2021 10 25.

Artículo en Inglés | MEDLINE | ID: mdl-34554736

RESUMEN

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.

Asunto(s)

Algoritmos , Relación Estructura-Actividad Cuantitativa , Bases de Datos Factuales , Descubrimiento de Drogas , Conformación Molecular

2.

Correction: QSAR without borders.

Muratov, Eugene N; Bajorath, Jürgen; Sheridan, Robert P; Tetko, Igor V; Filimonov, Dmitry; Poroikov, Vladimir; Oprea, Tudor I; Baskin, Igor I; Varnek, Alexandre; Roitberg, Adrian; Isayev, Olexandr; Curtarolo, Stefano; Fourches, Denis; Cohen, Yoram; Aspuru-Guzik, Alan; Winkler, David A; Agrafiotis, Dimitris; Cherkasov, Artem; Tropsha, Alexander.

Chem Soc Rev ; 49(11): 3716, 2020 06 08.

Artículo en Inglés | MEDLINE | ID: mdl-32441715

RESUMEN

Correction for 'QSAR without borders' by Eugene N. Muratov et al., Chem. Soc. Rev., 2020, DOI: 10.1039/d0cs00098a.

3.

QSAR without borders.

Muratov, Eugene N; Bajorath, Jürgen; Sheridan, Robert P; Tetko, Igor V; Filimonov, Dmitry; Poroikov, Vladimir; Oprea, Tudor I; Baskin, Igor I; Varnek, Alexandre; Roitberg, Adrian; Isayev, Olexandr; Curtarolo, Stefano; Fourches, Denis; Cohen, Yoram; Aspuru-Guzik, Alan; Winkler, David A; Agrafiotis, Dimitris; Cherkasov, Artem; Tropsha, Alexander.

Chem Soc Rev ; 49(11): 3525-3564, 2020 06 07.

Artículo en Inglés | MEDLINE | ID: mdl-32356548

RESUMEN

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.

Asunto(s)

Química Farmacéutica/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos/metabolismo , Preparaciones Farmacéuticas/química , Algoritmos , Animales , Inteligencia Artificial , Bases de Datos Factuales , Diseño de Fármacos , Historia del Siglo XX , Historia del Siglo XXI , Humanos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Teoría Cuántica , Reproducibilidad de los Resultados

4.

Comprehensive Analysis of Applicability Domains of QSPR Models for Chemical Reactions.

Rakhimbekova, Assima; Madzhidov, Timur I; Nugmanov, Ramil I; Gimadiev, Timur R; Baskin, Igor I; Varnek, Alexandre.

Int J Mol Sci ; 21(15)2020 Aug 03.

Artículo en Inglés | MEDLINE | ID: mdl-32756326

RESUMEN

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.

Asunto(s)

Quimioinformática/tendencias , Dominios Proteicos , Relación Estructura-Actividad Cuantitativa , Termodinámica , Fenómenos Químicos , Cinética , Modelos Moleculares

5.

De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping.

Sattarov, Boris; Baskin, Igor I; Horvath, Dragos; Marcou, Gilles; Bjerrum, Esben Jannik; Varnek, Alexandre.

J Chem Inf Model ; 59(3): 1182-1196, 2019 03 25.

Artículo en Inglés | MEDLINE | ID: mdl-30785751

RESUMEN

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).

Asunto(s)

Aprendizaje Profundo , Diseño de Fármacos , Dominio Catalítico , Evaluación Preclínica de Medicamentos , Ligandos , Simulación del Acoplamiento Molecular , Receptor de Adenosina A2A/química , Receptor de Adenosina A2A/metabolismo , Bibliotecas de Moléculas Pequeñas/metabolismo , Bibliotecas de Moléculas Pequeñas/farmacología

6.

Conjugated Quantitative Structure-Property Relationship Models: Application to Simultaneous Prediction of Tautomeric Equilibrium Constants and Acidity of Molecules.

Zankov, Dmitry V; Madzhidov, Timur I; Rakhimbekova, Assima; Gimadiev, Timur R; Nugmanov, Ramil I; Kazymova, Marina A; Baskin, Igor I; Varnek, Alexandre.

J Chem Inf Model ; 59(11): 4569-4576, 2019 11 25.

Artículo en Inglés | MEDLINE | ID: mdl-31638794

RESUMEN

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.

Asunto(s)

Compuestos Orgánicos/química , Preparaciones Farmacéuticas/química , Ácidos/química , Algoritmos , Descubrimiento de Drogas , Modelos Químicos , Estructura Molecular , Redes Neurales de la Computación , Relación Estructura-Actividad Cuantitativa , Solventes/química , Estereoisomerismo

7.

Predictive cartography of metal binders using generative topographic mapping.

Baskin, Igor I; Solov'ev, Vitaly P; Bagatur'yants, Alexander A; Varnek, Alexandre.

J Comput Aided Mol Des ; 31(8): 701-714, 2017 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-28688089

RESUMEN

Generative topographic mapping (GTM) approach is used to visualize the chemical space of organic molecules (L) with respect to binding a wide range of 41 different metal cations (M) and also to build predictive models for stability constants (logK) of 1:1 (M:L) complexes using "density maps," "activity landscapes," and "selectivity landscapes" techniques. A two-dimensional map describing the entire set of 2962 metal binders reveals the selectivity and promiscuity zones with respect to individual metals or groups of metals with similar chemical properties (lanthanides, transition metals, etc). The GTM-based global (for entire set) and local (for selected subsets) models demonstrate a good predictive performance in the cross-validation procedure. It is also shown that the data likelihood could be used as a definition of the applicability domain of GTM-based models. Thus, the GTM approach represents an efficient tool for the predictive cartography of metal binders, which can both visualize their chemical space and predict the affinity profile of metals for new ligands.

Asunto(s)

Quelantes/química , Complejos de Coordinación/química , Metales/química , Algoritmos , Simulación por Computador , Ligandos , Funciones de Verosimilitud , Estructura Molecular , Relación Estructura-Actividad , Termodinámica

8.

Stargate GTM: Bridging Descriptor and Activity Spaces.

Gaspar, Héléna A; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre.

J Chem Inf Model ; 55(11): 2403-10, 2015 Nov 23.

Artículo en Inglés | MEDLINE | ID: mdl-26458083

RESUMEN

Predicting the activity profile of a molecule or discovering structures possessing a specific activity profile are two important goals in chemoinformatics, which could be achieved by bridging activity and molecular descriptor spaces. In this paper, we introduce the "Stargate" version of the Generative Topographic Mapping approach (S-GTM) in which two different multidimensional spaces (e.g., structural descriptor space and activity space) are linked through a common 2D latent space. In the S-GTM algorithm, the manifolds are trained simultaneously in two initial spaces using the probabilities in the 2D latent space calculated as a weighted geometric mean of probability distributions in both spaces. S-GTM has the following interesting features: (1) activities are involved during the training procedure; therefore, the method is supervised, unlike conventional GTM; (2) using molecular descriptors of a given compound as input, the model predicts a whole activity profile, and (3) using an activity profile as input, areas populated by relevant chemical structures can be detected. To assess the performance of S-GTM prediction models, a descriptor space (ISIDA descriptors) of a set of 1325 GPCR ligands was related to a B-dimensional (B = 1 or 8) activity space corresponding to pKi values for eight different targets. S-GTM outperforms conventional GTM for individual activities and performs similarly to the Lasso multitask learning algorithm, although it is still slightly less accurate than the Random Forest method.

Asunto(s)

Algoritmos , Diseño Asistido por Computadora , Diseño de Fármacos , Inteligencia Artificial , Humanos , Probabilidad , Relación Estructura-Actividad Cuantitativa

9.

Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge.

Gaspar, Héléna A; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre.

J Chem Inf Model ; 55(1): 84-94, 2015 Jan 26.

Artículo en Inglés | MEDLINE | ID: mdl-25423612

RESUMEN

This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.

Asunto(s)

Algoritmos , Bases de Datos de Compuestos Químicos , Entropía , Bibliotecas de Moléculas Pequeñas , Solubilidad , Interfaz Usuario-Computador

10.

Continuous indicator fields: a novel universal type of molecular fields.

Sitnikov, Gleb V; Zhokhova, Nelly I; Ustynyuk, Yury A; Varnek, Alexandre; Baskin, Igor I.

J Comput Aided Mol Des ; 29(3): 233-47, 2015 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-25449975

RESUMEN

A novel type of molecular fields, Continuous Indicator Fields (CIFs), is suggested to provide 3D structural description of molecules. The values of CIFs are calculated as the degree to which a point with given 3D coordinates belongs to an atom of a certain type. They can be used similarly to standard physicochemical fields for building 3D structure-activity models. One can build CIF-based 3D structure-activity models in the framework of the continuous molecular fields approach described earlier (J Comput-Aided Mol Des 27 (5):427-442, 2013) for the case of physicochemical molecular fields. CIFs are thought to complement and further extend traditional physicochemical fields. The models built with CIFs can be interpreted in terms of preferable and undesirable positions of certain types of atoms in space. This helps to understand which changes in chemical structure should be made in order to design a compound possessing desirable properties. We have demonstrated that CIFs can be considered as 3D analogues of 2D topological molecular fragments. The performance of this approach is demonstrated in structure-activity studies of thrombin inhibitors, multidentate N-heterocyclic ligands for Am(3+)/Eu(3+) separation, and coloring dyes.

Asunto(s)

Conformación Molecular , Relación Estructura-Actividad , Adsorción , Americio/química , Antitrombinas/química , Antitrombinas/metabolismo , Antitrombinas/farmacología , Cationes/química , Colorantes/química , Europio/química , Compuestos Heterocíclicos/química , Fenilalanina/química , Relación Estructura-Actividad Cuantitativa

11.

The continuous molecular fields approach to building 3D-QSAR models.

Baskin, Igor I; Zhokhova, Nelly I.

J Comput Aided Mol Des ; 27(5): 427-42, 2013 May.

Artículo en Inglés | MEDLINE | ID: mdl-23719959

RESUMEN

The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure-activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.

Asunto(s)

Bases de Datos de Proteínas , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Relación Estructura-Actividad , Algoritmos , Inteligencia Artificial , Diseño de Fármacos , Humanos , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas

12.

One-class classification as a novel method of ligand-based virtual screening: the case of glycogen synthase kinase 3ß inhibitors.

Karpov, Pavel V; Osolodkin, Dmitry I; Baskin, Igor I; Palyulin, Vladimir A; Zefirov, Nikolay S.

Bioorg Med Chem Lett ; 21(22): 6728-31, 2011 Nov 15.

Artículo en Inglés | MEDLINE | ID: mdl-21983440

RESUMEN

A virtual screening system based on one-class classification with molecular fingerprints as descriptors is developed and tested on a series of 1226 inhibitors and 209 noninhibitors of glycogen synthase kinase 3ß (GSK-3ß). The suggested system outperforms the ones based on pharmacophore hypothesis and molecular docking in a retrospective study. However, in a prospective study it should not be used as a sole classifier. The system is exceptionally useful for the identification of new scaffolds among the virtual screening results obtained with other methods.

Asunto(s)

Diseño de Fármacos , Inhibidores Enzimáticos/química , Glucógeno Sintasa Quinasa 3/antagonistas & inhibidores , Diseño Asistido por Computadora , Inhibidores Enzimáticos/farmacología , Glucógeno Sintasa Quinasa 3/metabolismo , Glucógeno Sintasa Quinasa 3 beta , Humanos , Ligandos , Modelos Moleculares , Redes Neurales de la Computación

13.

Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information.

Sushko, Iurii; Novotarskyi, Sergii; Körner, Robert; Pandey, Anil Kumar; Rupp, Matthias; Teetz, Wolfram; Brandmaier, Stefan; Abdelaziz, Ahmed; Prokopenko, Volodymyr V; Tanchuk, Vsevolod Y; Todeschini, Roberto; Varnek, Alexandre; Marcou, Gilles; Ertl, Peter; Potemkin, Vladimir; Grishina, Maria; Gasteiger, Johann; Schwab, Christof; Baskin, Igor I; Palyulin, Vladimir A; Radchenko, Eugene V; Welsh, William J; Kholodovych, Vladyslav; Chekmarev, Dmitriy; Cherkasov, Artem; Aires-de-Sousa, Joao; Zhang, Qing-You; Bender, Andreas; Nigsch, Florian; Patiny, Luc; Williams, Antony; Tkachenko, Valery; Tetko, Igor V.

J Comput Aided Mol Des ; 25(6): 533-54, 2011 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-21660515

RESUMEN

The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.

Asunto(s)

Bases de Datos Factuales , Internet , Modelos Químicos , Difusión de la Información , Gestión de la Información , Relación Estructura-Actividad Cuantitativa , Interfaz Usuario-Computador

14.

Synthesis and SAR requirements of adamantane-colchicine conjugates with both microtubule depolymerizing and tubulin clustering activities.

Zefirova, Olga N; Nurieva, Evgeniya V; Shishov, Dmitrii V; Baskin, Igor I; Fuchs, Fabian; Lemcke, Heiko; Schröder, Fabian; Weiss, Dieter G; Zefirov, Nikolay S; Kuznetsov, Sergei A.

Bioorg Med Chem ; 19(18): 5529-38, 2011 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-21873068

RESUMEN

A series of analogues of conjugate 1, combining an adamantane-based paclitaxel (taxol) mimetic with colchicine was synthesized and tested for cytotoxicity in a cell-based assay with the human lung carcinoma cell line A549. The most active compounds (10 EC(50) 2 ± 1.0 nM, 23 EC(50) 6 ± 1.4 nM, 26 EC(50) 5 ± 1.8 nM, 28 EC(50) 11 ± 1.7 nM, 30 EC(50) 4.8 ± 0.5 nM) were found to interfere with the microtubule dynamics in an interesting manner. Treatment of the cells with these compounds promoted disassembly of microtubules followed by the formation of stable tubulin clusters. Structure-activity relationships for the analogues of 23 revealed the sensitivity of both cytotoxicity and tubulin clustering ability to the linker length. The presence of adamantane (or another bulky hydrophobic and non-aromatic moiety) in 23 was found to play an important role in the formation of tubulin clusters. Structural requirements for optimal activity have been partially explained by molecular modeling.

Asunto(s)

Adamantano/farmacología , Antineoplásicos/síntesis química , Antineoplásicos/farmacología , Colchicina/farmacología , Microtúbulos/efectos de los fármacos , Tubulina (Proteína)/metabolismo , Adamantano/química , Antineoplásicos/química , Proliferación Celular/efectos de los fármacos , Colchicina/química , Relación Dosis-Respuesta a Droga , Ensayos de Selección de Medicamentos Antitumorales , Humanos , Microtúbulos/metabolismo , Modelos Moleculares , Estructura Molecular , Paclitaxel/química , Paclitaxel/farmacología , Estereoisomerismo , Relación Estructura-Actividad , Células Tumorales Cultivadas

15.

Discovery of novel chemical reactions by deep generative recurrent neural network.

Bort, William; Baskin, Igor I; Gimadiev, Timur; Mukanov, Artem; Nugmanov, Ramil; Sidorov, Pavel; Marcou, Gilles; Horvath, Dragos; Klimchuk, Olga; Madzhidov, Timur; Varnek, Alexandre.

Sci Rep ; 11(1): 3178, 2021 02 04.

Artículo en Inglés | MEDLINE | ID: mdl-33542271

RESUMEN

The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.

16.

The power of deep learning to ligand-based novel drug discovery.

Baskin, Igor I.

Expert Opin Drug Discov ; 15(7): 755-764, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32228116

RESUMEN

INTRODUCTION: Deep discriminative and generative neural-network models are becoming an integral part of the modern approach to ligand-based novel drug discovery. The variety of different architectures of neural networks, the methods of their training, and the procedures of generating new molecules require expert knowledge to choose the most suitable approach. AREAS COVERED: Three different approaches to deep learning use in ligand-based drug discovery are considered: virtual screening, neural generative models, and mutation-based structure generation. Several architectures of neural networks for building either discriminative or generative models are considered in this paper, including deep multilayer neural networks, different kinds of convolutional neural networks, recurrent neural networks, and several types of autoencoders. Several kinds of learning frameworks are also considered, including adversarial learning and reinforcement learning. Different types of representations for generating molecules, including SMILES, graphs, and several alternative string representations are also considered. EXPERT OPINION: Two kinds of problem should be solved in order to make the models built using deep neural networks, especially generative models, a valuable option in ligand-based drug discovery: the issue of interpretability and explainability of deep-learning models and the issue of synthetic accessibility of novel compounds designed by deep-learning algorithms.

Asunto(s)

Aprendizaje Profundo , Descubrimiento de Drogas/métodos , Redes Neurales de la Computación , Algoritmos , Diseño de Fármacos , Humanos , Ligandos

17.

Parallel Generative Topographic Mapping: An Efficient Approach for Big Data Handling.

Lin, Arkadii; Baskin, Igor I; Marcou, Gilles; Horvath, Dragos; Beck, Bernd; Varnek, Alexandre.

Mol Inform ; 39(12): e2000009, 2020 12.

Artículo en Inglés | MEDLINE | ID: mdl-32347666

RESUMEN

Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8âM compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8âM molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.

Asunto(s)

Algoritmos , Macrodatos , Benchmarking , Bases de Datos de Compuestos Químicos , Entropía

18.

Application of the mol2vec Technology to Large-size Data Visualization and Analysis.

Shibayama, Shojiro; Marcou, Gilles; Horvath, Dragos; Baskin, Igor I; Funatsu, Kimito; Varnek, Alexandre.

Mol Inform ; 39(6): e1900170, 2020 06.

Artículo en Inglés | MEDLINE | ID: mdl-32090493

RESUMEN

Generative Topographic Mapping (GTM) is a dimensionality reduction method, which is widely used for both data visualization and structure-activity modeling. Large dimensionality of the initial data space may require significant computational resources and slow down the GTM construction. Therefore, it may be meaningful to reduce the number of descriptors used for encoding molecular structures. The Principal Component Analysis (PCA), a standard preprocessing tool, suffers from the information loss upon the dimensionality reduction. As an alternative, we propose to use substructure vector embedding provided by the mol2vec technique. In addition to the data dimensionality reduction, this technology also accounts for proximity of substructures in molecular graphs. In this study, dimensionality of large descriptor spaces of ISIDA fragment descriptors or Morgan fingerprints were reduced using either the PCA or the mol2vec method. The latter significantly speeds up GTM training without compromising its predictive power in bioactivity classification tasks.

Asunto(s)

Algoritmos , Análisis de Datos , Visualización de Datos , Análisis de Componente Principal

19.

Continuous molecular fields and the concept of molecular co-fields in structure-activity studies.

Baskin, Igor I; Zhokhova, Nelly I.

Future Med Chem ; 11(20): 2701-2713, 2019 10.

Artículo en Inglés | MEDLINE | ID: mdl-31596146

RESUMEN

The analysis of information on the spatial structure of molecules and the physical fields of their interactions with biological targets is extremely important for solving various problems in drug discovery. This mini-review article surveys the main features of the continuous molecular fields approach and its use for analyzing structure-activity relationships in 3D space, building 3D quantitative structure-activity models and conducting similarity based virtual screening. Particular attention is paid to the consideration of the concept of molecular co-fields and their use for the interpretation of 3D structure-activity models. The principles of molecular design based on the overlapping and the similarity of molecular fields with corresponding co-fields are formulated.

Asunto(s)

Estructura Molecular , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Relación Estructura-Actividad

20.

Neural networks in building QSAR models.

Baskin, Igor I; Palyulin, Vladimir A; Zefirov, Nikolai S.

Methods Mol Biol ; 458: 137-58, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-19065809

RESUMEN

This chapter critically reviews some of the important methods being used for building quantitative structure-activity relationship (QSAR) models using the artificial neural networks (ANNs). It attends predominantly to the use of multilayer ANNs in the regression analysis of structure-activity data. The highlighted topics cover the approximating ability of ANNs, the interpretability of the resulting models, the issues of generalization and memorization, the problems of overfitting and overtraining, the learning dynamics, regularization, and the use of neural network ensembles. The next part of the chapter focuses attention on the use of descriptors. It reviews different descriptor selection and preprocessing techniques; considers the use of the substituent, substructural, and superstructural descriptors in building common QSAR models; the use of molecular field descriptors in three-dimensional QSAR studies; along with the prospects of "direct" graph-based QSAR analysis. The chapter starts with a short historical survey of the main milestones in this area.

Asunto(s)

Técnicas de Química Analítica/métodos , Relación Estructura-Actividad Cuantitativa , Algoritmos , Inteligencia Artificial , Biología/métodos , Química Física/métodos , Análisis por Conglomerados , Computadores , Modelos Estadísticos , Modelos Teóricos , Redes Neurales de la Computación , Análisis de Regresión , Reproducibilidad de los Resultados , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA