Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 63(21): 6629-6641, 2023 11 13.
Artículo en Inglés | MEDLINE | ID: mdl-37902548

RESUMEN

Computational design of chiral organic catalysts for asymmetric synthesis is a promising technology that can significantly reduce the material and human resources required for the preparation of enantiopure compounds. Herein, for the modeling of catalysts' enantioselectivity, we propose to use the multi-instance learning approach accounting for multiple catalyst conformers and requiring neither conformer selection nor their spatial alignment. A catalyst was represented by an ensemble of conformers, each encoded by three-dimesinonal (3D) pmapper descriptors. A catalyzed reactant transformation was converted into a single molecular graph, a condensed graph of reaction, encoded by 2D fragment descriptors. A whole chemical reaction was finally encoded by concatenated 3D catalyst and 2D transformation descriptors. The performance of the proposed method was demonstrated in the modeling of the enantioselectivity of homogeneous and phase-transfer reactions and compared with the state-of-the-art approaches.


Asunto(s)
Catálisis
2.
J Chem Inf Model ; 62(9): 2015-2020, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-34843251

RESUMEN

This work introduces CGRdb2.0─an open-source database management system for molecules, reactions, and chemical data. CGRdb2.0 is a Python package connecting to a PostgreSQL database that enables native searches for molecules and reactions without complicated SQL syntax. The library provides out-of-the-box implementations for similarity and substructure searches for molecules, as well as similarity and substructure searches for reactions in two ways─based on reaction components and based on the Condensed Graph of Reaction approach, the latter significantly accelerating the performance. In benchmarking studies with the RDKit database cartridge, we demonstrate that CGRdb2.0 performs searches faster for smaller data sets, while allowing for interactive access to the retrieved data.


Asunto(s)
Benchmarking , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales
3.
J Chem Inf Model ; 62(15): 3524-3534, 2022 08 08.
Artículo en Inglés | MEDLINE | ID: mdl-35876159

RESUMEN

Graph-based architectures are becoming increasingly popular as a tool for structure generation. Here, we introduce novel open-source architecture HyFactor in which, similar to the InChI linear notation, the number of hydrogens attached to the heavy atoms was considered instead of the bond types. HyFactor was benchmarked on the ZINC 250K, MOSES, and ChEMBL data sets against conventional graph-based architecture ReFactor, representing our implementation of the reported DEFactor architecture in the literature. On average, HyFactor models contain some 20% less fitting parameters than those of ReFactor. The two architectures display similar validity, uniqueness, and reconstruction rates. Compared to the training set compounds, HyFactor generates more similar structures than ReFactor. This could be explained by the fact that the latter generates many open-chain analogues of cyclic structures in the training set. It has been demonstrated that the reconstruction error of heavy molecules can be significantly reduced using the data augmentation technique. The codes of HyFactor and ReFactor as well as all models obtained in this study are publicly available from our GitHub repository: https://github.com/Laboratoire-de-Chemoinformatique/HyFactor.


Asunto(s)
Programas Informáticos
4.
J Chem Inf Model ; 62(22): 5471-5484, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-36332178

RESUMEN

In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Simulación del Acoplamiento Molecular
5.
J Chem Inf Model ; 61(2): 554-559, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33502186

RESUMEN

Presently, quantum chemical calculations are widely used to generate extensive data sets for machine learning applications; however, generally, these sets only include information on equilibrium structures and some close conformers. Exploration of potential energy surfaces provides important information on ground and transition states, but analysis of such data is complicated due to the number of possible reaction pathways. Here, we present RePathDB, a database system for managing 3D structural data for both ground and transition states resulting from quantum chemical calculations. Our tool allows one to store, assemble, and analyze reaction pathway data. It combines relational database CGR DB for handling compounds and reactions as molecular graphs with a graph database architecture for pathway analysis by graph algorithms. Original condensed graph of reaction technology is used to store any chemical reaction as a single graph.


Asunto(s)
Algoritmos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales
6.
J Chem Inf Model ; 61(10): 4913-4923, 2021 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-34554736

RESUMEN

Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.


Asunto(s)
Algoritmos , Relación Estructura-Actividad Cuantitativa , Bases de Datos Factuales , Descubrimiento de Drogas , Conformación Molecular
7.
Int J Mol Sci ; 23(1)2021 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-35008674

RESUMEN

The selection of experimental conditions leading to a reasonable yield is an important and essential element for the automated development of a synthesis plan and the subsequent synthesis of the target compound. The classical QSPR approach, requiring one-to-one correspondence between chemical structure and a target property, can be used for optimal reaction conditions prediction only on a limited scale when only one condition component (e.g., catalyst or solvent) is considered. However, a particular reaction can proceed under several different conditions. In this paper, we describe the Likelihood Ranking Model representing an artificial neural network that outputs a list of different conditions ranked according to their suitability to a given chemical transformation. Benchmarking calculations demonstrated that our model outperformed some popular approaches to the theoretical assessment of reaction conditions, such as k Nearest Neighbors, and a recurrent artificial neural network performance prediction of condition components (reagents, solvents, catalysts, and temperature). The ability of the Likelihood Ranking model trained on a hydrogenation reactions dataset, (~42,000 reactions) from Reaxys® database, to propose conditions that led to the desired product was validated experimentally on a set of three reactions with rich selectivity issues.


Asunto(s)
Modelos Químicos , Hidrogenación , Funciones de Verosimilitud , Estereoisomerismo
8.
Int J Mol Sci ; 21(15)2020 Aug 03.
Artículo en Inglés | MEDLINE | ID: mdl-32756326

RESUMEN

Nowadays, the problem of the model's applicability domain (AD) definition is an active research topic in chemoinformatics. Although many various AD definitions for the models predicting properties of molecules (Quantitative Structure-Activity/Property Relationship (QSAR/QSPR) models) were described in the literature, no one for chemical reactions (Quantitative Reaction-Property Relationships (QRPR)) has been reported to date. The point is that a chemical reaction is a much more complex object than an individual molecule, and its yield, thermodynamic and kinetic characteristics depend not only on the structures of reactants and products but also on experimental conditions. The QRPR models' performance largely depends on the way that chemical transformation is encoded. In this study, various AD definition methods extensively used in QSAR/QSPR studies of individual molecules, as well as several novel approaches suggested in this work for reactions, were benchmarked on several reaction datasets. The ability to exclude wrong reaction types, increase coverage, improve the model performance and detect Y-outliers were tested. As a result, several "best" AD definitions for the QRPR models predicting reaction characteristics have been revealed and tested on a previously published external dataset with a clear AD definition problem.


Asunto(s)
Quimioinformática/tendencias , Dominios Proteicos , Relación Estructura-Actividad Cuantitativa , Termodinámica , Fenómenos Químicos , Cinética , Modelos Moleculares
9.
Molecules ; 25(2)2020 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-31963467

RESUMEN

Pharmacophore modeling is usually considered as a special type of virtual screening without probabilistic nature. Correspondence of at least one conformation of a molecule to pharmacophore is considered as evidence of its bioactivity. We show that pharmacophores can be treated as one-class machine learning models, and the probability the reflecting model's confidence can be assigned to a pharmacophore on the basis of their precision of active compounds identification on a calibration set. Two schemes (Max and Mean) of probability calculation for consensus prediction based on individual pharmacophore models were proposed. Both approaches to some extent correspond to commonly used consensus approaches like the common hit approach or the one based on a logical OR operation uniting hit lists of individual models. Unlike some known approaches, the proposed ones can rank compounds retrieved by multiple models. These approaches were benchmarked on multiple ChEMBL datasets used for ligand-based pharmacophore modeling and externally validated on corresponding DUD-E datasets. The influence of complexity of pharmacophores and their performance on a calibration set on results of virtual screening was analyzed. It was shown that Max and Mean approaches have superior early enrichment to the commonly used approaches. Thus, a well-performing, easy-to-implement, and probabilistic alternative to existing approaches for pharmacophore-based virtual screening was proposed.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Preparaciones Farmacéuticas/análisis , Animales , Simulación por Computador , Humanos , Ligandos , Aprendizaje Automático , Modelos Químicos , Modelos Moleculares , Conformación Molecular , Unión Proteica
10.
J Chem Inf Model ; 59(6): 2516-2521, 2019 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-31063394

RESUMEN

CGRtools is an open-source Python library aimed to handle molecular and reaction information. It is the sole library developed so far which can process condensed graph of reaction (CGR) handling. CGR provides the possibility for advanced operations with reaction information and could be used for reaction descriptor calculation, structure-reactivity modeling, atom-to-atom mapping comparison and correction, reaction center extraction, reaction balancing, and some other related tasks. Unlike other popular libraries, CGRtools is fully written in Python with minor dependencies on other libraries and cross-platform. Reaction, molecule, and CGR objects in CGRtools support native Python methods and are comparable with the help of operations "equal to", "less than", and "bigger than". CGRtools supports common structural formats. CGRtools is distributed via an L-GPL license and available on https://github.com/cimm-kzn/CGRtools .


Asunto(s)
Quimioinformática/métodos , Bibliotecas de Moléculas Pequeñas/química , Programas Informáticos , Fenómenos Químicos , Modelos Químicos
11.
J Chem Inf Model ; 59(11): 4569-4576, 2019 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-31638794

RESUMEN

Here, we describe a concept of conjugated models for several properties (activities) linked by a strict mathematical relationship. This relationship can be directly integrated analytically into the ridge regression (RR) algorithm or accounted for in a special case of "twin" neural networks (NN). Developed approaches were applied to the modeling of the logarithm of the prototropic tautomeric constant (logKT) which can be expressed as the difference between the acidity constants (pKa) of two related tautomers. Both conjugated and individual RR and NN models for logKT and pKa were developed. The modeling set included 639 tautomeric constants and 2371 acidity constants of organic molecules in various solvents. A descriptor vector for each reaction resulted from the concatenation of structural descriptors and some parameters for reaction conditions. For the former, atom-centered substructural fragments describing acid sites in tautomer molecules were used. The latter were automatically identified using the condensed graph of reaction approach. Conjugated models performed similarly to the best individual models for logKT and pKa. At the same time, the physically grounded relationship between logKT and pKa was respected only for conjugated but not individual models.


Asunto(s)
Compuestos Orgánicos/química , Preparaciones Farmacéuticas/química , Ácidos/química , Algoritmos , Descubrimiento de Drogas , Modelos Químicos , Estructura Molecular , Redes Neurales de la Computación , Relación Estructura-Actividad Cuantitativa , Solventes/química , Estereoisomerismo
12.
Int J Mol Sci ; 20(23)2019 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-31757043

RESUMEN

Pharmacophore models are widely used for the identification of promising primary hits in compound large libraries. Recent studies have demonstrated that pharmacophores retrieved from protein-ligand molecular dynamic trajectories outperform pharmacophores retrieved from a single crystal complex structure. However, the number of retrieved pharmacophores can be enormous, thus, making it computationally inefficient to use all of them for virtual screening. In this study, we proposed selection of distinct representative pharmacophores by the removal of pharmacophores with identical three-dimensional (3D) pharmacophore hashes. We also proposed a new conformer coverage approach in order to rank compounds using all representative pharmacophores. Our results for four cyclin-dependent kinase 2 (CDK2) complexes with different ligands demonstrated that the proposed selection and ranking approaches outperformed the previously described common hits approach. We also demonstrated that ranking, based on averaged predicted scores obtained from different complexes, can outperform ranking based on scores from an individual complex. All developments were implemented in open-source software pharmd.


Asunto(s)
Quinasa 2 Dependiente de la Ciclina/química , Descubrimiento de Drogas/métodos , Simulación de Dinámica Molecular , Bibliotecas de Moléculas Pequeñas/química , Sitios de Unión , Simulación por Computador , Quinasa 2 Dependiente de la Ciclina/metabolismo , Humanos , Ligandos , Simulación del Acoplamiento Molecular/métodos , Unión Proteica , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Bibliotecas de Moléculas Pequeñas/farmacología
13.
Molecules ; 24(6)2019 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-30934532

RESUMEN

The authors would like to add the funding number to the published article [...].

14.
J Comput Chem ; 39(14): 821-826, 2018 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-29283453

RESUMEN

Hydration of the copper(II) bis-complexes with glycine, serine, lysine, and aspartic acid was studied by DFT and MD simulation methods. The distances between copper(II) and water molecules in the 1st and 2nd coordination shells, the average number of water molecules and their mean residence times in the hydration shells were calculated. Good agreement was observed between the values obtained and those found by DFT and NMR relaxation methods. Influence of the functional groups of the ligands and the cis-trans isomerism of the complexes on the structural and dynamical parameters of the hydration shells was displayed and explained. Analysis of the MD trajectories reveals the competition for a copper(II) axial position between water molecules or water molecules and the functional chain groups of the ligands and confirms the suggestion on the pentacoordination of copper(II) in such complexes. MD simulations show that only one axial position of Cu(II) is basically occupied at each time step while in average the coordination number more than 5 is observed. © 2017 Wiley Periodicals, Inc.


Asunto(s)
Aminoácidos/química , Cobre/química , Compuestos Organometálicos/química , Agua/química , Teoría Funcional de la Densidad , Simulación de Dinámica Molecular , Estereoisomerismo
15.
Molecules ; 23(12)2018 Nov 27.
Artículo en Inglés | MEDLINE | ID: mdl-30486389

RESUMEN

Pharmacophore modeling is a widely used strategy for finding new hit molecules. Since not all protein targets have available 3D structures, ligand-based approaches are still useful. Currently, there are just a few free ligand-based pharmacophore modeling tools, and these have a lot of restrictions, e.g., using a template molecule for alignment. We developed a new approach to 3D pharmacophore representation and matching which does not require pharmacophore alignment. This representation can be used to quickly find identical pharmacophores in a given set. Based on this representation, a 3D pharmacophore ligand-based modeling approach to search for pharmacophores which preferably match active compounds and do not match inactive ones was developed. The approach searches for 3D pharmacophore models starting from 2D structures of available active and inactive compounds. The implemented approach was successfully applied for several retrospective studies. The results were compared to a 2D similarity search, demonstrating some of the advantages of the developed 3D pharmacophore models. Also, the generated 3D pharmacophore models were able to match the 3D poses of known ligands from their protein-ligand complexes, confirming the validity of the models. The developed approach is available as an open-source software tool: http://www.qsar4u.com/pages/pmapper.php and https://github.com/meddwl/psearch.


Asunto(s)
Antagonistas del Receptor de Adenosina A2/química , Inhibidores de la Colinesterasa/química , Inhibidores del Citocromo P-450 CYP3A/química , Modelos Moleculares , Ligandos
16.
J Comput Aided Mol Des ; 31(9): 829-839, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28752345

RESUMEN

We describe a novel approach of reaction representation as a combination of two mixtures: a mixture of reactants and a mixture of products. In turn, each mixture can be encoded using an earlier reported approach involving simplex descriptors (SiRMS). The feature vector representing these two mixtures results from either concatenated product and reactant descriptors or the difference between descriptors of products and reactants. This reaction representation doesn't need an explicit labeling of a reaction center. The rigorous "product-out" cross-validation (CV) strategy has been suggested. Unlike the naïve "reaction-out" CV approach based on a random selection of items, the proposed one provides with more realistic estimation of prediction accuracy for reactions resulting in novel products. The new methodology has been applied to model rate constants of E2 reactions. It has been demonstrated that the use of the fragment control domain applicability approach significantly increases prediction accuracy of the models. The models obtained with new "mixture" approach performed better than those required either explicit (Condensed Graph of Reaction) or implicit (reaction fingerprints) reaction center labeling.


Asunto(s)
Modelos Moleculares , Compuestos Orgánicos/química , Cinética , Estructura Molecular , Relación Estructura-Actividad Cuantitativa
17.
J Chem Inf Model ; 56(11): 2140-2148, 2016 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-27783508

RESUMEN

We report a new method to assess protective groups (PGs) reactivity as a function of reaction conditions (catalyst, solvent) using raw reaction data. It is based on an intuitive similarity principle for chemical reactions: similar reactions proceed under similar conditions. Technically, reaction similarity can be assessed using the Condensed Graph of Reaction (CGR) approach representing an ensemble of reactants and products as a single molecular graph, i.e., as a pseudomolecule for which molecular descriptors or fingerprints can be calculated. CGR-based in-house tools were used to process data for 142,111 catalytic hydrogenation reactions extracted from the Reaxys database. Our results reveal some contradictions with famous Greene's Reactivity Charts based on manual expert analysis. Models developed in this study show high accuracy (ca. 90%) for predicting optimal experimental conditions of protective group deprotection.


Asunto(s)
Informática/métodos , Automatización , Catálisis , Bases de Datos Factuales , Hidróxidos/química , Modelos Químicos , Fenoles/química , Solventes/química
18.
J Phys Chem A ; 117(19): 4011-24, 2013 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-23590617

RESUMEN

The first systematic theoretical study of the nature of intermolecular bonding of dimethylselenide as donor and IIIA group element halides as acceptors was made with the help of the approach of Quantum Theory of Atoms in Molecules. Density Functional Theory with "old" Sapporo triple-ζ basis sets was used to calculate geometry, thermodynamics, and wave function of Me2Se···AX3 complexes. The analysis of the electron density distribution and the Laplacian of the electron density allowed us to reveal and explain the tendencies in the influence of the central atom (A = B, Al, Ga, In) and halogen (X = F, Cl, Br, I) on the nature of Se···A bonding. Significant changes in properties of the selenium lone pair upon complexation were described by means of the analysis of the Laplacian of the charge density. Charge transfer characteristics and the contributions to it from electron localization and delocalization were analyzed in terms of localization and delocalization indexes. Common features of the complexation and differences in the nature of bonding were revealed. Performed analysis evidenced that gallium and indium halide complexes can be attributed to charge transfer-driven complexes; aluminum halides complexes seem to be mainly of an electrostatic nature. The nature of bonding in different boron halides essentially varies; these complexes are stabilized mainly by covalent Se···B interaction. In all the complexes under study covalence of the Se···A interaction is rather high.

19.
Mol Inform ; 42(10): e2200275, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37488968

RESUMEN

Conjugated QSPR models for reactions integrate fundamental chemical laws expressed by mathematical equations with machine learning algorithms. Herein we present a methodology for building conjugated QSPR models integrated with the Arrhenius equation. Conjugated QSPR models were used to predict kinetic characteristics of cycloaddition reactions related by the Arrhenius equation: rate constant l o g k ${{\rm l}{\rm o}{\rm g}k}$ , pre-exponential factor l o g A ${{\rm l}{\rm o}{\rm g}A}$ , and activation energy E a ${{E}_{{\rm a}}}$ . They were benchmarked against single-task (individual and equation-based models) and multi-task models. In individual models, all characteristics were modeled separately, while in multi-task models l o g k ${{\rm l}{\rm o}{\rm g}k}$ , l o g A ${{\rm l}{\rm o}{\rm g}A}$ and E a ${{E}_{{\rm a}}}$ were treated cooperatively. An equation-based model assessed l o g k ${{\rm l}{\rm o}{\rm g}k}$ using the Arrhenius equation and l o g A ${{\rm l}{\rm o}{\rm g}A}$ and E a ${{E}_{{\rm a}}}$ values predicted by individual models. It has been demonstrated that the conjugated QSPR models can accurately predict the reaction rate constants at extreme temperatures, at which reaction rate constants hardly can be measured experimentally. Also, in the case of small training sets conjugated models are more robust than related single-task approaches.

20.
J Control Release ; 353: 903-914, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36402234

RESUMEN

Active learning (AL) has become a subject of active recent research both in industry and academia as an efficient approach for rapid design and discovery of novel chemicals, materials, and polymers. Herein, we have assessed the applicability of AL for the discovery of polymeric micelle formulations for poorly soluble drugs. We were motivated by the key advantages of this approach making it a desirable strategy for rational design of drug delivery systems due toto its ability to (i) employ relatively small datasets for model development, (ii) iterate between model development and model assessment using small external datasets that can be either generated in focused experimental studies or formed from subsets of the initial training data, and (iii) progressively evolve models towards increasingly more reliable predictions and the identification of novel chemicals with the desired properties. In this study, we compared various AL protocols for their effectiveness in finding biologically active molecules using synthetic datasets. We have investigated the dependency of AL performance on the size of the initial training set, the relative complexity of the task, and the choice of the initial training dataset. We found that AL techniques as applied to regression modeling offer no benefits over random search, while AL used for classification tasks performs better than models built for randomly selected training sets but still quite far from perfect. Using the best performing AL protocol,. Finally, the best performing AL approach was employed to discover and experimentally validate novel binding polymers for a case study of asialoglycoprotein receptor (ASGPR).


Asunto(s)
Polímeros , Aprendizaje Basado en Problemas , Polímeros/química , Micelas , Sistemas de Liberación de Medicamentos , Péptidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA