Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Regul Toxicol Pharmacol ; 149: 105623, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38631606

RESUMEN

The Bone-Marrow derived Dendritic Cell (BMDC) test is a promising assay for identifying sensitizing chemicals based on the 3Rs (Replace, Reduce, Refine) principle. This study expanded the BMDC benchmarking to various in vitro, in chemico, and in silico assays targeting different key events (KE) in the skin sensitization pathway, using common substances datasets. Additionally, a Quantitative Structure-Activity Relationship (QSAR) model was developed to predict the BMDC test outcomes for sensitizing or non-sensitizing chemicals. The modeling workflow involved ISIDA (In Silico Design and Data Analysis) molecular fragment descriptors and the SVM (Support Vector Machine) machine-learning method. The BMDC model's performance was at least comparable to that of all ECVAM-validated models regardless of the KE considered. Compared with other tests targeting KE3, related to dendritic cell activation, BMDC assay was shown to have higher balanced accuracy and sensitivity concerning both the Local Lymph Node Assay (LLNA) and human labels, providing additional evidence for its reliability. The consensus QSAR model exhibits promising results, correlating well with observed sensitization potential. Integrated into a publicly available web service, the BMDC-based QSAR model may serve as a cost-effective and rapid alternative to lab experiments, providing preliminary screening for sensitization potential, compound prioritization, optimization and risk assessment.


Asunto(s)
Benchmarking , Células Dendríticas , Relación Estructura-Actividad Cuantitativa , Células Dendríticas/efectos de los fármacos , Humanos , Animales , Máquina de Vectores de Soporte , Simulación por Computador , Dermatitis Alérgica por Contacto , Alérgenos/toxicidad , Alternativas a las Pruebas en Animales/métodos , Células de la Médula Ósea/efectos de los fármacos , Ensayo del Nódulo Linfático Local , Ratones
2.
J Chem Inf Model ; 63(17): 5571-5582, 2023 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-37602843

RESUMEN

In chemical library analysis, it may be useful to describe libraries as individual items rather than collections of compounds. This is particularly true for ultra-large noncherry-pickable compound mixtures, such as DNA-encoded libraries (DELs). In this sense, the chemical library space (CLS) is useful for the management of a portfolio of libraries, just like chemical space (CS) helps manage a portfolio of molecules. Several possible CLSs were previously defined using vectorial library representations obtained from generative topographic mapping (GTM). Given the steadily growing number of DEL designs, the CLS becomes "crowded" and requires analysis tools beyond pairwise library comparison. Therefore, herein, we investigate the cartography of CLS on meta-(µ)GTMs─"meta" to remind that these are maps of the CLS, itself based on responsibility vectors issued by regular CS GTMs. 2,5 K DELs and ChEMBL (reference) were projected on the µGTM, producing landscapes of library-specific properties. These describe both interlibrary similarity and intrinsic library characteristics in the same view, herewith facilitating the selection of the best project-specific libraries.


Asunto(s)
Bibliotecas de Moléculas Pequeñas , Biblioteca de Genes
3.
J Chem Inf Model ; 63(13): 4042-4055, 2023 07 10.
Artículo en Inglés | MEDLINE | ID: mdl-37368824

RESUMEN

The development of DNA-encoded library (DEL) technology introduced new challenges for the analysis of chemical libraries. It is often useful to consider a chemical library as a stand-alone chemoinformatic object─represented both as a collection of independent molecules, and yet an individual entity─in particular, when they are inseparable mixtures, like DELs. Herein, we introduce the concept of chemical library space (CLS), in which resident items are individual chemical libraries. We define and compare four vectorial library representations obtained using generative topographic mapping. These allow for an effective comparison of libraries, with the ability to tune and chemically interpret the similarity relationships. In particular, property-tuned CLS encodings enable to simultaneously compare libraries with respect to both property and chemotype distributions. We apply the various CLS encodings for the selection problem of DELs that optimally "match" a reference collection (here ChEMBL28), showing how the choice of the CLS descriptors may help to fine-tune the "matching" (overlap) criteria. Hence, the proposed CLS may represent a new efficient way for polyvalent analysis of thousands of chemical libraries. Selection of an easily accessible compound collection for drug discovery, as a substitute for a difficult to produce reference library, can be tuned for either primary or target-focused screening, also considering property distributions of compounds. Alternatively, selection of libraries covering novel regions of the chemical space with respect to a reference compound subspace may serve for library portfolio enrichment.


Asunto(s)
ADN , Bibliotecas de Moléculas Pequeñas , Bibliotecas de Moléculas Pequeñas/química , ADN/química , Biblioteca de Genes , Descubrimiento de Drogas/métodos
4.
J Chem Inf Model ; 63(16): 5107-5119, 2023 08 28.
Artículo en Inglés | MEDLINE | ID: mdl-37556857

RESUMEN

This study introduces a new de novo design algorithm called GENERA that combines the capabilities of a deep-learning algorithm for automated drug-like analogue design, called DeLA-Drug, with a genetic algorithm for generating molecules with desired target-oriented properties. Specifically, GENERA was applied to the angiotensin-converting enzyme 2 (ACE2) target, which is implicated in many pathological conditions, including COVID-19. The ability of GENERA to de novo design promising candidates for a specific target was assessed using two docking programs, PLANTS and GLIDE. A fitness function based on the Pareto dominance resulting from computed PLANTS and GLIDE scores was applied to demonstrate the algorithm's ability to perform multiobjective optimizations effectively. GENERA can quickly generate focused libraries that produce better scores compared to a starting set of known ACE-2 binders. This study is the first to utilize a DL-based algorithm designed for analogue generation as a mutational operator within a GA framework, representing an innovative approach to target-oriented de novo design.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Humanos , Algoritmos , Diseño de Fármacos
5.
J Chem Inf Model ; 62(18): 4537-4548, 2022 09 26.
Artículo en Inglés | MEDLINE | ID: mdl-36103300

RESUMEN

Nowadays, drug discovery is inevitably intertwined with the usage of large compound collections. Understanding of their chemotype composition and physicochemical property profiles is of the highest importance for successful hit identification. Efficient polyfunctional tools allowing multifaceted analysis of constantly growing chemical libraries must be Big Data-compatible. Here, we present the freely accessible ChemSpace Atlas (https://chematlas.chimie.unistra.fr), which includes almost 40K hierarchically organized Generative Topographic Maps (GTM) accommodating up to 500 M compounds covering fragment-like, lead-like, drug-like, PPI-like, and NP-like chemical subspaces. They allow users to navigate and analyze ZINC, ChEMBL, and COCONUT from multiple perspectives on different scales: from a bird's eye view of the entire library to structural pattern detection in small clusters. Around 20 physicochemical properties and almost 750 biological activities can be visualized (associated with map zones), supporting activity profiling and analogue search. Moreover, ChemScape Atlas will be extended toward new chemical subspaces (e.g., DNA-encoded libraries and synthons) and functionalities (ADMETox profiling and property-guided de novo compound generation).


Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , ADN/química , Biblioteca de Genes , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Zinc
6.
J Chem Inf Model ; 62(22): 5471-5484, 2022 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-36332178

RESUMEN

In order to better foramize it, the notorious inverse-QSAR problem (finding structures of given QSAR-predicted properties) is considered in this paper as a two-step process including (i) finding "seed" descriptor vectors corresponding to user-constrained QSAR model output values and (ii) identifying the chemical structures best matching the "seed" vectors. The main development effort here was focused on the latter stage, proposing a new attention-based conditional variational autoencoder neural-network architecture based on recent developments in attention-based methods. The obtained results show that this workflow was capable of generating compounds predicted to display desired activity while being completely novel compared to the training database (ChEMBL). Moreover, the generated compounds show acceptable druglikeness and synthetic accessibility. Both pharmacophore and docking studies were carried out as "orthogonal" in silico validation methods, proving that some of de novo structures are, beyond being predicted active by 2D-QSAR models, clearly able to match binding 3D pharmacophores and bind the protein pocket.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Simulación del Acoplamiento Molecular
7.
J Chem Inf Model ; 62(9): 2151-2163, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-34723532

RESUMEN

Most of the existing computational tools for de novo library design are focused on the generation, rational selection, and combination of promising structural motifs to form members of the new library. However, the absence of a direct link between the chemical space of the retrosynthetically generated fragments and the pool of available reagents makes such approaches appear as rather theoretical and reality-disconnected. In this context, here we present Synthons Interpreter (SynthI), a new open-source toolkit for de novo library design that allows merging those two chemical spaces into a single synthons space. Here synthons are defined as actual fragments with valid valences and special labels, specifying the position and the nature of reactive centers. They can be issued from either the "breakup" of reference compounds according to 38 retrosynthetic rules or real reagents, after leaving group withdrawal or transformation. Such an approach not only enables the design of synthetically accessible libraries and analog generation but also facilitates reagents (building blocks) analysis in the medicinal chemistry context. SynthI code is publicly available at https://github.com/Laboratoire-de-Chemoinformatique/SynthI.


Asunto(s)
Indicadores y Reactivos
8.
J Chem Inf Model ; 62(9): 2171-2185, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-34928600

RESUMEN

The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BBs). In this work, we present a detailed analysis of the chemical space of 400 000 purchasable BBs. The chemical space was defined by corresponding synthons─fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BBs were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space that enables visualization of both libraries and analysis of their overlapped and library-specific regions.


Asunto(s)
Química Farmacéutica , Descubrimiento de Drogas , Descubrimiento de Drogas/métodos , Indicadores y Reactivos
9.
Int J Mol Sci ; 23(11)2022 May 30.
Artículo en Inglés | MEDLINE | ID: mdl-35682792

RESUMEN

Molecular similarity is an impressively broad topic with many implications in several areas of chemistry. Its roots lie in the paradigm that 'similar molecules have similar properties'. For this reason, methods for determining molecular similarity find wide application in pharmaceutical companies, e.g., in the context of structure-activity relationships. The similarity evaluation is also used in the field of chemical legislation, specifically in the procedure to judge if a new molecule can obtain the status of orphan drug with the consequent financial benefits. For this procedure, the European Medicines Agency uses experts' judgments. It is clear that the perception of the similarity depends on the observer, so the development of models to reproduce the human perception is useful. In this paper, we built models using both 2D fingerprints and 3D descriptors, i.e., molecular shape and pharmacophore descriptors. The proposed models were also evaluated by constructing a dataset of pairs of molecules which was submitted to a group of experts for the similarity judgment. The proposed machine-learning models can be useful to reduce or assist human efforts in future evaluations. For this reason, the new molecules dataset and an online tool for molecular similarity estimation have been made freely available.


Asunto(s)
Aprendizaje Automático , Receptores de Droga , Humanos , Percepción , Relación Estructura-Actividad
10.
Int J Mol Sci ; 23(5)2022 Mar 03.
Artículo en Inglés | MEDLINE | ID: mdl-35269934

RESUMEN

Neuromyelitis optica spectrum disorder (NMOSD) and multiple sclerosis (MS) are both autoimmune inflammatory and demyelinating diseases of the central nervous system. NMOSD is a highly disabling disease and rapid introduction of the appropriate treatment at the acute phase is crucial to prevent sequelae. Specific criteria were established in 2015 and provide keys to distinguish NMOSD and MS. One of the most reliable criteria for NMOSD diagnosis is detection in patient's serum of an antibody that attacks the water channel aquaporin-4 (AQP-4). Another target in NMOSD is myelin oligodendrocyte glycoprotein (MOG), delineating a new spectrum of diseases called MOG-associated diseases. Lastly, patients with NMOSD can be negative for both AQP-4 and MOG antibodies. At disease onset, NMOSD symptoms are very similar to MS symptoms from a clinical and radiological perspective. Thus, at first episode, given the urgency of starting the anti-inflammatory treatment, there is an unmet need to differentiate NMOSD subtypes from MS. Here, we used Fourier transform infrared spectroscopy in combination with a machine learning algorithm with the aim of distinguishing the infrared signatures of sera of a first episode of NMOSD from those of a first episode of relapsing-remitting MS, as well as from those of healthy subjects and patients with chronic inflammatory demyelinating polyneuropathy. Our results showed that NMOSD patients were distinguished from MS patients and healthy subjects with a sensitivity of 100% and a specificity of 100%. We also discuss the distinction between the different NMOSD serostatuses. The coupling of infrared spectroscopy of sera to machine learning is a promising cost-effective, rapid and reliable differential diagnosis tool capable of helping to gain valuable time in patients' treatment.


Asunto(s)
Esclerosis Múltiple , Neuromielitis Óptica , Acuaporina 4 , Autoanticuerpos , Humanos , Aprendizaje Automático , Esclerosis Múltiple/diagnóstico , Glicoproteína Mielina-Oligodendrócito
11.
Molecules ; 27(17)2022 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-36080168

RESUMEN

New models for ACE2 receptor binding, based on QSAR and docking algorithms were developed, using XRD structural data and ChEMBL 26 database hits as training sets. The selectivity of the potential ACE2-binding ligands towards Neprilysin (NEP) and ACE was evaluated. The Enamine screening collection (3.2 million compounds) was virtually screened according to the above models, in order to find possible ACE2-chemical probes, useful for the study of SARS-CoV2-induced neurological disorders. An enzymology inhibition assay for ACE2 was optimized, and the combined diversified set of predicted selective ACE2-binding molecules from QSAR modeling, docking, and ultrafast docking was screened in vitro. The in vitro hits included two novel chemotypes suitable for further optimization.


Asunto(s)
Enzima Convertidora de Angiotensina 2 , COVID-19 , Humanos , Simulación del Acoplamiento Molecular , Peptidil-Dipeptidasa A/metabolismo , ARN Viral , SARS-CoV-2
12.
J Chem Inf Model ; 61(1): 179-188, 2021 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-33334102

RESUMEN

The days when medicinal chemistry was limited to a few series of compounds of therapeutic interest are long gone. Nowadays, no human may succeed to acquire a complete overview of more than a billion existing or feasible compounds within which the potential "blockbuster drugs" are well hidden and yet only a few mouse clicks away. To reach these "hidden treasures", we adapted the generative topographic mapping method to enable efficient navigation through the chemical space, from a global overview to a structural pattern detection, covering, for the first time, the complete ZINC library of purchasable compounds, relative to 1.6 million biologically relevant ChEMBL molecules. About 40 000 hierarchical maps of the chemical space were constructed. Structural motifs inherent to only one library were identified. Roughly 20 000 off-market ChEMBL compound families represent incentives to enrich commercial catalogs. Alternatively, 125 000 ZINC-specific compound classes, absent in structure-activity bases, are novel paths to explore in medicinal chemistry. The complete list of these chemotypes can be downloaded using the link https://forms.gle/B6bUJj82t9EfmttV6.


Asunto(s)
Química Farmacéutica
13.
Environ Sci Technol ; 55(22): 15542-15553, 2021 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-34736317

RESUMEN

The removal of CO2 from gases is an important industrial process in the transition to a low-carbon economy. The use of selective physical (co-)solvents is especially perspective in cases when the amount of CO2 is large as it enables one to lower the energy requirements for solvent regeneration. However, only a few physical solvents have found industrial application and the design of new ones can pave the way to more efficient gas treatment techniques. Experimental screening of gas solubility is a labor-intensive process, and solubility modeling is a viable strategy to reduce the number of solvents subject to experimental measurements. In this paper, a chemoinformatics-based modeling workflow was applied to build a predictive model for the solubility of CO2 and four other industrially important gases (CO, CH4, H2, and N2). A dataset containing solubilities of gases in 280 solvents was collected from literature sources and supplemented with the new data for six solvents measured in the present study. A modeling workflow based on the usage of several state-of-the-art machine learning algorithms was applied to establish quantitative structure-solubility relationships. The best models were used to perform virtual screening of the industrially produced chemicals. It enabled the identification of compounds with high predicted CO2 solubility and selectivity toward other gases. The prediction for one of the compounds, 4-methylmorpholine, was confirmed experimentally.


Asunto(s)
Dióxido de Carbono , Quimioinformática , Gases , Solubilidad , Solventes
14.
Molecules ; 26(13)2021 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-34203441

RESUMEN

In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules ("fragments") in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics.

15.
J Chem Inf Model ; 60(12): 6020-6032, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-33172272

RESUMEN

In chemography, grid-based maps sample molecular descriptor space by injecting a set of nodes, and then linking them to some regular 2D grid representing the map. They include self-organizing maps (SOMs) and generative topographic maps (GTMs). Grid-based maps are predictive because any compound thereupon projected can "inherit" the properties of its residence node(s)-node properties themselves "inherited" from node-neighboring training set compounds. This Article proposes a formalism to define the trustworthiness of these nodes as "providers" of structure-activity information captured from training compounds. An empirical four-parameter node trustworthiness (NT) function of density (sparsely populated nodes are less trustworthy) and coherence (nodes with training set residents of divergent properties are less trustworthy) is proposed. Based upon it, a trustworthiness score T is used to delimit the applicability domain (AD) by means of a trustworthiness threshold TT. For each parameter setup, success of ensuing inside-AD predictions is monitored. It is seen that setup-specific success levels (averaged over large pools of prediction challenges) are highly covariant, irrespectively of the targets of prediction challenges, of the (classification or regression) type of problems, of the specific parametrization, and even of the nature (GTM or SOM) of underlying maps. Thus, success levels determined on the basis of regression problems (445 target-specific affinity QSAR sets) on GTMs and levels returned by completely unrelated classification problems (319 target-specific active-/inactive-labeled sets) on SOMs were seen to correlate to a degree of 70%. Therefore, a common, general-purpose setup of the herein proposed parametric AD definition was shown to generally apply to grid-based map-driven property prediction problems.


Asunto(s)
Algoritmos
16.
J Chem Inf Model ; 60(6): 2951-2965, 2020 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-32374171

RESUMEN

The Generalized Born (GB) solvent model is offering the best accuracy/computing effort ratio yet requires drastic simplifications to estimate of the Effective Born Radii (EBR) in bypassing a too expensive volume integration step. EBRs are a measure of the degree of burial of an atom and not very sensitive to small changes of geometry: in molecular dynamics, the costly EBR update procedure is not mandatory at every step. This work however aims at implementing a GB model into the Sampler for Multiple Protein-Ligand Entities (S4MPLE) evolutionary algorithm with mandatory EBR updates at each step triggering arbitrarily large geometric changes. Therefore, a quantitative structure-property relationship has been developed in order to express the EBRs as a linear function of both the topological neighborhood and geometric occupancy of the space around atoms. A training set of 810 molecular systems, starting from fragment-like to drug-like compounds, proteins, host-guest systems, and ligand-protein complexes, has been compiled. For each species, S4MPLE generated several hundreds of random conformers. For each atom in each geometry of each species, its "standard" EBR was calculated by numeric integration and associated to topological and geometric descriptors of the atom neighborhood. This training set (EBR, atom descriptors) involving >5 M entries was subjected to a boot-strapping multilinear regression process with descriptor selection. In parallel, the strategy was repurposed to also learn atomic solvent-accessible areas (SA) based on the same descriptors. Resulting linear equations were challenged to predict EBR and SA values for a similarly compiled external set of >2000 new molecular systems. Solvation energies calculated with estimated EBR and SA match "standard" energies within the typical error of a force-field-based approach (a few kilocalories per mole). Given the extreme diversity of molecular systems covered by the model, this simple EBR/SA estimator covers a vast applicability domain.


Asunto(s)
Quimioinformática , Radio (Anatomía) , Proteínas , Solventes , Termodinámica
17.
J Comput Aided Mol Des ; 34(7): 805-815, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31407224

RESUMEN

Generative topographic mapping was used to investigate the possibility to diversify the in-house compounds collection of Boehringer Ingelheim (BI). For this purpose, a 2D map covering the relevant chemical space was trained, and the BI compound library was compared to the Aldrich-Market Select (AMS) database of more than 8M purchasable compounds. In order to discover new (sub)structures, the "AutoZoom" tool was developed and applied in order to analyze chemotypes of molecules residing in heavily populated zones of a map and to extract the corresponding maximum common substructures. A set of 401K new structures from the AMS database was retrieved and checked for drug-likeness and biological activity.


Asunto(s)
Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas , Algoritmos , Diseño Asistido por Computadora/estadística & datos numéricos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos Farmacéuticas/estadística & datos numéricos , Diseño de Fármacos , Desarrollo de Medicamentos/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Humanos , Estructura Molecular , Programas Informáticos , Interfaz Usuario-Computador
18.
Analyst ; 144(15): 4647-4652, 2019 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-31257384

RESUMEN

The challenging diagnosis and differentiation between multiple sclerosis and amyotrophic lateral sclerosis relies on the clinical assessment of the symptoms along with magnetic resonance imaging and sampling cerebrospinal fluid for the search of biomarkers for either disease. Despite the progress made in imaging techniques and biomarker identification, misdiagnosis still occurs. Here we used 2.5 µL of serum samples to obtain the infrared spectroscopic signatures of sera of multiple sclerosis and amyotrophic lateral sclerosis patients and compared them to those of healthy controls. The spectra are then classified with the help of a two-fold Random Forest cross-validation algorithm. This approach shows that infrared spectroscopy is powerful in discriminating between the two diseases and healthy controls by offering high specificity for multiple sclerosis (100%) and amyotrophic lateral sclerosis (98%). In addition, data after six and twelve months of treatment of the multiple sclerosis patients with biotin are discussed.


Asunto(s)
Esclerosis Amiotrófica Lateral/diagnóstico , Biomarcadores/sangre , Esclerosis Múltiple/diagnóstico , Adulto , Anciano , Anciano de 80 o más Años , Algoritmos , Esclerosis Amiotrófica Lateral/tratamiento farmacológico , Biotina/uso terapéutico , Árboles de Decisión , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Persona de Mediana Edad , Esclerosis Múltiple/tratamiento farmacológico , Proyectos Piloto , Espectroscopía Infrarroja por Transformada de Fourier/métodos
19.
J Chem Inf Model ; 59(3): 1182-1196, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30785751

RESUMEN

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).


Asunto(s)
Aprendizaje Profundo , Diseño de Fármacos , Dominio Catalítico , Evaluación Preclínica de Medicamentos , Ligandos , Simulación del Acoplamiento Molecular , Receptor de Adenosina A2A/química , Receptor de Adenosina A2A/metabolismo , Bibliotecas de Moléculas Pequeñas/metabolismo , Bibliotecas de Moléculas Pequeñas/farmacología
20.
J Chem Inf Model ; 59(1): 564-572, 2019 01 28.
Artículo en Inglés | MEDLINE | ID: mdl-30567430

RESUMEN

Universal generative topographic maps (GTMs) provide two-dimensional representations of chemical space selected for their "polypharmacological competence", that is, the ability to simultaneously represent meaningful activity and property landscapes, associated with many distinct targets and properties. Several such GTMs can be generated, each based on a different initial descriptor vector, encoding distinct structural features. While their average polypharmacological competence may indeed be equivalent, they nevertheless significantly diverge with respect to the quality of each property-specific landscape. In this work, we show that distinct universal maps represent complementary and strongly synergistic views of biologically relevant chemical space. Eight universal GTMs were employed as support for predictive classification landscapes, using more than 600 active/inactive ligand series associated with as many targets from the ChEMBL database (v.23). For nine of these targets, it was possible to extract, from the Directory of Useful Decoys (DUD), truly external sets featuring sufficient "actives" and "decoys" not present in the landscape-defining ChEMBL ligand sets. For each such molecule, projected on every class landscape of a particular universal map, a probability of activity was estimated, in analogy to a virtual screening (VS) experiment. Cross-validated (CV) balanced accuracy on landscape-defining ChEMBL data was unable to predict the success of that landscape in VS. Thus, the universal map with best CV results for a given property should not be prioritized as the implicitly best predictor. For a given map, predictions for many DUD compounds are not trustworthy, according to applicability domain considerations. By contrast, simultaneous application of all universal maps, and rating of the likelihood of activity as the mean returned by all applicable maps, significantly improved prediction results. Performance measures in consensus VS using multiple maps were always superior or similar to those of the best individual map.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Bases de Datos Farmacéuticas , Interfaz Usuario-Computador , Flujo de Trabajo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA