Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
J Chem Inf Model ; 64(10): 4031-4046, 2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38739465

RESUMEN

Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multitask learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks; in other words, how difficult ("hard") a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning on the FS-Mol data set, we demonstrate that the proposed task hardness metric is inversely correlated with performance (Pearson's correlation coefficient r = -0.72). The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.


Asunto(s)
Aprendizaje Automático , Descubrimiento de Drogas/métodos , Humanos
2.
J Chem Inf Model ; 64(2): 348-358, 2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38170877

RESUMEN

The ability to determine and predict metabolically labile atom positions in a molecule (also called "sites of metabolism" or "SoMs") is of high interest to the design and optimization of bioactive compounds, such as drugs, agrochemicals, and cosmetics. In recent years, several in silico models for SoM prediction have become available, many of which include a machine-learning component. The bottleneck in advancing these approaches is the coverage of distinct atom environments and rare and complex biotransformation events with high-quality experimental data. Pharmaceutical companies typically have measured metabolism data available for several hundred to several thousand compounds. However, even for metabolism experts, interpreting these data and assigning SoMs are challenging and time-consuming. Therefore, a significant proportion of the potential of the existing metabolism data, particularly in machine learning, remains dormant. Here, we report on the development and validation of an active learning approach that identifies the most informative atoms across molecular data sets for SoM annotation. The active learning approach, built on a highly efficient reimplementation of SoM predictor FAME 3, enables experts to prioritize their SoM experimental measurements and annotation efforts on the most rewarding atom environments. We show that this active learning approach yields competitive SoM predictors while requiring the annotation of only 20% of the atom positions required by FAME 3. The source code of the approach presented in this work is publicly available.


Asunto(s)
Aprendizaje Automático , Programas Informáticos
3.
Nat Prod Rep ; 39(8): 1544-1556, 2022 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-35708009

RESUMEN

Covering: up to 2021The structural core of most small-molecule drugs is formed by a ring system, often derived from natural products. However, despite the importance of natural product ring systems in bioactive small molecules, there is still a lack of a comprehensive overview and understanding of natural product ring systems and how their full potential can be harnessed in drug discovery and related fields. Herein, we present a comprehensive cheminformatic analysis of the structural and physicochemical properties of 38 662 natural product ring systems, and the coverage of natural product ring systems by readily purchasable, synthetic compounds that are commonly explored in virtual screening and high-throughput screening. The analysis stands out by the use of comprehensive, curated data sets, the careful consideration of stereochemical information, and a robust analysis of the 3D molecular shape and electrostatic properties of ring systems. Among the key findings of this study are the facts that only about 2% of the ring systems observed in NPs are present in approved drugs but that approximately one in two NP ring systems are represented by ring systems with identical or related 3D shape and electrostatic properties in compounds that are typically used in (high-throughput) screening.


Asunto(s)
Productos Biológicos , Productos Biológicos/química , Productos Biológicos/farmacología , Descubrimiento de Drogas
4.
Int J Mol Sci ; 23(14)2022 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-35887097

RESUMEN

Methods for the pairwise comparison of 2D and 3D molecular structures are established approaches in virtual screening. In this work, we explored three strategies for maximizing the virtual screening performance of these methods: (i) the merging of hit lists obtained from multi-compound screening using a single screening method, (ii) the merging of the hit lists obtained from 2D and 3D screening by parallel selection, and (iii) the combination of both of these strategies in an integrated approach. We found that any of these strategies led to a boost in virtual screening performance, with the clearest advantages observed for the integrated approach. On test sets for virtual screening, covering 50 pharmaceutically relevant proteins, the integrated approach, using sets of five query molecules, yielded, on average, an area under the receiver operating characteristic curve (AUC) of 0.84, an early enrichment among the top 1% of ranked compounds (EF1%) of 53.82 and a scaffold recovery rate among the top 1% of ranked compounds (SRR1%) of 0.50. In comparison, the 2D and 3D methods on their own (when using a single query molecule) yielded AUC values of 0.68 and 0.54, EF1% values of 19.96 and 17.52, and SRR1% values of 0.20 and 0.17, respectively. In conclusion, based on these results, the integration of 2D and 3D methods, via a (balanced) parallel selection strategy, is recommended, and, in particular, when combined with multi-query screening.


Asunto(s)
Proteínas , Ligandos , Conformación Molecular , Curva ROC
5.
Nat Rev Chem ; 8(5): 319-339, 2024 05.
Artículo en Inglés | MEDLINE | ID: mdl-38622244

RESUMEN

Biochemical and cell-based assays are essential to discovering and optimizing efficacious and safe drugs, agrochemicals and cosmetics. However, false assay readouts stemming from colloidal aggregation, chemical reactivity, chelation, light signal attenuation and emission, membrane disruption, and other interference mechanisms remain a considerable challenge in screening synthetic compounds and natural products. To address assay interference, a range of powerful experimental approaches are available and in silico methods are now gaining traction. This Review begins with an overview of the scope and limitations of experimental approaches for tackling assay interference. It then focuses on theoretical methods, discusses strategies for their integration with experimental approaches, and provides recommendations for best practices. The Review closes with a summary of the critical facts and an outlook on potential future developments.


Asunto(s)
Bibliotecas de Moléculas Pequeñas , Humanos , Bioensayo/métodos
6.
Cells ; 11(8)2022 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-35455933

RESUMEN

The pregnane X receptor (PXR) regulates the metabolism of many xenobiotic and endobiotic substances. In consequence, PXR decreases the efficacy of many small-molecule drugs and induces drug-drug interactions. The prediction of PXR activators with theoretical approaches such as machine learning (ML) proves challenging due to the ligand promiscuity of PXR, which is related to its large and flexible binding pocket. In this work we demonstrate, by the example of random forest models and support vector machines, that classifiers generated following classical training procedures often fail to predict PXR activity for compounds that are dissimilar from those in the training set. We present a novel regularization technique that penalizes the gap between a model's training and validation performance. On a challenging test set, this technique led to improvements in Matthew correlation coefficients (MCCs) by up to 0.21. Using these regularized ML models, we selected 31 compounds that are structurally distinct from known PXR ligands for experimental validation. Twelve of them were confirmed as active in the cellular PXR ligand-binding domain assembly assay and more hits were identified during follow-up studies. Comprehensive analysis of key features of PXR biology conducted for three representative hits confirmed their ability to activate the PXR.


Asunto(s)
Receptores de Esteroides , Ligandos , Aprendizaje Automático , Receptor X de Pregnano , Receptores de Esteroides/metabolismo , Xenobióticos
7.
Pharmaceuticals (Basel) ; 14(8)2021 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-34451887

RESUMEN

In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model ("Skin Doctor CP:Bio") obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.

8.
Biomolecules ; 9(2)2019 01 24.
Artículo en Inglés | MEDLINE | ID: mdl-30682850

RESUMEN

Natural products (NPs) remain the most prolific resource for the development of smallmolecule drugs. Here we report a new machine learning approach that allows the identification of natural products with high accuracy. The method also generates similarity maps, which highlight atoms that contribute significantly to the classification of small molecules as a natural product or synthetic molecule. The method can hence be utilized to (i) identify natural products in large molecular libraries, (ii) quantify the natural product-likeness of small molecules, and (iii) visualize atoms in small molecules that are characteristic of natural products or synthetic molecules. The models are based on random forest classifiers trained on data sets consisting of more than 265,000 to 322,000 natural products and synthetic molecules. Two-dimensional molecular descriptors, MACCS keys and Morgan2 fingerprints were explored. On an independent test set the models reached areas under the receiver operating characteristic curve (AUC) of 0.997 and Matthews correlation coefficients (MCCs) of 0.954 and higher. The method was further tested on data from the Dictionary of Natural Products, ChEMBL and other resources. The best-performing models are accessible as a free web service at http://npscout.zbh.uni-hamburg.de/npscout.


Asunto(s)
Productos Biológicos/química , Aprendizaje Automático , Bibliotecas de Moléculas Pequeñas/química , Estructura Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA