Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
J Chem Inf Model ; 64(14): 5557-5569, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-38950192

RESUMEN

Scaffold-hopped (SH) compounds are bioactive compounds structurally different from known active compounds. Identifying SH compounds in the ligand-based approaches has been a central issue in medicinal chemistry, and various molecular representations of scaffold hopping have been proposed. However, appropriate representations for SH compound identification remain unclear. Herein, the ability of SH compound identification among several representations was fairly evaluated based on retrospective validation and prospective demonstration. In the retrospective validation, the combinations of two screening algorithms and four two- and three-dimensional molecular representations were compared using controlled data sets for the early identification of SH compounds. We found that the combination of the support vector machine and extended connectivity fingerprint with bond diameter 4 (SVM-ECFP4) and SVM and the rapid overlay of chemical structures (SVM-ROCS) showed a relatively high performance. The compounds that were highly ranked by SVM-ROCS did not share substructures with the active training compounds, while those ranked by SVM-ECFP4 were mostly recombinant. In the prospective demonstration, 93 SH compounds were prepared by screening the Namiki database using SVM-ROCS, targeting ABL1 inhibitors. The primary screening using surface plasmon resonance suggested five active compounds; however, in the competitive binding assays with adenosine triphosphate, no hits were found.


Asunto(s)
Máquina de Vectores de Soporte , Ligandos , Humanos , Modelos Moleculares , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Algoritmos
2.
J Comput Aided Mol Des ; 36(3): 237-252, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35348984

RESUMEN

The retrospective evaluation of virtual screening approaches and activity prediction models are important for methodological development. However, for fair comparison, evaluation data sets must be carefully prepared. In this research, we compiled structure-activity-relationship matrix-based data sets for 15 biological targets along with many diverse inactive compounds, assuming the early stage of structure-activity-relationship progression. To use a large number of diverse inactive compounds and a limited number of active compounds, similarity profiles (SPs) are proposed as a set of molecular descriptors. Using these highly imbalanced data sets, we evaluated various approaches including SPs, under-sampling, support vector machine (SVM), and message passing neural networks. We found that for the under-sampling approaches, cluster-based sampling is better than random sampling. For virtual screening, SPs with inactive reference compounds and the under-sampling SVM also perform well. For classification, SPs with many inactive references performed as well as the under-sampling SVM trained on a balanced data set. Although the performance of SPs and the under-sampling SVM were comparable, SPs with many inactive references were preferable for selecting structurally distinct compounds from the active training compounds.


Asunto(s)
Máquina de Vectores de Soporte , Ligandos , Estudios Retrospectivos , Relación Estructura-Actividad
3.
J Chem Inf Model ; 61(7): 3348-3360, 2021 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-34264667

RESUMEN

The aim of scaffold hopping (SH) is to find compounds consisting of different scaffolds from those in already known active compounds, giving an opportunity for unexplored regions of chemical space. We previously demonstrated the usefulness of pharmacophore graphs (PhGs) for this purpose through proof-of-concept virtual screening experiments. PhGs consist of nodes and edges corresponding to pharmacophoric features (PFs) and their topological distances. Although PhGs were effective in SH, they are hard to interpret as they are complete graphs. Herein, we introduce an intuitive representation of a molecule, termed as sparse pharmacophore graphs (SPhG) by keeping the topological distances among PFs as much as possible while reducing the number of edges in the graphs. Several benchmark calculations quantitatively confirmed the sparseness of the graphs and the preservation of topological distances among pharmacophoric points. As proof-of-concept applications, virtual screening (VS) trials for SH were conducted using active and inactive compounds from ChEMBL and PubChem databases for three biological targets: thrombin, tyrosine kinase ABL1, and κ-opioid receptor. The performances of VS were comparable with using fully connected PhGs. Furthermore, highly ranked SPhGs were interpretable for the three biological targets, in particular for thrombin, for which selected SPhGs were in agreement with the structure-based interpretation.


Asunto(s)
Diseño de Fármacos , Receptores de Droga
4.
J Comput Aided Mol Des ; 35(2): 179-193, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33392949

RESUMEN

Quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) models predict biological activity and molecular property based on the numerical relationship between chemical structures and activity (property) values. Molecular representations are of importance in QSAR/QSPR analysis. Topological information of molecular structures is usually utilized (2D representations) for this purpose. However, conformational information seems important because molecules are in the three-dimensional space. As a three-dimensional molecular representation applicable to diverse compounds, similarity between a test molecule and a set of reference molecules has been previously proposed. This 3D representation was found to be effective on virtual screening for early enrichment of active compounds. In this study, we introduced the 3D representation into QSAR/QSPR modeling (regression tasks). Furthermore, we investigated relative merits of 3D representations over 2D in terms of the diversity of training data sets. For the prediction task of quantum mechanics-based properties, the 3D representations were superior to 2D. For predicting activity of small molecules against specific biological targets, no consistent trend was observed in the difference of performance using the two types of representations, irrespective of the diversity of training data sets.


Asunto(s)
Compuestos Orgánicos/química , Bases de Datos Factuales , Evaluación Preclínica de Medicamentos , Aprendizaje Automático , Modelos Moleculares , Conformación Molecular , Relación Estructura-Actividad Cuantitativa , Análisis de Regresión
5.
Molecules ; 26(16)2021 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-34443503

RESUMEN

Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers' decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have been proposed. However, the proposed methods face a challenge of interpretability due to the black-box character of the predictive models. In this study, we developed interpretable MMP fingerprints and modified a model-specific interpretation approach for models based on a support vector machine (SVM) and MMP kernel. We compared important features highlighted by this SVM-based interpretation approach and the SHapley Additive exPlanations (SHAP) as a major model-independent approach. The model-specific approach could capture the difference between AC and non-AC, while SHAP assigned high weights to the features not present in the test instances. For specific MMPs, the feature weights mapped by the SVM-based interpretation method were in agreement with the previously confirmed binding knowledge from X-ray co-crystal structures, indicating that this method is able to interpret the AC prediction model in a chemically intuitive manner.

6.
J Chem Inf Model ; 60(4): 2073-2081, 2020 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-32202780

RESUMEN

The primary goal of ligand-based virtual screening is to identify active compounds consisting of a core scaffold that is not found in the current active compound pool. Scaffold hopping is the term used for this purpose. In the present study, topological representations of pharmacophore features on chemical graphs were investigated for scaffold hopping. Pharmacophore graphs (PhGs), which consist of pharmacophore features as nodes and their topological distances as edges, were used as a representation of important information on compounds being active. We investigated ranking methods for prioritizing PhGs for scaffold hopping. The proposed method, NScaffold, which ranks PhGs based on the number of scaffolds covered by the PhGs, outperforms other conventional methods. As a demonstrative case, using a thrombin inhibitor data set, we interpreted the highest-ranked PhGs by NScaffold from the protein-ligand interaction point of view. It resulted that the NScaffold method successfully retrieved three known important interactions, showing the potential for identifying scaffold-hopped compounds with interpretable PhGs.


Asunto(s)
Receptores de Droga , Ligandos
7.
Clin Exp Allergy ; 49(4): 474-483, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30431203

RESUMEN

BACKGROUND: Chemokines are involved not only in regulating leucocyte recruitment, but also in other activities. However, functions other than cell recruitment remain poorly understood. We have already shown that the production of CC chemokine ligand (CCL)17 and CCL22 by antigen-stimulated naïve CD4+  T cells was higher in asthmatic patients than in healthy controls. However, the role of these chemokines in stimulated naïve CD4+ T cells remains unclear. OBJECTIVE: To clarify the biological function of CCL17 and CCL22 on naïve CD4+ T, we examined effects of these two chemokines on naïve CD4+ T cells expressing CC chemokine receptor (CCR)4 (a receptor for CCL17 and CCL22) during differentiation of Th2 cells in asthmatic patients as allergic subjects. METHODS: Naïve CD4+ T cells were prepared from healthy controls and patients with asthma. We analysed effect of CCL17 and CCL22, and blocking their receptor on differentiation of Th2 cells. RESULTS: Production of CCL17 and CCL22 by activated naive CD4+ T cells under Th2 condition was much more in asthmatic patients than in healthy controls. Proliferation and survival of the Th2 differentiating cells and restimulation-induced IL-4 production were much greater in asthmatic patients than in healthy controls. These cell biological phenomena were inhibited by blockade of CCR4. The biological effects of exogenous CCL17 and CCL22 were apparently observed in both healthy controls and asthmatic patients. The effectiveness of these chemokines on naïve CD4+ T cells from healthy controls was stronger than those from asthmatic patients. We found that thymic stromal lymphopoietin (TSLP), a Th2 promoting chemokine, is involved in the activation of CD4+ naïve T cells via production of CCL17 and CCL22. CONCLUSIONS AND CLINICAL RELEVANCE: These data suggest that CCL17 and CCL22 produced by TSLP-primed naïve CD4+ T cells in asthma might contribute to an increase in Th2 cells via autocrine loops.


Asunto(s)
Comunicación Autocrina , Diferenciación Celular/inmunología , Quimiocinas CC/metabolismo , Hipersensibilidad Inmediata/inmunología , Hipersensibilidad Inmediata/metabolismo , Células Th2/inmunología , Células Th2/metabolismo , Adulto , Apoptosis/inmunología , Asma/diagnóstico , Asma/inmunología , Asma/metabolismo , Biomarcadores , Linfocitos T CD4-Positivos/citología , Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD4-Positivos/metabolismo , Estudios de Casos y Controles , Femenino , Humanos , Inmunoglobulina E/inmunología , Inmunofenotipificación , Activación de Linfocitos/inmunología , Recuento de Linfocitos , Masculino , Subgrupos de Linfocitos T/citología , Subgrupos de Linfocitos T/inmunología , Subgrupos de Linfocitos T/metabolismo , Células Th2/citología
8.
J Chem Inf Model ; 59(6): 2626-2641, 2019 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-31058504

RESUMEN

Identification of chemical compounds having desirable properties is a central goal of screening campaigns. Iterative screening is a means of surveying a set of compounds, during which their property values are determined and used as feedback for regression models. Quantitative models that assess the relationships between chemical structures and property/activity are repeatedly updated through this type of cycle, and the efficient sampling of compounds for the subsequent test is a key factor in the early identification of target compounds. Nevertheless, methodological approaches to comparisons and to establishing the degree of extrapolation of sampled compounds, including the effects of applicability domains, are still required. In the present study, we conducted a series of virtual experiments to assess the characteristics of different iterative screening methods. Genetic algorithm-based partial least-squares regression, support vector regression, Bayesian optimization with Gaussian Process (GP), and batch-based Bayesian optimization with GP (GP_batch) were all compared, based on the analysis of one million compounds extracted from the ZINC database. Our results show that, irrespective of the diversity of the initial set of compounds, it was possible to identify a compound having the desired property value using the appropriate screening method. However, overall, the GP_batch method was found to be preferable when evaluating properties either which are difficult to predict or for which a key factor is present in the set of molecular descriptors.


Asunto(s)
Descubrimiento de Drogas/métodos , Preparaciones Farmacéuticas/química , Bibliotecas de Moléculas Pequeñas/química , Teorema de Bayes , Humanos , Análisis de los Mínimos Cuadrados , Distribución Normal , Farmacología , Relación Estructura-Actividad Cuantitativa , Bibliotecas de Moléculas Pequeñas/farmacología , Máquina de Vectores de Soporte
9.
J Chem Inf Model ; 59(6): 2656-2663, 2019 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-31059251

RESUMEN

Molecular fingerprints are indispensable in medicinal chemistry for quantifying chemical structures. Fingerprints can be calculated for substructures with attachment points, which are positions where a substructure and a corresponding core structure connect. Because structures with attachment points can be crucial for understanding structure-activity relationships, fingerprints specialized for representing this structural feature are required. R-group fingerprints and R-group descriptors were proposed previously for this purpose; however, these molecular representations have limitations. Current R-group fingerprints do not emphasize information about attachment points, and R-group descriptors are too sensitive to changes in the topological path length from an attachment point. In the present work, we developed novel R-group fingerprints, termed R-path fingerprints, which contain substituent information from an attachment point without being sensitive to small differences in topological distances. The concept of the R-path fingerprints is to describe a chemical substructure from the viewpoint of an attachment point, to distinguish atomistic information around the attachment point and other parts of the substructure. This was achieved by considering all the paths on the shortest path between the attachment point and each atom in a substituent. Benchmark testing was conducted, including comparisons of similarity distributions and potency prediction for R-group substituents. The results showed that R-path fingerprints should be useful for classifying and comparing substructures with attachment points.


Asunto(s)
Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Algoritmos , Diseño de Fármacos , Humanos , Estructura Molecular , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad
10.
J Chem Inf Model ; 59(3): 993-1004, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30485091

RESUMEN

Activity landscapes (ALs) integrate structural and potency data of active compounds and provide graphical access to structure-activity relationships (SARs) contained in compound data sets. Three-dimensional (3D) ALs can be conceptualized as a two-dimensional (2D) projection of chemical space with an interpolated activity surface added as a third dimension. Such 3D ALs are particularly intuitive for SAR visualization. In this work, 3D ALs were generated on the basis of different projection methods and fingerprint descriptors, and their topologies were compared. Moreover, going beyond qualitative analysis, the use of 3D ALs for semiquantitative and quantitative potency predictions was investigated. NeuroScale, a neural network variant of multidimensional scaling, combined with Gaussian process regression (GPR) was identified as a preferred approach for generating 3D ALs that accounted for training compounds and their SAR characteristics with high accuracy. On the other hand, GPR-induced overfitting generally limited the accuracy of potency value predictions regardless of the projection method applied. However, 3D ALs enabled reliable mapping of test compounds with varying potency levels to corresponding AL regions. The most accurate mapping was achieved with NeuroScale models. Taken together, the results of our analysis indicate the high potential of 3D ALs for graphical SAR exploration and the identification of potent test compounds.


Asunto(s)
Simulación por Computador , Preparaciones Farmacéuticas/química , Diseño de Fármacos , Ligandos , Estructura Molecular , Distribución Normal , Relación Estructura-Actividad Cuantitativa
11.
J Chem Inf Model ; 59(3): 983-992, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30547580

RESUMEN

Support vector regression (SVR) is a premier approach for the prediction of compound potency. Given the conceptual link between support vector machine (SVM) and SVR modeling, SVR is capable of accounting for continuous and discontinuous structure-activity relationships (SARs) in potency prediction, which further extends the classical quantitative SAR (QSAR) paradigm. In the context of virtual compound screening, compound potency prediction can be applied to identify the most potent compounds that are available or enrich database selection sets with potent compounds. To these ends, we have evaluated new potency prediction strategies. Conventional (direct) potency prediction using SVR was compared to two-stage SVM-SVR modeling and potency prediction using SVR models trained in the presence of active and inactive compounds, a previously unconsidered approach. The latter models were found to maximize the recall of potent compounds but were least accurate in predicting high potency values. For this purpose, direct SVR predictions were preferred. However, the best balance between accurate potency predictions and enrichment of potent compounds in database selection sets was achieved by combined SVM-SVR modeling. Taken together, our findings further extend current approaches for compound potency prediction in virtual compound screening.


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Máquina de Vectores de Soporte , Relación Estructura-Actividad Cuantitativa , Análisis de Regresión
12.
J Comput Aided Mol Des ; 33(8): 729-743, 2019 08.
Artículo en Inglés | MEDLINE | ID: mdl-31435894

RESUMEN

In this work, computational compound screening strategies on the basis of two- and three-dimensional (2D and 3D) molecular representations were investigated including similarity searching and support vector machine (SVM) ranking. Calculations based on topological fingerprints and molecular shape queries and features were compared. A unique aspect of the analysis setting apart from previous comparisons of 2D and 3D virtual screening approaches has been the design of compound reference, training, and test data sets with controlled incremental increases in intra-set structural diversity and different categories of structural relationships between reference/training and test sets. The use of these data sets made it possible to assess the relative performance of 2D and 3D screening strategies under increasingly challenging conditions ultimately leading to the use of training and test sets with essentially unrelated structures. The results showed that 3D similarity searching had little advantage over 2D searching in identifying active compounds with remote structural relationships. However, 3D SVM models trained on the basis of shape features were superior to other approaches (including 2D SVM) when the detection of structure-activity relationships became increasingly challenging. Such 3D SVM methods has thus far only been little investigated in virtual screening, proving a wealth of opportunities for further analyses.


Asunto(s)
Química Computacional/métodos , Relación Estructura-Actividad , Máquina de Vectores de Soporte , Interfaz Usuario-Computador , Aprendizaje Automático , Conformación Molecular , Unión Proteica/genética
13.
J Comput Aided Mol Des ; 32(7): 759-767, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29968097

RESUMEN

Shape similarity searching is a popular approach for ligand-based virtual screening on the basis of three-dimensional reference compounds. It is generally thought that well-defined experimentally determined binding modes of active reference compounds provide the best possible basis for shape searching. Herein, we show that experimental binding modes are not essential for successful shape similarity searching. Furthermore, we show that ensembles of analogs of X-ray ligands-in the absence of these ligands-further improve the search performance of single crystallographic reference compounds. This is even the case if ensembles of virtually generated analogs are used whose activity status is unknown. Taken together, the results of our study indicate that analog ensembles representing fuzzy reference states are effective starting points for shape similarity searching.


Asunto(s)
Modelos Moleculares , Compuestos Orgánicos/química , Proteínas/química , Algoritmos , Sitios de Unión , Cristalización , Cristalografía por Rayos X , Ligandos , Estructura Molecular , Unión Proteica , Relación Estructura-Actividad
14.
J Chem Inf Model ; 56(2): 286-99, 2016 Feb 22.
Artículo en Inglés | MEDLINE | ID: mdl-26818135

RESUMEN

Retrieving descriptor information (x information) from a value of an objective variable (y) is a fundamental problem in inverse quantitative structure-property relationship (inverse-QSPR) analysis but challenging because of the complexity of the preimage function. Herewith, we propose using a cluster-wise multiple linear regression (cMLR) model as a QSPR model for inverse-QSPR analysis. x information is acquired as a probability density function by combining cMLR and the prior distribution modeled with a mixture of Gaussians (GMMs). Three case studies were conducted to demonstrate various aspects of the potential of cMLR. It was found that the predictive power of cMLR was superior to that of MLR, especially for data with nonlinearity. Moreover, it turned out that the applicability domain could be considered since the posterior distribution inherits the prior distribution's feature (i.e., training data feature) and represents the possibility of having the desired property. Finally, a series of inverse analyses with the GMMs/cMLR was demonstrated with the aim to generate de novo structures having specific aqueous solubility.


Asunto(s)
Relación Estructura-Actividad Cuantitativa , Modelos Químicos , Estructura Molecular
15.
J Comput Aided Mol Des ; 30(5): 425-46, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-27299746

RESUMEN

Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.


Asunto(s)
Diseño de Fármacos , Preparaciones Farmacéuticas/química , Tiazolidinedionas/química , Interfaz Usuario-Computador , Algoritmos , Simulación por Computador , Humanos , Estructura Molecular , Rosiglitazona
16.
Planta Med ; 81(6): 429-35, 2015 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-25719940

RESUMEN

We present the application of the generative topographic map algorithm to visualize the chemical space populated by natural products and synthetic drugs. Generative topographic maps may be used for nonlinear dimensionality reduction and probabilistic modeling. For compound mapping, we represented the molecules by two-dimensional pharmacophore features (chemically advanced template search descriptor). The results obtained suggest a close resemblance of synthetic drugs with natural products in terms of their pharmacophore features, despite pronounced differences in chemical structure. Generative topographic map-based cluster analysis revealed both known and new potential activities of natural products and drug-like compounds. We conclude that the generative topographic map method is suitable for inferring functional similarities between these two classes of compounds and predicting macromolecular targets of natural products.


Asunto(s)
Productos Biológicos/química , Análisis por Conglomerados , Análisis de Componente Principal , Probabilidad
17.
ACS Omega ; 9(8): 9463-9474, 2024 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-38434845

RESUMEN

In the pursuit of optimal quantitative structure-activity relationship (QSAR) models, two key factors are paramount: the robustness of predictive ability and the interpretability of the model. Symbolic regression (SR) searches for the mathematical expressions that explain a training data set. Thus, the models provided by SR are globally interpretable. We previously proposed an SR method that can generate interpretable expressions by humans. This study introduces an enhanced symbolic regression method, termed filter-induced genetic programming 2 (FIGP2), as an extension of our previously proposed SR method. FIGP2 is designed to improve the generalizability of SR models and to be applicable to data sets in which cost-intensive descriptors are employed. The FIGP2 method incorporates two major improvements: a modified domain filter to eradicate diverging expressions based on optimal calculation and the introduction of a stability metric to penalize expressions that would lead to overfitting. Our retrospective comparative analysis using 12 structure-activity relationship data sets revealed that FIGP2 surpassed the previously proposed SR method and conventional modeling methods, such as support vector regression and multivariate linear regression in terms of predictive performance. Generated mathematical expressions by FIGP2 were relatively simple and not divergent in the domain of function. Taken together, FIGP2 can be used for making interpretable regression models with predictive ability.

18.
ACS Omega ; 9(39): 40907-40919, 2024 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-39372005

RESUMEN

The chemical reaction yield is an important factor to determine the reaction conditions. Recently, many data-driven models for yield prediction using high-throughput experimentation datasets have been reported. In this study, we propose a neural network architecture based on the chemical graphs of the reaction components to predict the reaction yield. The proposed model is the sequential combination of a message-passing neural network and a transformer encoder (MPNN-Transformer). The reaction components are converted to molecular matrices by the first network, followed by the interplay of the reaction components in the second network after adding the embeddings of the compound roles in the chemical reaction. The predictive ability of the proposed models was compared with state-of-the-art yield prediction models using two high-throughput experimental datasets: the Buchwald-Hartwig cross-coupling (BHC) and Suzuki-Miyaura cross-coupling (SMC) reaction datasets. Overall, the MPNN-Transformer models showed high prediction accuracy for the BHC reaction datasets and some of the extrapolation-oriented SMC reaction datasets. These models also performed well when the training dataset size was relatively large. Furthermore, analyzing the poorly predicted reactions for the BHC reaction dataset revealed a limitation of the data-driven yield prediction approach based on the chemical structural similarity.

19.
ACS Omega ; 9(37): 38957-38969, 2024 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-39310180

RESUMEN

Ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS), and their combinations, are frequently conducted in modern drug discovery campaigns. As a form of combination, an amalgamation of methods from ligand- and structure-based information, termed hybrid VS approaches, has been extensively investigated such as using interaction fingerprints (IFPs) in combination with machine learning (ML) models. This approach has the potential to prioritize active compounds in terms of protein-ligand binding and ligand structural characteristics, which is assumed to be difficult using either one of the approaches. Herein, we present an IFP, named the fragmented interaction fingerprint (FIFI), for hybrid VS approaches. FIFI is constructed from the extended connectivity fingerprint atom environments of a ligand proximal to the protein residues in the binding site. Each unique ligand substructure within each amino acid residue is encoded as a bit in FIFI while retaining sequence order. From the retrospective evaluation of activity prediction using a limited number and variety of active compounds for six biological targets, FIFI consistently showed higher prediction accuracy than that using previously proposed IFPs. For the same data sets, the screening performance of LBVS, SBVS sequential VS, parallel VS, and other hybrid VS approaches was investigated. Compared to these approaches, FIFI in combination with ML showed overall stable and high prediction accuracy, except for one target: the kappa opioid receptor, where the extended connectivity fingerprint combined with ML models showed better performance than other approaches by wide margins.

20.
J Cheminform ; 15(1): 4, 2023 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-36611204

RESUMEN

Activity cliffs (AC) are formed by pairs of structural analogues that are active against the same target but have a large difference in potency. While much of our knowledge about ACs has originated from the analysis and comparison of compounds and activity data, several studies have reported AC predictions over the past decade. Different from typical compound classification tasks, AC predictions must be carried out at the level of compound pairs representing ACs or nonACs. Most AC predictions reported so far have focused on individual methods or comparisons of two or three approaches and only investigated a few compound activity classes (from 2 to 10). Although promising prediction accuracy has been reported in most cases, different system set-ups, AC definitions, methods, and calculation conditions were used, precluding direct comparisons of these studies. Therefore, we have carried out a large-scale AC prediction campaign across 100 activity classes comparing machine learning methods of greatly varying complexity, ranging from pair-based nearest neighbor classifiers and decision tree or kernel methods to deep neural networks. The results of our systematic predictions revealed the level of accuracy that can be expected for AC predictions across many different compound classes. In addition, prediction accuracy did not scale with methodological complexity but was significantly influenced by memorization of compounds shared by different ACs or nonACs. In many instances, limited training data were sufficient for building accurate models using different methods and there was no detectable advantage of deep learning over simpler approaches for AC prediction. On a global scale, support vector machine models performed best, by only small margins compared to others including simple nearest neighbor classifiers.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda