Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
Chem Biol Drug Des ; 103(1): e14375, 2024 01.
Artículo en Inglés | MEDLINE | ID: mdl-37849030

RESUMEN

The epidermal growth factor receptor (EGFR) tyrosine kinase plays an important role in tumor formation and growth by mediating cell growth and other physiological processes. Therefore, EGFR is a promising target for the treatment of cancer. In this work, we combined ligand-based and structure-based virtual screening methods to identify novel EGFR inhibitors from a library of more than 103 thousand compounds. We first obtained hundreds of compounds with similar physiochemical properties through 3D molecular shape and electrostatic similarity screening with potent inhibitors AEE788 and Afatinib as queries. Next, we identified compounds with strong binding affinities to the EGFR pocket through molecular docking, which makes good use of the structure information of the receptor. After molecular scaffold analysis, our bioassay confirmed 13 compounds with EGFR inhibitory activity and three compounds had IC50 values below 1000 nM. In addition, we collected 5371 EGFR inhibitors from online databases, and clustered them into 7 groups by K-means method using their ECFP4 fingerprints as input. Each cluster had typical molecular fragments and corresponding activity characteristics, which could guide the design of EGFR inhibitors, and we concluded that the fragments from some of the hits are indicated in the highly active scaffolds.


Asunto(s)
Antineoplásicos , Neoplasias , Humanos , Simulación del Acoplamiento Molecular , Inhibidores de Proteínas Quinasas/química , Ligandos , Receptores ErbB/metabolismo , Afatinib/uso terapéutico , Neoplasias/tratamiento farmacológico , Antineoplásicos/farmacología
2.
Molecules ; 28(19)2023 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-37836625

RESUMEN

Cyclooxygenase-2 (COX-2) and microsomal prostaglandin E2 synthase (mPGES-1) are two key targets in anti-inflammatory therapy. Medicine and food homology (MFH) substances have both edible and medicinal properties, providing a valuable resource for the development of novel, safe, and efficient COX-2 and mPGES-1 inhibitors. In this study, we collected active ingredients from 503 MFH substances and constructed the first comprehensive MFH database containing 27,319 molecules. Subsequently, we performed Murcko scaffold analysis and K-means clustering to deeply analyze the composition of the constructed database and evaluate its structural diversity. Furthermore, we employed four supervised machine learning algorithms, including support vector machine (SVM), random forest (RF), deep neural networks (DNNs), and eXtreme Gradient Boosting (XGBoost), as well as ensemble learning, to establish 640 classification models and 160 regression models for COX-2 and mPGES-1 inhibitors. Among them, ModelA_ensemble_RF_1 emerged as the optimal classification model for COX-2 inhibitors, achieving predicted Matthews correlation coefficient (MCC) values of 0.802 and 0.603 on the test set and external validation set, respectively. ModelC_RDKIT_SVM_2 was identified as the best regression model based on COX-2 inhibitors, with root mean squared error (RMSE) values of 0.419 and 0.513 on the test set and external validation set, respectively. ModelD_ECFP_SVM_4 stood out as the top classification model for mPGES-1 inhibitors, attaining MCC values of 0.832 and 0.584 on the test set and external validation set, respectively. The optimal regression model for mPGES-1 inhibitors, ModelF_3D_SVM_1, exhibited predictive RMSE values of 0.253 and 0.35 on the test set and external validation set, respectively. Finally, we proposed a ligand-based cascade virtual screening strategy, which integrated the well-performing supervised machine learning models with unsupervised learning: the self-organized map (SOM) and molecular scaffold analysis. Using this virtual screening workflow, we discovered 10 potential COX-2 inhibitors and 15 potential mPGES-1 inhibitors from the MFH database. We further verified candidates by molecular docking, investigated the interaction of the candidate molecules upon binding to COX-2 or mPGES-1. The constructed comprehensive MFH database has laid a solid foundation for the further research and utilization of the MFH substances. The series of well-performing machine learning models can be employed to predict the COX-2 and mPGES-1 inhibitory capabilities of unknown compounds, thereby aiding in the discovery of anti-inflammatory medications. The COX-2 and mPGES-1 potential inhibitor molecules identified through the cascade virtual screening approach provide insights and references for the design of highly effective and safe novel anti-inflammatory drugs.


Asunto(s)
Antiinflamatorios , Inhibidores de la Ciclooxigenasa 2 , Inhibidores de la Ciclooxigenasa 2/farmacología , Ciclooxigenasa 2 , Simulación del Acoplamiento Molecular , Algoritmos , Aprendizaje Automático , Redes y Vías Metabólicas
3.
Mol Divers ; 2023 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-37479824

RESUMEN

In this study, we built classification models using machine learning techniques to predict the bioactivity of non-covalent inhibitors of Bruton's tyrosine kinase (BTK) and to provide interpretable and transparent explanations for these predictions. To achieve this, we gathered data on BTK inhibitors from the Reaxys and ChEMBL databases, removing compounds with covalent bonds and duplicates to obtain a dataset of 3895 inhibitors of non-covalent. These inhibitors were characterized using MACCS fingerprints and Morgan fingerprints, and four traditional machine learning algorithms (decision trees (DT), random forests (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost)) were used to build 16 classification models. In addition, four deep learning models were developed using deep neural networks (DNN). The best model, Model D_4, which was built using XGBoost and MACCS fingerprints, achieved an accuracy of 94.1% and a Matthews correlation coefficient (MCC) of 0.75 on the test set. To provide interpretable explanations, we employed the SHAP method to decompose the predicted values into the contributions of each feature. We also used K-means dimensionality reduction and hierarchical clustering to visualize the clustering effects of molecular structures of the inhibitors. The results of this study were validated using crystal structures, and we found that the interaction between the BTK amino acid residue and the important features of clustered scaffold was consistent with the known properties of the complex crystal structures. Overall, our models demonstrated high predictive ability and a qualitative model can be converted to a quantitative model to some extent by SHAP, making them valuable for guiding the design of new BTK inhibitors with desired activity.

4.
Mol Divers ; 2023 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-37142889

RESUMEN

FMS-like tyrosine kinase 3 (FLT3) is a type III receptor tyrosine kinase, which is an important target for anti-cancer therapy. In this work, we conducted a structure-activity relationship (SAR) study on 3867 FLT3 inhibitors we collected. MACCS fingerprints, ECFP4 fingerprints, and TT fingerprints were used to represent the inhibitors in the dataset. A total of 36 classification models were built based on support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN) algorithms. Model 3D_3 built by deep neural networks (DNN) and TT fingerprints performed best on the test set with the highest prediction accuracy of 85.83% and Matthews correlation coefficient (MCC) of 0.72 and also performed well on the external test set. In addition, we clustered 3867 inhibitors into 11 subsets by the K-Means algorithm to figure out the structural characteristics of the reported FLT3 inhibitors. Finally, we analyzed the SAR of FLT3 inhibitors by RF algorithm based on ECFP4 fingerprints. The results showed that 2-aminopyrimidine, 1-ethylpiperidine,2,4-bis(methylamino)pyrimidine, amino-aromatic heterocycle, [(2E)-but-2-enyl]dimethylamine, but-2-enyl, and alkynyl were typical fragments among highly active inhibitors. Besides, three scaffolds in Subset_A (Subset 4), Subset_B, and Subset_C showed a significant relationship to inhibition activity targeting FLT3.

5.
Chem Res Toxicol ; 36(4): 617-629, 2023 04 17.
Artículo en Inglés | MEDLINE | ID: mdl-37017429

RESUMEN

Persistent contaminants from different industries have already caused significant risks to the environment and public health. In this study, a data set containing 1306 not readily biodegradable (NRB) and 622 readily biodegradable (RB) chemicals was collected and characterized by CORINA descriptors, MACCS fingerprints, and ECFP_4 fingerprints. We utilized decision tree (DT), support vector machine (SVM), random forest (RF), and deep neural network (DNN) to construct 34 classification models that could predict the biodegradability of compounds. The best model (model 5F) built using a Transformer-CNN algorithm had a balanced accuracy of 86.29% and a Matthews correlation coefficient of 0.71 on the test set. By analyzing the top 10 CORINA descriptors used for modeling, the properties containing solubility, π/σ atom charges, rotatable bonds number, lone pair/π/σ atom electronegativities, molecular weight, and number of nitrogen atom based hydrogen bonding acceptors were determined to be critical for biodegradability. The substructure investigations confirmed earlier studies that the presence of aromatic rings and nitrogen or halogen substitutions in a molecule will hinder the biodegradation of the compound, while the ester groups and carboxyl groups promote biodegradability. We also identified the representative fragments affecting biodegradability by analyzing the frequency differences of substructural fragments between the NRB and RB compounds. The results of the study can provide excellent guidance for the discovery and design of compounds with good chemical biodegradability.


Asunto(s)
Algoritmos , Aprendizaje Automático , Relación Estructura-Actividad , Redes Neurales de la Computación , Máquina de Vectores de Soporte
6.
Chem Biol Drug Des ; 101(6): 1307-1321, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36752697

RESUMEN

There is a strong interest in the development of microsomal prostaglandin E2 synthase-1 (mPGES-1) inhibitors of their potential to safely and effectively treat inflammation. Herein, 70 QSAR models were built on the dataset (735 mPGES-1 inhibitors) characterized with RDKit descriptors by multiple linear regression (MLR), support vector machine (SVM), random forest (RF), deep neural networks (DNN), and eXtreme Gradient Boosting (XGBoost). The other three regression models on the dataset are represented by SMILES using self-attention recurrent neural networks (RNN) and Graph Convolutional Networks (GCN). For the best model (Model C2), which was developed by SVM with RDKit descriptors, the coefficient of determination (R2 ) of 0.861 and root mean squared error (RMSE) of 0.235 were achieved for the test set. Additionally, R2 of 0.692 and RMSE of 0.383 were obtained on the external test set. We investigated the applicability domain (AD) of Model C2 with the rivality index (RI), the prediction of Model C2 on 78.92% of molecules in the test set, and 78.33% of molecules in the external test set were reliable. After dissecting the RDKit descriptors of Model C2, we found important physicochemical properties of highly active mPGES-1 inhibitors. Besides, by analyzing the attention weight of each atom of each inhibitor from the attention layer, we found that the benzamide group and the trifluoromethyl cyclohexane group are favorable substructures for mPGES-1 inhibitors.


Asunto(s)
Algoritmos , Relación Estructura-Actividad Cuantitativa , Prostaglandina-E Sintasas , Aprendizaje Automático , Máquina de Vectores de Soporte , Prostaglandinas
7.
Mol Divers ; 27(3): 1037-1051, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35737257

RESUMEN

Histone deacetylase (HDAC) 1, a member of the histone deacetylases family, plays a pivotal role in various tumors. In this study, we collected 7313 human HDAC1 inhibitors with bioactivities to form a dataset. Then, the dataset was divided into a training set and a test set using two splitting methods: (1) Kohonen's self-organizing map and (2) random splitting. The molecular structures were represented by MACCS fingerprints, RDKit fingerprints, topological torsions fingerprints and ECFP4 fingerprints. A total of 80 classification models were built by using five machine learning methods, including decision tree (DT), random forest, support vector machine, eXtreme Gradient Boosting and deep neural network. Model 15A_2 built by the XGBoost algorithm based on ECFP4 fingerprints showed the best performance, with an accuracy of 88.08% and an MCC value of 0.76 on the test set. Finally, we clustered the 7313 HDAC1 inhibitors into 31 subsets, and the substructural features in each subset were investigated. Moreover, using DT algorithm we analyzed the structure-activity relationship of HDAC1 inhibitors. It may conclude that some substructures have a significant effect on high activity, such as N-(2-amino-phenyl)-benzamide, benzimidazole, AR-42 analogues, hydroxamic acid with a middle chain alkyl and 4-aryl imidazole with a midchain of alkyl whose α carbon is chiral.


Asunto(s)
Algoritmos , Aprendizaje Automático , Humanos , Relación Estructura-Actividad , Estructura Molecular , Máquina de Vectores de Soporte , Histona Desacetilasa 1
8.
China CDC Wkly ; 5(52): 1167-1173, 2023 Dec 29.
Artículo en Inglés | MEDLINE | ID: mdl-38164467

RESUMEN

What is already known about this topic?: Campylobacter is a significant foodborne pathogen that leads to global outbreaks of acute gastroenteritis (AGE) usually affecting less than 30 individuals. Human sapovirus (HuSaV) is an enteric virus responsible for sporadic cases and outbreaks of AGE worldwide. In a study conducted in Beijing, HuSaV detection ranked second after norovirus. What is added by this report?: We present a discussion of the first large-scale outbreak of AGE caused by both Campylobacter coli (C. coli) and HuSaV. The outbreak involved a total of 996 patients and exhibited two distinct peaks over a period of 17 days. Through case-control studies, we identified exposure to raw water from a secondary water supply system as a significant risk factor. Among 83 patients, 49 samples tested positive for C. coli, 39 samples tested positive for HuSaV, and 27 samples tested positive for both pathogens using real-time polymerase chain reaction detection. Furthermore, whole-genome sequencing of 17 C. coli isolates obtained from 17 patients revealed that all isolates belonged to a highly clonal strain of C. coli. What are the implications for public health practice?: Outbreaks of AGE resulting from multiple pathogen infections warrant increased attention. This report emphasizes the significance of ensuring the safety of drinking water, particularly in secondary supply systems.

9.
J Cheminform ; 14(1): 52, 2022 Aug 04.
Artículo en Inglés | MEDLINE | ID: mdl-35927691

RESUMEN

Recently, graph neural networks (GNNs) have revolutionized the field of chemical property prediction and achieved state-of-the-art results on benchmark data sets. Compared with the traditional descriptor- and fingerprint-based QSAR models, GNNs can learn task related representations, which completely gets rid of the rules defined by experts. However, due to the lack of useful prior knowledge, the prediction performance and interpretability of the GNNs may be affected. In this study, we introduced a new GNN model called RG-MPNN for chemical property prediction that integrated pharmacophore information hierarchically into message-passing neural network (MPNN) architecture, specifically, in the way of pharmacophore-based reduced-graph (RG) pooling. RG-MPNN absorbed not only the information of atoms and bonds from the atom-level message-passing phase, but also the information of pharmacophores from the RG-level message-passing phase. Our experimental results on eleven benchmark and ten kinase data sets showed that our model consistently matched or outperformed other existing GNN models. Furthermore, we demonstrated that applying pharmacophore-based RG pooling to MPNN architecture can generally help GNN models improve the predictive power. The cluster analysis of RG-MPNN representations and the importance analysis of pharmacophore nodes will help chemists gain insights for hit discovery and lead optimization.

10.
Mol Divers ; 26(3): 1715-1730, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34636023

RESUMEN

Epidermal growth factor receptor (EGFR) has received widespread attention because it is an important target for anticancer drug design. Mutations in the EGFR, especially the T790M/L858R double mutation, have made cancer treatment more difficult. We herein built the structure-activity relationship models of small-molecule inhibitors on wild-type and T790M/L858R double-mutant EGFR with a whole dataset of 379 compounds. For 2D classification models, we used ECFP4 fingerprints to build support vector machine and random forest models and used SMILES to build self-attention recurrent neural network models. Each of all six models resulted in an accuracy of above 0.87 and the Matthews correlation coefficient value of above 0.76 on the test set, respectively. We concluded that inhibitors containing anilinoquinoline and methoxy or fluoro phenyl are highly active against wild EGFR. Substructures such as anilinopyrimidine, acrylamide, amino phenyl, methoxy phenyl, and thienopyrimidinyl amide appeared more in highly active inhibitors against double-mutant EGFR. We also used self-organizing map to cluster the inhibitors into six subsets based on ECFP4 fingerprints and analyzed the activity characteristics of different scaffolds in each subset. Among them, three datasets, which are based on pteridin, anilinopyrimidine, and anilinoquinoline scaffold, were selected to build 3D comparative molecular similarity analysis models individually. Models with the leave-one-out coefficient of determination (q2) above 0.65 were selected, and five descriptor types (steric, electrostatic, hydrophobic, donor, and acceptor) were used to study the effects of side chains of inhibitors on the activity against wild-type and mutant-type EGFR.


Asunto(s)
Receptores ErbB , Neoplasias Pulmonares , Línea Celular Tumoral , Diseño de Fármacos , Receptores ErbB/genética , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Mutación , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Relación Estructura-Actividad
11.
J Chem Inf Model ; 62(21): 5149-5164, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-34931847

RESUMEN

The epidermal growth factor receptor (EGFR) signaling pathway plays an important role in cell growth, proliferation, differentiation, and other physiological processes, which makes the EGFR a promising target for anticancer therapies. The discovery of novel EGFR inhibitors may provide a solution to the problem of drug resistance. In this work, we performed a ligand-based virtual screening (LBVS) protocol for finding novel EGFR inhibitors from a 5.3 million compound library. First, the 3D shape-based similarity was used to obtain structurally novel EGFR inhibitors. In this study, we tried three queries; two were crystal structures and one was generated from deep generative models of graphs (DGMG). Next, we have built four structure-activity relationship (SAR) models and three quantitative structure-activity relationship (QSAR) models based on an SVM method for further screening of highly active EGFR inhibitors. Experimental validations led to the identification of nine hits out of 18 tested compounds. Among them, hit 1, hit 5, and hit 6 had IC50 values around 80 nM against EGFR whose interactions with EGFR were further investigated by molecular dynamics simulations.


Asunto(s)
Inhibidores de Proteínas Quinasas , Relación Estructura-Actividad Cuantitativa , Inhibidores de Proteínas Quinasas/química , Receptores ErbB/química , Ligandos , Proliferación Celular , Simulación del Acoplamiento Molecular
12.
Front Pediatr ; 9: 695610, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34249820

RESUMEN

Background: Pulmonary hypertension is one of the most common co-morbidities in infants with bronchopulmonary dysplasia (BPD), but its risk factors are unclear. The onset of pulmonary hypertension in BPD has been associated with poor morbidity- and mortality-related outcomes in infants. Two review and meta-analysis studies have evaluated the risk factors and outcomes associated with pulmonary hypertension in infants with BPD. However, the limitations in those studies and the publication of recent cohort studies warrant our up-to-date study. We designed a systematic review and meta-analysis to evaluate the risk factors and outcomes of pulmonary hypertension in infants with BPD. Objective: To systematically evaluate the risk factors and outcomes associated with pulmonary hypertension in infants with BPD. Methods: We systematically searched the academic literature according to the PRISMA guidelines across five databases (Web of Science, EMBASE, CENTRAL, Scopus, and MEDLINE). We conducted random-effects meta-analyses to evaluate the pulmonary hypertension risk factors in infants with BPD. We also evaluated the overall morbidity- and mortality-related outcomes in infants with BPD and pulmonary hypertension. Results: We found 15 eligible studies (from the initial 963 of the search result) representing data from 2,156 infants with BPD (mean age, 25.8 ± 0.71 weeks). The overall methodological quality of the included studies was high. Our meta-analysis in infants with severe BPD revealed increased risks of pulmonary hypertension [Odds ratio (OR) 11.2], sepsis (OR, 2.05), pre-eclampsia (OR, 1.62), and oligohydramnios (OR, 1.38) of being small for gestational age (3.31). Moreover, a comparative analysis found medium-to-large effects of pulmonary hypertension on the total duration of hospital stay (Hedge's g, 0.50), the total duration of oxygen received (g, 0.93), the cognitive score (g, -1.5), and the overall mortality (g, 0.83) in infants with BPD. Conclusion: We identified several possible risk factors (i.e., severe BPD, sepsis, small for gestational age, pre-eclampsia) which promoted the onset of pulmonary hypertension in infants with BPD. Moreover, our review sheds light on the morbidity- and mortality-related outcomes associated with pulmonary hypertension in these infants. Our present findings are in line with the existing literature. The findings from this research will be useful in development of efficient risk-based screening system that determine the outcomes associated with pulmonary hypertension in infants with BPD.

13.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34151363

RESUMEN

Three-dimensional (3D) molecular similarity, one major ligand-based virtual screening (VS) method, has been widely used in the drug discovery process. A variety of 3D molecular similarity tools have been developed in recent decades. In this study, we assessed a panel of 15 3D molecular similarity programs against the DUD-E and LIT-PCBA datasets, including commercial ROCS and Phase, in terms of screening power and scaffold-hopping power. The results revealed that (1) SHAFTS, LS-align, Phase Shape_Pharm and LIGSIFT showed the best VS capability in terms of screening power. Some 3D similarity tools available to academia can yield relatively better VS performance than commercial ROCS and Phase software. (2) Current 3D similarity VS tools exhibit a considerable ability to capture actives with new chemotypes in terms of scaffold hopping. (3) Multiple conformers relative to single conformations will generally improve VS performance for most 3D similarity tools, with marginal improvement observed in area under the receiving operator characteristic curve values, enrichment factor in the top 1% and hit rate in the top 1% values showed larger improvement. Moreover, redundancy and complementarity analyses of hit lists from different query seeds and different 3D similarity VS tools showed that the combination of different query seeds and/or different 3D similarity tools in VS campaigns retrieved more (and more diverse) active molecules. These findings provide useful information for guiding choices of the optimal 3D molecular similarity tools for VS practices and designing possible combination strategies to discover more diverse active compounds.


Asunto(s)
Descubrimiento de Drogas/métodos , Modelos Moleculares , Conformación Molecular , Programas Informáticos , Área Bajo la Curva , Benchmarking , Bases de Datos Farmacéuticas , Diseño de Fármacos , Evaluación Preclínica de Medicamentos/métodos , Ligandos , Estructura Molecular , Curva ROC , Navegador Web
14.
Mol Divers ; 25(3): 1597-1616, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-33534023

RESUMEN

Cysteinyl leukotrienes 1 (CysLT1) receptor is a promising drug target for rhinitis or other allergic diseases. In our study, we built classification models to predict bioactivities of CysLT1 receptor antagonists. We built a dataset with 503 CysLT1 receptor antagonists which were divided into two groups: highly active molecules (IC50 < 1000 nM) and weakly active molecules (IC50 ≥ 1000 nM). The molecules were characterized by several descriptors including CORINA descriptors, MACCS fingerprints, Morgan fingerprint and molecular SMILES. For CORINA descriptors and two types of fingerprints, we used the random forests (RF) and deep neural networks (DNN) to build models. For molecular SMILES, we used recurrent neural networks (RNN) with the self-attention to build models. The accuracies of test sets for all models reached 85%, and the accuracy of the best model (Model 2C) was 93%. In addition, we made structure-activity relationship (SAR) analyses on CysLT1 receptor antagonists, which were based on the output from the random forest models and RNN model. It was found that highly active antagonists usually contained the common substructures such as tetrazoles, indoles and quinolines. These substructures may improve the bioactivity of the CysLT1 receptor antagonists.


Asunto(s)
Algoritmos , Antagonistas de Leucotrieno/química , Aprendizaje Automático , Modelos Moleculares , Receptores de Leucotrienos/química , Sitios de Unión , Quimioinformática/métodos , Descubrimiento de Drogas , Antagonistas de Leucotrieno/farmacología , Estructura Molecular , Unión Proteica , Relación Estructura-Actividad Cuantitativa , Curva ROC , Reproducibilidad de los Resultados
15.
Chem Biol Drug Des ; 96(3): 931-947, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-33058463

RESUMEN

Inflammatory diseases can be treated by inhibiting 5-lipo-oxygenase activating protein (FLAP). In this study, a data set containing 2,112 FLAP inhibitors was collected. A total of 25 classification models were built by five machine learning algorithms with five different types of fingerprints. The best model, which was built by support vector machine algorithm with ECFP_4 fingerprint had an accuracy and a Matthews correlation coefficient of 0.862 and 0.722 on the test set, respectively. The predicted results were further evaluated by the application domain dSTD-PRO (a distance between one compound to models). Each compound had a dSTD-PRO value, which was calculated by the predicted probabilities obtained from all 25 models. The application domain results suggested that the reliability of predicted results depended mainly on the compounds themselves rather than algorithms or fingerprints. A group of customized 10-bit fingerprint was manually defined for clustering the molecular structures of 2,112 FLAP inhibitors into eight subsets by K-Means. According to the clustering results, most of inhibitors in two subsets (subsets 2 and 4) were highly active inhibitors. We found that aryl oxadiazole/oxazole alkanes, biaryl amino-heteroarenes, two aromatic rings (often N-containing) linked by a cyclobutene group, and 1,2,4-triazole group were typical fragments in highly active inhibitors.


Asunto(s)
Proteínas Activadoras de la 5-Lipooxigenasa/efectos de los fármacos , Simulación por Computador , Algoritmos , Análisis por Conglomerados , Conjuntos de Datos como Asunto , Aprendizaje Automático , Estructura Molecular , Máquina de Vectores de Soporte
16.
Curr Comput Aided Drug Des ; 16(5): 654-666, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31538902

RESUMEN

BACKGROUND: HIV-1 Integrase (IN) is an important target for the development of the new anti-AIDS drugs. HIV-1 LEDGF/p75 inhibitors, which block the integrase and LEDGF/p75 interaction, have been validated for reduction in HIV-1 viral replicative capacity. METHODS: In this work, computational Quantitative Structure-Activity Relationship (QSAR) models were developed for predicting the bioactivity of HIV-1 integrase LEDGF/p75 inhibitors. We collected 190 inhibitors and their bioactivities in this study and divided the inhibitors into nine scaffolds by the method of T-distributed Stochastic Neighbor Embedding (TSNE). These 190 inhibitors were split into a training set and a test set according to the result of a Kohonen's self-organizing map (SOM) or randomly. Multiple Linear Regression (MLR) models, support vector machine (SVM) models and two consensus models were built based on the training sets by 20 selected CORINA Symphony descriptors. RESULTS: All the models showed a good prediction of pIC50. The correlation coefficients of all the models were more than 0.7 on the test set. For the training set of consensus Model C1, which performed better than other models, the correlation coefficient(r) achieved 0.909 on the training set, and 0.804 on the test set. CONCLUSION: The selected molecular descriptors show that hydrogen bond acceptor, atom charges and electronegativities (especially π atom) were important in predicting the activity of HIV-1 integrase LEDGF/p75-IN inhibitors.


Asunto(s)
Fármacos Anti-VIH/química , Descubrimiento de Drogas/métodos , Inhibidores de Integrasa VIH/química , VIH-1/efectos de los fármacos , Diseño de Fármacos , Humanos , Modelos Moleculares , Estructura Molecular , Relación Estructura-Actividad Cuantitativa , Relación Estructura-Actividad
17.
J Chem Inf Model ; 59(5): 1988-2008, 2019 05 28.
Artículo en Inglés | MEDLINE | ID: mdl-30762371

RESUMEN

This work reports the classification study conducted on the biggest COX-2 inhibitor data set so far. Using 2925 diverse COX-2 inhibitors collected from 168 pieces of literature, we applied machine learning methods, support vector machine (SVM) and random forest (RF), to develop 12 classification models. The best SVM and RF models resulted in MCC values of 0.73 and 0.72, respectively. The 2925 COX-2 inhibitors were reduced to a data set of 1630 molecules by removing intermediately active inhibitors, and 12 new classification models were constructed, yielding MCC values above 0.72. The best MCC value of the external test set was predicted to be 0.68 by the RF model using ECFP_4 fingerprints. Moreover, the 2925 COX-2 inhibitors were clustered into eight subsets, and the structural features of each subset were investigated. We identified substructures important for activity including halogen, carboxyl, sulfonamide, and methanesulfonyl groups, as well as the aromatic nitrogen atoms. The models developed in this study could serve as useful tools for compound screening prior to lab tests.


Asunto(s)
Inhibidores de la Ciclooxigenasa 2/clasificación , Máquina de Vectores de Soporte , Bases de Datos Farmacéuticas
18.
Chem Biol Drug Des ; 93(5): 666-684, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30582300

RESUMEN

GIIA secreted phospholipase A2 (GIIA sPLA2 ) is a potent target for drug discovery. To distinguish the activity level of the inhibitors of GIIA sPLA2 , we built 24 classification models by three machine learning algorithms including support vector machine (SVM), decision tree (DT), and random forest (RF) based on 452 compounds. The molecules were represented by CORINA descriptors, MACCS fingerprints, and ECFP4 fingerprints, respectively. The dataset was split into a training set containing 312 compounds and a test set containing 140 compounds by Kohonen's self-organizing map (SOM) strategy and a random strategy. A recursive feature elimination (RFE) method and an information gain (IG) method were used in the selection of molecular descriptors. Three favorable performing models were obtained. They were built by SVM algorithm with CORINA descriptors (Models 1A and 2A) and ECFP4 fingerprints (Model 10A). In the prediction of test set of Model 10A, the accuracy reached 90.71%, and the Matthews correlation coefficient (MCC) values reached 0.82. In addition, the 452 inhibitors were clustered into eight subsets by K-Means algorithm for analyzing their structural features. It was found that highly active inhibitors mainly contained indole scaffold or indolizine scaffold and four side chains.


Asunto(s)
Inhibidores Enzimáticos/química , Fosfolipasas A2 Grupo II/antagonistas & inhibidores , Aprendizaje Automático , Análisis por Conglomerados , Inhibidores Enzimáticos/metabolismo , Fosfolipasas A2 Grupo II/metabolismo , Humanos , Análisis de Componente Principal , Relación Estructura-Actividad
19.
ACS Omega ; 3(11): 15837-15849, 2018 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-30556015

RESUMEN

HIV-1 protease plays an important role in the processing of virus infection. Protease is an effective therapeutic target for the treatment of HIV-1. Our data set is based on a selection of 4855 HIV-1 protease inhibitors (PIs) from ChEMBL. A series of 15 classification models for predicting the active inhibitors were built by machine learning methods, including k-nearest neighors (K-NN), decision tree (DT), random forest (RF), support vector machine (SVM), and deep neural network (DNN). The molecular structures were characterized by (1) fingerprint descriptors including MACCS fingerprints and PubChem fingerprints and (2) physicochemical descriptors calculated by CORINA Symphony. The prediction accuracies of all of the models are more than 70% on the test set; the best accuracy of 83.07% was obtained by model 4A, which was built by the SVM method based on MACCS fingerprint descriptors. Nine consensus models were built with three kinds of different descriptors, which combined all of the machine learning methods using the "consensus prediction". Model C3a developed with MACCS fingerprint descriptors showed the highest accuracy on both training set (91.96%) and test set (83.15%). An external validation set including 35 989 compounds from DUD database and 239 active inhibitors from the recent literature was used to verify the performance of our model. The best prediction accuracy of 98.37% was obtained by model 3C, which was built by RF based on CORINA Symphony descriptors. In addition, from the analysis of molecular descriptors, it shows that the aromatic system and atoms related to hydrogen bonding provide important contributions to the bioactivity of PIs.

20.
J Chem Inf Model ; 58(1): 36-47, 2018 01 22.
Artículo en Inglés | MEDLINE | ID: mdl-29202231

RESUMEN

Aurora kinases are essential for cell mitosis, amplified, and overexpressed in various human malignancies. Therefore, Aurora kinases have been promising targets for anticancer therapies, which has prompted an intensive search for their small-molecule inhibitors. In this work, we performed a hierarchical and time-efficient virtual screening cascade for scaffold hopping, aiming to obtain structurally novel and highly potent hit compounds targeting Aurora kinases. The cascade consisted of a shape- and an electrostatic-based protocol, combined with a QSAR-based selection protocol. This virtual screening cascade was used to screen two databases, one commercial database named the J&K database containing about 5.2 million diverse molecules and the Drugbank database. Experimental validations led to the identification of one structurally novel and highly potent hit compound (hit 1, found to possess an IC50 of 8.1 and 19 nM for Aurora kinases A and B, respectively), which can be a promising starting point for further exploration. Additionally, Aurora kinases were identified as off-targets for hits 2-6 (Crizotinib, CI-1033, Dasatinib, Bosutinib, MLN-518), which are approved or investigational drugs as listed in Drugbank, plausibly suggesting targeting Aurora kinases may even contribute to their mechanism of action.


Asunto(s)
Aurora Quinasa A/antagonistas & inhibidores , Ensayos Analíticos de Alto Rendimiento/métodos , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Bases de Datos de Compuestos Químicos , Humanos , Concentración 50 Inhibidora , Ligandos , Modelos Químicos , Simulación del Acoplamiento Molecular , Estructura Molecular , Relación Estructura-Actividad Cuantitativa , Electricidad Estática , Máquina de Vectores de Soporte
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...