RESUMO
Modern databases of small organic molecules contain tens of millions of structures. The size of theoretically available chemistry is even larger. However, despite the large amount of chemical information, the "big data" moment for chemistry has not yet provided the corresponding payoff of cheaper computer-predicted medicine or robust machine-learning models for the determination of efficacy and toxicity. Here, we present a study of the diversity of chemical datasets using a measure that is commonly used in socioeconomic studies. We demonstrate the use of this diversity measure on several datasets that were constructed to contain various congeneric subsets of molecules as well as randomly selected molecules. We also apply our method to a number of well-known databases that are frequently used for structure-activity relationship modeling. Our results show the poor diversity of the common sources of potential lead compounds compared to actual known drugs. © 2016 Wiley Periodicals, Inc.
RESUMO
Recent availability of large publicly accessible databases of chemical compounds and their biological activities (PubChem, ChEMBL) has inspired us to develop a web-based tool for structure activity relationship and quantitative structure activity relationship modeling to add to the services provided by CHARMMing (www.charmming.org). This new module implements some of the most recent advances in modern machine learning algorithms-Random Forest, Support Vector Machine, Stochastic Gradient Descent, Gradient Tree Boosting, so forth. A user can import training data from Pubchem Bioassay data collections directly from our interface or upload his or her own SD files which contain structures and activity information to create new models (either categorical or numerical). A user can then track the model generation process and run models on new data to predict activity.
Assuntos
Internet , Relação Quantitativa Estrutura-Atividade , Interface Usuário-Computador , Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Modelos Moleculares , SoftwareRESUMO
Hepatitis C virus (HCV) is a global health challenge, affecting approximately 200 million people worldwide. In this study we developed SAR models with advanced machine learning classifiers Random Forest and k Nearest Neighbor Simulated Annealing for 679 small molecules with measured inhibition activity for NS5B genotype 1b. The activity was expressed as a binary value (active/inactive), where actives were considered molecules with IC50 ≤0.95 µM. We applied our SAR models to various drug-like databases and identified novel chemical scaffolds for NS5B inhibitors. Subsequent in vitro antiviral assays suggested a new activity for an existing prodrug, Candesartan cilexetil, which is currently used to treat hypertension and heart failure but has not been previously tested for anti-HCV activity. We also identified NS5B inhibitors with two novel non-nucleoside chemical motifs.
Assuntos
Anti-Hipertensivos/química , Antivirais/química , Benzimidazóis/química , Compostos de Bifenilo/química , RNA Polimerase Dependente de RNA/antagonistas & inibidores , Tetrazóis/química , Proteínas não Estruturais Virais/antagonistas & inibidores , Inteligência Artificial , Bases de Dados de Compostos Químicos , Descoberta de Drogas , Reposicionamento de Medicamentos , Hepacivirus/química , Hepacivirus/enzimologia , Simulação de Acoplamento Molecular , RNA Polimerase Dependente de RNA/química , Curva ROC , Relação Estrutura-Atividade , Proteínas não Estruturais Virais/químicaRESUMO
We present here a greatly updated version of an earlier study on the conformational energies of protein-ligand complexes in the Protein Data Bank (PDB) [Nicklaus et al. Bioorg. Med. Chem. 1995, 3, 411-428], with the goal of improving on all possible aspects such as number and selection of ligand instances, energy calculations performed, and additional analyses conducted. Starting from about 357,000 ligand instances deposited in the 2008 version of the Ligand Expo database of the experimental 3D coordinates of all small-molecule instances in the PDB, we created a "high-quality" subset of ligand instances by various filtering steps including application of crystallographic quality criteria and structural unambiguousness. Submission of 640 Gaussian 03 jobs yielded a set of about 415 successfully concluded runs. We used a stepwise optimization of internal degrees of freedom at the DFT level of theory with the B3LYP/6-31G(d) basis set and a single-point energy calculation at B3LYP/6-311++G(3df,2p) after each round of (partial) optimization to separate energy changes due to bond length stretches vs bond angle changes vs torsion changes. Even for the most "conservative" choice of all the possible conformational energies-the energy difference between the conformation in which all internal degrees of freedom except torsions have been optimized and the fully optimized conformer-significant energy values were found. The range of 0 to ~25 kcal/mol was populated quite evenly and independently of the crystallographic resolution. A smaller number of "outliers" of yet higher energies were seen only at resolutions above 1.3 Å. The energies showed some correlation with molecular size and flexibility but not with crystallographic quality metrics such as the Cruickshank diffraction-component precision index (DPI) and R(free)-R, or with the ligand instance-specific metrics such as occupancy-weighted B-factor (OWAB), real-space R factor (RSR), and real-space correlation coefficient (RSCC). We repeated these calculations with the solvent model IEFPCM, which yielded energy differences that were generally somewhat lower than the corresponding vacuum results but did not produce a qualitatively different picture. Torsional sampling around the crystal conformation at the molecular mechanics level using the MMFF94s force field typically led to an increase in energy.
Assuntos
Bases de Dados de Proteínas , Conformação Molecular , Teoria Quântica , Cristalografia por Raios X , Ligantes , Modelos Moleculares , Solventes/química , TermodinâmicaRESUMO
Human tyrosyl-DNA phosphodiesterase (hTdp1) inhibitors have become a major area of drug research and structure-based design since they have been shown to work synergistically with camptothecin (CPT) and selectively in cancer cells. The pharmacophore features of 14 hTdp1 inhibitors were used as a filter to screen the ChemNavigator iResearch Library of about 27 million purchasable samples. Docking of the inhibitors and hits obtained from virtual screening was performed into a structural model of hTdp1 based on a high resolution X-ray crystal structure of human Tdp1 in complex with vanadate, DNA and a human topoisomerase I (TopI)-derived peptide (PDB code: 1NOP). A total of 46 compounds matching the three-dimensional arrangement of the pharmacophoric features were assayed. Using a high-throughput screening assay, we have identified an 1H-indol-3-yl-acetic acid derivative as a potent Tdp1 inhibitor with an IC(50) value of 7.94 microM. The obtained novel chemotype may provide a new scaffold for developing inhibitors of Tdp1.
Assuntos
Inibidores de Fosfodiesterase/química , Inibidores de Fosfodiesterase/farmacologia , Diester Fosfórico Hidrolases/metabolismo , Cristalografia por Raios X , Desenho de Fármacos , Ensaios de Triagem em Larga Escala , Humanos , Concentração Inibidora 50 , Ligantes , Modelos Moleculares , Diester Fosfórico Hidrolases/química , Ligação Proteica , Relação Estrutura-AtividadeRESUMO
Human tyrosyl-DNA phosphodiesterase (hTdp1) inhibitors have become a major area of drug research and structure-based design since they have been shown to work synergistically with camptothecin (CPT) and selectively in cancer cells. The pharmacophore features of 14hTdp1 inhibitors were used as a filter to screen the ChemNavigator iResearch Library of about 27 million purchasable samples. Docking of the inhibitors and hits obtained from virtual screening was performed into a structural model of hTdp1 based on a high resolution X-ray crystal structure of human Tdp1 in complex with vanadate, DNA and a human topoisomerase I (Top1)-derived peptide (PDB code: 1NOP). We present and discuss in some detail 46 compounds matching the three-dimensional arrangement of the pharmacophoric features. The presented novel chemotypes may provide new scaffolds for developing inhibitors of Tdp1.