RESUMO
Modulation of protein-protein interactions (PPI) has emerged as a new concept in rational drug design. Here, we present a computational protocol for identifying potential PPI inhibitors. Relevant regions of interfaces (epitopes) are predicted for three-dimensional protein models and serve as queries for virtual compound screening. We present a computational screening protocol that incorporates two different pharmacophore models. One model is based on the mathematical concept of autocorrelation vectors and the other utilizes fuzzy labeled graphs. In a proof-of-concept study, we were able to identify serine protease inhibitors using a predicted trypsin epitope as query. Our virtual screening framework may be suited for rapid identification of PPI inhibitors and suggesting bioactive tool compounds.
Assuntos
Epitopos/química , Mimetismo Molecular , Proteínas/química , Modelos Moleculares , SoftwareRESUMO
The text-based similarity searching method Pharmacophore Alignment Search Tool is grounded on pairwise comparisons of potential pharmacophoric points between a query and screening compounds. The underlying scoring matrix is of critical importance for successful virtual screening and hit retrieval from large compound libraries. Here, we compare three conceptually different computational methods for systematic deduction of scoring matrices: assignment-based, alignment-based, and stochastic optimization. All three methods resulted in optimized pharmacophore scoring matrices with significantly superior retrospective performance in comparison with simplistic scoring schemes. Computer-generated similarity matrices of pharmacophoric features turned out to agree well with a manually constructed matrix. We introduce the concept of position-specific scoring to text-based similarity searching so that knowledge about specific ligand-receptor binding patterns can be included and demonstrate its benefit for hit retrieval. The approach was also used for automated pharmacophore elucidation in agonists of peroxisome proliferator activated receptor gamma, successfully identifying key interactions for receptor activation.
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Armazenamento e Recuperação da Informação/métodos , Sequência de Aminoácidos , Biologia Computacional/métodos , Bases de Dados de Proteínas , PPAR gama/agonistas , Ligação Proteica , Ferramenta de Busca/métodos , Alinhamento de SequênciaRESUMO
Previously (Hähnke et al., J Comput Chem 2010, 31, 2810) we introduced the concept of nonlinear dimensionality reduction for canonization of two-dimensional layouts of molecular graphs as foundation for text-based similarity searching using our Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening method. Here we apply these methods to three-dimensional molecular conformations and investigate the impact of these additional degrees of freedom on virtual screening performance and assess differences in ranking behavior. Best-performing variants of PhAST are compared with 16 state-of-the-art screening methods with respect to significance estimates for differences in screening performance. We show that PhAST sorts new chemotypes on early ranks without sacrificing overall screening performance. We succeeded in combining PhAST with other virtual screening techniques by rank-based data fusion, significantly improving screening capabilities. We also present a parameterization of double dynamic programming for the problem of small molecule comparison, which allows for the calculation of structural similarity between compounds based on one-dimensional representations, opening the door to a holistic approach to molecule comparison based on textual representations.
Assuntos
Gráficos por Computador , Avaliação Pré-Clínica de Medicamentos/métodos , Bibliotecas de Moléculas Pequenas/química , Conformação MolecularRESUMO
Previously, (Hähnke et al., J Comput Chem 2009, 30, 761) we presented the Pharmacophore Alignment Search Tool (PhAST), a ligand-based virtual screening technique representing molecules as strings coding pharmacophoric features and comparing them by global pairwise sequence alignment. To guarantee unambiguity during the reduction of two-dimensional molecular graphs to one-dimensional strings, PhAST employs a graph canonization step. Here, we present the results of the comparison of 11 different algorithms for graph canonization with respect to their impact on virtual screening. Retrospective screenings of a drug-like data set were evaluated using the BEDROC metric, which yielded averaged values between 0.4 and 0.14 for the best-performing and worst-performing canonization technique. We compared five scoring schemes for the alignments and found preferred combinations of canonization algorithms and scoring functions. Finally, we introduce a performance index that helps prioritize canonization approaches without the need for extensive retrospective evaluation.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Descoberta de Drogas/métodos , Alinhamento de Sequência/métodos , Bibliotecas de Moléculas Pequenas , Algoritmos , Análise de Componente PrincipalRESUMO
Antimicrobial activity of trimethoprim/sulfamethoxazole (SXT) against Staphylococcus aureus (S. aureus) is antagonized by thymidine, which is abundant in infected or inflamed human tissue. To restore the antimicrobial activity of SXT in the presence of thymidine, we screened for small-molecule inhibitors of S. aureus thymidine kinase with non-nucleoside scaffolds. We present the successful application of an adaptive virtual screening protocol for novel antibiotics using a combination of ligand- and structure-based approaches. Two consecutive rounds of virtual screening and in vitro testing were performed that resulted in several non-nucleoside hits. The most potent compound exhibits substantial antimicrobial activity against both methicillin-resistant S. aureus strain ATCC 700699 and nonresistant strain ATCC 29213, when combined with SXT in the presence of thymidine. This study demonstrates how virtual screening can be used to guide hit finding in antibacterial screening campaigns with minimal experimental effort.
Assuntos
Ensaios de Triagem em Larga Escala/métodos , Timidina Quinase/antagonistas & inibidores , Sequência de Aminoácidos , Desenho de Fármacos , Humanos , Resistência a Meticilina , Testes de Sensibilidade Microbiana , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Molecular , Alinhamento de Sequência , Infecções Estafilocócicas/tratamento farmacológico , Staphylococcus aureus/efeitos dos fármacos , Combinação Trimetoprima e Sulfametoxazol/farmacologia , Combinação Trimetoprima e Sulfametoxazol/uso terapêuticoRESUMO
We present a ligand-based virtual screening technique (PhAST) for rapid hit and lead structure searching in large compound databases. Molecules are represented as strings encoding the distribution of pharmacophoric features on the molecular graph. In contrast to other text-based methods using SMILES strings, we introduce a new form of text representation that describes the pharmacophore of molecules. This string representation opens the opportunity for revealing functional similarity between molecules by sequence alignment techniques in analogy to homology searching in protein or nucleic acid sequence databases. We favorably compared PhAST with other current ligand-based virtual screening methods in a retrospective analysis using the BEDROC metric. In a prospective application, PhAST identified two novel inhibitors of 5-lipoxygenase product formation with minimal experimental effort. This outcome demonstrates the applicability of PhAST to drug discovery projects and provides an innovative concept of sequence-based compound screening with substantial scaffold hopping potential.
Assuntos
Descoberta de Drogas/métodos , Alinhamento de Sequência/métodos , Inibidores Enzimáticos/química , Ligantes , Inibidores de Lipoxigenase , Estudos Prospectivos , Estudos Retrospectivos , Bibliotecas de Moléculas PequenasRESUMO
BACKGROUND: PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure. RESULTS: The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form). CONCLUSIONS: Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource ( https://pubchem.ncbi.nlm.nih.gov/standardize ), and via programmatic interfaces.
RESUMO
BACKGROUND: Atom environments and fragments find wide-spread use in chemical information and cheminformatics. They are the basis of prediction models, an integral part in similarity searching, and employed in structure search techniques. Most of these methods were developed and evaluated on the relatively small sets of chemical structures available at the time. An analysis of fragment distributions representative of most known chemical structures was published in the 1970s using the Chemical Abstracts Service data system. More recently, advances in automated synthesis of chemicals allow millions of chemicals to be synthesized by a single organization. In addition, open chemical databases are readily available containing tens of millions of chemical structures from a multitude of data sources, including chemical vendors, patents, and the scientific literature, making it possible for scientists to readily access most known chemical structures. With this availability of information, one can now address interesting questions, such as: what chemical fragments are known today? How do these fragments compare to earlier studies? How unique are chemical fragments found in chemical structures? RESULTS: For our analysis, after hydrogen suppression, atoms were characterized by atomic number, formal charge, implicit hydrogen count, explicit degree (number of neighbors), valence (bond order sum), and aromaticity. Bonds were differentiated as single, double, triple or aromatic bonds. Atom environments were created in a circular manner focused on a central atom with radii from 0 (atom types) up to 3 (representative of ECFP_6 fragments). In total, combining atom types and atom environments that include up to three spheres of nearest neighbors, our investigation identified 28,462,319 unique fragments in the 46 million structures found in the PubChem Compound database as of January 2013. We could identify several factors inflating the number of environments involving transition metals, with many seemingly due to erroneous interpretation of structures from patent data. Compared to fragmentation statistics published 40 years ago, the exponential growth in chemistry is mirrored in a nearly eightfold increase in the number of unique chemical fragments; however, this result is clearly an upper bound estimate as earlier studies employed structure sampling approaches and this study shows that a relatively high rate of atom fragments are found in only a single chemical structure (singletons). In addition, the percentage of singletons grows as the size of the chemical fragment is increased. CONCLUSIONS: The observed growth of the numbers of unique fragments over time suggests that many chemically possible connections of atom types to larger fragments have yet to be explored by chemists. A dramatic drop in the relative rate of increase of atom environments from smaller to larger fragments shows that larger fragments mainly consist of diverse combinations of a limited subset of smaller fragments. This is further supported by the observed concomitant increase of singleton atom environments. Combined, these findings suggest that there is considerable opportunity for chemists to combine known fragments to novel chemical compounds. The comparison of PubChem to an older study of known chemical structures shows noticeable differences. The changes suggest advances in synthetic capabilities of chemists to combine atoms in new patterns. Log-log plots of fragment incidence show small numbers of fragments are found in many structures and that large numbers of fragments are found in very few structures, with nearly half being novel using the methods in this work. The relative decrease in the count of new fragments as a function of size further suggests considerable opportunity for more novel chemicals exists. Lastly, the differences in atom environment diversity between PubChem Substance and Compound showcase the effect of PubChem standardization protocols, but also indicate that a normalization procedure for atom types, functional groups, and tautomeric/resonance forms based on atom environments is possible. The complete sets of atom types and atom environments are supplied as supporting information.
RESUMO
BACKGROUND: Developing structure-activity relationships (SARs) of molecules is an important approach in facilitating hit exploration in the early stage of drug discovery. Although information on millions of compounds and their bioactivities is freely available to the public, it is very challenging to infer a meaningful and novel SAR from that information. RESULTS: Research discussed in the present paper employed a bioactivity-centered clustering approach to group 843,845 non-inactive compounds stored in PubChem according to both structural similarity and bioactivity similarity, with the aim of mining bioactivity data in PubChem for useful SAR information. The compounds were clustered in three bioactivity similarity contexts: (1) non-inactive in a given bioassay, (2) non-inactive against a given protein, and (3) non-inactive against proteins involved in a given pathway. In each context, these small molecules were clustered according to their two-dimensional (2-D) and three-dimensional (3-D) structural similarities. The resulting 18 million clusters, named "PubChem SAR clusters", were delivered in such a way that each cluster contains a group of small molecules similar to each other in both structure and bioactivity. CONCLUSIONS: The PubChem SAR clusters, pre-computed using publicly available bioactivity information, make it possible to quickly navigate and narrow down the compounds of interest. Each SAR cluster can be a useful resource in developing a meaningful SAR or enable one to design or expand compound libraries from the cluster. It can also help to predict the potential therapeutic effects and pharmacological actions of less-known compounds from those of well-known compounds (i.e., drugs) in the same cluster.
RESUMO
Previously, we proposed a ligand-based virtual screening technique (PhAST) based on global alignment of linearized interaction patterns. Here, we applied techniques developed for similarity assessment in local sequence alignments to our method resulting in p-values for chemical similarity. We compared two sampling strategies, a simple sampling strategy and a Markov Chain Monte Carlo (MCMC) method, and investigated the similarity of sampled distributions to Gaussian, Gumbel, modified Gumbel, and Gamma distributions. The Gumbel distribution with a Gaussian correction term was identified as the most similar to the observed empirical distributions. These techniques were applied in retrospective screenings on a drug-like dataset. Obtained p-values were adjusted to the size of the screening library with four different methods. Evaluation of E-value thresholds corroborated the Bonferroni correction as a preferred means to identify significant chemical similarity with PhAST. An online version of PhAST with significance estimation is available at http://modlab-cadd.ethz.ch/.
RESUMO
BACKGROUND: Chemical similarity searching allows the retrieval of preferred screening molecules from a compound database. Candidates are ranked according to their similarity to a reference compound (query). Assessing the statistical significance of chemical similarity scores helps prioritizing significant hits, and identifying cases where the database does not contain any promising compounds. METHOD: Our text-based similarity measure, Pharmacophore Alignment Search Tool (PhAST), employs pair-wise sequence alignment. We adapted the concept of E-values as significance estimates and employed a sampling technique that incorporates the principle of importance sampling in a Markov chain Monte Carlo simulation to generate distributions of random alignment scores. These distributions were used to compute significance estimates for similarity scores in a preliminary prospective virtual screen for inhibitors of Aurora A kinase. CONCLUSION: Assessing the significance of compound similarity computed with PhAST allows for a statistically motivated identification of candidate screening compounds. Inhibitors of Aurora A kinase were retrieved from a large compound library.
Assuntos
Inibidores de Proteínas Quinases/química , Proteínas Serina-Treonina Quinases/antagonistas & inibidores , Aurora Quinases , Técnicas de Química Combinatória , Bases de Dados de Compostos Químicos , Humanos , Método de Monte Carlo , Piperazinas/química , Proteínas Serina-Treonina Quinases/metabolismo , SoftwareRESUMO
We present an integrated approach to identify and optimize a novel class of γ-secretase modulators (GSMs) with a unique pharmacological profile. Our strategy included (i) virtual screening through application of a recently developed protocol (PhAST), (ii) synthetic chemistry to discover structure-activity relationships, and (iii) detailed in vitro pharmacological characterization. GSMs are promising agents for treatment or prevention of Alzheimer's disease. They modulate the γ-secretase product spectrum (i.e., amyloid-ß (Aß) peptides of different length) and induce a shift from toxic Aß42 to shorter Aß species such as Aß38 with no or minimal effect on the overall rate of γ-secretase cleavage. We describe the identification of a series of 4-hydroxypyridin-2-one derivatives, which display a novel type of γ-secretase modulation with equipotent inhibition of Aß42 and Aß38 peptide species.
Assuntos
Doença de Alzheimer/enzimologia , Secretases da Proteína Precursora do Amiloide/metabolismo , Peptídeos beta-Amiloides/antagonistas & inibidores , Piridinas/química , Piridinas/farmacologia , Doença de Alzheimer/tratamento farmacológico , Sequência de Aminoácidos , Secretases da Proteína Precursora do Amiloide/química , Peptídeos beta-Amiloides/química , Peptídeos beta-Amiloides/metabolismo , Animais , Células CHO , Cricetinae , Desenho de Fármacos , Humanos , Dados de Sequência Molecular , Piridonas , Relação Estrutura-AtividadeRESUMO
BACKGROUND: De novo design of drug-like compounds with a desired pharmacological activity profile has become feasible through innovative computer algorithms. Fragment-based design and simulated chemical reactions allow for the rapid generation of candidate compounds as blueprints for organic synthesis. METHODS: We used a combination of complementary virtual-screening tools for the analysis of de novo designed compounds that were generated with the aim to inhibit inactive polo-like kinase 1 (Plk1), a target for the development of cancer therapeutics. A homology model of the inactive state of Plk1 was constructed and the nucleotide binding pocket conformations in the DFG-in and DFG-out state were compared. The de novo-designed compounds were analyzed using pharmacophore matching, structure-activity landscape analysis, and automated ligand docking. One compound was synthesized and tested in vitro. RESULTS: The majority of the designed compounds possess a generic architecture present in known kinase inhibitors. Predictions favor kinases as targets of these compounds but also suggest potential off-target effects. Several bioisosteric replacements were suggested, and de novo designed compounds were assessed by automated docking for potential binding preference toward the inactive (type II inhibitors) over the active conformation (type I inhibitors) of the kinase ATP binding site. One selected compound was successfully synthesized as suggested by the software. The de novo-designed compound exhibited inhibitory activity against inactive Plk1 in vitro, but did not show significant inhibition of active Plk1 and 38 other kinases tested. CONCLUSIONS: Computer-based de novo design of screening candidates in combination with ligand- and receptor-based virtual screening generates motivated suggestions for focused library design in hit and lead discovery. Attractive, synthetically accessible compounds can be obtained together with predicted on- and off-target profiles and desired activities.