RESUMEN
The growing size of make-on-demand chemical libraries is posing new challenges to cheminformatics. These ultra-large chemical libraries became too large for exhaustive enumeration. Using a combinatorial approach instead, the resource requirement scales approximately with the number of synthons instead of the number of molecules. This gives access to billions or trillions of compounds as so-called chemical spaces with moderate hardware and in a reasonable time frame. While extremely performant ligand-based 2D methods exist in this context, 3D methods still largely rely on exhaustive enumeration and therefore fail to apply. Here, we present SpaceGrow: a novel shape-based 3D approach for ligand-based virtual screening of billions of compounds within hours on a single CPU. Compared to a conventional superposition tool, SpaceGrow shows comparable pose reproduction capacity based on RMSD and superior ranking performance while being orders of magnitude faster. Result assessment of two differently sized subsets of the eXplore space reveals a higher probability of finding superior results in larger spaces highlighting the potential of searching in ultra-large spaces. Furthermore, the application of SpaceGrow in a drug discovery workflow was investigated in four examples involving G protein-coupled receptors (GPCRs) with the aim to identify compounds with similar binding capabilities and molecular novelty.
Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , Ligandos , Bibliotecas de Moléculas Pequeñas/química , Descubrimiento de Drogas/métodosRESUMEN
With the ever-increasing number of synthesis-on-demand compounds for drug lead discovery, there is a great need for efficient search technologies. We present the successful application of a virtual screening method that combines two advances: (1) it avoids full library enumeration (2) products are evaluated by molecular docking, leveraging protein structural information. Crucially, these advances enable a structure-based technique that can efficiently explore libraries with billions of molecules and beyond. We apply this method to identify inhibitors of ROCK1 from almost one billion commercially available compounds. Out of 69 purchased compounds, 27 (39%) have Ki values < 10 µM. X-ray structures of two leads confirm their docked poses. This approach to docking scales roughly with the number of reagents that span a chemical space and is therefore multiple orders of magnitude faster than traditional docking.
Asunto(s)
Inhibidores de Proteínas Quinasas , Proteínas , Simulación del Acoplamiento Molecular , Ligandos , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/química , Unión ProteicaRESUMEN
Protein adaptations to extreme environmental conditions are drivers in biotechnological process optimization and essential to unravel the molecular limits of life. Most proteins with such desirable adaptations are found in extremophilic organisms inhabiting extreme environments. The deep sea is such an environment and a promising resource that poses multiple extremes on its inhabitants. Conditions like high hydrostatic pressure and high or low temperature are prevalent and many deep-sea organisms tolerate multiple of these extremes. While molecular adaptations to high temperature are comparatively good described, adaptations to other extremes like high pressure are not well-understood yet. To fully unravel the molecular mechanisms of individual adaptations it is probably necessary to disentangle multifactorial adaptations. In this study, we evaluate differences of protein structures from deep-sea organisms and their respective related proteins from nondeep-sea organisms. We created a data collection of 1281 experimental protein structures from 25 deep-sea organisms and paired them with orthologous proteins. We exhaustively evaluate differences between the protein pairs with machine learning and Shapley values to determine characteristic differences in sequence and structure. The results show a reasonable discrimination of deep-sea and nondeep-sea proteins from which we distinguish correlations previously attributed to thermal stability from other signals potentially describing adaptions to high pressure. While some distinct correlations can be observed the overall picture appears intricate.
Asunto(s)
Adaptación Fisiológica , Proteínas , Frío , Calor , Presión Hidrostática , Proteínas/metabolismoRESUMEN
Chemical libraries are commonplace in computer-aided drug discovery, and assessing their overlap/complementarity is a routine task. For this purpose, different techniques are applied, ranging from exact matching to comparing physicochemical properties. However, these techniques are applicable only if the compound sets are not too big. Particularly for chemical spaces, containing billions of compounds, alternative ways of assessment are required. Random subsets could be enumerated and compared one-to-one, but given the vast sizes of the chemical spaces assessed here, such samples can at best provide a rough estimate of any overlap. Here we describe a novel way to compare chemical spaces utilizing a panel of query compounds. We applied this technique to three different types of spaces and obtained insight into their structural overlap, their coverage of the chemical universe, and their density. As chemical feasibility of virtual compounds is particularly important, we included related in silico predictions in our assessment.
RESUMEN
We introduce SAR-by-Space, a concept to drastically accelerate structure-activity relationship (SAR) elucidation by synthesizing neighboring compounds that originate from vast chemical spaces. The space navigation is accomplished within minutes on affordable standard computer hardware using a tree-based molecule descriptor and dynamic programming. Maximizing the synthetic accessibility of the results from the computer is achieved by applying a careful selection of building blocks in combination with suitably chosen reactions; a decade of in-house quality control shows that this is a crucial part in the process. The REAL Space is the largest chemical space of commercially available compounds, counting 11 billion molecules as of today. It was used to mine actives against bromodomain 4 (BRD4). Before synthesis, compounds were docked into the binding site using a scoring function, which incorporates intrinsic desolvation terms, thus avoiding time-consuming simulations. Five micromolecular hits have been identified and verified within less than six weeks, including the measurement of IC50 values. We conclude that this procedure is a substantial time-saver, accelerating both ligand- and structure-based approaches in hit generation and lead optimization stages.
Asunto(s)
Biología Computacional/métodos , Bibliotecas de Moléculas Pequeñas/farmacología , Factores de Transcripción/química , Factores de Transcripción/metabolismo , Sitios de Unión , Bases de Datos de Compuestos Químicos , Evaluación Preclínica de Medicamentos/métodos , Ensayos Analíticos de Alto Rendimiento , Humanos , Concentración 50 Inhibidora , Simulación del Acoplamiento Molecular , Estructura Molecular , Unión Proteica , Bibliotecas de Moléculas Pequeñas/química , Relación Estructura-ActividadRESUMEN
The HYDE scoring function consistently describes hydrogen bonding, the hydrophobic effect and desolvation. It relies on HYdration and DEsolvation terms which are calibrated using octanol/water partition coefficients of small molecules. We do not use affinity data for calibration, therefore HYDE is generally applicable to all protein targets. HYDE reflects the Gibbs free energy of binding while only considering the essential interactions of protein-ligand complexes. The greatest benefit of HYDE is that it yields a very intuitive atom-based score, which can be mapped onto the ligand and protein atoms. This allows the direct visualization of the score and consequently facilitates analysis of protein-ligand complexes during the lead optimization process. In this study, we validated our new scoring function by applying it in large-scale docking experiments. We could successfully predict the correct binding mode in 93% of complexes in redocking calculations on the Astex diverse set, while our performance in virtual screening experiments using the DUD dataset showed significant enrichment values with a mean AUC of 0.77 across all protein targets with little or no structural defects. As part of these studies, we also carried out a very detailed analysis of the data that revealed interesting pitfalls, which we highlight here and which should be addressed in future benchmark datasets.
Asunto(s)
Algoritmos , Proteínas/química , Termodinámica , Agua/química , Sitios de Unión , Enlace de Hidrógeno , Interacciones Hidrofóbicas e Hidrofílicas , Ligandos , Modelos Moleculares , Unión ProteicaRESUMEN
Ligand-based approaches are particularly important in the hit identification process of drug discovery when no structural information on the target is available. Pharmacophore descriptors that use a topological representation of the ligands are usually fast enough to screen large compound libraries effectively when seeking novel lead candidates. One example of this kind is the Feature Tree descriptor, a reduced graph representation implemented in the FTrees software. In this study, we tested the screening efficiency of FTrees by both retrospective and prospective screens using known histamine H4 antagonists and serotonin transporter (SERT) inhibitors as query molecules. Our results demonstrate that FTrees can effectively find actives. Particularly when combined with a subsequent 2D fingerprint-based diversity selection, FTrees was found to be extremely effective at discovering a diverse set of scaffolds. Prospective screening of our in-house compound deck provided several novel H4 and SERT ligands that could serve as suitable starting points for further optimization.
Asunto(s)
Histamínicos/química , Receptores Acoplados a Proteínas G/química , Receptores Histamínicos/química , Inhibidores Selectivos de la Recaptación de Serotonina/química , Proteínas de Transporte de Serotonina en la Membrana Plasmática/química , Programas Informáticos , Algoritmos , Diseño Asistido por Computadora , Descubrimiento de Drogas , Histamínicos/farmacología , Humanos , Estructura Molecular , Receptores Acoplados a Proteínas G/antagonistas & inhibidores , Receptores Histamínicos H4 , Inhibidores Selectivos de la Recaptación de Serotonina/farmacología , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacologíaRESUMEN
For computational de novo design, a general retrospective validation work is a very challenging task. Here we propose a comprehensive workflow to de novo design driven by the needs of computational and medicinal chemists and, at the same time, we propose a general validation scheme for this technique. The study was conducted combining a suite of already published programs developed within the framework of the NovoBench project, which involved three different pharmaceutical companies and four groups of developers. Based on 188 PDB protein-ligand complexes with diverse functions, the study involved the ligand reconstruction by means of a fragment-based de-novo design approach. The structure-based de novo search engine FlexNovo showed in five out of eight total cases the ability to reconstruct native ligands and to rank them in four cases out of five within the first five candidates. The generated structures were ranked according to their synthetic accessibilities evaluated by the program SYLVIA. This investigation showed that the final candidate molecules have about the same synthetic complexity as the respective reference ligands. Furthermore, the plausibility of being true actives was assessed through literature searches.
Asunto(s)
Diseño Asistido por Computadora , Diseño de Fármacos , Proteínas/química , Bibliotecas de Moléculas Pequeñas/química , Algoritmos , Humanos , Ligandos , Conformación Molecular , Unión Proteica , Bibliotecas de Moléculas Pequeñas/uso terapéutico , Programas InformáticosRESUMEN
Large collections of combinatorial libraries are an integral element in today's pharmaceutical industry. It is of great interest to perform similarity searches against all virtual compounds that are synthetically accessible by any such library. Here we describe the successful application of a new software tool CoLibri on 358 combinatorial libraries based on validated reaction protocols to create a single chemistry space containing over 10 (12) possible products. Similarity searching with FTrees-FS allows the systematic exploration of this space without the need to enumerate all product structures. The search result is a set of virtual hits which are synthetically accessible by one or more of the existing reaction protocols. Grouping these virtual hits by their synthetic protocols allows the rapid design and synthesis of multiple follow-up libraries. Such library ideas support hit-to-lead design efforts for tasks like follow-up from high-throughput screening hits or scaffold hopping from one hit to another attractive series.
Asunto(s)
Técnicas Químicas Combinatorias , Química Farmacéutica , Diseño de FármacosRESUMEN
There are several methods for virtual screening of databases of small organic compounds to find tight binders to a given protein target. Recent reviews in Drug Discovery Today have concentrated on screening by docking and by pharmacophore searching. Here, we complement these reviews by focusing on virtual screening methods that are based on analyzing ligand similarity on a structural level. Specifically, we concentrate on methods that exploit structural properties of the complete ligand molecules, as opposed to using just partial structural templates, such as pharmacophores. The in silico procedure of virtual screening (VS) and its relationship to the experimental procedure, HTS, is discussed, new developments in the field are summarized and perspectives on future research are offered.
Asunto(s)
Simulación por Computador , Diseño de Fármacos , Evaluación Preclínica de Medicamentos/métodos , Técnicas Químicas Combinatorias , Bases de Datos Factuales , Modelos Moleculares , Conformación Molecular , Relación Estructura-ActividadRESUMEN
We present an integrated docking environment that allows for iterative and interactive detailed analysis of many docking solutions. All docking information is stored in an ORACLE database. New scoring schemes (e.g. target-specific scoring functions) as well as various types of filters can be easily defined and tested within this environment. As an example application we investigated the validity of the following hypothesis: If a docking procedure can lead to enrichments significantly better than random then a bias towards (partially) correct placements should be detectable. Such bias in terms of a preference for certain interacting groups within the active site can be used to select a set of receptor-based pharmacophore constraints, which in turn might be used to enhance the docking procedure. As a proof of concept for this approach we performed docking studies on three targets: thrombin, the cyclin-dependent kinase 2 (CDK2) and the angiotensin converting enzyme (ACE). We docked a set of known active compounds with standard FlexX and derived three sets of target-specific receptor-based pharmacophore constraints by statistical analysis of the predicted placements. Applying these receptor-based constraints in a virtual screening protocol utilizing FlexX-Pharm led to significantly improved enrichments.
Asunto(s)
Interpretación Estadística de Datos , Bases de Datos Factuales , Receptores de Droga/química , Calibración , Simulación por Computador , Quinasa 2 Dependiente de la Ciclina/química , Evaluación Preclínica de Medicamentos , Ligandos , Modelos Moleculares , Peptidil-Dipeptidasa A/química , Trombina/químicaRESUMEN
This paper introduces Signal, a novel method for classifying activity against a small molecule drug target. Signal creates an ensemble, or collection, of meaningful descriptors chosen from a much larger property space. The method works with a variety of descriptor types, including fingerprints that represent four-point pharmacophores or shape descriptors. It also exploits information from both active and inactive compounds and generates predictive models suitable for high throughput screening data analysis. Given the fingerprints and activity data for a set of compounds, Signal is a two step process. The first step is to Evaluate the Descriptors: for each descriptor in the fingerprint, quantify and rank the correlation between the activity of the compounds and the presence of that descriptor. The second step is to Create an Ensemble Model: use the high ranking descriptors to create a model of activity against the biological target. For the first step, two possible ranking strategies were investigated: mutual information and chi-square. For the second step, two types of ensemble models were investigated: high ranking and a novel method called high ranking set cover. Of the four possible pairings, the combination of chi-square and high ranking set cover performed the best on a Thrombin data set.
Asunto(s)
Diseño de Fármacos , Preparaciones Farmacéuticas/clasificación , Algoritmos , Inteligencia Artificial , Bases de Datos como Asunto , Hemostáticos/química , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Programas Informáticos , Terminología como Asunto , Trombina/química , Trombina/efectos de los fármacosRESUMEN
Molecules with similar shapes and features often have similar biological activity. Several computational approaches search chemical databases for new leads or templates based on overall molecular shape similarity. However, active molecules often present critical subshapes that are required for binding, which may be missed by comparing overall shape similarity. We present a new approach to compare molecular shapes of different sizes and to calculate subshape similarity. We developed a skeletal representation of the shape which is topologically unrelated to covalent chemical connectivity. This simplifies rotational and translational sampling. We test initial possible alignments by matching similar triangles. This triangle-matching filter rapidly eliminates most geometrically impossible matches. Surviving matches are filtered further in successive stages. These stages involve direction, feature, and shape matching procedures. Our approach is applied to several situations demonstrating lead discovery and evolution.
RESUMEN
We investigate the following data mining problem from computer-aided drug design: From a large collection of compounds, find those that bind to a target molecule in as few iterations of biochemical testing as possible. In each iteration a comparatively small batch of compounds is screened for binding activity toward this target. We employed the so-called "active learning paradigm" from Machine Learning for selecting the successive batches. Our main selection strategy is based on the maximum margin hyperplane-generated by "Support Vector Machines". This hyperplane separates the current set of active from the inactive compounds and has the largest possible distance from any labeled compound. We perform a thorough comparative study of various other selection strategies on data sets provided by DuPont Pharmaceuticals and show that the strategies based on the maximum margin hyperplane clearly outperform the simpler ones.
Asunto(s)
Diseño Asistido por Computadora , Diseño de FármacosRESUMEN
The shape of and the chemical features of a ligand are both critical for biological activity. This paper presents a strategy that uses these descriptors to build a computational model for virtual screening of bioactive compounds. Molecules are represented in a binary shape-feature descriptor space as bit-strings, and their relative activities are used to identify the subset of the bit-string that is most relevant to bioactivity. This subset is used to score virtual libraries. We describe the computational details of the method and present an example validation experiment on thrombin inhibitors.
Asunto(s)
Evaluación Preclínica de Medicamentos/estadística & datos numéricos , Simulación por Computador , Ligandos , Modelos Químicos , Conformación Molecular , Estructura Molecular , Interfaz Usuario-ComputadorRESUMEN
Protein structural information is combined with combinatorial library design in the following protocol. Active site maps are generated from protein structures. All possible 2-, 3- and 4-point pharmacophores are enumerated from the active site map and encoded as bit strings. The pharmacophores define a design space that can be used to select compounds using an informative library design tool. The method was evaluated against a collection of compounds assayed previously against a cyclin-dependent kinase target, CDK-2, starting with 23 X-ray co-crystal structures. Performance was assessed based on the number of active scaffolds selected after four rounds of iterative informative library design. The method selects compounds from 12 out of the 15 active scaffolds from the CDK-2 library and outperforms a two-dimensional similarity search and docking calculations.