Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
J Chem Inf Model ; 63(12): 3688-3696, 2023 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-37294674

RESUMEN

Protein kinases are a protein family that plays an important role in several complex diseases such as cancer and cardiovascular and immunological diseases. Protein kinases have conserved ATP binding sites, which when targeted can lead to similar activities of inhibitors against different kinases. This can be exploited to create multitarget drugs. On the other hand, selectivity (lack of similar activities) is desirable in order to avoid toxicity issues. There is a vast amount of protein kinase activity data in the public domain, which can be used in many different ways. Multitask machine learning models are expected to excel for these kinds of data sets because they can learn from implicit correlations between tasks (in this case activities against a variety of kinases). However, multitask modeling of sparse data poses two major challenges: (i) creating a balanced train-test split without data leakage and (ii) handling missing data. In this work, we construct a protein kinase benchmark set composed of two balanced splits without data leakage, using random and dissimilarity-driven cluster-based mechanisms, respectively. This data set can be used for benchmarking and developing protein kinase activity prediction models. Overall, the performance on the dissimilarity-driven cluster-based split is lower than on random split-based sets for all models, indicating poor generalizability of models. Nevertheless, we show that multitask deep learning models, on this very sparse data set, outperform single-task deep learning and tree-based models. Finally, we demonstrate that data imputation does not improve the performance of (multitask) models on this benchmark set.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas Quinasas , Fosforilación , Procesamiento Proteico-Postraduccional
2.
J Comput Aided Mol Des ; 35(8): 901-909, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34273053

RESUMEN

Accurate prediction of lipophilicity-logP-based on molecular structures is a well-established field. Predictions of logP are often used to drive forward drug discovery projects. Driven by the SAMPL7 challenge, in this manuscript we describe the steps that were taken to construct a novel machine learning model that can predict and generalize well. This model is based on the recently described Directed-Message Passing Neural Networks (D-MPNNs). Further enhancements included: both the inclusion of additional datasets from ChEMBL (RMSE improvement of 0.03), and the addition of helper tasks (RMSE improvement of 0.04). To the best of our knowledge, the concept of adding predictions from other models (Simulations Plus logP and logD@pH7.4, respectively) as helper tasks is novel and could be applied in a broader context. The final model that we constructed and used to participate in the challenge ranked 2/17 ranked submissions with an RMSE of 0.66, and an MAE of 0.48 (submission: Chemprop). On other datasets the model also works well, especially retrospectively applied to the SAMPL6 challenge where it would have ranked number one out of all submissions (RMSE of 0.35). Despite the fact that our model works well, we conclude with suggestions that are expected to improve the model even further.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Modelos Químicos , Modelos Estadísticos , Redes Neurales de la Computación , Teoría Cuántica , Solventes/química , Solubilidad , Termodinámica
3.
J Cheminform ; 16(1): 21, 2024 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-38395961

RESUMEN

The conversion of chemical structures into computer-readable descriptors, able to capture key structural aspects, is of pivotal importance in the field of cheminformatics and computer-aided drug design. Molecular fingerprints represent a widely employed class of descriptors; however, their generation process is time-consuming for large databases and is susceptible to bias. Therefore, descriptors able to accurately detect predefined structural fragments and devoid of lengthy generation procedures would be highly desirable. To meet additional needs, such descriptors should also be interpretable by medicinal chemists, and suitable for indexing databases with trillions of compounds. To this end, we developed-as integral part of EXSCALATE, Dompé's end-to-end drug discovery platform-the DompeKeys (DK), a new substructure-based descriptor set, which encodes the chemical features that characterize compounds of pharmaceutical interest. DK represent an exhaustive collection of curated SMARTS strings, defining chemical features at different levels of complexity, from specific functional groups and structural patterns to simpler pharmacophoric points, corresponding to a network of hierarchically interconnected substructures. Because of their extended and hierarchical structure, DK can be used, with good performance, in different kinds of applications. In particular, we demonstrate how they are very well suited for effective mapping of chemical space, as well as substructure search and virtual screening. Notably, the incorporation of DK yields highly performing machine learning models for the prediction of both compounds' activity and metabolic reaction occurrence. The protocol to generate the DK is freely available at https://dompekeys.exscalate.eu and is fully integrated with the Molecular Anatomy protocol for the generation and analysis of hierarchically interconnected molecular scaffolds and frameworks, thus providing a comprehensive and flexible tool for drug design applications.

4.
Proteins ; 64(1): 60-7, 2006 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-16568448

RESUMEN

The interaction between beta-catenin and Tcf family members is crucial for the Wnt signal transduction pathway, which is commonly mutated in cancer. This interaction extends over a very large surface area (4800 A(2)), and inhibiting such interactions using low molecular weight inhibitors is a challenge. However, protein surfaces frequently contain "hot spots," small patches that are the main mediators of binding affinity. By making tight interactions with a hot spot, a small molecule can compete with a protein. The Tcf3/Tcf4-binding surface on beta-catenin contains a well-defined hot spot around residues K435 and R469. A 17,700 compounds subset of the Pharmacia corporate collection was docked to this hot spot with the QXP program; 22 of the best scoring compounds were put into a biophysical (NMR and ITC) screening funnel, where specific binding to beta-catenin, competition with Tcf4 and finally binding constants were determined. This process led to the discovery of three druglike, low molecular weight Tcf4-competitive compounds with the tightest binder having a K(D) of 450 nM. Our approach can be used in several situations (e.g., when selecting compounds from external collections, when no biochemical functional assay is available, or when no HTS is envisioned), and it may be generally applicable to the identification of inhibitors of protein-protein interactions.


Asunto(s)
Proteínas/antagonistas & inhibidores , Proteínas/química , beta Catenina/antagonistas & inhibidores , Sitios de Unión , Cristalografía por Rayos X , Humanos , Modelos Moleculares , Mutación , Neoplasias/genética , Conformación Proteica , Programas Informáticos , Interfaz Usuario-Computador , beta Catenina/genética
5.
Biophys Chem ; 120(1): 55-61, 2006 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-16288953

RESUMEN

One of the interesting puzzles of amyloid beta-peptide of Alzheimer's disease (Abeta) is that it appears to polymerize into amyloid fibrils in a parallel beta sheet topology, while smaller subsets of the peptide produce anti-parallel beta sheets. In order to target potential weak points of amyloid fibrils in a rational drug design effort, it would be helpful to understand the forces that drive this change. We have designed two peptides CHQKLVFFAEDYNGKDEAFFVLKQHW and CHQKLVFFAEDYNGKHQKLVFFAEDW that join the significant amyloidogenic Abeta (14-23) sequence HQKLVFFAED in parallel and anti-parallel topologies, respectively. (Here, the word "parallel" refers only to residue sequence and not backbone topology). The N-termini of the hairpins were labeled with the fluorescent dye 5-((((2-iodoacetyl)amino)ethyl)amino)naphthalene-1-sulfonic acid (IAEDANS), forming a fluorescence energy transfer donor-acceptor pair with the C-terminus tryptophan. Circular dichroism results show that the anti-parallel hairpin adopts a beta-sheet conformation, while the parallel hairpin is disordered. Fluorescent Resonance Energy Transfer (FRET) results show that the distance between the donor and the acceptor is significantly shorter in the anti-parallel topology than in the parallel topology. The fluorescence intensity of anti-parallel hairpin also displays a linear concentration dependence, indicating that the FRET observed in the anti-parallel hairpin is from intra-molecular interactions. The results thus provide a quantitative estimate of the relative topological propensities of amyloidogenic peptides. Our FRET and CD results show that beta sheets involving the essential Abeta (14-23) fragment, strongly prefer the anti-parallel topology. Moreover, we provide a quantitative estimate of the relative preference for these two topologies. Such analysis can be repeated for larger subsets of Abeta to determine quantitatively the relative degree of preference for parallel/anti-parallel topologies in given fragments of Abeta.


Asunto(s)
Péptidos beta-Amiloides/química , Amiloide/química , Diseño de Fármacos , Péptidos/química , Estructura Secundaria de Proteína , Secuencia de Aminoácidos , Transferencia Resonante de Energía de Fluorescencia , Datos de Secuencia Molecular , Conformación Proteica
6.
Org Lett ; 18(4): 780-3, 2016 Feb 19.
Artículo en Inglés | MEDLINE | ID: mdl-26849068

RESUMEN

A conformational study of branimycin was performed using single-crystal X-ray crystallography to characterize the solid-state form, while a combination of NMR spectroscopy and molecular modeling was employed to gain information about the solution structure. Comparison of the crystal structure with its solution counterpart showed no significant differences in conformation, confirming the relative rigidity of the tricyclic system. However, these experiments revealed that the formerly proposed stereochemistry of branimycin at 17-C should be revised.


Asunto(s)
Macrólidos/química , Cristalografía por Rayos X , Conformación Molecular , Estructura Molecular , Resonancia Magnética Nuclear Biomolecular , Estereoisomerismo
7.
Proteins ; 60(4): 629-43, 2005 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-16028223

RESUMEN

Docking programs can generate subsets of a compound collection with an increased percentage of actives against a target (enrichment) by predicting their binding mode (pose) and affinity (score), and retrieving those with the highest scores. Using the QXP and GOLD programs, we compared the ability of six single scoring functions (PLP, Ligscore, Ludi, Jain, ChemScore, PMF) and four composite scoring models (Mean Rank: MR, Rank-by-Vote: Vt, Bayesian Statistics: BS and PLS Discriminant Analysis: DA) to separate compounds that are active against CDK2 from inactives. We determined the enrichment for the entire set of actives (IC50 < 10 microM) and for three activity subsets. In all cases, the enrichment for each subset was lower than for the entire set of actives. QXP outperformed GOLD at pose prediction, but yielded only moderately better enrichments. Five to six scoring functions yielded good enrichments with GOLD poses, while typically only two worked well with QXP poses. For each program, two scoring functions generally performed better than the others (Ligscore2 and Ludi for GOLD; QXP and Jain for QXP). Composite scoring functions yielded better results than single scoring functions. The consensus approaches MR and Vt worked best when separating micromolar inhibitors from inactives. The statistical approaches BS and DA, which require training data, performed best when distinguishing between low and high nanomolar inhibitors. The key observation that all hit rate profiles for all four activity intervals for all scoring schemes for both programs are significantly better than random, is evidence that docking can be successfully applied to enrich compound collections.


Asunto(s)
Quinasa 2 Dependiente de la Ciclina/antagonistas & inhibidores , Quinasa 2 Dependiente de la Ciclina/química , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Adenosina Trifosfato/química , Adenosina Trifosfato/metabolismo , Algoritmos , Sitios de Unión , Concentración de Iones de Hidrógeno , Cinética , Ligandos , Modelos Moleculares , Modelos Teóricos , Conformación Proteica , Interfaz Usuario-Computador
8.
J Med Chem ; 47(24): 6104-7, 2004 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-15537364

RESUMEN

The relationship of rotatable bond count (N(rot)) and polar surface area (PSA) with oral bioavailability in rats was examined for 434 Pharmacia compounds and compared with an earlier report from Veber et al. (J. Med. Chem. 2002, 45, 2615). N(rot) and PSA were calculated with QikProp or Cerius2. The resulting correlations depended on the calculation method and the therapeutic class within the data superset. These results underscore that such generalizations must be used with caution.


Asunto(s)
Disponibilidad Biológica , Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/metabolismo , Administración Oral , Animales , Estructura Molecular , Preparaciones Farmacéuticas/administración & dosificación , Ratas , Relación Estructura-Actividad
9.
Curr Pharm Des ; 20(20): 3314-22, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-23947648

RESUMEN

Compilation of an appropriate set of compounds is essential for the success of a small molecule screen. When very little is known about the target and when no or few ligands have been identified, the screening file is often made as diverse as possible. When structural information on the target or target family is available or ligands of the target are known, it is more efficient to apply a ligand- or target-focused bias, so as to predominantly screen compounds that can be expected to modulate the target. One way to achieve this is to select subsets of existing collections; another is to specifically design and synthesize libraries focused on a particular target, target family or mechanism of action. Despite the number of success stories, designing such libraries is still challenging and requires specialized knowledge, especially in emerging target areas such as protein-protein interactions (PPI), epigenetics and the ubiquitin proteasome pathway. BioFocus has successfully produced so-called SoftFocus(®) libraries for many years, evolving their targets from kinases to GPCRs and ion channels to difficult targets in the epigenetics and PPI fields. This article outlines several of the principles applied to SoftFocus library design, showcasing successes achieved by BioFocus' clients. In addition, screening results for a comprehensive set of BioFocus' kinase libraries against 20 kinase targets are used to demonstrate the power of the SoftFocus approach in delivering both selective and less-selective compounds and libraries against these targets. Trademarks: BioFocus(®), SoftFocus(®), HDRA™, FieldFocus™, Thematic Analysis™, ThemePair™ and ThemePair Fragment™ are trademarks of Galapagos NV and/or its affiliates.


Asunto(s)
Inhibidores de Proteínas Quinasas/síntesis química , Bibliotecas de Moléculas Pequeñas/síntesis química , Ligandos , Estructura Molecular , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Proteínas Quinasas/metabolismo , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Relación Estructura-Actividad
10.
Comb Chem High Throughput Screen ; 14(6): 521-31, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21521154

RESUMEN

Target-focused compound libraries are collections of compounds which are designed to interact with an individual protein target or, frequently, a family of related targets (such as kinases, voltage-gated ion channels, serine/cysteine proteases). They are used for screening against therapeutic targets in order to find hit compounds that might be further developed into drugs. The design of such libraries generally utilizes structural information about the target or family of interest. In the absence of such structural information, a chemogenomic model that incorporates sequence and mutagenesis data to predict the properties of the binding site can be employed. A third option, usually pursued when no structural data are available, utilizes knowledge of the ligands of the target from which focused libraries can be developed via scaffold hopping. Consequently, the methods used for the design of target-focused libraries vary according to the quantity and quality of structural or ligand data that is available for each target family. This article describes examples of each of these design approaches and illustrates them with case studies, which highlight some of the issues and successes observed when screening target-focused libraries.


Asunto(s)
Diseño de Fármacos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Animales , Humanos , Canales Iónicos/metabolismo , Modelos Moleculares , Unión Proteica , Mapeo de Interacción de Proteínas , Proteínas Quinasas/metabolismo , Receptores Acoplados a Proteínas G/metabolismo
11.
J Chem Inf Model ; 47(1): 85-91, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17238252

RESUMEN

The ability to accurately predict biological affinity on the basis of in silico docking to a protein target remains a challenging goal in the CADD arena. Typically, "standard" scoring functions have been employed that use the calculated docking result and a set of empirical parameters to calculate a predicted binding affinity. To improve on this, we are exploring novel strategies for rapidly developing and tuning "customized" scoring functions tailored to a specific need. In the present work, three such customized scoring functions were developed using a set of 129 high-resolution protein-ligand crystal structures with measured Ki values. The functions were parametrized using N-PLS (N-way partial least squares), a multivariate technique well-known in the 3D quantitative structure-activity relationship field. A modest correlation between observed and calculated pKi values using a standard scoring function (r2 = 0.5) could be improved to 0.8 when a customized scoring function was applied. To mimic a more realistic scenario, a second scoring function was developed, not based on crystal structures but exclusively on several binding poses generated with the Flo+ docking program. Finally, a validation study was conducted by generating a third scoring function with 99 randomly selected complexes from the 129 as a training set and predicting pKi values for a test set that comprised the remaining 30 complexes. Training and test set r2 values were 0.77 and 0.78, respectively. These results indicate that, even without direct structural information, predictive customized scoring functions can be developed using N-PLS, and this approach holds significant potential as a general procedure for predicting binding affinity on the basis of in silico docking.


Asunto(s)
Diseño de Fármacos , Relación Estructura-Actividad Cuantitativa , Inteligencia Artificial , Biología Computacional , Unión Proteica , Programas Informáticos
12.
J Chem Inf Model ; 45(1): 170-6, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-15667142

RESUMEN

Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.

13.
J Comput Aided Mol Des ; 19(2): 111-22, 2005 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-16075305

RESUMEN

Cyclin-dependent kinases (CDKs) play a key role in regulating the cell cycle. The cyclins, their activating agents, and endogenous CDK inhibitors are frequently mutated in human cancers, making CDKs interesting targets for cancer chemotherapy. Our aim is the discovery of selective CDK4/cyclin D1 inhibitors. An ATP-competitive pyrazolopyrimidinone CDK inhibitor was identified by HTS and docked into a CDK4 homology model. The resulting binding model was consistent with available SAR and was validated by a subsequent CDK2/inhibitor crystal structure. An iterative cycle of chemistry and modeling led to a 70-fold improvement in potency. Small substituent changes resulted in large CDK4/CDK2 selectivity changes. The modeling revealed that selectivity is largely due to hydrogen-bonded interactions with only two kinase residues. This demonstrates that small differences between enzymes can efficiently be exploited in the design of selective inhibitors.


Asunto(s)
Quinasas CDC2-CDC28/antagonistas & inhibidores , Ciclina A/antagonistas & inhibidores , Ciclina D1/antagonistas & inhibidores , Quinasas Ciclina-Dependientes/antagonistas & inhibidores , Inhibidores Enzimáticos/farmacología , Proteínas Proto-Oncogénicas/antagonistas & inhibidores , Pirimidinonas/farmacología , Secuencia de Aminoácidos , Quinasas CDC2-CDC28/química , Quinasa 2 Dependiente de la Ciclina , Quinasa 4 Dependiente de la Ciclina , Quinasas Ciclina-Dependientes/química , Evaluación Preclínica de Medicamentos , Inhibidores Enzimáticos/química , Enlace de Hidrógeno , Modelos Moleculares , Datos de Secuencia Molecular , Proteínas Proto-Oncogénicas/química , Pirimidinonas/química , Homología de Secuencia de Aminoácido , Especificidad por Sustrato
14.
J Chem Inf Comput Sci ; 44(3): 882-93, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15154753

RESUMEN

Novel scoring functions that predict the affinity of a ligand for its receptor have been developed. They were built with several statistical tools (partial least squares, genetic algorithms, neural networks) and trained on a data set of 100 crystal structures of receptor-ligand complexes, with affinities spanning 10 log units. The new scoring functions contain both descriptors generated by the QXP docking program and new descriptors that were developed in-house. These new descriptors are based on solvent accessible surface areas and account for conformational entropy changes and desolvation effects of both ligand and receptor upon binding. The predictive r(2) values for a test set of 24 complexes are in the 0.712-0.741 range and RMS prediction errors in the 1.09-1.16 log K(d) range. Inclusion of the new descriptors led to significant improvements in affinity prediction, compared to scoring functions based on QXP descriptors alone. However, the QXP descriptors by themselves perform better in binding mode prediction. The performance of the linear models is comparable to that of the neural networks. The new functions perform very well, but they still need to be validated as universal tools for the prediction of binding affinity.


Asunto(s)
Proteínas/química , Algoritmos , Conformación Proteica , Termodinámica
15.
J Chem Inf Comput Sci ; 44(3): 871-81, 2004.
Artículo en Inglés | MEDLINE | ID: mdl-15154752

RESUMEN

Six docking programs (FlexX, GOLD, ICM, LigandFit, the Northwestern University version of DOCK, and QXP) were evaluated in terms of their ability to reproduce experimentally observed binding modes (poses) of small-molecule ligands to macromolecular targets. The accuracy of a pose was assessed in two ways: First, the RMS deviation of the predicted pose from the crystal structure was calculated. Second, the predicted pose was compared to the experimentally observed one regarding the presence of key interactions with the protein. The latter assessment is referred to as interactions-based accuracy classification (IBAC). In a number of cases significant discrepancies were found between IBAC and RMSD-based classifications. Despite being more subjective, the IBAC proved to be a more meaningful measure of docking accuracy in all these cases.


Asunto(s)
Cristalografía por Rayos X/métodos , Modelos Moleculares , Estructura Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA