Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37642660

RESUMEN

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Asunto(s)
Benchmarking , Relación Estructura-Actividad Cuantitativa , Bioensayo , Aprendizaje Automático
2.
J Cheminform ; 13(1): 96, 2021 Dec 07.
Artículo en Inglés | MEDLINE | ID: mdl-34876230

RESUMEN

With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.

3.
Molecules ; 26(22)2021 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-34834051

RESUMEN

Machine learning models predicting the bioactivity of chemical compounds belong nowadays to the standard tools of cheminformaticians and computational medicinal chemists. Multi-task and federated learning are promising machine learning approaches that allow privacy-preserving usage of large amounts of data from diverse sources, which is crucial for achieving good generalization and high-performance results. Using large, real world data sets from six pharmaceutical companies, here we investigate different strategies for averaging weighted task loss functions to train multi-task bioactivity classification models. The weighting strategies shall be suitable for federated learning and ensure that learning efforts are well distributed even if data are diverse. Comparing several approaches using weights that depend on the number of sub-tasks per assay, task size, and class balance, respectively, we find that a simple sub-task weighting approach leads to robust model performance for all investigated data sets and is especially suited for federated learning.


Asunto(s)
Descubrimiento de Drogas/métodos , Aprendizaje Automático , Diseño de Fármacos , Humanos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología
4.
Mol Inform ; 39(12): e2000009, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32347666

RESUMEN

Generative Topographic Mapping (GTM) can be efficiently used to visualize, analyze and model large chemical data. The GTM manifold needs to span the chemical space deemed relevant for a given problem. Therefore, the Frame set (FS) of compounds used for the manifold construction must well cover a given chemical space. Intuitively, the FS size must raise with the size and diversity of the target library. At the same time, the GTM training can be very slow or even becomes technically impossible at FS sizes of the order of 105 compounds - which is a very small number compared to today's commercially accessible compounds, and, especially, to the theoretically feasible molecules. In order to solve this problem, we propose a Parallel GTM algorithm based on the merging of "intermediate" manifolds constructed in parallel for different subsets of molecules. An ensemble of these subsets forms a FS for the "final" manifold. In order to assess the efficiency of the new algorithm, 80 GTMs were built on the FSs of different sizes ranging from 10 to 1.8 M compounds selected from the ChEMBL database. Each GTM was challenged to build classification models for up to 712 biological activities (depending on the FS size). With the novel parallel GTM procedure, we could thus cover the entire spectrum of possible FS sizes, whereas previous studies were forced to rely on the working hypothesis that FS sizes of few thousands of compounds are sufficient to describe the ChEMBL chemical space. In fact, this study formally proves this to be true: a FS containing only 5000 randomly picked compounds is sufficient to represent the entire ChEMBL collection (1.8 M molecules), in the sense that a further increase of FS compound numbers has no benefice impact on the predictive propensity of the above-mentioned 712 activity classification models. Parallel GTM may, however, be required to generate maps based on very large FS, that might improve chemical space cartography of big commercial and virtual libraries, approaching billions of compounds.


Asunto(s)
Algoritmos , Macrodatos , Benchmarking , Bases de Datos de Compuestos Químicos , Entropía
5.
J Comput Aided Mol Des ; 34(7): 805-815, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31407224

RESUMEN

Generative topographic mapping was used to investigate the possibility to diversify the in-house compounds collection of Boehringer Ingelheim (BI). For this purpose, a 2D map covering the relevant chemical space was trained, and the BI compound library was compared to the Aldrich-Market Select (AMS) database of more than 8M purchasable compounds. In order to discover new (sub)structures, the "AutoZoom" tool was developed and applied in order to analyze chemotypes of molecules residing in heavily populated zones of a map and to extract the corresponding maximum common substructures. A set of 401K new structures from the AMS database was retrieved and checked for drug-likeness and biological activity.


Asunto(s)
Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas , Algoritmos , Diseño Asistido por Computadora/estadística & datos numéricos , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Bases de Datos Farmacéuticas/estadística & datos numéricos , Diseño de Fármacos , Desarrollo de Medicamentos/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Humanos , Estructura Molecular , Programas Informáticos , Interfaz Usuario-Computador
6.
Front Pharmacol ; 10: 1303, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31749705

RESUMEN

In recent years, the development of high-throughput screening (HTS) technologies and their establishment in an industrialized environment have given scientists the possibility to test millions of molecules and profile them against a multitude of biological targets in a short period of time, generating data in a much faster pace and with a higher quality than before. Besides the structure activity data from traditional bioassays, more complex assays such as transcriptomics profiling or imaging have also been established as routine profiling experiments thanks to the advancement of Next Generation Sequencing or automated microscopy technologies. In industrial pharmaceutical research, these technologies are typically established in conjunction with automated platforms in order to enable efficient handling of screening collections of thousands to millions of compounds. To exploit the ever-growing amount of data that are generated by these approaches, computational techniques are constantly evolving. In this regard, artificial intelligence technologies such as deep learning and machine learning methods play a key role in cheminformatics and bio-image analytics fields to address activity prediction, scaffold hopping, de novo molecule design, reaction/retrosynthesis predictions, or high content screening analysis. Herein we summarize the current state of analyzing large-scale compound data in industrial pharmaceutical research and describe the impact it has had on the drug discovery process over the last two decades, with a specific focus on deep-learning technologies.

7.
J Comput Aided Mol Des ; 33(3): 331-343, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30739238

RESUMEN

The previously reported procedure to generate "universal" Generative Topographic Maps (GTMs) of the drug-like chemical space is in practice a multi-task learning process, in which both operational GTM parameters (example: map grid size) and hyperparameters (key example: the molecular descriptor space to be used) are being chosen by an evolutionary process in order to fit/select "universal" GTM manifolds. After selection (a one-time task aimed at optimizing the compromise in terms of neighborhood behavior compliance, over a large pool of various biological targets), for any further use the manifolds are ready to provide "fit-free" predictive models. Using any structure-activity set-irrespectively whether the associated target served at map fitting stage or not-the generation or "coloring" a property landscape enables predicting the property for any external molecule, with zero additional fitable parameters involved. While previous works have signaled the excellent behavior of such models in aggressive three-fold cross-validation assessments of their predictive power, the present work wished to explore their behavior in Virtual Screening (VS), here simulated on hand of external DUD ligand and decoy series that are fully disjoint from the ChEMBL-extracted landscape coloring sets. Beyond the rather robust results of the universal GTM manifolds in this challenge, it could be shown that the descriptor spaces selected by the evolutionary multi-task learner were intrinsically able to serve as an excellent support for many other VS procedures, starting from parameter-free similarity searching, to local (target-specific) GTM models, to parameter-rich, nonlinear Random Forest and Neural Network approaches.


Asunto(s)
Modelos Moleculares , Proteínas/química , Bases de Datos de Proteínas , Ligandos , Redes Neurales de la Computación , Unión Proteica , Conformación Proteica , Relación Estructura-Actividad
8.
J Comput Aided Mol Des ; 29(9): 911-21, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-26409840

RESUMEN

Data driven decision making is a key element of today's pharmaceutical research, including early drug discovery. It comprises questions like which target to pursue, which chemical series to pursue, which compound to make next, or which compound to select for advanced profiling and promotion to pre-clinical development. In the following paper we will exemplify how data integrity, i.e. the context data is generated in and auxiliary information that is provided for individual result records, can influence decision making in early lead discovery programs. In addition we will describe some approaches which we pursue at Boehringer Ingelheim to reduce the risk for getting misguided.


Asunto(s)
Exactitud de los Datos , Toma de Decisiones , Descubrimiento de Drogas , Ensayos Analíticos de Alto Rendimiento/métodos , Artefactos , Química Farmacéutica/métodos , Química Farmacéutica/normas , Química Farmacéutica/estadística & datos numéricos , Simulación por Computador , Bases de Datos Factuales , Industria Farmacéutica/métodos , Industria Farmacéutica/organización & administración , Industria Farmacéutica/normas , Reacciones Falso Positivas , Ensayos Analíticos de Alto Rendimiento/normas , Concentración 50 Inhibidora , Espectroscopía de Resonancia Magnética , Espectrometría de Masas/normas
9.
J Mol Model ; 20(7): 2322, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-24972798

RESUMEN

Quantitative structure activity relationship (QSAR) modeling has been in use for several decades now. One branch of it, in silico ADMET, became more and more important since the late 1990s as studies indicated that poor pharmacokinetics and toxicity were important causes of costly late-stage failures in drug development. In this paper we describe some of the available methods and best practice for the different stages of the in silico model building process. We also describe some more recent developments, like automated model building and the prediction probability. Finally we will discuss the use of in silico ADMET for "big data" and the importance and possible further development of interpretable models.

10.
J Chem Inf Model ; 54(4): 1093-102, 2014 Apr 28.
Artículo en Inglés | MEDLINE | ID: mdl-24593681

RESUMEN

Within this work, a methodological extension of the matched molecular pair analysis is presented. The method is based on a pharmacophore retyping of the molecular graph and a consecutive matched molecular pair analysis. The features of the new methodology are exemplified using a large data set on CYP inhibition. We show that Fuzzy Matched Pairs can be used to extract activity and selectivity determining pharmacophoric features. Based on the fuzzy pharmacophore description, the method clusters molecular transfers and offers new opportunities for the combination of data from different sources, namely public and industry datasets.


Asunto(s)
Lógica Difusa , Algoritmos , Descubrimiento de Drogas
11.
Bioorg Med Chem ; 20(18): 5428-35, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22633121

RESUMEN

In the last 10-15 years, many new technologies and approaches have been implemented in research in the pharmaceutical industry; these include high-throughput screening or combinatorial chemistry, which result in a rapidly growing amount of biological assay and structural data in the corporate databases. Efficient use of the data from this growing data mountain is a key success factor; 'provide as much knowledge as possible as early as possible and therefore enable research teams to make the best possible decision whenever this decision can be supported by stored data'. Here, an approach which started several years ago to obtain as much information as possible out of historical assay data stored in the corporate database is described. It will be shown how important a careful preprocessing of the stored data is to enhance its information. Different possibilities for accessing and to analyzing the preconditioned data are in place. Some of will be described in the examples.


Asunto(s)
Bases de Datos Farmacéuticas , Descubrimiento de Drogas , Compuestos Orgánicos/farmacología , Preparaciones Farmacéuticas/química , Bioensayo , Relación Dosis-Respuesta a Droga , Estructura Molecular , Compuestos Orgánicos/química , Relación Estructura-Actividad
12.
J Chem Inf Model ; 50(3): 404-14, 2010 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-20088498

RESUMEN

Insolubility is a crucial issue in drug design because insoluble compounds are often measured to be inactive although they might be active if they were soluble. We provide and analyze various insolubility classification models based on a recently published data set and compounds measured in-house at Boehringer-Ingelheim. The 2D descriptor sets from pharmacophore fingerprints and MOE and the 3D descriptor sets from ParaSurf and VolSurf were examined in conjunction with support vector machines, Bayesian regularized neural networks, and random forests. We introduce a classifier-fusion strategy, called metaclassifier, which improves upon the best single prediction and at the same time avoids descriptor selection, a potential source of overfitting. The metaclassifier strategy is compared to the simpler fusion strategies of maximum vote and highest probability picking. A prediction accuracy of 72.6% on a three class model is achieved with the metaclassifier, with nearly perfect separation of soluble and insoluble compounds and prediction as good as our calculated maximum possible agreement with experiment.


Asunto(s)
Redes Neurales de la Computación , Preparaciones Farmacéuticas/química , Modelos Químicos , Probabilidad , Solubilidad
13.
J Chem Inf Model ; 50(3): 429-36, 2010 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-20108914

RESUMEN

Log P(OW), the negative logarithm of the octanol-water partition coefficient, is omnipresent in computational drug design. Here, we present a surface-integral model for calculating log P(OW). The model is based on local properties calculated using AM1 semiempirical molecular orbital theory. These are the molecular electrostatic potential (MEP), local ionization energy (IE(L)), local electron affinity (EA(L)), local hardness (HARD), local polarizability (POL), and the local field normal to the surface (FN). We have developed a new scheme to calculate a local hydrophobicity based on binning the range of local surface properties instead of using polynomial expansions of the base terms. The model has been trained using approximately 9500 compounds available from the literature. It was validated on approximately 1350 compounds from the literature and an in-house validation set of 768 compounds from Boehringer-Ingelheim. The model performs similarly to or slightly better than the best commercially available models. We also introduce a model based purely on conformationally rigid compounds that performs well for flexible compounds if the Boltzmann weighted predictions for the different conformers are used. This is the first 3D QSPR model based on such a large databasis that is able to benefit from using conformational ensembles.


Asunto(s)
Preparaciones Farmacéuticas/química , Algoritmos , Interacciones Hidrofóbicas e Hidrofílicas , Modelos Químicos , Octanoles/química , Solubilidad , Agua/química
14.
ChemMedChem ; 4(9): 1529-36, 2009 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-19588473

RESUMEN

Herein, we describe a new dataset of kinetic aqueous solubilities determined by nephelometry for 711 druglike compounds. The solubilities are reported in twelve classes ranging from <2 microg mL(-1) to >250 microg mL(-1). The measurements were designed to provide the appropriate data for applications in the early phases of drug discovery. Three class classification models (insoluble, moderately soluble, soluble) were built using the random forest algorithm and their performance for this dataset was analyzed.


Asunto(s)
Descubrimiento de Drogas , Algoritmos , Nefelometría y Turbidimetría , Compuestos Orgánicos/química , Compuestos Orgánicos/clasificación , Solubilidad
15.
J Chem Inf Model ; 49(1): 28-34, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-19105731

RESUMEN

Multiple linear regression is a major tool in computational chemistry. Although it has been used for more than 30 years, it has only recently been noted within the cheminformatics community that the standard F-values used to assess the significance of the resulting models are inappropriate in situations where the variables included in a model are chosen from a large pool of descriptors, due to an effect known in the statistical literature as selection bias. We have used Monte Carlo simulations to estimate the critical F-values for many combinations of sample size (n), model size (p), and descriptor pool size (k), using stepwise regression, one of the methods most commonly used to derive linear models from large sets of molecular descriptors. The values of n, p, and k represent cases appropriate to contemporary cheminformatics data sets. A formula for general n, p, and k values has been developed from the numerical estimates that approximates the critical stepwise F-values at 90%, 95%, and 99% significance levels. This approximation reproduces both the original simulated values and an interpolation test set (within the range of the training values) with an R2 value greater than 0.995. For an extrapolation test set of cases outside the range of the training set, the approximation produced an R2 above 0.93.

16.
ChemMedChem ; 3(2): 254-65, 2008 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-18061919

RESUMEN

hERG blockade is one of the major toxicological problems in lead structure optimization. Reliable ligand-based in silico models for predicting hERG blockade therefore have considerable potential for saving time and money, as patch-clamp measurements are very expensive and no crystal structures of the hERG-encoded channel are available. Herein we present a predictive QSAR model for hERG blockade that differentiates between specific and nonspecific binding. Specific binders are identified by preliminary pharmacophore scanning. In addition to descriptor-based models for the compounds selected as hitting one of two different pharmacophores, we also use a model for nonspecific binding that reproduces blocking properties of molecules that do not fit either of the two pharmacophores. PLS and SVR models based on interpretable quantum mechanically derived descriptors on a literature dataset of 113 molecules reach overall R(2) values between 0.60 and 0.70 for independent validation sets and R(2) values between 0.39 and 0.76 after partitioning according to the pharmacophore search for the test sets. Our findings suggest that hERG blockade may occur through different types of binding, so that several different models may be necessary to assess hERG toxicity.


Asunto(s)
Canales de Potasio Éter-A-Go-Go/antagonistas & inhibidores , Bloqueadores de los Canales de Potasio/farmacología , Relación Estructura-Actividad Cuantitativa , Cristalografía por Rayos X , Canales de Potasio Éter-A-Go-Go/química , Humanos , Concentración 50 Inhibidora , Ligandos , Modelos Biológicos , Bloqueadores de los Canales de Potasio/química , Unión Proteica , Estudios de Validación como Asunto
17.
J Comput Aided Mol Des ; 19(3): 189-201, 2005 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-16059671

RESUMEN

The cytochrome P450 (CYP) enzyme superfamily plays a major role in the metabolism of commercially available drugs. Inhibition of these enzymes by a drug may result in a plasma level increase of another drug, thus leading to unwanted drug-drug interactions when two or more drugs are coadministered. Therefore, fast and reliable in silico methods predicting CYP inhibition from calculated molecular properties are an important tool which can be applied to assess both already synthesized as well as virtual compounds. We have studied the performance of support vector machines (SVMs) to classify compounds according to their potency to inhibit CYP3A4. The data set for model generation consists of more than 1300 structural diverse drug-like research molecules which were divided into training and test sets. The predictive power of SVMs crucially depends on a careful selection of parameters specifying the kernel function and the penalty for misclassifications. In this study we have investigated a procedure to identify a valid set of SVM parameters which is based on a sampling of the parameter space on a regular grid. From this set of parameters, either single SVMs or SVM committees were trained to distinguish between strong and weak inhibitors or to achieve a more realistic three-class assignment, with one class representing medium inhibitors. This workflow was studied for several kernel functions and descriptor sets. All SVM models performed significantly better than PLS-DA models which were generated from the corresponding descriptor sets. As a very promising result, simple two-dimensional (2D) descriptors yield a three-class model which correctly classifies more than 70% of the test set. Our work illustrates that SVMs used in combination with simple 2D descriptors provide a very effective and reliable tool which allows a fast assessment of CYP3A4 inhibition potency in an early in silico filtering process.


Asunto(s)
Inhibidores Enzimáticos del Citocromo P-450 , Inhibidores Enzimáticos/farmacología , Diseño Asistido por Computadora , Citocromo P-450 CYP3A , Diseño de Fármacos , Inhibidores Enzimáticos/química , Humanos , Análisis Multivariante , Reproducibilidad de los Resultados
18.
Eur J Pharm Sci ; 24(5): 451-63, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15784335

RESUMEN

In the early phases of current pharmaceutical research projects, huge numbers of compounds are tested on their biological activity with respect to a certain target by experimental or virtual screening campaigns. To reduce the attrition rate in later stages of a project, other relevant properties such as physicochemical and ADMET (absorption, distribution, metabolism, excretion, toxicity) properties should be assessed as early as possible in lead discovery and optimization. The present study describes the development of in silico models to predict the inhibition of human cytochrome P450 3A4 (CYP3A4) from calculated molecular descriptors. The models were trained and validated using a set of 967 structural diverse drug-like research compounds with an experimentally determined CYP3A4 inhibition potency (IC50 value) which was carefully split into a training and a test set. For classification models, the data sets were further subdivided into strong, medium, and weak inhibitors. Different descriptor sets were used to cover various aspects of molecular properties, including properties derived from the 2D structure, the interaction of the molecule with its environment, and properties derived from quantum-mechanical calculations. The descriptors were related to the CYP3A4 inhibition potency by multivariate data analysis methods such as partial least-squares projection to latent structures (PLS), PLS discriminant analysis (PLS-DA), and soft independent class modeling (SIMCA). The squared correlation between experimental and predicted IC50 values of the previously unseen test set compounds was Qext2=0.6 for the best PLS models, corresponding to a root mean squared error (RMSE) of RMSE=0.45 (logarithm of IC50). The best PLS-DA models were able to correctly classify more than 60% of the test set compounds, whereas almost no strong inhibitors were wrongly classified as weak inhibitors and vice versa. Furthermore, relevant molecular properties were identified which are closely related to the CYP3A4 inhibition potency of a compound. The results presented here are very encouraging since our models could, for instance, serve to flag problematic compounds or to guide further synthesis efforts.


Asunto(s)
Inhibidores Enzimáticos del Citocromo P-450 , Inhibidores Enzimáticos/química , Citocromo P-450 CYP3A , Modelos Moleculares , Análisis Multivariante
19.
J Antibiot (Tokyo) ; 55(5): 480-94, 2002 May.
Artículo en Inglés | MEDLINE | ID: mdl-12139017

RESUMEN

Hormone-sensitive lipase (HSL) is a key enzyme of lipid metabolism and its control is therefore a target in the treatment of diabetes mellitus. Cultures of the Streptomyces species DSM 13381 have been shown to potently inhibit HSL. Ten inhibitors of HSL, termed cyclipostins, have been isolated from the mycelium of this microorganism and a further nine related compounds detected. Their structures were characterized by 2-D NMR experiments and by mass spectrometry and were found to comprise neutral cyclic enol phosphate esters with an additional y-lactone ring. On account of their ester-bound fatty alcohol side chain, the cyclipostins have physicochemical properties similar to those of triglycerides. The outstanding characteristic of the cyclipostins is their strong anti-HSL activity, with IC50 values in the nanomolar range.


Asunto(s)
Inhibidores Enzimáticos/aislamiento & purificación , Esterol Esterasa/antagonistas & inhibidores , Streptomyces/metabolismo , Adipocitos/efectos de los fármacos , Animales , Cromatografía Líquida de Alta Presión , Inhibidores Enzimáticos/química , Inhibidores Enzimáticos/farmacología , Lipólisis/efectos de los fármacos , Espectroscopía de Resonancia Magnética , Masculino , Espectrometría de Masas , Estructura Molecular , Ratas , Ratas Sprague-Dawley , Streptomyces/crecimiento & desarrollo
20.
J Med Chem ; 45(16): 3345-55, 2002 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-12139446

RESUMEN

We have investigated techniques for distinguishing between drugs and nondrugs using a set of molecular descriptors derived from semiempirical molecular orbital (AM1) calculations. The "drug" data set of 2105 compounds was derived from the World Drug Index (WDI) using a procedure designed to select real drugs. The "nondrug" data set was the Maybridge database. We have first investigated the dimensionality of physical properties space based on a set of 26 descriptors that we have used successfully to build absorption, distribution, metabolism, and excretion-related quantitative structure-property relationship models. We discuss the general nature of the descriptors for physical property space and the ability of these descriptors to distinguish between drugs and nondrugs. The third most significant principal component of this set of descriptors serves as a useful numerical index of drug-likeness, but no others are able to distinguish between drugs and nondrugs. We have therefore extended our set of descriptors to a total of 66 and have used recursive partitioning to identify the descriptors that can distinguish between drugs and nondrugs. This procedure pointed to two of the descriptors that play an important role in the principal component found above and one more from the set of 40 extra descriptors. These three descriptors were then used to train a Kohonen artificial neural net for the entire Maybridge data set. Projecting the drug database onto the map obtained resulted in a clear distinction not only between drugs and nondrugs but also, for instance, between hormones and other drugs. Projection of 42 131 compounds from the WDI onto the Kohonen map also revealed pronounced clustering in the regions of the map assigned as druglike.


Asunto(s)
Preparaciones Farmacéuticas/química , Preparaciones Farmacéuticas/clasificación , Fenómenos Químicos , Química Física , Bases de Datos Factuales , Diseño de Fármacos , Redes Neurales de la Computación , Teoría Cuántica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...