Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
J Chem Inf Model ; 64(12): 4687-4699, 2024 Jun 24.
Artículo en Inglés | MEDLINE | ID: mdl-38822782

RESUMEN

The design of compounds during hit-to-lead often seeks to explore a vector from a core scaffold to form additional interactions with the target protein. A rational approach to this is to probe the region of a protein accessed by a vector with a systematic placement of pharmacophore features in 3D, particularly when bound structures are not available. Herein, we present bbSelect, an open-source tool built to map the placements of pharmacophore features in 3D Euclidean space from a library of R-groups, employing partitioning to drive a diverse and systematic selection to a user-defined size. An evaluation of bbSelect against established methods exemplified the superiority of bbSelect in its ability to perform diverse selections, achieving high levels of pharmacophore feature placement coverage with selection sizes of a fraction of the total set and without the introduction of excess complexity. bbSelect also reports visualizations and rationale to enable users to understand and interrogate results. This provides a tool for the drug discovery community to guide their hit-to-lead activities.


Asunto(s)
Descubrimiento de Drogas , Programas Informáticos , Descubrimiento de Drogas/métodos , Modelos Moleculares , Diseño de Fármacos , Proteínas/química , Farmacóforo
2.
J Chem Inf Model ; 63(4): 1099-1113, 2023 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-36758178

RESUMEN

Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.


Asunto(s)
Aprendizaje Profundo , Solubilidad , Redes Neurales de la Computación , Aprendizaje Automático , Algoritmos
3.
J Chem Inf Model ; 62(6): 1458-1470, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35258972

RESUMEN

Accurate and rapid predictions of the binding affinity of a compound to a target are one of the ultimate goals of computer aided drug design. Alchemical approaches to free energy estimations follow the path from an initial state of the system to the final state through alchemical changes of the energy function during a molecular dynamics simulation. Herein, we explore the accuracy and efficiency of two such techniques: relative free energy perturbation (FEP) and multisite lambda dynamics (MSλD). These are applied to a series of inhibitors for the bromodomain-containing protein 4 (BRD4). We demonstrate a procedure for obtaining accurate relative binding free energies using MSλD when dealing with a change in the net charge of the ligand. This resulted in an impressive comparison with experiment, with an average difference of 0.4 ± 0.4 kcal mol-1. In a benchmarking study for the relative FEP calculations, we found that using 20 lambda windows with 0.5 ns of equilibration and 1 ns of data collection for each window gave the optimal compromise between accuracy and speed. Overall, relative FEP and MSλD predicted binding free energies with comparable accuracy, an average of 0.6 kcal mol-1 for each method. However, MSλD makes predictions for a larger molecular space over a much shorter time scale than relative FEP, with MSλD requiring a factor of 18 times less simulation time for the entire molecule space.


Asunto(s)
Proteínas Nucleares , Factores de Transcripción , Entropía , Ligandos , Simulación de Dinámica Molecular , Unión Proteica , Termodinámica
4.
Org Biomol Chem ; 19(25): 5632-5641, 2021 06 30.
Artículo en Inglés | MEDLINE | ID: mdl-34105560

RESUMEN

The bromodomain-containing protein 4 (BRD4), a member of the bromodomain and extra-terminal domain (BET) family, plays a key role in several diseases, especially cancers. With increased interest in BRD4 as a therapeutic target, many X-ray crystal structures of the protein in complex with small molecule inhibitors are publicly available over the recent decade. In this study, we use this structural information to investigate the conformations of the first bromodomain (BD1) of BRD4. Structural alignment of 297 BRD4-BD1 complexes shows a high level of similarity between the structures of BRD4-BD1, regardless of the bound ligand. We employ WONKA, a tool for detailed analyses of protein binding sites, to compare the active site of over 100 of these crystal structures. The positions of key binding site residues show a high level of conformational similarity, with the exception of Trp81. A focused analysis on the highly conserved water network in the binding site of BRD4-BD1 is performed to identify the positions of these water molecules across the crystal structures. The importance of the water network is illustrated using molecular docking and absolute free energy perturbation simulations. 82% of the ligand poses were better predicted when including water molecules as part of the receptor. Our analysis provides guidance for the design of new BRD4-BD1 inhibitors and the selection of the best structure of BRD4-BD1 to use in structure-based drug design, an important approach for faster and more cost-efficient lead discovery.


Asunto(s)
Proteínas de Ciclo Celular , Factores de Transcripción
5.
J Med Chem ; 63(20): 11964-11971, 2020 10 22.
Artículo en Inglés | MEDLINE | ID: mdl-32955254

RESUMEN

Machine learning approaches promise to accelerate and improve success rates in medicinal chemistry programs by more effectively leveraging available data to guide a molecular design. A key step of an automated computational design algorithm is molecule generation, where the machine is required to design high-quality, drug-like molecules within the appropriate chemical space. Many algorithms have been proposed for molecular generation; however, a challenge is how to assess the validity of the resulting molecules. Here, we report three Turing-inspired tests designed to evaluate the performance of molecular generators. Profound differences were observed between the performance of molecule generators in these tests, highlighting the importance of selection of the appropriate design algorithms for specific circumstances. One molecule generator, based on match molecular pairs, performed excellently against all tests and thus provides a valuable component for machine-driven medicinal chemistry design workflows.


Asunto(s)
Algoritmos , Aprendizaje Automático , Química Farmacéutica , Diseño de Fármacos , Humanos , Estructura Molecular
6.
J Chem Inf Model ; 60(12): 5699-5713, 2020 12 28.
Artículo en Inglés | MEDLINE | ID: mdl-32659085

RESUMEN

Deep learning approaches have become popular in recent years in the field of de novo molecular design. While a variety of different methods are available, it is still a challenge to assess and compare their performance. A particularly promising approach for automated drug design is to use recurrent neural networks (RNNs) as SMILES generators and train them with the learning procedure called "transfer learning". This involves first training the initial model on a large generic data set of molecules to learn the general syntax of SMILES, followed by fine-tuning on a smaller set of molecules, coming from, e.g., a lead optimization program. To create a well-performing transfer learning application which can be automated, it is important to understand how the size of the second data set affects the training process. In addition, extensive postfiltering using similarity metrics of the molecules generated after transfer learning should be avoided, as it can introduce new biases toward the selection of drug candidates. Here, we present results from the application of a gated recurrent unit cell (GRU)-RNN to transfer learning on data sets of varying sizes and complexity. Analysis of the results has allowed us to provide some general guidelines for transfer learning. In particular, we show that data set sizes containing at least 190 molecules are needed for effective GRU-RNN-based molecular generation using transfer learning. The methods presented here should be applicable generally to the benchmarking of other deep learning methodologies for molecule generation.


Asunto(s)
Diseño de Fármacos , Redes Neurales de la Computación , Aprendizaje Automático
7.
J Chem Inf Model ; 59(3): 1136-1146, 2019 03 25.
Artículo en Inglés | MEDLINE | ID: mdl-30525594

RESUMEN

A key component of automated molecular design is the generation of compound ideas for subsequent filtering and assessment. Recently deep learning approaches have been explored as alternatives to traditional de novo molecular design techniques. Deep learning algorithms rely on learning from large pools of molecules represented as molecular graphs (generally SMILES), and several approaches can be used to tailor the generated molecules to defined regions of chemical space. Cheminformatics has developed alternative higher-level representations that capture the key properties of a set of molecules, and it would be of interest to understand whether such representations can be used to constrain the output of molecule generation algorithms. In this work we explore the use of one such representation, the Reduced Graph, as a definition of target chemical space for a deep learning molecule generator. The Reduced Graph replaces functional groups with superatoms representing the pharmacophoric features. Assigning these superatoms to specific nonorganic element types allows the Reduced Graph to be represented as a valid SMILES string. The mapping from standard SMILES to Reduced Graph SMILES is well-defined, however, the inverse is not true, and this presents a particular challenge. Here we present the results of a novel seq-to-seq approach to molecule generation, where the one to many mapping of Reduced Graph to SMILES is learned on a large training set. This training needs to be performed only once. In a subsequent step, this model can be used to generate arbitrary numbers of compounds that have the same Reduced Graph as any input molecule. Through analysis of data sets in ChEMBL we show that the approach generates valid molecules and can extrapolate to Reduced Graphs unseen in the training set. The method offers an alternative deep learning approach to molecule generation that does not rely on transfer learning, latent space generation, or adversarial networks and is applicable to scaffold hopping and other cheminformatics applications in drug discovery.


Asunto(s)
Aprendizaje Profundo , Preparaciones Farmacéuticas/química , Quimioinformática , Bases de Datos Farmacéuticas , Diseño de Fármacos , Modelos Moleculares , Estructura Molecular
8.
SLAS Discov ; 23(6): 532-545, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29699447

RESUMEN

High-throughput screening (HTS) hits include compounds with undesirable properties. Many filters have been described to identify such hits. Notably, pan-assay interference compounds (PAINS) has been adopted by the community as the standard term to refer to such filters, and very useful guidelines have been adopted by the American Chemical Society (ACS) and subsequently triggered a healthy scientific debate about the pitfalls of draconian use of filters. Using an inhibitory frequency index, we have analyzed in detail the promiscuity profile of the whole GlaxoSmithKline (GSK) HTS collection comprising more than 2 million unique compounds that have been tested in hundreds of screening assays. We provide a comprehensive analysis of many previously published filters and newly described classes of nuisance structures that may serve as a useful source of empirical information to guide the design or growth of HTS collections and hit triaging strategies.


Asunto(s)
Descubrimiento de Drogas/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Bibliotecas de Moléculas Pequeñas/química , Bioensayo/métodos
9.
J Med Chem ; 59(18): 8189-206, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-27124799

RESUMEN

Fragment-based drug discovery (FBDD) is well suited for discovering both drug leads and chemical probes of protein function; it can cover broad swaths of chemical space and allows the use of creative chemistry. FBDD is widely implemented for lead discovery in industry but is sometimes used less systematically in academia. Design principles and implementation approaches for fragment libraries are continually evolving, and the lack of up-to-date guidance may prevent more effective application of FBDD in academia. This Perspective explores many of the theoretical, practical, and strategic considerations that occur within FBDD programs, including the optimal size, complexity, physicochemical profile, and shape profile of fragments in FBDD libraries, as well as compound storage, evaluation, and screening technologies. This compilation of industry experience in FBDD will hopefully be useful for those pursuing FBDD in academia.


Asunto(s)
Diseño de Fármacos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Animales , Quinasa de Punto de Control 2/antagonistas & inhibidores , Inhibidores de Integrasa VIH/química , Inhibidores de Integrasa VIH/farmacología , Proteínas HSP90 de Choque Térmico/antagonistas & inhibidores , Humanos , Metaloproteinasa 12 de la Matriz/metabolismo , Inhibidores de la Metaloproteinasa de la Matriz/química , Inhibidores de la Metaloproteinasa de la Matriz/farmacología , Proteína Quinasa 14 Activada por Mitógenos/antagonistas & inhibidores , Inhibidores de Proteínas Quinasas/química , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Tripsina/química , Inhibidores de Tripsina/farmacología
10.
J Med Chem ; 59(6): 2452-67, 2016 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-26938474

RESUMEN

Inhibitors of mitochondrial branched chain aminotransferase (BCATm), identified using fragment screening, are described. This was carried out using a combination of STD-NMR, thermal melt (Tm), and biochemical assays to identify compounds that bound to BCATm, which were subsequently progressed to X-ray crystallography, where a number of exemplars showed significant diversity in their binding modes. The hits identified were supplemented by searching and screening of additional analogues, which enabled the gathering of further X-ray data where the original hits had not produced liganded structures. The fragment hits were optimized using structure-based design, with some transfer of information between series, which enabled the identification of ligand efficient lead molecules with micromolar levels of inhibition, cellular activity, and good solubility.


Asunto(s)
Mitocondrias/enzimología , Transaminasas/antagonistas & inhibidores , Adipocitos/efectos de los fármacos , Adipocitos/enzimología , Cristalografía por Rayos X , Ensayos Analíticos de Alto Rendimiento , Humanos , Espectroscopía de Resonancia Magnética , Modelos Moleculares , Fragmentos de Péptidos/química , Fragmentos de Péptidos/farmacología , Unión Proteica , Relación Estructura-Actividad
11.
Nat Rev Drug Discov ; 14(7): 475-86, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-26091267

RESUMEN

The pharmaceutical industry remains under huge pressure to address the high attrition rates in drug development. Attempts to reduce the number of efficacy- and safety-related failures by analysing possible links to the physicochemical properties of small-molecule drug candidates have been inconclusive because of the limited size of data sets from individual companies. Here, we describe the compilation and analysis of combined data on the attrition of drug candidates from AstraZeneca, Eli Lilly and Company, GlaxoSmithKline and Pfizer. The analysis reaffirms that control of physicochemical properties during compound optimization is beneficial in identifying compounds of candidate drug quality and indicates for the first time a link between the physicochemical properties of compounds and clinical failure due to safety issues. The results also suggest that further control of physicochemical properties is unlikely to have a significant effect on attrition rates and that additional work is required to address safety-related failures. Further cross-company collaborations will be crucial to future progress in this area.


Asunto(s)
Sistemas de Liberación de Medicamentos/métodos , Descubrimiento de Drogas/métodos , Industria Farmacéutica/métodos , Drogas en Investigación , Animales , Sistemas de Liberación de Medicamentos/estadística & datos numéricos , Sistemas de Liberación de Medicamentos/tendencias , Descubrimiento de Drogas/estadística & datos numéricos , Descubrimiento de Drogas/tendencias , Evaluación Preclínica de Medicamentos/métodos , Evaluación Preclínica de Medicamentos/estadística & datos numéricos , Evaluación Preclínica de Medicamentos/tendencias , Industria Farmacéutica/estadística & datos numéricos , Industria Farmacéutica/tendencias , Drogas en Investigación/administración & dosificación , Humanos , Estadística como Asunto/métodos , Estadística como Asunto/tendencias
12.
J Med Chem ; 58(18): 7140-63, 2015 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-26090771

RESUMEN

The hybridization of hits, identified by complementary fragment and high throughput screens, enabled the discovery of the first series of potent inhibitors of mitochondrial branched-chain aminotransferase (BCATm) based on a 2-benzylamino-pyrazolo[1,5-a]pyrimidinone-3-carbonitrile template. Structure-guided growth enabled rapid optimization of potency with maintenance of ligand efficiency, while the focus on physicochemical properties delivered compounds with excellent pharmacokinetic exposure that enabled a proof of concept experiment in mice. Oral administration of 2-((4-chloro-2,6-difluorobenzyl)amino)-7-oxo-5-propyl-4,7-dihydropyrazolo[1,5-a]pyrimidine-3-carbonitrile 61 significantly raised the circulating levels of the branched-chain amino acids leucine, isoleucine, and valine in this acute study.


Asunto(s)
Proteínas Mitocondriales/antagonistas & inhibidores , Pirazoles/química , Pirimidinonas/química , Transaminasas/antagonistas & inhibidores , Adipocitos/efectos de los fármacos , Adipocitos/enzimología , Animales , Cristalografía por Rayos X , Humanos , Isoleucina/sangre , Leucina/sangre , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Modelos Moleculares , Pirazoles/síntesis química , Pirazoles/farmacología , Pirimidinonas/síntesis química , Pirimidinonas/farmacología , Relación Estructura-Actividad , Transaminasas/química , Valina/sangre
13.
J Comput Aided Mol Des ; 27(4): 321-36, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23615761

RESUMEN

We describe the QSAR Workbench, a system for the building and analysis of QSAR models. The system is built around the Pipeline Pilot workflow tool and provides access to a variety of model building algorithms for both continuous and categorical data. Traditionally models are built on a one by one basis and fully exploring the model space of algorithms and descriptor subsets is a time consuming basis. The QSAR Workbench provides a framework to allow for multiple models to be built over a number of modeling algorithms, descriptor combinations and data splits (training and test sets). Methods to analyze and compare models are provided, enabling the user to select the most appropriate model. The Workbench provides a consistent set of routines for data preparation and chemistry normalization that are also applied for predictions. The Workbench provides a large degree of automation with the ability to publish preconfigured model building workflows for a variety of problem domains, whilst providing experienced users full access to the underlying parameterization if required. Methods are provided to allow for publication of selected models as web services, thus providing integration with the chemistry desktop. We describe the design and implementation of the QSAR Workbench and demonstrate its utility through application to two public domain datasets.


Asunto(s)
Diseño de Fármacos , Modelos Biológicos , Relación Estructura-Actividad Cuantitativa , Algoritmos , Bases de Datos Farmacéuticas , Humanos , Flujo de Trabajo
14.
ACS Med Chem Lett ; 2(1): 28-33, 2011 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-24900251

RESUMEN

Traditional lead optimization projects involve long synthesis and testing cycles, favoring extensive structure-activity relationship (SAR) analysis and molecular design steps, in an attempt to limit the number of cycles that a project must run to optimize a development candidate. Microfluidic-based chemistry and biology platforms, with cycle times of minutes rather than weeks, lend themselves to unattended autonomous operation. The bottleneck in the lead optimization process is therefore shifted from synthesis or test to SAR analysis and design. As such, the way is open to an algorithm-directed process, without the need for detailed user data analysis. Here, we present results of two synthesis and screening experiments, undertaken using traditional methodology, to validate a genetic algorithm optimization process for future application to a microfluidic system. The algorithm has several novel features that are important for the intended application. For example, it is robust to missing data and can suggest compounds for retest to ensure reliability of optimization. The algorithm is first validated on a retrospective analysis of an in-house library embedded in a larger virtual array of presumed inactive compounds. In a second, prospective experiment with MMP-12 as the target protein, 140 compounds are submitted for synthesis over 10 cycles of optimization. Comparison is made to the results from the full combinatorial library that was synthesized manually and tested independently. The results show that compounds selected by the algorithm are heavily biased toward the more active regions of the library, while the algorithm is robust to both missing data (compounds where synthesis failed) and inactive compounds. This publication places the full combinatorial library and biological data into the public domain with the intention of advancing research into algorithm-directed lead optimization methods.

15.
Drug Discov Today ; 16(3-4): 164-71, 2011 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-21129497

RESUMEN

The impact of carboaromatic, heteroaromatic, carboaliphatic and heteroaliphatic ring counts and fused aromatic ring count on several developability measures (solubility, lipophilicity, protein binding, P450 inhibition and hERG binding) is the topic for this review article. Recent results indicate that increasing ring counts have detrimental effects on developability in the order carboaromatics≫heteroaromatics>carboaliphatics>heteroaliphatics, with heteroaliphatics exerting a beneficial effect in many cases. Increasing aromatic ring count exerts effects on several developability parameters that are lipophilicity- and size-independent, and fused aromatic systems have a beneficial effect relative to their nonfused counterparts. Increasing aromatic ring count has a detrimental effect on human bioavailability parameters, and heteroaromatic ring count (but not other ring counts) has increased over time in marketed oral drugs.


Asunto(s)
Diseño de Fármacos , Compuestos Heterocíclicos/química , Hidrocarburos Aromáticos/química , Preparaciones Farmacéuticas/química , Administración Oral , Compuestos Heterocíclicos/síntesis química , Humanos , Hidrocarburos Aromáticos/síntesis química , Mercadotecnía/estadística & datos numéricos , Preparaciones Farmacéuticas/administración & dosificación , Preparaciones Farmacéuticas/síntesis química , Farmacocinética , Solubilidad , Relación Estructura-Actividad
16.
J Chem Inf Model ; 50(10): 1872-86, 2010 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-20873842

RESUMEN

Previous studies of the analysis of molecular matched pairs (MMPs) have often assumed that the effect of a substructural transformation on a molecular property is independent of the context (i.e., the local structural environment in which that transformation occurs). Experiments with large sets of hERG, solubility, and lipophilicity data demonstrate that the inclusion of contextual information can enhance the predictive power of MMP analyses, with significant trends (both positive and negative) being identified that are not apparent when using conventional, context-independent approaches.


Asunto(s)
Diseño de Fármacos , Canales de Potasio Éter-A-Go-Go/antagonistas & inhibidores , Canales de Potasio Éter-A-Go-Go/metabolismo , Algoritmos , Bases de Datos Factuales , Canales de Potasio Éter-A-Go-Go/química , Humanos , Ligandos , Lípidos/química , Estructura Molecular , Solubilidad
17.
J Chem Inf Model ; 49(2): 195-208, 2009 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-19434823

RESUMEN

Neighborhood behavior describes the extent to which small structural changes defined by a molecular descriptor are likely to lead to small property changes. This study evaluates two methods for the quantification of neighborhood behavior: the optimal diagonal method of Patterson et al. and the optimality criterion method of Horvath and Jeandenans. The methods are evaluated using twelve different types of fingerprint (both 2D and 3D) with screening data derived from several lead optimization projects at GlaxoSmithKline. The principal focus of the work is the design of chemical arrays during lead optimization, and the study hence considers not only biological activity but also important drug properties such as metabolic stability, permeability, and lipophilicity. Evidence is provided to suggest that the optimality criterion method may provide a better quantitative description of neighborhood behavior than the optimal diagonal method.


Asunto(s)
Diseño de Fármacos , Permeabilidad
18.
J Chem Inf Model ; 48(8): 1543-57, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18630899

RESUMEN

A new machine learning method is presented for extracting interpretable structure-activity relationships from screening data. The method is based on an evolutionary algorithm and reduced graphs and aims to evolve a reduced graph query (subgraph) that is present within the active compounds and absent from the inactives. The reduced graph representation enables heterogeneous compounds, such as those found in high-throughput screening data, to be captured in a single representation with the resulting query encoding structure-activity information in a form that is readily interpretable by a chemist. The application of the method is illustrated using data sets extracted from the well-known MDDR data set and GSK in-house screening data. Queries are evolved that are consistent with the known SARs, and they are also shown to be robust when applied to independent sets that were not used in training.


Asunto(s)
Técnicas Químicas Combinatorias/métodos , Algoritmos , Cromosomas/genética , Humanos , Fenotipo , Receptor de Serotonina 5-HT1A/metabolismo , Agonistas del Receptor de Serotonina 5-HT1 , Relación Estructura-Actividad
19.
J Chem Inf Model ; 48(8): 1558-70, 2008 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-18637673

RESUMEN

A multiobjective evolutionary algorithm (MOEA) is described for evolving multiple structure-activity relationships (SARs). The SARs are encoded in easy-to-interpret reduced graph queries which describe features that are preferentially present in active compounds compared to inactives. The MOEA addresses a limitation associated with many machine learning methods; that is, the inherent tradeoff that exists in recall and precision which is usually handled by combining the two objectives into a single measure with a consequent loss of control. By simultaneously optimizing recall and precision, the MOEA generates a family of SARs that lie on the precision-recall (PR) curve. The user is then able to select a query with an appropriate balance in the two objectives: for example, a low recall-high precision query may be preferred when establishing the SAR, whereas a high recall-low precision query may be more appropriate in a virtual screening context. Each query on the PR curve aims at capturing the structure-activity information into a single representation, and each can be considered as an alternative (equally valid) solution. We then investigate combining individual queries into teams with the aim of capturing multiple SARs that may exist in a data set, for example, as is commonly seen in high-throughput screening data sets. Team formation is carried out iteratively as a postprocessing step following the evolution of the individual queries. The inclusion of uniqueness as a third objective within the MOEA provides an effective way of ensuring the queries are complementary in the active compounds they describe. Substantial improvements in both recall and precision are seen for some data sets. Furthermore, the resulting queries provide more detailed structure-activity information than is present in a single query.


Asunto(s)
Modelos Biológicos , Algoritmos , Humanos , Estructura Molecular , Receptores de Serotonina 5-HT1/metabolismo , Agonistas del Receptor de Serotonina 5-HT1 , Relación Estructura-Actividad
20.
J Chem Inf Model ; 47(1): 219-27, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17238267

RESUMEN

We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.


Asunto(s)
Modelos Estadísticos , Relación Estructura-Actividad Cuantitativa , Algoritmos , Inteligencia Artificial , Clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...