Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 2024 Sep 24.
Artículo en Inglés | MEDLINE | ID: mdl-39314089

RESUMEN

We have analyzed 40 different databases ranging in size from a few thousand to nearly 100 million molecules, comprising a total of over 210 million structures, for their tautomeric conflicts. A tautomeric conflict is defined as an occurrence of two or more structures within a data set identified by the tautomeric rules applied as being tautomers of each other. We tested a total of 119 detailed tautomeric transform rules expressed as SMIRKS, out of which 79 yielded at least one conflict. These transformations include three types of tautomerism: prototropic, ring-chain, and valence tautomerism. The databases analyzed spanned a wide variety of types including large aggregating databases, drug collections, and structure collections based on experimental data. All databases analyzed showed intra-database tautomeric conflicts. The conflict rates as percentage of the database were typically in the few tenths of a percent range, which for the largest databases amounts to >100,000 cases per database.

2.
J Comput Aided Mol Des ; 38(1): 22, 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38753096

RESUMEN

Although the size of virtual libraries of synthesizable compounds is growing rapidly, we are still enumerating only tiny fractions of the drug-like chemical universe. Our capability to mine these newly generated libraries also lags their growth. That is why fragment-based approaches that utilize on-demand virtual combinatorial libraries are gaining popularity in drug discovery. These à la carte libraries utilize synthetic blocks found to be effective binders in parts of target protein pockets and a variety of reliable chemistries to connect them. There is, however, no data on the potential impact of the chemistries used for making on-demand libraries on the hit rates during virtual screening. There are also no rules to guide in the selection of these synthetic methods for production of custom libraries. We have used the SAVI (Synthetically Accessible Virtual Inventory) library, constructed using 53 reliable reaction types (transforms), to evaluate the impact of these chemistries on docking hit rates for 40 well-characterized protein pockets. The data shows that the virtual hit rates differ significantly for different chemistries with cross coupling reactions such as Sonogashira, Suzuki-Miyaura, Hiyama and Liebeskind-Srogl coupling producing the highest hit rates. Virtual hit rates appear to depend not only on the property of the formed chemical bond but also on the diversity of available building blocks and the scope of the reaction. The data identifies reactions that deserve wider use through increasing the number of corresponding building blocks and suggests the reactions that are more effective for pockets with certain physical and hydrogen bond-forming properties.


Asunto(s)
Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas , Bibliotecas de Moléculas Pequeñas , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Proteínas/química , Proteínas/metabolismo , Sitios de Unión , Descubrimiento de Drogas/métodos , Ligandos , Diseño de Fármacos , Humanos
3.
J Am Chem Soc ; 144(11): 4925-4941, 2022 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-35282679

RESUMEN

Germline antibodies, the initial set of antibodies produced by the immune system, are critical for host defense, and information about their binding properties can be useful for designing vaccines, understanding the origins of autoantibodies, and developing monoclonal antibodies. Numerous studies have found that germline antibodies are polyreactive with malleable, flexible binding pockets. While insightful, it remains unclear how broadly this model applies, as there are many families of antibodies that have not yet been studied. In addition, the methods used to obtain germline antibodies typically rely on assumptions and do not work well for many antibodies. Herein, we present a distinct approach for isolating germline antibodies that involves immunizing activation-induced cytidine deaminase (AID) knockout mice. This strategy amplifies antigen-specific B cells, but somatic hypermutation does not occur because AID is absent. Using synthetic haptens, glycoproteins, and whole cells, we obtained germline antibodies to an assortment of clinically important tumor-associated carbohydrate antigens, including Lewis Y, the Tn antigen, sialyl Lewis C, and Lewis X (CD15/SSEA-1). Through glycan microarray profiling and cell binding, we demonstrate that all but one of these germline antibodies had high selectivity for their glycan targets. Using molecular dynamics simulations, we provide insights into the structural basis of glycan recognition. The results have important implications for designing carbohydrate-based vaccines, developing anti-glycan monoclonal antibodies, and understanding antibody evolution within the immune system.


Asunto(s)
Anticuerpos Monoclonales , Antígenos de Carbohidratos Asociados a Tumores , Animales , Anticuerpos Monoclonales/química , Biomarcadores de Tumor , Carbohidratos , Células Germinativas , Ratones , Ratones Noqueados , Polisacáridos/química
4.
J Chem Inf Model ; 62(9): 2021-2034, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-35421301

RESUMEN

Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.


Asunto(s)
Descubrimiento de Drogas , Bibliotecas de Moléculas Pequeñas , Diseño de Fármacos , Descubrimiento de Drogas/métodos , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología
5.
J Chem Inf Model ; 61(2): 653-663, 2021 02 22.
Artículo en Inglés | MEDLINE | ID: mdl-33533614

RESUMEN

Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.


Asunto(s)
Aprendizaje Profundo , Consenso , Bases de Datos Factuales , Redes Neurales de la Computación
6.
J Chem Inf Model ; 60(3): 1090-1100, 2020 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-32027495

RESUMEN

We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring-chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.


Asunto(s)
Isomerismo , Bases de Datos Factuales , Espectroscopía de Resonancia Magnética
7.
J Chem Inf Model ; 60(7): 3336-3341, 2020 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-32539385

RESUMEN

We have adopted and extended the CHMTRN language and used it for the knowledge base of a computer program to generate a large database of synthetically accessible, drug-like chemical structures, the Synthetically Accessible Virtual Inventory (SAVI) Database. CHMTRN is a powerful language originally developed in the LHASA (Logic and Heuristics Applied to Synthetic Analysis) project at Harvard University and used together with the chemical pattern description language, PATRAN, to describe chemical retro-reactions. The languages have proven to be useful beyond the design of retrosynthetic routes and have the potential for much wider use in chemistry; this paper describes CHMTRN and PATRAN as now reimplemented for the forward-synthetic SAVI project but able to describe both forward and retro-reactions.


Asunto(s)
Técnicas Químicas Combinatorias , Programas Informáticos , Bases de Datos Factuales , Humanos
8.
J Chem Inf Model ; 60(3): 1253-1275, 2020 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-32043883

RESUMEN

We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism, 21 for ring-chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto-enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules' enumerated tautomer sets by InChI V.1.05, both in InChI's Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.


Asunto(s)
Quimioinformática , Bases de Datos Factuales
9.
Molecules ; 25(23)2020 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-33276504

RESUMEN

Due to its antiangiogenic and anti-immunomodulatory activity, thalidomide continues to be of clinical interest despite its teratogenic actions, and efforts to synthesize safer, clinically active thalidomide analogs are continually underway. In this study, a cohort of 27 chemically diverse thalidomide analogs was evaluated for antiangiogenic activity in an ex vivo rat aorta ring assay. The protein cereblon has been identified as the target for thalidomide, and in silico pharmacophore analysis and molecular docking with a crystal structure of human cereblon were used to investigate the cereblon binding abilities of the thalidomide analogs. The results suggest that not all antiangiogenic thalidomide analogs can bind cereblon, and multiple targets and mechanisms of action may be involved.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/metabolismo , Inhibidores de la Angiogénesis/farmacología , Aorta/efectos de los fármacos , Simulación del Acoplamiento Molecular , Neovascularización Fisiológica/efectos de los fármacos , Talidomida/análogos & derivados , Talidomida/farmacología , Ubiquitina-Proteína Ligasas/metabolismo , Inhibidores de la Angiogénesis/química , Animales , Simulación por Computador , Humanos , Masculino , Ratas , Ratas Sprague-Dawley
10.
J Chem Inf Model ; 59(9): 3635-3644, 2019 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-31453694

RESUMEN

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.


Asunto(s)
Minería de Datos/métodos , Descubrimiento de Drogas/métodos , Bases de Datos Factuales , Infecciones por VIH/tratamiento farmacológico , Transcriptasa Inversa del VIH/antagonistas & inhibidores , VIH-1/efectos de los fármacos , VIH-1/enzimología , Humanos , PubMed , Inhibidores de la Transcriptasa Inversa/farmacología
11.
Molecules ; 25(1)2019 Dec 25.
Artículo en Inglés | MEDLINE | ID: mdl-31881687

RESUMEN

Despite the achievements of antiretroviral therapy, discovery of new anti-HIV medicines remains an essential task because the existing drugs do not provide a complete cure for the infected patients, exhibit severe adverse effects, and lead to the appearance of resistant strains. To predict the interaction of drug-like compounds with multiple targets for HIV treatment, ligand-based drug design approach is widely applied. In this study, we evaluated the possibilities and limitations of (Q)SAR analysis aimed at the discovery of novel antiretroviral agents inhibiting the vital HIV enzymes. Local (Q)SAR models are based on the analysis of structure-activity relationships for molecules from the same chemical class, which significantly restrict their applicability domain. In contrast, global (Q)SAR models exploit data from heterogeneous sets of drug-like compounds, which allows their application to databases containing diverse structures. We compared the information for HIV-1 integrase, protease and reverse transcriptase inhibitors available in the EBI ChEMBL, NIAID HIV/OI/TB Therapeutics, and Clarivate Analytics Integrity databases as the sources for (Q)SAR training sets. Using the PASS and GUSAR software, we developed and validated a variety of (Q)SAR models, which can be further used for virtual screening of new antiretrovirals in the SAVI library. The developed models are implemented in the freely available web resource AntiHIV-Pred.


Asunto(s)
Fármacos Anti-VIH/farmacología , VIH-1/metabolismo , Relación Estructura-Actividad Cuantitativa , Proteínas Virales/antagonistas & inhibidores , Fármacos Anti-VIH/química , Bases de Datos como Asunto , VIH-1/efectos de los fármacos , Humanos , Concentración 50 Inhibidora , Análisis de Regresión , Reproducibilidad de los Resultados , Proteínas Virales/metabolismo
12.
J Mol Recognit ; 30(8)2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28233410

RESUMEN

In this review, we address a fundamental question: What is the range of conformational energies seen in ligands in protein-ligand crystal structures? This value is important biophysically, for better understanding the protein-ligand binding process; and practically, for providing a parameter to be used in many computational drug design methods such as docking and pharmacophore searches. We synthesize a selection of previously reported conflicting results from computational studies of this issue and conclude that high ligand conformational energies really are present in some crystal structures. The main source of disagreement between different analyses appears to be due to divergent treatments of electrostatics and solvation. At the same time, however, for many ligands, a high conformational energy is in error, due to either crystal structure inaccuracies or incorrect determination of the reference state. Aside from simple chemistry mistakes, we argue that crystal structure error may mainly be because of the heuristic weighting of ligand stereochemical restraints relative to the fit of the structure to the electron density. This problem cannot be fixed with improvements to electron density fitting or with simple ligand geometry checks, though better metrics are needed for evaluating ligand and binding site chemistry in addition to geometry during structure refinement. The ultimate solution for accurately determining ligand conformational energies lies in ultrahigh-resolution crystal structures that can be refined without restraints.


Asunto(s)
Conformación Proteica , Proteínas/química , Termodinámica , Animales , Sitios de Unión , Cristalografía por Rayos X , Diseño de Fármacos , Humanos , Ligandos , Simulación del Acoplamiento Molecular , Unión Proteica , Proteínas/agonistas , Proteínas/antagonistas & inhibidores , Solubilidad , Electricidad Estática
13.
J Chem Inf Model ; 62(9): 2009-2010, 2022 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-35527682

Asunto(s)
Informática
14.
Mol Pharm ; 13(2): 545-56, 2016 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-26669717

RESUMEN

Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the U.S. with more than 100,000 deaths per year. As up to 30% of all ADRs are believed to be caused by drug-drug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1485, 2628, 4371, and 27,966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and 3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these data sets, we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: quantitative neighborhoods of atoms (QNA) and simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) were utilized to build QSAR models predicting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72-79% for the external test sets with a coverage of 81.36-100% when a conservative threshold for the model's applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database.


Asunto(s)
Algoritmos , Sistema Enzimático del Citocromo P-450/química , Sistema Enzimático del Citocromo P-450/metabolismo , Interacciones Farmacológicas , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Bases de Datos Factuales , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Modelos Biológicos
15.
J Chem Inf Model ; 56(11): 2149-2161, 2016 11 28.
Artículo en Inglés | MEDLINE | ID: mdl-27669079

RESUMEN

We investigated how many cases of the same chemical sold as different products (at possibly different prices) occurred in a prototypical large aggregated database and simultaneously tested the tautomerism definitions in the chemoinformatics toolkit CACTVS. We applied the standard CACTVS tautomeric transforms plus a set of recently developed ring-chain transforms to the Aldrich Market Select (AMS) database of 6 million screening samples and building blocks. In 30 000 cases, two or more AMS products were found to be just different tautomeric forms of the same compound. We purchased and analyzed 166 such tautomer pairs and triplets by 1H and 13C NMR to determine whether the CACTVS transforms accurately predicted what is the same "stuff in the bottle". Essentially all prototropic transforms with examples in the AMS were confirmed. Some of the ring-chain transforms were found to be too "aggressive", i.e. to equate structures with one another that were different compounds.


Asunto(s)
Bases de Datos Factuales , Informática/métodos , Compuestos Orgánicos/química , Bases de Datos Factuales/economía , Isomerismo
16.
J Org Chem ; 80(20): 9900-9, 2015 Oct 16.
Artículo en Inglés | MEDLINE | ID: mdl-26372257

RESUMEN

Warfarin, an important anticoagulant drug, can exist in solution in 40 distinct tautomeric forms through both prototropic tautomerism and ring-chain tautomerism. We have investigated all warfarin tautomers with computational and NMR approaches. Relative energies calculated at the B3LYP/6-311G++(d,p) level of theory indicate that the 4-hydroxycoumarin cyclic hemiketal tautomer is the most stable tautomer in aqueous solution, followed by the 4-hydroxycoumarin open-chain tautomer. This is in agreement with our NMR experiments where the spectral assignments indicate that warfarin exists mainly as a mixture of cyclic hemiketal diastereomers, with an open-chain tautomer as a minor component. We present a diagram of the interconversion of warfarin created taking into account the calculated equilibrium constants (pK(T)) for all tautomeric reactions. These findings help with gaining further understanding of proton transfer and ring closure tautomerization processes. We also discuss the results in the context of chemoinformatics rules for handling tautomerism.


Asunto(s)
Anticoagulantes/química , Simulación de Dinámica Molecular , Teoría Cuántica , Warfarina/química , Espectroscopía de Resonancia Magnética , Estructura Molecular , Estereoisomerismo
17.
J Chem Inf Model ; 55(7): 1388-99, 2015 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-26046311

RESUMEN

Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.


Asunto(s)
Bases de Datos Farmacéuticas , Transcriptasa Inversa del VIH/antagonistas & inhibidores , VIH-1/enzimología , Modelos Estadísticos , Relación Estructura-Actividad Cuantitativa , Inhibidores de la Transcriptasa Inversa/química , Inhibidores de la Transcriptasa Inversa/farmacología , Algoritmos , Descubrimiento de Drogas , Farmacorresistencia Viral , VIH-1/efectos de los fármacos
18.
J Chem Inf Model ; 54(9): 2423-32, 2014 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-25158156

RESUMEN

A compound exhibits (prototropic) tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. When the movement of the proton is accompanied by the opening or closing of a ring it is called ring-chain tautomerism. This type of tautomerism is well observed in carbohydrates, but it also occurs in other molecules such as warfarin. In this work, we present an approach that allows for the generation of all ring-chain tautomers of a given chemical structure. Based on Baldwin's Rules estimating the likelihood of ring closure reactions to occur, we have defined a set of transform rules covering the majority of ring-chain tautomerism cases. The rules automatically detect substructures in a given compound that can undergo a ring-chain tautomeric transformation. Each transformation is encoded in SMIRKS line notation. All work was implemented in the chemoinformatics toolkit CACTVS. We report on the application of our ring-chain tautomerism rules to a large database of commercially available screening samples in order to identify ring-chain tautomers.


Asunto(s)
Conformación Molecular , Ciclización , Bases de Datos de Compuestos Químicos
19.
J Chem Inf Model ; 54(3): 705-12, 2014 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-24524735

RESUMEN

Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services ( http://cactus.nci.nih.gov/chemical/apps/cap).


Asunto(s)
Evaluación Preclínica de Medicamentos/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Relación Estructura-Actividad Cuantitativa , Bibliotecas de Moléculas Pequeñas/química , Bibliotecas de Moléculas Pequeñas/farmacología , Algoritmos , Bases de Datos de Compuestos Químicos , Células HEK293 , Humanos , Modelos Biológicos , Programas Informáticos
20.
J Chem Inf Model ; 54(3): 713-9, 2014 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-24451033

RESUMEN

We describe a novel approach to RBF approximation, which combines two new elements: (1) linear radial basis functions and (2) weighting the model by each descriptor's contribution. Linear radial basis functions allow one to achieve more accurate predictions for diverse data sets. Taking into account the contribution of each descriptor produces more accurate similarity values used for model development. The method was validated on 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. We also compared the new method with five different QSAR methods implemented in the EPA T.E.S.T. program. Our approach, implemented in the program GUSAR, showed a reasonable accuracy of prediction and high coverage for all external test sets, providing more accurate prediction results than the comparison methods and even the consensus of these methods. Using our new method, we have created models for physicochemical and toxicity endpoints, which we have made freely available in the form of an online service at http://cactus.nci.nih.gov/chemical/apps/cap.


Asunto(s)
Algoritmos , Modelos Biológicos , Relación Estructura-Actividad Cuantitativa , Programas Informáticos , Animales , Simulación por Computador , Cyprinidae/fisiología , Daphnia/efectos de los fármacos , Daphnia/fisiología , Bases de Datos Factuales , Internet , Redes Neurales de la Computación , Ratas , Tetrahymena/efectos de los fármacos , Tetrahymena/fisiología , Pruebas de Toxicidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA