Pesquisa | Biblioteca Virtual em Saúde Fiocruz

1.

Correlation of protein binding pocket properties with hits' chemistries used in generation of ultra-large virtual libraries.

Song, Robert X; Nicklaus, Marc C; Tarasova, Nadya I.

J Comput Aided Mol Des ; 38(1): 22, 2024 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-38753096

RESUMO

Although the size of virtual libraries of synthesizable compounds is growing rapidly, we are still enumerating only tiny fractions of the drug-like chemical universe. Our capability to mine these newly generated libraries also lags their growth. That is why fragment-based approaches that utilize on-demand virtual combinatorial libraries are gaining popularity in drug discovery. These à la carte libraries utilize synthetic blocks found to be effective binders in parts of target protein pockets and a variety of reliable chemistries to connect them. There is, however, no data on the potential impact of the chemistries used for making on-demand libraries on the hit rates during virtual screening. There are also no rules to guide in the selection of these synthetic methods for production of custom libraries. We have used the SAVI (Synthetically Accessible Virtual Inventory) library, constructed using 53 reliable reaction types (transforms), to evaluate the impact of these chemistries on docking hit rates for 40 well-characterized protein pockets. The data shows that the virtual hit rates differ significantly for different chemistries with cross coupling reactions such as Sonogashira, Suzuki-Miyaura, Hiyama and Liebeskind-Srogl coupling producing the highest hit rates. Virtual hit rates appear to depend not only on the property of the formed chemical bond but also on the diversity of available building blocks and the scope of the reaction. The data identifies reactions that deserve wider use through increasing the number of corresponding building blocks and suggests the reactions that are more effective for pockets with certain physical and hydrogen bond-forming properties.

Assuntos

Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas , Bibliotecas de Moléculas Pequenas , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Proteínas/química , Proteínas/metabolismo , Sítios de Ligação , Descoberta de Drogas/métodos , Ligantes , Desenho de Fármacos , Humanos

2.

Selective Recognition of Carbohydrate Antigens by Germline Antibodies Isolated from AID Knockout Mice.

DeLaitsch, Andrew T; Pridgen, Jacey R; Tytla, Avery; Peach, Megan L; Hu, Rayleen; Farnsworth, David W; McMillan, Aislinn K; Flanagan, Natalie; Temme, J Sebastian; Nicklaus, Marc C; Gildersleeve, Jeffrey C.

J Am Chem Soc ; 144(11): 4925-4941, 2022 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-35282679

RESUMO

Germline antibodies, the initial set of antibodies produced by the immune system, are critical for host defense, and information about their binding properties can be useful for designing vaccines, understanding the origins of autoantibodies, and developing monoclonal antibodies. Numerous studies have found that germline antibodies are polyreactive with malleable, flexible binding pockets. While insightful, it remains unclear how broadly this model applies, as there are many families of antibodies that have not yet been studied. In addition, the methods used to obtain germline antibodies typically rely on assumptions and do not work well for many antibodies. Herein, we present a distinct approach for isolating germline antibodies that involves immunizing activation-induced cytidine deaminase (AID) knockout mice. This strategy amplifies antigen-specific B cells, but somatic hypermutation does not occur because AID is absent. Using synthetic haptens, glycoproteins, and whole cells, we obtained germline antibodies to an assortment of clinically important tumor-associated carbohydrate antigens, including Lewis Y, the Tn antigen, sialyl Lewis C, and Lewis X (CD15/SSEA-1). Through glycan microarray profiling and cell binding, we demonstrate that all but one of these germline antibodies had high selectivity for their glycan targets. Using molecular dynamics simulations, we provide insights into the structural basis of glycan recognition. The results have important implications for designing carbohydrate-based vaccines, developing anti-glycan monoclonal antibodies, and understanding antibody evolution within the immune system.

Assuntos

Anticorpos Monoclonais , Antígenos Glicosídicos Associados a Tumores , Animais , Anticorpos Monoclonais/química , Biomarcadores Tumorais , Carboidratos , Células Germinativas , Camundongos , Camundongos Knockout , Polissacarídeos/química

3.

Exploration of Ultralarge Compound Collections for Drug Discovery.

Warr, Wendy A; Nicklaus, Marc C; Nicolaou, Christos A; Rarey, Matthias.

J Chem Inf Model ; 62(9): 2021-2034, 2022 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-35421301

RESUMO

Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.

Assuntos

Descoberta de Drogas , Bibliotecas de Moléculas Pequenas , Desenho de Fármacos , Descoberta de Drogas/métodos , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia

4.

Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods.

Jain, Sankalp; Siramshetty, Vishal B; Alves, Vinicius M; Muratov, Eugene N; Kleinstreuer, Nicole; Tropsha, Alexander; Nicklaus, Marc C; Simeonov, Anton; Zakharov, Alexey V.

J Chem Inf Model ; 61(2): 653-663, 2021 02 22.

Artigo em Inglês | MEDLINE | ID: mdl-33533614

RESUMO

Computational methods to predict molecular properties regarding safety and toxicology represent alternative approaches to expedite drug development, screen environmental chemicals, and thus significantly reduce associated time and costs. There is a strong need and interest in the development of computational methods that yield reliable predictions of toxicity, and many approaches, including the recently introduced deep neural networks, have been leveraged towards this goal. Herein, we report on the collection, curation, and integration of data from the public data sets that were the source of the ChemIDplus database for systemic acute toxicity. These efforts generated the largest publicly available such data set comprising > 80,000 compounds measured against a total of 59 acute systemic toxicity end points. This data was used for developing multiple single- and multitask models utilizing random forest, deep neural networks, convolutional, and graph convolutional neural network approaches. For the first time, we also reported the consensus models based on different multitask approaches. To the best of our knowledge, prediction models for 36 of the 59 end points have never been published before. Furthermore, our results demonstrated a significantly better performance of the consensus model obtained from three multitask learning approaches that particularly predicted the 29 smaller tasks (less than 300 compounds) better than other models developed in the study. The curated data set and the developed models have been made publicly available at https://github.com/ncats/ld50-multitask, https://predictor.ncats.io/, and https://cactus.nci.nih.gov/download/acute-toxicity-db (data set only) to support regulatory and research applications.

Assuntos

Aprendizado Profundo , Consenso , Bases de Dados Factuais , Redes Neurais de Computação

5.

Tautomer Database: A Comprehensive Resource for Tautomerism Analyses.

Dhaked, Devendra K; Guasch, Laura; Nicklaus, Marc C.

J Chem Inf Model ; 60(3): 1090-1100, 2020 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-32027495

RESUMO

We report a database of tautomeric structures that contains 2819 tautomeric tuples extracted from 171 publications. Each tautomeric entry has been annotated with experimental conditions reported in the respective publication, plus bibliographic details, structural identifiers (e.g., NCI/CADD identifiers FICTS, FICuS, uuuuu, and Standard InChI), and chemical information (e.g., SMILES, molecular weight). The majority of tautomeric tuples found were pairs; the remaining 10% were triples, quadruples, or quintuples, amounting to a total number of structures of 5977. The types of tautomerism were mainly prototropic tautomerism (79%), followed by ring-chain (13%) and valence tautomerism (8%). The experimental conditions reported in the publications included about 50 pure solvents and 9 solvent mixtures with 26 unique spectroscopic or nonspectroscopic methods. 1H and 13C NMR were the most frequently used methods. A total of 77 different tautomeric transform rules (SMIRKS) are covered by at least one example tuple in the database. This database is freely available as a spreadsheet at https://cactus.nci.nih.gov/download/tautomer/.

Assuntos

Isomerismo , Bases de Dados Factuais , Espectroscopia de Ressonância Magnética

6.

Adapting CHMTRN (CHeMistry TRaNslator) for a New Use.

Judson, Philip N; Ihlenfeldt, Wolf-Dietrich; Patel, Hitesh; Delannée, Victorien; Tarasova, Nadya; Nicklaus, Marc C.

J Chem Inf Model ; 60(7): 3336-3341, 2020 07 27.

Artigo em Inglês | MEDLINE | ID: mdl-32539385

RESUMO

We have adopted and extended the CHMTRN language and used it for the knowledge base of a computer program to generate a large database of synthetically accessible, drug-like chemical structures, the Synthetically Accessible Virtual Inventory (SAVI) Database. CHMTRN is a powerful language originally developed in the LHASA (Logic and Heuristics Applied to Synthetic Analysis) project at Harvard University and used together with the chemical pattern description language, PATRAN, to describe chemical retro-reactions. The languages have proven to be useful beyond the design of retrosynthetic routes and have the potential for much wider use in chemistry; this paper describes CHMTRN and PATRAN as now reimplemented for the forward-synthetic SAVI project but able to describe both forward and retro-reactions.

Assuntos

Técnicas de Química Combinatória , Software , Bases de Dados Factuais , Humanos

7.

Toward a Comprehensive Treatment of Tautomerism in Chemoinformatics Including in InChI V2.

Dhaked, Devendra K; Ihlenfeldt, Wolf-Dietrich; Patel, Hitesh; Delannée, Victorien; Nicklaus, Marc C.

J Chem Inf Model ; 60(3): 1253-1275, 2020 03 23.

Artigo em Inglês | MEDLINE | ID: mdl-32043883

RESUMO

We have collected 86 different transforms of tautomeric interconversions. Out of those, 54 are for prototropic (non-ring-chain) tautomerism, 21 for ring-chain tautomerism, and 11 for valence tautomerism. The majority of these rules have been extracted from experimental literature. Twenty rules, covering the most well-known types of tautomerism such as keto-enol tautomerism, were taken from the default handling of tautomerism by the chemoinformatics toolkit CACTVS. The rules were analyzed against nine differerent databases totaling over 400 million (non-unique) structures as to their occurrence rates, mutual overlap in coverage, and recapitulation of the rules' enumerated tautomer sets by InChI V.1.05, both in InChI's Standard and a Nonstandard version with the increased tautomer-handling options 15T and KET turned on. These results and the background of this study are discussed in the context of the IUPAC InChI Project tasked with the redesign of handling of tautomerism for an InChI version 2. Applying the rules presented in this paper would approximately triple the number of compounds in typical small-molecule databases that would be affected by tautomeric interconversion by InChI V2. A web tool has been created to test these rules at https://cactus.nci.nih.gov/tautomerizer.

Assuntos

Quimioinformática , Bases de Dados Factuais

8.

Antiangiogenic Activity and in Silico Cereblon Binding Analysis of Novel Thalidomide Analogs.

Peach, Megan L; Beedie, Shaunna L; Chau, Cindy H; Collins, Matthew K; Markolovic, Suzana; Luo, Weiming; Tweedie, David; Steinebach, Christian; Greig, Nigel H; Gütschow, Michael; Vargesson, Neil; Nicklaus, Marc C; Figg, William D.

Molecules ; 25(23)2020 Dec 02.

Artigo em Inglês | MEDLINE | ID: mdl-33276504

RESUMO

Due to its antiangiogenic and anti-immunomodulatory activity, thalidomide continues to be of clinical interest despite its teratogenic actions, and efforts to synthesize safer, clinically active thalidomide analogs are continually underway. In this study, a cohort of 27 chemically diverse thalidomide analogs was evaluated for antiangiogenic activity in an ex vivo rat aorta ring assay. The protein cereblon has been identified as the target for thalidomide, and in silico pharmacophore analysis and molecular docking with a crystal structure of human cereblon were used to investigate the cereblon binding abilities of the thalidomide analogs. The results suggest that not all antiangiogenic thalidomide analogs can bind cereblon, and multiple targets and mechanisms of action may be involved.

Assuntos

Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Inibidores da Angiogênese/farmacologia , Aorta/efeitos dos fármacos , Simulação de Acoplamento Molecular , Neovascularização Fisiológica/efeitos dos fármacos , Talidomida/análogos & derivados , Talidomida/farmacologia , Ubiquitina-Proteína Ligases/metabolismo , Inibidores da Angiogênese/química , Animais , Simulação por Computador , Humanos , Masculino , Ratos , Ratos Sprague-Dawley

9.

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.

Tarasova, Olga A; Biziukova, Nadezhda Yu; Filimonov, Dmitry A; Poroikov, Vladimir V; Nicklaus, Marc C.

J Chem Inf Model ; 59(9): 3635-3644, 2019 09 23.

Artigo em Inglês | MEDLINE | ID: mdl-31453694

RESUMO

A lot of high quality data on the biological activity of chemical compounds are required throughout the whole drug discovery process: from development of computational models of the structure-activity relationship to experimental testing of lead compounds and their validation in clinics. Currently, a large amount of such data is available from databases, scientific publications, and patents. Biological data are characterized by incompleteness, uncertainty, and low reproducibility. Despite the existence of free and commercially available databases of biological activities of compounds, they usually lack unambiguous information about peculiarities of biological assays. On the other hand, scientific papers are the primary source of new data disclosed to the scientific community for the first time. In this study, we have developed and validated a data-mining approach for extraction of text fragments containing description of bioassays. We have used this approach to evaluate compounds and their biological activity reported in scientific publications. We have found that categorization of papers into relevant and irrelevant may be performed based on the machine-learning analysis of the abstracts. Text fragments extracted from the full texts of publications allow their further partitioning into several classes according to the peculiarities of bioassays. We demonstrate the applicability of our approach to the comparison of the endpoint values of biological activity and cytotoxicity of reference compounds.

Assuntos

Mineração de Dados/métodos , Descoberta de Drogas/métodos , Bases de Dados Factuais , Infecções por HIV/tratamento farmacológico , Transcriptase Reversa do HIV/antagonistas & inibidores , HIV-1/efeitos dos fármacos , HIV-1/enzimologia , Humanos , PubMed , Inibidores da Transcriptase Reversa/farmacologia

10.

(Q)SAR Models of HIV-1 Protein Inhibition by Drug-Like Compounds.

Stolbov, Leonid A; Druzhilovskiy, Dmitry S; Filimonov, Dmitry A; Nicklaus, Marc C; Poroikov, Vladimir V.

Molecules ; 25(1)2019 Dec 25.

Artigo em Inglês | MEDLINE | ID: mdl-31881687

RESUMO

Despite the achievements of antiretroviral therapy, discovery of new anti-HIV medicines remains an essential task because the existing drugs do not provide a complete cure for the infected patients, exhibit severe adverse effects, and lead to the appearance of resistant strains. To predict the interaction of drug-like compounds with multiple targets for HIV treatment, ligand-based drug design approach is widely applied. In this study, we evaluated the possibilities and limitations of (Q)SAR analysis aimed at the discovery of novel antiretroviral agents inhibiting the vital HIV enzymes. Local (Q)SAR models are based on the analysis of structure-activity relationships for molecules from the same chemical class, which significantly restrict their applicability domain. In contrast, global (Q)SAR models exploit data from heterogeneous sets of drug-like compounds, which allows their application to databases containing diverse structures. We compared the information for HIV-1 integrase, protease and reverse transcriptase inhibitors available in the EBI ChEMBL, NIAID HIV/OI/TB Therapeutics, and Clarivate Analytics Integrity databases as the sources for (Q)SAR training sets. Using the PASS and GUSAR software, we developed and validated a variety of (Q)SAR models, which can be further used for virtual screening of new antiretrovirals in the SAVI library. The developed models are implemented in the freely available web resource AntiHIV-Pred.

Assuntos

Fármacos Anti-HIV/farmacologia , HIV-1/metabolismo , Relação Quantitativa Estrutura-Atividade , Proteínas Virais/antagonistas & inibidores , Fármacos Anti-HIV/química , Bases de Dados como Assunto , HIV-1/efeitos dos fármacos , Humanos , Concentração Inibidora 50 , Análise de Regressão , Reprodutibilidade dos Testes , Proteínas Virais/metabolismo

11.

Conformational energy range of ligands in protein crystal structures: The difficult quest for accurate understanding.

Peach, Megan L; Cachau, Raul E; Nicklaus, Marc C.

J Mol Recognit ; 30(8)2017 08.

Artigo em Inglês | MEDLINE | ID: mdl-28233410

RESUMO

In this review, we address a fundamental question: What is the range of conformational energies seen in ligands in protein-ligand crystal structures? This value is important biophysically, for better understanding the protein-ligand binding process; and practically, for providing a parameter to be used in many computational drug design methods such as docking and pharmacophore searches. We synthesize a selection of previously reported conflicting results from computational studies of this issue and conclude that high ligand conformational energies really are present in some crystal structures. The main source of disagreement between different analyses appears to be due to divergent treatments of electrostatics and solvation. At the same time, however, for many ligands, a high conformational energy is in error, due to either crystal structure inaccuracies or incorrect determination of the reference state. Aside from simple chemistry mistakes, we argue that crystal structure error may mainly be because of the heuristic weighting of ligand stereochemical restraints relative to the fit of the structure to the electron density. This problem cannot be fixed with improvements to electron density fitting or with simple ligand geometry checks, though better metrics are needed for evaluating ligand and binding site chemistry in addition to geometry during structure refinement. The ultimate solution for accurately determining ligand conformational energies lies in ultrahigh-resolution crystal structures that can be refined without restraints.

Assuntos

Conformação Proteica , Proteínas/química , Termodinâmica , Animais , Sítios de Ligação , Cristalografia por Raios X , Desenho de Fármacos , Humanos , Ligantes , Simulação de Acoplamento Molecular , Ligação Proteica , Proteínas/agonistas , Proteínas/antagonistas & inibidores , Solubilidade , Eletricidade Estática

12.

Special Issue on Reaction Informatics and Chemical Space.

Rarey, Matthias; Nicklaus, Marc C; Warr, Wendy.

J Chem Inf Model ; 62(9): 2009-2010, 2022 05 09.

Artigo em Inglês | MEDLINE | ID: mdl-35527682

Assuntos

Informática

13.

QSAR Modeling and Prediction of Drug-Drug Interactions.

Zakharov, Alexey V; Varlamova, Ekaterina V; Lagunin, Alexey A; Dmitriev, Alexander V; Muratov, Eugene N; Fourches, Denis; Kuz'min, Victor E; Poroikov, Vladimir V; Tropsha, Alexander; Nicklaus, Marc C.

Mol Pharm ; 13(2): 545-56, 2016 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-26669717

RESUMO

Severe adverse drug reactions (ADRs) are the fourth leading cause of fatality in the U.S. with more than 100,000 deaths per year. As up to 30% of all ADRs are believed to be caused by drug-drug interactions (DDIs), typically mediated by cytochrome P450s, possibilities to predict DDIs from existing knowledge are important. We collected data from public sources on 1485, 2628, 4371, and 27,966 possible DDIs mediated by four cytochrome P450 isoforms 1A2, 2C9, 2D6, and 3A4 for 55, 73, 94, and 237 drugs, respectively. For each of these data sets, we developed and validated QSAR models for the prediction of DDIs. As a unique feature of our approach, the interacting drug pairs were represented as binary chemical mixtures in a 1:1 ratio. We used two types of chemical descriptors: quantitative neighborhoods of atoms (QNA) and simplex descriptors. Radial basis functions with self-consistent regression (RBF-SCR) and random forest (RF) were utilized to build QSAR models predicting the likelihood of DDIs for any pair of drug molecules. Our models showed balanced accuracy of 72-79% for the external test sets with a coverage of 81.36-100% when a conservative threshold for the model's applicability domain was applied. We generated virtually all possible binary combinations of marketed drugs and employed our models to identify drug pairs predicted to be instances of DDI. More than 4500 of these predicted DDIs that were not found in our training sets were confirmed by data from the DrugBank database.

Assuntos

Algoritmos , Sistema Enzimático do Citocromo P-450/química , Sistema Enzimático do Citocromo P-450/metabolismo , Interações Medicamentosas , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Humanos , Modelos Biológicos

14.

Experimental and Chemoinformatics Study of Tautomerism in a Database of Commercially Available Screening Samples.

Guasch, Laura; Yapamudiyansel, Waruna; Peach, Megan L; Kelley, James A; Barchi, Joseph J; Nicklaus, Marc C.

J Chem Inf Model ; 56(11): 2149-2161, 2016 11 28.

Artigo em Inglês | MEDLINE | ID: mdl-27669079

RESUMO

We investigated how many cases of the same chemical sold as different products (at possibly different prices) occurred in a prototypical large aggregated database and simultaneously tested the tautomerism definitions in the chemoinformatics toolkit CACTVS. We applied the standard CACTVS tautomeric transforms plus a set of recently developed ring-chain transforms to the Aldrich Market Select (AMS) database of 6 million screening samples and building blocks. In 30â¯000 cases, two or more AMS products were found to be just different tautomeric forms of the same compound. We purchased and analyzed 166 such tautomer pairs and triplets by 1H and 13C NMR to determine whether the CACTVS transforms accurately predicted what is the same "stuff in the bottle". Essentially all prototropic transforms with examples in the AMS were confirmed. Some of the ring-chain transforms were found to be too "aggressive", i.e. to equate structures with one another that were different compounds.

Assuntos

Bases de Dados Factuais , Informática/métodos , Compostos Orgânicos/química , Bases de Dados Factuais/economia , Isomerismo

15.

Tautomerism of Warfarin: Combined Chemoinformatics, Quantum Chemical, and NMR Investigation.

Guasch, Laura; Peach, Megan L; Nicklaus, Marc C.

J Org Chem ; 80(20): 9900-9, 2015 Oct 16.

Artigo em Inglês | MEDLINE | ID: mdl-26372257

RESUMO

Warfarin, an important anticoagulant drug, can exist in solution in 40 distinct tautomeric forms through both prototropic tautomerism and ring-chain tautomerism. We have investigated all warfarin tautomers with computational and NMR approaches. Relative energies calculated at the B3LYP/6-311G++(d,p) level of theory indicate that the 4-hydroxycoumarin cyclic hemiketal tautomer is the most stable tautomer in aqueous solution, followed by the 4-hydroxycoumarin open-chain tautomer. This is in agreement with our NMR experiments where the spectral assignments indicate that warfarin exists mainly as a mixture of cyclic hemiketal diastereomers, with an open-chain tautomer as a minor component. We present a diagram of the interconversion of warfarin created taking into account the calculated equilibrium constants (pK(T)) for all tautomeric reactions. These findings help with gaining further understanding of proton transfer and ring closure tautomerization processes. We also discuss the results in the context of chemoinformatics rules for handling tautomerism.

Assuntos

Anticoagulantes/química , Simulação de Dinâmica Molecular , Teoria Quântica , Varfarina/química , Espectroscopia de Ressonância Magnética , Estrutura Molecular , Estereoisomerismo

16.

QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.

Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V.

J Chem Inf Model ; 55(7): 1388-99, 2015 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-26046311

RESUMO

Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.

Assuntos

Bases de Dados de Produtos Farmacêuticos , Transcriptase Reversa do HIV/antagonistas & inibidores , HIV-1/enzimologia , Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade , Inibidores da Transcriptase Reversa/química , Inibidores da Transcriptase Reversa/farmacologia , Algoritmos , Descoberta de Drogas , Farmacorresistência Viral , HIV-1/efeitos dos fármacos

17.

Enumeration of ring-chain tautomers based on SMIRKS rules.

Guasch, Laura; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(9): 2423-32, 2014 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-25158156

RESUMO

A compound exhibits (prototropic) tautomerism if it can be represented by two or more structures that are related by a formal intramolecular movement of a hydrogen atom from one heavy atom position to another. When the movement of the proton is accompanied by the opening or closing of a ring it is called ring-chain tautomerism. This type of tautomerism is well observed in carbohydrates, but it also occurs in other molecules such as warfarin. In this work, we present an approach that allows for the generation of all ring-chain tautomers of a given chemical structure. Based on Baldwin's Rules estimating the likelihood of ring closure reactions to occur, we have defined a set of transform rules covering the majority of ring-chain tautomerism cases. The rules automatically detect substructures in a given compound that can undergo a ring-chain tautomeric transformation. Each transformation is encoded in SMIRKS line notation. All work was implemented in the chemoinformatics toolkit CACTVS. We report on the application of our ring-chain tautomerism rules to a large database of commercially available screening samples in order to identify ring-chain tautomers.

Assuntos

Conformação Molecular , Ciclização , Bases de Dados de Compostos Químicos

18.

QSAR modeling of imbalanced high-throughput screening data in PubChem.

Zakharov, Alexey V; Peach, Megan L; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(3): 705-12, 2014 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-24524735

RESUMO

Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services ( http://cactus.nci.nih.gov/chemical/apps/cap).

Assuntos

Avaliação Pré-Clínica de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Relação Quantitativa Estrutura-Atividade , Bibliotecas de Moléculas Pequenas/química , Bibliotecas de Moléculas Pequenas/farmacologia , Algoritmos , Bases de Dados de Compostos Químicos , Células HEK293 , Humanos , Modelos Biológicos , Software

19.

A new approach to radial basis function approximation and its application to QSAR.

Zakharov, Alexey V; Peach, Megan L; Sitzmann, Markus; Nicklaus, Marc C.

J Chem Inf Model ; 54(3): 713-9, 2014 Mar 24.

Artigo em Inglês | MEDLINE | ID: mdl-24451033

RESUMO

We describe a novel approach to RBF approximation, which combines two new elements: (1) linear radial basis functions and (2) weighting the model by each descriptor's contribution. Linear radial basis functions allow one to achieve more accurate predictions for diverse data sets. Taking into account the contribution of each descriptor produces more accurate similarity values used for model development. The method was validated on 14 public data sets comprising nine physicochemical properties and five toxicity endpoints. We also compared the new method with five different QSAR methods implemented in the EPA T.E.S.T. program. Our approach, implemented in the program GUSAR, showed a reasonable accuracy of prediction and high coverage for all external test sets, providing more accurate prediction results than the comparison methods and even the consensus of these methods. Using our new method, we have created models for physicochemical and toxicity endpoints, which we have made freely available in the form of an online service at http://cactus.nci.nih.gov/chemical/apps/cap.

Assuntos

Algoritmos , Modelos Biológicos , Relação Quantitativa Estrutura-Atividade , Software , Animais , Simulação por Computador , Cyprinidae/fisiologia , Daphnia/efeitos dos fármacos , Daphnia/fisiologia , Bases de Dados Factuais , Internet , Redes Neurais de Computação , Ratos , Tetrahymena/efeitos dos fármacos , Tetrahymena/fisiologia , Testes de Toxicidade

20.

Inhibitors for the hepatitis C virus RNA polymerase explored by SAR with advanced machine learning methods.

Weidlich, Iwona E; Filippov, Igor V; Brown, Jodian; Kaushik-Basu, Neerja; Krishnan, Ramalingam; Nicklaus, Marc C; Thorpe, Ian F.

Bioorg Med Chem ; 21(11): 3127-37, 2013 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-23608107

RESUMO

Hepatitis C virus (HCV) is a global health challenge, affecting approximately 200 million people worldwide. In this study we developed SAR models with advanced machine learning classifiers Random Forest and k Nearest Neighbor Simulated Annealing for 679 small molecules with measured inhibition activity for NS5B genotype 1b. The activity was expressed as a binary value (active/inactive), where actives were considered molecules with IC50 ≤0.95 µM. We applied our SAR models to various drug-like databases and identified novel chemical scaffolds for NS5B inhibitors. Subsequent in vitro antiviral assays suggested a new activity for an existing prodrug, Candesartan cilexetil, which is currently used to treat hypertension and heart failure but has not been previously tested for anti-HCV activity. We also identified NS5B inhibitors with two novel non-nucleoside chemical motifs.

Assuntos

Anti-Hipertensivos/química , Antivirais/química , Benzimidazóis/química , Compostos de Bifenilo/química , RNA Polimerase Dependente de RNA/antagonistas & inibidores , Tetrazóis/química , Proteínas não Estruturais Virais/antagonistas & inibidores , Inteligência Artificial , Bases de Dados de Compostos Químicos , Descoberta de Drogas , Reposicionamento de Medicamentos , Hepacivirus/química , Hepacivirus/enzimologia , Simulação de Acoplamento Molecular , RNA Polimerase Dependente de RNA/química , Curva ROC , Relação Estrutura-Atividade , Proteínas não Estruturais Virais/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA