RESUMO
In this paper, we show that the combination of NMR theoretical and experimental results can help to solve the molecular structure of peptides, here it is used as an example the residue Leucine-67 in Desulfovibrio vulgaris flavodoxin. We apply a computational protocol based on the leucine amino acid dipeptide, which, using calculated and experimental spin-spin coupling constants, allows us to obtain the conformation of the amino acid side chain. Calculated results show that the best agreement is obtained when three conformers around the lateral chain angle $\chi _1$ are considered or when the dynamic effect in the torsional angles is included. The population of each structure is estimated and analyzed according to the correlation between those two approaches. Independently of the approach, the estimated $\chi _1$ angle in solution is close to the staggered value of -60$^\circ $ and deviates significantly from the average x-ray angle of -90$^\circ $.
Assuntos
Desulfovibrio vulgaris/química , Flavodoxina/química , Leucina/química , Espectroscopia de Ressonância Magnética/métodos , Modelos Moleculares , Sequência de Aminoácidos , Flavodoxina/isolamento & purificação , Peptídeos/química , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Soluções , Solventes/química , Água/químicaRESUMO
A new approach for generating Gaussian basis sets is reported and tested for atoms from H to Ne. The basis sets thus calculated, named SIGMA basis sets, range from DZ to QZ sizes and have the same composition per shell as Dunning basis sets but with different treatment of the contractions. The standard SIGMA basis sets and their augmented versions have proven to be very suitable for providing good results in atomic and molecular calculations. The performance of the new basis sets is analyzed in terms of total, correlation, and atomization energies, equilibrium distances, and vibrational frequencies in several molecules, and the results are compared at several computational levels with those obtained with the corresponding Dunning and other basis sets.
RESUMO
Theoretical relationships between the vicinal spin-spin coupling constants (SSCCs) and the χ1 torsion angles have been studied to predict the conformations of protein side chains. An efficient computational procedure is developed to obtain the conformation of dipeptides through theoretical and experimental SSCCs, Karplus equations, and quantum chemistry methods, and it is applied to three aliphatic hydrophobic residues (Val, Leu, and Ile). Three models are proposed: unimodal-static, trimodal-static-stepped, and trimodal-static-trigonal, where the most important factors are incorporated (coupled nuclei, nature and orientation of the substituents, and local geometric properties). Our results are validated by comparison with NMR and X-ray empirical data described in the literature, obtaining successful results on the 29 residues considered. Using out trimodal residue treatment, it is possible to detect and resolve residues with a simple conformation and those with two or three staggered conformers. In four residues, a deeper analysis explains that they do not have a unique conformation and that the population of each conformation plays an important role.
Assuntos
Dipeptídeos , Proteínas , Dipeptídeos/química , Imageamento por Ressonância Magnética , Espectroscopia de Ressonância Magnética , Conformação Proteica , Proteínas/químicaRESUMO
Imbalanced datasets, comprising of more inactive compounds relative to the active ones, are a common challenge in ligand-based model building workflows for drug discovery. This is particularly true for neglected tropical diseases since efforts to identify therapeutics for these diseases are often limited. In this report, we analyze the performance of several undersampling strategies in modeling the Dengue Virus 2 (DENV2) inhibitory activity, as well as the anti-flaviviral activities for the West Nile (WNV) and Zika (ZIKV) viruses. To this end, we build datasets comprising of 1218 (159 actives and 1059 inactives), 1044 (132 actives and 912 inactives) and 302 (75 actives and 227 inactives) molecules with known DENV2, WNV and ZIKV inhibitory activity profiles, respectively. We develop ensemble classifiers for these endpoints and compare the performance of the different undersampling algorithms on external sets. It is observed that data pruning algorithms yield superior performance relative to data selection algorithms. The best overall performance is provided by the one-sided selection algorithm with test set balanced accuracy (BACC) values of 0.84, 0.74 and 0.77 for the DENV2, WNV and ZIKV inhibitory activities, respectively. For the model building, we use the recently proposed GT-STAF information indices, and compare the predictivity of 3 molecular fragmentation approaches: connected subgraphs, substructure and alogp atom types, which are observed to show comparable performance. On the other hand, a combination of indices based on these fragmentation strategies enhances the predictivity of the built ensembles. The built models could be useful for screening new molecules with possible DENV, WNV and ZIKV inhibitory activities. ADMET modelers are encouraged to adopt undersampling algorithms in their workflows when dealing with imbalanced datasets.
Assuntos
Antivirais/farmacologia , Descoberta de Drogas/métodos , Flaviviridae/efeitos dos fármacos , Máquina de Vetores de Suporte , Antivirais/química , Vírus da Dengue/efeitos dos fármacos , Infecções por Flaviviridae/tratamento farmacológico , Humanos , Vírus do Nilo Ocidental/efeitos dos fármacos , Zika virus/efeitos dos fármacosRESUMO
Ionic liquids (ILs) play a key role in many chemical applications. As regards the theoretical approach, ILs show added difficulties in calculations due to the composition of the ion pair and to the fact that they are liquids. Although density functional theory (DFT) can treat this kind of systems to predict physico-chemical properties, common versions of these methods fail to perform accurate predictions of geometries, interaction energies, dipole moments, and other properties related to the molecular structure. In these cases, dispersion and self-interaction error (SIE) corrections need to be introduced to improve DFT calculations involving ILs. We show that the inclusion of dispersion is needed to obtain good geometries and accurate interaction energies. SIE needs to be corrected to describe the charges and dipoles in the ion pair correctly. The use of range-separated functionals allows us to obtain interaction energies close to the CCSD(T) level. © 2017 Wiley Periodicals, Inc.
RESUMO
Cluster tendency assessment is an important stage in cluster analysis. In this sense, a group of promising techniques named visual assessment of tendency (VAT) has emerged in the literature. The presence of clusters can be detected easily through the direct observation of a dark blocks structure along the main diagonal of the intensity image. Alternatively, if the Dunn's index for a single linkage partition is greater than 1, then it is a good indication of the blocklike structure. In this report, the Dunn's index is applied as a novel measure of tendency on 8 pharmacological data sets, represented by machine-learning-selected molecular descriptors. In all cases, observed values are less than 1, thus indicating a weak tendency for data to form compact clusters. Other results suggest that there is an increasing relationship between the Dunn's index as a measure of cluster separability and the classification accuracy of various cluster algorithms tested on the same data sets.
Assuntos
Análise por Conglomerados , Interpretação Estatística de Dados , Bases de Dados Factuais/estatística & dados numéricos , Farmacologia/estatística & dados numéricos , Humanos , SoftwareRESUMO
Cluster algorithms play an important role in diversity related tasks of modern chemoinformatics, with the widest applications being in pharmaceutical industry drug discovery programs. The performance of these grouping strategies depends on various factors such as molecular representation, mathematical method, algorithmical technique, and statistical distribution of data. For this reason, introduction and comparison of new methods are necessary in order to find the model that best fits the problem at hand. Earlier comparative studies report on Ward's algorithm using fingerprints for molecular description as generally superior in this field. However, problems still remain, i.e., other types of numerical descriptions have been little exploited, current descriptors selection strategy is trial and error-driven, and no previous comparative studies considering a broader domain of the combinatorial methods in grouping chemoinformatic data sets have been conducted. In this work, a comparison between combinatorial methods is performed,with five of them being novel in cheminformatics. The experiments are carried out using eight data sets that are well established and validated in the medical chemistry literature. Each drug data set was represented by real molecular descriptors selected by machine learning techniques, which are consistent with the neighborhood principle. Statistical analysis of the results demonstrates that pharmacological activities of the eight data sets can be modeled with a few of families with 2D and 3D molecular descriptors, avoiding classification problems associated with the presence of nonrelevant features. Three out of five of the proposed cluster algorithms show superior performance over most classical algorithms and are similar (or slightly superior in the most optimistic sense) to Ward's algorithm. The usefulness of these algorithms is also assessed in a comparative experiment to potent QSAR and machine learning classifiers, where they perform similarly in some cases.
Assuntos
Modelos Estatísticos , Relação Quantitativa Estrutura-Atividade , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Modelos Biológicos , Preparações Farmacêuticas/química , FarmacologiaRESUMO
The great cost associated with the development of new anabolic-androgenic steroid (AASs) makes necessary the development of computational methods that shorten the drug discovery pipeline. Toward this end, quantum, and physicochemical molecular descriptors, plus linear discriminant analysis (LDA) were used to analyze the anabolic/androgenic activity of structurally diverse steroids and to discover novel AASs, as well as also to give a structural interpretation of their anabolic-androgenic ratio (AAR). The obtained models are able to correctly classify 91.67% (86.27%) of the AASs in the training (test) sets, respectively. The results of predictions on the 10% full-out cross-validation test also evidence the robustness of the obtained model. Moreover, these classification functions are applied to an "in house" library of chemicals, to find novel AASs. Two new AASs are synthesized and tested for in vivo activity. Although both AASs are less active than some commercially AASs, this result leaves a door open to a virtual variational study of the structure of the two compounds, to improve their biological activity. The LDA-assisted QSAR models presented here, could significantly reduce the number of synthesized and tested AASs, as well as could increase the chance of finding new chemical entities with higher AAR.
Assuntos
Anabolizantes/química , Anabolizantes/farmacologia , Reconhecimento Automatizado de Padrão/métodos , Relação Quantitativa Estrutura-Atividade , Esteroides/química , Esteroides/farmacologia , Algoritmos , Anabolizantes/classificação , Fenômenos Químicos , Físico-Química , Análise por Conglomerados , Simulação por Computador , Análise Discriminante , Ligantes , Estrutura Molecular , Teoria Quântica , Reprodutibilidade dos Testes , Esteroides/classificaçãoRESUMO
Predictive quantitative structure-activity relationship (QSAR) models of anabolic and androgenic activities for the testosterone and dihydrotestosterone steroid analogues were obtained by means of multiple linear regression using quantum and physicochemical molecular descriptors (MD) as well as a genetic algorithm for the selection of the best subset of variables. Quantitative models found for describing the anabolic (androgenic) activity are significant from a statistical point of view: R(2) of 0.84 (0.72 and 0.70). A leave-one-out cross-validation procedure revealed that the regression models had a fairly good predictability [q(2) of 0.80 (0.60 and 0.59)]. In addition, other QSAR models were developed to predict anabolic/androgenic (A/A) ratios and the best regression equation explains 68% of the variance for the experimental values of AA ratio and has a rather adequate q(2) of 0.51. External validation, by using test sets, was also used in each experiment in order to evaluate the predictive power of the obtained models. The result shows that these QSARs have quite good predictive abilities (R(2) of 0.90, 0.72 (0.55), and 0.53) for anabolic activity, androgenic activity, and A/A ratios, respectively. Last, a Williams plot was used in order to define the domain of applicability of the models as a squared area within +/-2 band for residuals and a leverage threshold of h=0.16. No apparent outliers were detected and the models can be used with high accuracy in this applicability domain. MDs included in our QSAR models allow the structural interpretation of the biological process, evidencing the main role of the shape of molecules, hydrophobicity, and electronic properties. Attempts were made to include lipophilicity (octanol-water partition coefficient (logP)) and electronic (hardness (eta)) values of the whole molecules in the multivariate relations. It was found from the study that the logP of molecules has positive contribution to the anabolic and androgenic activities and high values of eta produce unfavorable effects. The found MDs can also be efficiently used in similarity studies based on cluster analysis. Our model for the anabolic/androgenic ratio (expressed by weight of levator ani muscle, LA, and seminal vesicle, SV, in mice) predicts that the 2-aminomethylene-17alpha-methyl-17beta-hydroxy-5alpha-androstan-3-one (43) compound is the most potent anabolic steroid, and the 17alpha-methyl-2beta,17beta-dihydroxy-5alpha-androstane (31) compound is the least potent one of this series. The approach described in this report is an alternative for the discovery and optimization of leading anabolic compounds among steroids and analogues. It also gives an important role to electron exchange terms of molecular interactions to this kind of steroid activity.
Assuntos
Anabolizantes/química , Anabolizantes/farmacologia , Androgênios/química , Androgênios/farmacologia , Di-Hidrotestosterona/análogos & derivados , Modelos Químicos , Testosterona/análogos & derivados , Algoritmos , Androgênios/genética , Análise por Conglomerados , Simulação por Computador , Humanos , Masculino , Relação Quantitativa Estrutura-Atividade , Testosterona/genéticaRESUMO
Research on similarity searching of cheminformatic data sets has been focused on similarity measures using fingerprints. However, nominal scales are the least informative of all metric scales, increasing the tied similarity scores, and decreasing the effectivity of the retrieval engines. Tanimoto's coefficient has been claimed to be the most prominent measure for this task. Nevertheless, this field is far from being exhausted since the computer science no free lunch theorem predicts that "no similarity measure has overall superiority over the population of data sets". We introduce 12 relational agreement (RA) coefficients for seven metric scales, which are integrated within a group fusion-based similarity searching algorithm. These similarity measures are compared to a reference panel of 21 proximity quantifiers over 17 benchmark data sets (MUV), by using informative descriptors, a feature selection stage, a suitable performance metric, and powerful comparison tests. In this stage, RA coefficients perform favourably with repect to the state-of-the-art proximity measures. Afterward, the RA-based method outperform another four nearest neighbor searching algorithms over the same data domains. In a third validation stage, RA measures are successfully applied to the virtual screening of the NCI data set. Finally, we discuss a possible molecular interpretation for these similarity variants.
Assuntos
Química/métodos , Bases de Dados de Compostos Químicos , Informática/métodos , Algoritmos , Mineração de DadosRESUMO
A computational strategy that combines both time-dependent and time-independent approaches is exploited to accurately model molecular dynamics and solvent effects on the isotropic hyperfine coupling constants of the DMPO-H nitroxide. Our recent general force field for nitroxides derived from AMBER ff99SB is further extended to systems involving hydrogen atoms in ß-positions with respect to NO-moiety. The resulting force-field has been employed in a series of classical molecular dynamics simulations, comparing the computed EPR parameters from selected molecular configurations to the corresponding experimental data in different solvents. The effect of vibrational averaging on the spectroscopic parameters is also taken into account, by second-order vibrational perturbation theory involving semidiagonal third energy derivatives together first and second property derivatives.
RESUMO
Parallel ligand- and structure-based virtual screenings of 269 steroids with anabolic activity evaluated in vivo were performed. The quantitative structure-activity relationship (QSAR) model expressed by selected descriptors as the octanol-water partition coefficient, the molar volume and the quantum mechanical calculated charge values on atoms C1, C2, C5, C9, C10, C14 and C17 of the steroid skeleton, expresses structural features of anabolic steroids (AS) contributing to the transport and steroid-receptor interaction. On the other hand, computational simulations of a candidate ligand binding to a receptor study (a "docking" procedure) predict the association of these AS with the human androgen receptor (AR). Fourteen compounds were identified as lead; the most potent was the 7α-methylestr-4-en-3, 17-dione. It was concluded that a good anabolic activity requires hydrogen bonding interactions between both Arg752 and Gln711 residues in the cycles A with O3 atom of the steroid and either Asn705 and Thr877 residues in the cycles D of steroid with O17 atom.
Assuntos
Anabolizantes/química , Anabolizantes/metabolismo , Relação Quantitativa Estrutura-Atividade , Esteroides/química , Esteroides/metabolismo , Análise por Conglomerados , Humanos , Receptores Androgênicos/química , Receptores Androgênicos/metabolismoRESUMO
The interaction of three different brassinosteroids with water was studied by the Multiple Minima Hypersurface (MMH) procedure to model molecular interactions explicitly. The resulting thermodynamic data give useful information on properties of molecular association with water. This application can serve as a tool for future investigations and modelling concerning interactions of brassinosteroids with receptor proteins in plants. DFT/B3LYP calculations were also made in order to correlate and test the performance of the current AM1 Hamiltonian calculations of these complexes, which are inherent to MMH routine. Diol functionalities located in ring A and lateral chain appears as the sites that show the highest affinity to water. The oxalactone group does not appear to be a key structural requirement in the association with water. Parallel calculations with a "polarizable continuum method" (PCM) agreed with the reported experimental order of biological activities, where Brassinolide exhibited the best solubility features.
Assuntos
Colestanóis/química , Modelos Teóricos , Reguladores de Crescimento de Plantas/química , Esteroides Heterocíclicos/química , Água/química , Brassinosteroides , Estrutura Molecular , Solubilidade , TermodinâmicaRESUMO
El desarrollo de fármacos es una tarea en extremo compleja pero también muy apreciada por la sensibilidad que genera el impacto negativo de las enfermedades en la sociedad moderna. En este trabajo de revisión se tratarán las características generales del paradigma tradicional del proceso de desarrollo de fármacos. Posteriormente, se abordarán las técnicas de cribado virtual basadas en el concepto de similitud molecular como alternativa racional y complementaria a las primeras fases dicho proceso. En este sentido, se hará énfasis en la búsqueda de similitud y sus componentes esenciales (AU)
Drug development is a very complex task but also very appreciated by the sensibility that generates the negative impact of diseases in modern society. In this review, we will address the general characteristics of the traditional paradigm of drug development pipeline. Later, virtual screening techniques will be introduced as a rational and complementary alternative to the early stages of this process. In this sense, we will focus on similarity searching and its key components (AU)