Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 37
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37642660

RESUMEN

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Asunto(s)
Benchmarking , Relación Estructura-Actividad Cuantitativa , Bioensayo , Aprendizaje Automático
2.
J Comput Aided Mol Des ; 38(1): 7, 2024 Jan 31.
Artículo en Inglés | MEDLINE | ID: mdl-38294570

RESUMEN

An important aspect in the development of small molecules as drugs or agrochemicals is their systemic availability after intravenous and oral administration. The prediction of the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows to focus the drug or agrochemical development on compounds with a favorable kinetic profile. However, such predictions are challenging as the availability is the result of the complex interplay between molecular properties, biology and physiology and training data is rare. In this work we improve the hybrid model developed earlier (Schneckener in J Chem Inf Model 59:4893-4905, 2019). We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture as well as the parametrization of mechanistic model. Further, we extend our approach to predict additional endpoints and to handle different covariates, like sex and dosage form. In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained. We demonstrate this feature by predicting the exposure over the first 24 h, while the model has only been trained on the total exposure.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Animales , Ratas , Cinética
3.
J Comput Aided Mol Des ; 37(3): 129-145, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36797399

RESUMEN

Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.


Asunto(s)
Redes Neurales de la Computación , Agua , Humanos , Animales , Solubilidad , Agua/química , Aprendizaje Automático , Concentración de Iones de Hidrógeno , Preparaciones Farmacéuticas
4.
J Comput Aided Mol Des ; 37(12): 765-789, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37878216

RESUMEN

In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.


Asunto(s)
Modelos Químicos , Agua , Solubilidad , Agua/química , Aprendizaje Automático , Solventes/química
5.
J Comput Aided Mol Des ; 36(11): 805-824, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36319876

RESUMEN

Accurate calculation of relative tautomer energies in different environments is a prerequisite to many parameters of relevance in drug discovery. This work provides a thorough benchmark of the semiempirical methods AM1, PM3 and GFN2-xTB, the force-field OPLS4, Hartree-Fock and HF-3c, the density functionals PBEh-3c, B97-3c, r2SCAN-3c, PBE, PBE0, TPSS, r2SCAN, ω-B97X-V, M06-2X, B3LYP, B2PLYP, and second-order perturbation theory MP2 versus the gold-standard coupled-cluster DLPNO-CCSD(T) using the def2-QZVPP basis set. The outperforming method identified is M06-2X, whereas r2SCAN-3c is the best-perfoming one in the set of cost-optimized methods. Application of the two methods on a challenging subset from the SAMPL2 challenge provides evidence that deviations from experiment are caused by deficiencies of current continuum solvation methods.


Asunto(s)
Descubrimiento de Drogas , Isomerismo
6.
J Comput Aided Mol Des ; 35(4): 505-516, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33094408

RESUMEN

Selective progesterone receptor modulators are promising therapeutic options for the treatment of uterine fibroids. Vilaprisan, a new chemical entity that was discovered at Bayer is currently in clinical development. In this study we provide a combined experimental and quantum chemical approach providing the data that allowed to present hydroxyestradienone as an acceptable starting material for drug substance synthesis. Hydroxyestradienone has four stereogenic centers leading to 8 diastereomers and 16 enantiomers of which only six diastereomers were synthetically accessible but two not. A computational multistep protocol resulting in density functional P2PLYP-D3(BJ)/dev2-TZVPP Gibbs free energies and SMD solvation free energies led to a clear separation between the existing and the synthetically not accessible enantiomers, whereas multiple geometry-based and cheminformatic descriptors were not able to explain experimental findings.


Asunto(s)
Estrenos/química , Esteroides/química , Estrenos/síntesis química , Modelos Moleculares , Teoría Cuántica , Estereoisomerismo , Esteroides/síntesis química , Termodinámica
7.
J Chem Inf Model ; 59(2): 668-672, 2019 02 25.
Artículo en Inglés | MEDLINE | ID: mdl-30694664

RESUMEN

Pharmaceutical products are often synthesized by the use of reactive starting materials and intermediates. These can, either as impurities or through metabolic activation, bind to the DNA. Primary aromatic amines belong to the critical classes that are considered potentially mutagenic in the Ames test, so there is a great need for good prediction models for risk assessment. How primary aromatic amines exert their mutagenic potential can be rationalized by the widely accepted nitrenium ion hypothesis of covalent binding to the DNA of reactive electrophiles formed out of the aromatic amines. Since the reactive chemical species is different in chemical structure from the actual compound, it is difficult to achieve good predictions via classical descriptor or fingerprint-based machine learning. In this approach, we use a combination of different molecular and atomic descriptors that is able to describe different mechanistic aspects of the metabolic transformation leading from the primary aromatic amine to the reactive metabolite that binds to the DNA. Applied to a test set, the combination shows significantly better performance than models that only use one of these descriptors and complemented the general internal Ames mutagenicity prediction model at Bayer.


Asunto(s)
Aminas/química , Aminas/toxicidad , Quimioinformática/métodos , Pruebas de Mutagenicidad , Mutágenos/química , Mutágenos/toxicidad , Modelos Moleculares , Conformación Molecular , Relación Estructura-Actividad Cuantitativa
8.
J Chem Inf Model ; 59(11): 4893-4905, 2019 11 25.
Artículo en Inglés | MEDLINE | ID: mdl-31714067

RESUMEN

Oral administration of drug products is a strict requirement in many medical indications. Therefore, bioavailability prediction models are of high importance for prioritization of compound candidates in the drug discovery process. However, oral exposure and bioavailability are difficult to predict, as they are the result of various highly complex factors and/or processes influenced by the physicochemical properties of a compound, such as solubility, lipophilicity, or charge state, as well as by interactions with the organism, for instance, metabolism or membrane permeation. In this study, we assess whether it is possible to predict intravenous (iv) or oral drug exposure and oral bioavailability in rats. As input parameters, we use (i) six experimentally determined in vitro and physicochemical endpoints, namely, membrane permeation, free fraction, metabolic stability, solubility, pKa value, and lipophilicity; (ii) the outputs of six in silico absorption, distribution, metabolism, and excretion models trained on the same endpoints, or (iii) the chemical structure encoded as fingerprints or simplified molecular input line entry system strings. The underlying data set for the models is an unprecedented collection of almost 1900 data points with high-quality in vivo experiments performed in rats. We find that drug exposure after iv administration can be predicted similarly well using hybrid models with in vitro- or in silico-predicted endpoints as inputs, with fold change errors (FCE) of 2.28 and 2.08, respectively. The FCEs for exposure after oral administration are higher, and here, the prediction from in vitro inputs performs significantly better in comparison to in silico-based models with FCEs of 3.49 and 2.40, respectively, most probably reflecting the higher complexity of oral bioavailability. Simplifying the prediction task to a binary alert for low oral bioavailability, based only on chemical structure, we achieve accuracy and precision close to 70%.


Asunto(s)
Descubrimiento de Drogas/métodos , Hepatocitos/metabolismo , Preparaciones Farmacéuticas/metabolismo , Administración Oral , Animales , Disponibilidad Biológica , Células CACO-2 , Simulación por Computador , Humanos , Aprendizaje Automático , Masculino , Modelos Biológicos , Permeabilidad , Preparaciones Farmacéuticas/química , Ratas , Ratas Wistar , Albúmina Sérica/metabolismo , Solubilidad
9.
Drug Discov Today Technol ; 32-33: 37-43, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-33386093

RESUMEN

This review provides an overview of descriptions of atoms applied to the understanding of phenomena like chemical reactivity and selectivity, pKa values, Site of Metabolism prediction, or hydrogen bond strengths, but also the substitution of quantum mechanical calculations by machine learning models for energies, forces or even spectrosocopic properties and finally the fast calculation of atomic charges for force field parametrization. The descriptor space ranges from derivatives of the wavefunctions or electron density via quantum mechanics derived descriptors to classical descriptions of atoms and their embedding in a molecule. The common denominator for all approaches is the thorough understanding of the physics of the chemical problem that guided the design of the atom descriptor. Quantum mechanics (QM) and machine learning (ML) finally are converging to a new discipline, namely QM/ML.


Asunto(s)
Descubrimiento de Drogas , Aprendizaje Automático , Preparaciones Farmacéuticas/química , Teoría Cuántica , Humanos
10.
J Chem Inf Model ; 58(5): 1005-1020, 2018 05 29.
Artículo en Inglés | MEDLINE | ID: mdl-29717870

RESUMEN

Prediction of compound properties from structure via quantitative structure-activity relationship and machine-learning approaches is an important computational chemistry task in small-molecule drug research. Though many such properties are dependent on three-dimensional structures or even conformer ensembles, the majority of models are based on descriptors derived from two-dimensional structures. Here we present results from a thorough benchmark study of force field, semiempirical, and density functional methods for the calculation of conformer energies in the gas phase and water solvation as a foundation for the correct identification of relevant low-energy conformers. We find that the tight-binding ansatz GFN-xTB shows the lowest error metrics and highest correlation to the benchmark PBE0-D3(BJ)/def2-TZVP in the gas phase for the computationally fast methods and that in solvent OPLS3 becomes comparable in performance. MMFF94, AM1, and DFTB+ perform worse, whereas the performance-optimized but far more expensive functional PBEh-3c yields energies almost perfectly correlated to the benchmark and should be used whenever affordable. On the basis of our findings, we have implemented a reliable and fast protocol for the identification of low-energy conformers of drug-like molecules in water that can be used for the quantification of strain energy and entropy contributions to target binding as well as for the derivation of conformer-ensemble-dependent molecular descriptors.


Asunto(s)
Gases/química , Informática/métodos , Aprendizaje Automático , Agua/química , Descubrimiento de Drogas , Modelos Moleculares , Conformación Molecular , Preparaciones Farmacéuticas/química , Relación Estructura-Actividad Cuantitativa , Solventes/química , Termodinámica
11.
Chemphyschem ; 18(8): 898-905, 2017 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-28133881

RESUMEN

Computational methods play a key role in modern drug design in the pharmaceutical industry but are mostly based on force fields, which are limited in accuracy when describing non-classical binding effects, proton transfer, or metal coordination. Here, we propose a general fully quantum mechanical (QM) scheme for the computation of protein-ligand affinities. It works on a single protein cutout (of about 1000 atoms) and evaluates all contributions (interaction energy, solvation, thermostatistical) to absolute binding free energy on the highest feasible QM level. The methodology is tested on two different protein targets: activated serine protease factor X (FXa) and tyrosine-protein kinase 2 (TYK2). We demonstrate that the geometry of the model systems can be efficiently energy-minimized by using general purpose graphics processing units, resulting in structures that are close to the co-crystallized protein-ligand structures. Our best calculations at a hybrid DFT level (PBEh-3c composite method) for the FXa ligand set result in an overall mean absolute deviation as low as 2.1 kcal mol-1 . Though very encouraging, an analysis of outliers indicates that the structure optimization level, conformational sampling, and solvation treatment require further improvement.


Asunto(s)
Factor X/química , Teoría Cuántica , Serina Endopeptidasas/química , TYK2 Quinasa/química , Sitios de Unión , Factor X/metabolismo , Humanos , Ligandos , Serina Endopeptidasas/metabolismo , TYK2 Quinasa/metabolismo
13.
J Chem Inf Model ; 55(2): 389-97, 2015 Feb 23.
Artículo en Inglés | MEDLINE | ID: mdl-25514239

RESUMEN

In a unique collaboration between a software company and a pharmaceutical company, we were able to develop a new in silico pKa prediction tool with outstanding prediction quality. An existing pKa prediction method from Simulations Plus based on artificial neural network ensembles (ANNE), microstates analysis, and literature data was retrained with a large homogeneous data set of drug-like molecules from Bayer. The new model was thus built with curated sets of ∼14,000 literature pKa values (∼11,000 compounds, representing literature chemical space) and ∼19,500 pKa values experimentally determined at Bayer Pharma (∼16,000 compounds, representing industry chemical space). Model validation was performed with several test sets consisting of a total of ∼31,000 new pKa values measured at Bayer. For the largest and most difficult test set with >16,000 pKa values that were not used for training, the original model achieved a mean absolute error (MAE) of 0.72, root-mean-square error (RMSE) of 0.94, and squared correlation coefficient (R(2)) of 0.87. The new model achieves significantly improved prediction statistics, with MAE = 0.50, RMSE = 0.67, and R(2) = 0.93. It is commercially available as part of the Simulations Plus ADMET Predictor release 7.0. Good predictions are only of value when delivered effectively to those who can use them. The new pKa prediction model has been integrated into Pipeline Pilot and the PharmacophorInformatics (PIx) platform used by scientists at Bayer Pharma. Different output formats allow customized application by medicinal chemists, physical chemists, and computational chemists.


Asunto(s)
Simulación por Computador , Bases de Datos Factuales , Modelos Químicos , Algoritmos , Biología Computacional , Minería de Datos , Informática , Redes Neurales de la Computación , Valor Predictivo de las Pruebas , Relación Estructura-Actividad
14.
Chemphyschem ; 15(17): 3824-31, 2014 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-25196062

RESUMEN

Transient UV/Vis absorption spectroscopy is used to study the primary dynamics of the ring-A methyl imino ether of phycocyanobilin (PCB-AIE), which was shown to mimic the far-red absorbance of the Pfr chromophore in phytochromes (R. Micura, K. Grubmayr, Bioorg. Med. Chem. Lett.- 1994, 4, 2517-2522). After excitation at 615 nm, the excited electronic state is found to decay with τ1 =0.4 ps followed by electronic ground-state relaxation with τ2 =1.2 and τ3 =6.7 ps. Compared with phycocyanobilin (PCB), the initial kinetics of PCB-AIE is much faster. Thus, the lactim structure of PCB-AIE seems to be a suitable model that could not only explain the bathochromic shift in the ground-state absorption but also the short reaction of the Pfr as compared to the Pr chromophore in phytochrome. In addition, the equivalence of ring-A and ring-D lactim tautomers with respect to a red-shifted absorbance relative to the lactam tautomers is demonstrated by semiempirical calculations.

15.
Front Pharmacol ; 15: 1415266, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39086387

RESUMEN

N-nitrosamines and nitrosamine drug substance related impurities (NDSRIs) became a critical topic for the development and safety of small molecule medicines following the withdrawal of various pharmaceutical products from the market. To assess the mutagenic and carcinogenic potential of different N-nitrosamines lacking robust carcinogenicity data, several approaches are in use including the published carcinogenic potency categorization approach (CPCA), the Enhanced Ames Test (EAT), in vivo mutagenicity studies as well as read-across to analogue molecules with robust carcinogenicity data. We employ quantum chemical calculations as a pivotal tool providing insights into the likelihood of reactive ion formation and subsequent DNA alkylation for a selection of molecules including e.g., carcinogenic N-nitrosopiperazine (NPZ), N-nitrosopiperidine (NPIP), together with N-nitrosodimethylamine (NDMA) as well as non-carcinogenic N-nitrosomethyl-tert-butylamine (NTBA) and bis (butan-2-yl) (nitros)amine (BBNA). In addition, a series of nitroso-methylaminopyridines is compared side-by-side. We draw comparisons between calculated reaction profiles for structures representing motifs common to NDSRIs and those of confirmed carcinogenic and non-carcinogenic molecules with in vivo data from cancer bioassays. Furthermore, our approach enables insights into reactivity and relative stability of intermediate species that can be formed upon activation of several nitrosamines. Most notably, we reveal consistent differences between the free energy profiles of carcinogenic and non-carcinogenic molecules. For the former, the intermediate diazonium ions mostly react, kinetically controlled, to the more stable DNA adducts and less to the water adducts via transition-states of similar heights. Non-carcinogenic molecules yield stable carbocations as intermediates that, thermodynamically controlled, more likely form the statistically preferred water adducts. In conclusion, our data confirm that quantum chemical calculations can contribute to a weight of evidence approach for the risk assessment of nitrosamines.

16.
ACS Omega ; 8(6): 5901-5916, 2023 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-36816707

RESUMEN

Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel-Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.

17.
ACS Omega ; 7(49): 45617-45623, 2022 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-36530278

RESUMEN

We present a quantum chemistry (QM)-based method that computes the relative energies of intermediates in the Heck reaction that relate to the regioselective reaction outcome: branched (α), linear (ß), or a mix of the two. The calculations are done for two different reaction pathways (neutral and cationic) and are based on r 2SCAN-3c single-point calculations on GFN2-xTB geometries that, in turn, derive from a GFNFF-xTB conformational search. The method is completely automated and is sufficiently efficient to allow for the calculation of thousands of reaction outcomes. The method can mostly reproduce systematic experimental studies where the ratios of regioisomers are carefully determined. For a larger dataset extracted from Reaxys, the results are somewhat worse with accuracies of 63% for ß-selectivity using the neutral pathway and 29% for α-selectivity using the cationic pathway. Our analysis of the dataset suggests that only the major or desired regioisomer is reported in the literature in many cases, which makes accurate comparisons difficult. The code is freely available on GitHub under the MIT open-source license: https://github.com/jensengroup/HeckQM.

18.
Methods Mol Biol ; 2390: 61-101, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34731464

RESUMEN

The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.


Asunto(s)
Aprendizaje Automático , Algoritmos , Descubrimiento de Drogas , Relación Estructura-Actividad Cuantitativa
19.
J Cheminform ; 13(1): 10, 2021 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-33579374

RESUMEN

We present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.

20.
J Cheminform ; 13(1): 55, 2021 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-34325738

RESUMEN

In this study we compare the three algorithms for the generation of conformer ensembles Biovia BEST, Schrödinger Prime macrocycle sampling (PMM) and Conformator (CONF) form the University of Hamburg, with ensembles derived for exhaustive molecular dynamics simulations applied to a dataset of 7 small macrocycles in two charge states and three solvents. Ensemble completeness is a prerequisite to allow for the selection of relevant diverse conformers for many applications in computational chemistry. We apply conformation maps using principal component analysis based on ring torsions. Our major finding critical for all applications of conformer ensembles in any computational study is that maps derived from MD with explicit solvent are significantly distinct between macrocycles, charge states and solvents, whereas the maps for post-optimized conformers using implicit solvent models from all generator algorithms are very similar independent of the solvent. We apply three metrics for the quantification of the relative covered ensemble space, namely cluster overlap, variance statistics, and a novel metric, Mahalanobis distance, showing that post-optimized MD ensembles cover a significantly larger conformational space than the generator ensembles, with the ranking PMM > BEST >> CONF. Furthermore, we find that the distributions of 3D polar surface areas are very similar for all macrocycles independent of charge state and solvent, except for the smaller and more strained compound 7, and that there is also no obvious correlation between 3D PSA and intramolecular hydrogen bond count distributions.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA