Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
J Comput Aided Mol Des ; 38(1): 7, 2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38294570

RESUMO

An important aspect in the development of small molecules as drugs or agrochemicals is their systemic availability after intravenous and oral administration. The prediction of the systemic availability from the chemical structure of a potential candidate is highly desirable, as it allows to focus the drug or agrochemical development on compounds with a favorable kinetic profile. However, such predictions are challenging as the availability is the result of the complex interplay between molecular properties, biology and physiology and training data is rare. In this work we improve the hybrid model developed earlier (Schneckener in J Chem Inf Model 59:4893-4905, 2019). We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62. This is achieved by training on a larger data set, improving the neural network architecture as well as the parametrization of mechanistic model. Further, we extend our approach to predict additional endpoints and to handle different covariates, like sex and dosage form. In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained. We demonstrate this feature by predicting the exposure over the first 24 h, while the model has only been trained on the total exposure.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Animais , Ratos , Cinética
2.
J Chem Inf Model ; 64(7): 2331-2344, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37642660

RESUMO

Federated multipartner machine learning has been touted as an appealing and efficient method to increase the effective training data volume and thereby the predictivity of models, particularly when the generation of training data is resource-intensive. In the landmark MELLODDY project, indeed, each of ten pharmaceutical companies realized aggregated improvements on its own classification or regression models through federated learning. To this end, they leveraged a novel implementation extending multitask learning across partners, on a platform audited for privacy and security. The experiments involved an unprecedented cross-pharma data set of 2.6+ billion confidential experimental activity data points, documenting 21+ million physical small molecules and 40+ thousand assays in on-target and secondary pharmacodynamics and pharmacokinetics. Appropriate complementary metrics were developed to evaluate the predictive performance in the federated setting. In addition to predictive performance increases in labeled space, the results point toward an extended applicability domain in federated learning. Increases in collective training data volume, including by means of auxiliary data resulting from single concentration high-throughput and imaging assays, continued to boost predictive performance, albeit with a saturating return. Markedly higher improvements were observed for the pharmacokinetics and safety panel assay-based task subsets.


Assuntos
Benchmarking , Relação Quantitativa Estrutura-Atividade , Bioensaio , Aprendizado de Máquina
3.
J Comput Aided Mol Des ; 37(12): 765-789, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37878216

RESUMO

In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.


Assuntos
Modelos Químicos , Água , Solubilidade , Água/química , Aprendizado de Máquina , Solventes/química
4.
J Comput Aided Mol Des ; 37(3): 129-145, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36797399

RESUMO

Aqueous solubility is the most important physicochemical property for agrochemical and drug candidates and a prerequisite for uptake, distribution, transport, and finally the bioavailability in living species. We here present the first-ever direct machine learning models for pH-dependent solubility in water. For this, we combined almost 300000 data points from 11 solubility assays performed over 24 years and over one million data points from lipophilicity and melting point experiments. Data were split into three pH-classes - acidic, neutral and basic - , representing the conditions of stomach and intestinal tract for animals and humans, and phloem and xylem for plants. We find that multi-task neural networks using ECFP-6 fingerprints outperform baseline random forests and single-task neural networks on the individual tasks. Our final model with three solubility tasks using the pH-class combined data from different assays and five helper tasks results in root mean square errors of 0.56 log units overall (acidic 0.61; neutral 0.52; basic 0.54) and Spearman rank correlations of 0.83 (acidic 0.78; neutral 0.86; basic 0.86), making it a valuable tool for profiling of compounds in pharmaceutical and agrochemical research. The model allows for the prediction of compound pH profiles with mean and median RMSE per molecule of 0.62 and 0.56 log units.


Assuntos
Redes Neurais de Computação , Água , Humanos , Animais , Solubilidade , Água/química , Aprendizado de Máquina , Concentração de Íons de Hidrogênio , Preparações Farmacêuticas
5.
ACS Omega ; 8(6): 5901-5916, 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36816707

RESUMO

Approaches for predicting proteolysis targeting chimera (PROTAC) cell permeability are of major interest to reduce resource-demanding synthesis and testing of low-permeable PROTACs. We report a comprehensive investigation of the scope and limitations of machine learning-based binary classification models developed using 17 simple descriptors for large and structurally diverse sets of cereblon (CRBN) and von Hippel-Lindau (VHL) PROTACs. For the VHL PROTAC set, kappa nearest neighbor and random forest models performed best and predicted the permeability of a blinded test set with >80% accuracy (k ≥ 0.57). Models retrained by combining the original training and the blinded test set performed equally well for a second blinded VHL set. However, models for CRBN PROTACs were less successful, mainly due to the imbalanced nature of the CRBN datasets. All descriptors contributed to the models, but size and lipophilicity were the most important. We conclude that properly trained machine learning models can be integrated as effective filters in the PROTAC design process.

6.
ACS Omega ; 7(49): 45617-45623, 2022 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-36530278

RESUMO

We present a quantum chemistry (QM)-based method that computes the relative energies of intermediates in the Heck reaction that relate to the regioselective reaction outcome: branched (α), linear (ß), or a mix of the two. The calculations are done for two different reaction pathways (neutral and cationic) and are based on r 2SCAN-3c single-point calculations on GFN2-xTB geometries that, in turn, derive from a GFNFF-xTB conformational search. The method is completely automated and is sufficiently efficient to allow for the calculation of thousands of reaction outcomes. The method can mostly reproduce systematic experimental studies where the ratios of regioisomers are carefully determined. For a larger dataset extracted from Reaxys, the results are somewhat worse with accuracies of 63% for ß-selectivity using the neutral pathway and 29% for α-selectivity using the cationic pathway. Our analysis of the dataset suggests that only the major or desired regioisomer is reported in the literature in many cases, which makes accurate comparisons difficult. The code is freely available on GitHub under the MIT open-source license: https://github.com/jensengroup/HeckQM.

7.
J Comput Aided Mol Des ; 36(11): 805-824, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36319876

RESUMO

Accurate calculation of relative tautomer energies in different environments is a prerequisite to many parameters of relevance in drug discovery. This work provides a thorough benchmark of the semiempirical methods AM1, PM3 and GFN2-xTB, the force-field OPLS4, Hartree-Fock and HF-3c, the density functionals PBEh-3c, B97-3c, r2SCAN-3c, PBE, PBE0, TPSS, r2SCAN, ω-B97X-V, M06-2X, B3LYP, B2PLYP, and second-order perturbation theory MP2 versus the gold-standard coupled-cluster DLPNO-CCSD(T) using the def2-QZVPP basis set. The outperforming method identified is M06-2X, whereas r2SCAN-3c is the best-perfoming one in the set of cost-optimized methods. Application of the two methods on a challenging subset from the SAMPL2 challenge provides evidence that deviations from experiment are caused by deficiencies of current continuum solvation methods.


Assuntos
Descoberta de Drogas , Isomerismo
8.
Methods Mol Biol ; 2390: 61-101, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34731464

RESUMO

The well-known concept of quantitative structure-activity relationships (QSAR) has been gaining significant interest in the recent years. Data, descriptors, and algorithms are the main pillars to build useful models that support more efficient drug discovery processes with in silico methods. Significant advances in all three areas are the reason for the regained interest in these models. In this book chapter we review various machine learning (ML) approaches that make use of measured in vitro/in vivo data of many compounds. We put these in context with other digital drug discovery methods and present some application examples.


Assuntos
Aprendizado de Máquina , Algoritmos , Descoberta de Drogas , Relação Quantitativa Estrutura-Atividade
9.
J Cheminform ; 13(1): 55, 2021 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-34325738

RESUMO

In this study we compare the three algorithms for the generation of conformer ensembles Biovia BEST, Schrödinger Prime macrocycle sampling (PMM) and Conformator (CONF) form the University of Hamburg, with ensembles derived for exhaustive molecular dynamics simulations applied to a dataset of 7 small macrocycles in two charge states and three solvents. Ensemble completeness is a prerequisite to allow for the selection of relevant diverse conformers for many applications in computational chemistry. We apply conformation maps using principal component analysis based on ring torsions. Our major finding critical for all applications of conformer ensembles in any computational study is that maps derived from MD with explicit solvent are significantly distinct between macrocycles, charge states and solvents, whereas the maps for post-optimized conformers using implicit solvent models from all generator algorithms are very similar independent of the solvent. We apply three metrics for the quantification of the relative covered ensemble space, namely cluster overlap, variance statistics, and a novel metric, Mahalanobis distance, showing that post-optimized MD ensembles cover a significantly larger conformational space than the generator ensembles, with the ranking PMM > BEST >> CONF. Furthermore, we find that the distributions of 3D polar surface areas are very similar for all macrocycles independent of charge state and solvent, except for the smaller and more strained compound 7, and that there is also no obvious correlation between 3D PSA and intramolecular hydrogen bond count distributions.

10.
J Cheminform ; 13(1): 10, 2021 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-33579374

RESUMO

We present RegioSQM20, a new version of RegioSQM (Chem Sci 9:660, 2018), which predicts the regioselectivities of electrophilic aromatic substitution (EAS) reactions from the calculation of proton affinities. The following improvements have been made: The open source semiempirical tight binding program xtb is used instead of the closed source MOPAC program. Any low energy tautomeric forms of the input molecule are identified and regioselectivity predictions are made for each form. Finally, RegioSQM20 offers a qualitative prediction of the reactivity of each tautomer (low, medium, or high) based on the reaction center with the highest proton affinity. The inclusion of tautomers increases the success rate from 90.7 to 92.7%. RegioSQM20 is compared to two machine learning based models: one developed by Struble et al. (React Chem Eng 5:896, 2020) specifically for regioselectivity predictions of EAS reactions (WLN) and a more generally applicable reactivity predictor (IBM RXN) developed by Schwaller et al. (ACS Cent Sci 5:1572, 2019). RegioSQM20 and WLN offers roughly the same success rates for the entire data sets (without considering tautomers), while WLN is many orders of magnitude faster. The accuracy of the more general IBM RXN approach is somewhat lower: 76.3-85.0%, depending on the data set. The code is freely available under the MIT open source license and will be made available as a webservice (regiosqm.org) in the near future.

12.
J Comput Aided Mol Des ; 35(4): 505-516, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33094408

RESUMO

Selective progesterone receptor modulators are promising therapeutic options for the treatment of uterine fibroids. Vilaprisan, a new chemical entity that was discovered at Bayer is currently in clinical development. In this study we provide a combined experimental and quantum chemical approach providing the data that allowed to present hydroxyestradienone as an acceptable starting material for drug substance synthesis. Hydroxyestradienone has four stereogenic centers leading to 8 diastereomers and 16 enantiomers of which only six diastereomers were synthetically accessible but two not. A computational multistep protocol resulting in density functional P2PLYP-D3(BJ)/dev2-TZVPP Gibbs free energies and SMD solvation free energies led to a clear separation between the existing and the synthetically not accessible enantiomers, whereas multiple geometry-based and cheminformatic descriptors were not able to explain experimental findings.


Assuntos
Estrenos/química , Esteroides/química , Estrenos/síntese química , Modelos Moleculares , Teoria Quântica , Estereoisomerismo , Esteroides/síntese química , Termodinâmica
13.
Drug Discov Today ; 25(9): 1702-1709, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32652309

RESUMO

Over the past two decades, an in silico absorption, distribution, metabolism, and excretion (ADMET) platform has been created at Bayer Pharma with the goal to generate models for a variety of pharmacokinetic and physicochemical endpoints in early drug discovery. These tools are accessible to all scientists within the company and can be a useful in assisting with the selection and design of novel leads, as well as the process of lead optimization. Here. we discuss the development of machine-learning (ML) approaches with special emphasis on data, descriptors, and algorithms. We show that high company internal data quality and tailored descriptors, as well as a thorough understanding of the experimental endpoints, are essential to the utility of our models. We discuss the recent impact of deep neural networks and show selected application examples.


Assuntos
Aprendizado de Máquina , Farmacocinética , Animais , Simulação por Computador , Humanos , Absorção Intestinal , Modelos Teóricos , Preparações Farmacêuticas/metabolismo
14.
J Med Chem ; 63(13): 6774-6783, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32453569

RESUMO

We herein report the first thorough analysis of the structure-permeability relationship of semipeptidic macrocycles. In total, 47 macrocycles were synthesized using a hybrid solid-phase/solution strategy, and then their passive and cellular permeability was assessed using the parallel artificial membrane permeability assay (PAMPA) and Caco-2 assay, respectively. The results indicate that semipeptidic macrocycles generally possess high passive permeability based on the PAMPA, yet their cellular permeability is governed by efflux, as reported in the Caco-2 assay. Structural variations led to tractable structure-permeability and structure-efflux relationships, wherein the linker length, stereoinversion, N-methylation, and peptoids site-specifically impact the permeability and efflux. Extensive nuclear magnetic resonance, molecular dynamics, and ensemble-based three-dimensional polar surface area (3D-PSA) studies showed that ensemble-based 3D-PSA is a good predictor of passive permeability.


Assuntos
Compostos Macrocíclicos/química , Compostos Macrocíclicos/metabolismo , Peptídeos/química , Células CACO-2 , Humanos , Membranas Artificiais , Permeabilidade
15.
J Phys Chem B ; 124(18): 3636-3646, 2020 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-32275425

RESUMO

Special-purpose classical force fields (FFs) provide good accuracy at very low computational cost, but their application is limited to systems for which potential energy functions are available. This excludes most metal-containing proteins or those containing cofactors. In contrast, the GFN2-xTB semiempirical quantum chemical method is parametrized for almost the entire periodic table. The accuracy of GFN2-xTB is assessed for protein structures with respect to experimental X-ray data. Furthermore, the results are compared with those of two special-purpose FFs, HF-3c, PM6-D3H4X, and PM7. The test sets include proteins without any prosthetic groups as well as metalloproteins. Crystal packing effects are examined for a set of smaller proteins to validate the molecular approach. For the proteins without prosthetic groups, the special purpose FF OPLS-2005 yields the smallest overall RMSD to the X-ray data but GFN2-xTB provides similarly good structures with even better bond-length distributions. For the metalloproteins with up to 5000 atoms, a good overall structural agreement is obtained with GFN2-xTB. The full geometry optimizations of protein structures with on average 1000 atoms in wall-times below 1 day establishes the GFN2-xTB method as a versatile tool for the computational treatment of various biomolecules with a good accuracy/computational cost ratio.


Assuntos
Metaloproteínas , Peptídeos
16.
J Chem Inf Model ; 59(11): 4893-4905, 2019 11 25.
Artigo em Inglês | MEDLINE | ID: mdl-31714067

RESUMO

Oral administration of drug products is a strict requirement in many medical indications. Therefore, bioavailability prediction models are of high importance for prioritization of compound candidates in the drug discovery process. However, oral exposure and bioavailability are difficult to predict, as they are the result of various highly complex factors and/or processes influenced by the physicochemical properties of a compound, such as solubility, lipophilicity, or charge state, as well as by interactions with the organism, for instance, metabolism or membrane permeation. In this study, we assess whether it is possible to predict intravenous (iv) or oral drug exposure and oral bioavailability in rats. As input parameters, we use (i) six experimentally determined in vitro and physicochemical endpoints, namely, membrane permeation, free fraction, metabolic stability, solubility, pKa value, and lipophilicity; (ii) the outputs of six in silico absorption, distribution, metabolism, and excretion models trained on the same endpoints, or (iii) the chemical structure encoded as fingerprints or simplified molecular input line entry system strings. The underlying data set for the models is an unprecedented collection of almost 1900 data points with high-quality in vivo experiments performed in rats. We find that drug exposure after iv administration can be predicted similarly well using hybrid models with in vitro- or in silico-predicted endpoints as inputs, with fold change errors (FCE) of 2.28 and 2.08, respectively. The FCEs for exposure after oral administration are higher, and here, the prediction from in vitro inputs performs significantly better in comparison to in silico-based models with FCEs of 3.49 and 2.40, respectively, most probably reflecting the higher complexity of oral bioavailability. Simplifying the prediction task to a binary alert for low oral bioavailability, based only on chemical structure, we achieve accuracy and precision close to 70%.


Assuntos
Descoberta de Drogas/métodos , Hepatócitos/metabolismo , Preparações Farmacêuticas/metabolismo , Administração Oral , Animais , Disponibilidade Biológica , Células CACO-2 , Simulação por Computador , Humanos , Aprendizado de Máquina , Masculino , Modelos Biológicos , Permeabilidade , Preparações Farmacêuticas/química , Ratos , Ratos Wistar , Albumina Sérica/metabolismo , Solubilidade
17.
J Chem Inf Model ; 59(2): 668-672, 2019 02 25.
Artigo em Inglês | MEDLINE | ID: mdl-30694664

RESUMO

Pharmaceutical products are often synthesized by the use of reactive starting materials and intermediates. These can, either as impurities or through metabolic activation, bind to the DNA. Primary aromatic amines belong to the critical classes that are considered potentially mutagenic in the Ames test, so there is a great need for good prediction models for risk assessment. How primary aromatic amines exert their mutagenic potential can be rationalized by the widely accepted nitrenium ion hypothesis of covalent binding to the DNA of reactive electrophiles formed out of the aromatic amines. Since the reactive chemical species is different in chemical structure from the actual compound, it is difficult to achieve good predictions via classical descriptor or fingerprint-based machine learning. In this approach, we use a combination of different molecular and atomic descriptors that is able to describe different mechanistic aspects of the metabolic transformation leading from the primary aromatic amine to the reactive metabolite that binds to the DNA. Applied to a test set, the combination shows significantly better performance than models that only use one of these descriptors and complemented the general internal Ames mutagenicity prediction model at Bayer.


Assuntos
Aminas/química , Aminas/toxicidade , Quimioinformática/métodos , Testes de Mutagenicidade , Mutagênicos/química , Mutagênicos/toxicidade , Modelos Moleculares , Conformação Molecular , Relação Quantitativa Estrutura-Atividade
18.
Drug Discov Today Technol ; 32-33: 37-43, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33386093

RESUMO

This review provides an overview of descriptions of atoms applied to the understanding of phenomena like chemical reactivity and selectivity, pKa values, Site of Metabolism prediction, or hydrogen bond strengths, but also the substitution of quantum mechanical calculations by machine learning models for energies, forces or even spectrosocopic properties and finally the fast calculation of atomic charges for force field parametrization. The descriptor space ranges from derivatives of the wavefunctions or electron density via quantum mechanics derived descriptors to classical descriptions of atoms and their embedding in a molecule. The common denominator for all approaches is the thorough understanding of the physics of the chemical problem that guided the design of the atom descriptor. Quantum mechanics (QM) and machine learning (ML) finally are converging to a new discipline, namely QM/ML.


Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Preparações Farmacêuticas/química , Teoria Quântica , Humanos
19.
J Cheminform ; 11(1): 59, 2019 Sep 11.
Artigo em Inglês | MEDLINE | ID: mdl-33430967

RESUMO

We present machine learning (ML) models for hydrogen bond acceptor (HBA) and hydrogen bond donor (HBD) strengths. Quantum chemical (QC) free energies in solution for 1:1 hydrogen-bonded complex formation to the reference molecules 4-fluorophenol and acetone serve as our target values. Our acceptor and donor databases are the largest on record with 4426 and 1036 data points, respectively. After scanning over radial atomic descriptors and ML methods, our final trained HBA and HBD ML models achieve RMSEs of 3.8 kJ mol-1 (acceptors), and 2.3 kJ mol-1 (donors) on experimental test sets, respectively. This performance is comparable with previous models that are trained on experimental hydrogen bonding free energies, indicating that molecular QC data can serve as substitute for experiment. The potential ramifications thereof could lead to a full replacement of wetlab chemistry for HBA/HBD strength determination by QC. As a possible chemical application of our ML models, we highlight our predicted HBA and HBD strengths as possible descriptors in two case studies on trends in intramolecular hydrogen bonding.

20.
Mol Inform ; 38(4): e1800115, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30474291

RESUMO

We present two approaches for the computation of hydrogen bond acceptor strengths, one by machine-learning and one by a composite quantum-mechanical protocol, both based on the well-established pKBHX scale and dataset. The QM calculations after a necessary linear fit reproduce the complexation free energies in solution with an RMSE of 2.6 kJ mol-1 , not far off the expected error of 2 kJ mol-1 obtained from the comparison of experimental data from two different sources. The second approach is by Gaussian Process Regression (GPR) machine-learning. We describe the hydrogen bond acceptor atoms by a radial atomic reactivity descriptor that encodes their electronic and steric environment. The performance of the GPR model on an external test set corresponds to 3.3 kJ mol-1 , which is also close to the experimental error. We apply the GPR model built on experimental data to model the hydrogen bond acceptor strengths of a series of hydrogen bond acceptor sites of 10 phosphodiesterase 10 A inhibitors. The predicted values correlate well with the experimentally measured IC50 values.


Assuntos
Aprendizado de Máquina , Relação Quantitativa Estrutura-Atividade , Bases de Dados de Compostos Químicos , Ligação de Hidrogênio , Concentração Inibidora 50 , Modelos Lineares , Distribuição Normal , Inibidores de Fosfodiesterase/química , Inibidores de Fosfodiesterase/farmacologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA