Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 177
Filtrar
Mais filtros












Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 15(1): 8865, 2024 Oct 14.
Artigo em Inglês | MEDLINE | ID: mdl-39402016

RESUMO

Identifying transition states-saddle points on the potential energy surface connecting reactant and product minima-is central to predicting kinetic barriers and understanding chemical reaction mechanisms. In this work, we train a fully differentiable equivariant neural network potential, NewtonNet, on thousands of organic reactions and derive the analytical Hessians. By reducing the computational cost by several orders of magnitude relative to the density functional theory (DFT) ab initio source, we can afford to use the learned Hessians at every step for the saddle point optimizations. We show that the full machine learned (ML) Hessian robustly finds the transition states of 240 unseen organic reactions, even when the quality of the initial guess structures are degraded, while reducing the number of optimization steps to convergence by 2-3× compared to the quasi-Newton DFT and ML methods. All data generation, NewtonNet model, and ML transition state finding methods are available in an automated workflow.

2.
ArXiv ; 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39279844

RESUMO

In 1999 Wright and Dyson highlighted the fact that large sections of the proteome of all organisms are comprised of protein sequences that lack globular folded structures under physiological conditions. Since then the biophysics community has made significant strides in unraveling the intricate structural and dynamic characteristics of intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs). Unlike crystallographic beamlines and their role in streamlining acquisition of structures for folded proteins, an integrated experimental and computational approach aimed at IDPs/IDRs has emerged. In this Perspective we aim to provide a robust overview of current computational tools for IDPs and IDRs, and most recently their complexes and phase separated states, including statistical models, physics-based approaches, and machine learning methods that permit structural ensemble generation and validation against many solution experimental data types.

3.
J Chem Theory Comput ; 20(19): 8594-8608, 2024 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-39288266

RESUMO

We introduce a general framework for many-body force fields, the Completely Multipolar Model (CMM), that utilizes multipolar electrical moments modulated by exponential decay of electron density as a common functional form for all terms of an energy decomposition analysis of intermolecular interactions. With this common functional form, the CMM model establishes well-formulated damped tensors that reach the correct asymptotes at both long- and short-range while formally ensuring no short-range catastrophes. CMM describes the separable EDA terms of dispersion, exchange polarization, and Pauli repulsion with short-ranged anisotropy, polarization as intramolecular charge fluctuations and induced dipoles, while charge transfer describes explicit movement of charge between molecules, and naturally describes many-body charge transfer by coupling into the polarization equations. We also utilize a new one-body potential that accounts for intramolecular polarization by including an electric field-dependent correction to the Morse potential to ensure that CMM reproduces all physically relevant monomer properties including the dipole moment, molecular polarizability, and dipole and polarizability derivatives. The quality of CMM is illustrated through agreement of individual terms of the EDA and excellent extrapolation to energies and geometries of an extensive validation set of water cluster data.

4.
Bioinformatics ; 40(7)2024 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-38995731

RESUMO

MOTIVATION: Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However, much of protein function is modulated beyond the translated sequence through the introduction of post-translational modifications (PTMs). RESULTS: In this work, we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro, Rosetta, and AlphaFold3 in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions. AVAILABILITY AND IMPLEMENTATION: The codes for dihedral angle computations and library creation are available at https://github.com/THGLab/ptm_sc.git.


Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas , Processamento de Proteína Pós-Traducional , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Algoritmos , Dobramento de Proteína , Método de Monte Carlo , Conformação Proteica , Aminoácidos/química , Aminoácidos/metabolismo , Software
5.
Inorg Chem ; 63(31): 14609-14622, 2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39049593

RESUMO

Metal-organic cages form well-defined microenvironments that can enhance the catalytic proficiency of encapsulated transition metal complexes (TMCs). We introduce a screening protocol to efficiently identify TMCs that are promising candidates for encapsulation in the Ga4L612- nanocage. We obtain TMCs from the Cambridge Structural Database with geometric and electronic characteristics amenable to encapsulation and mine the text of associated manuscripts to curate TMCs with documented catalytic functionality. By docking candidate TMCs inside the nanocage cavity and carrying out electronic structure calculations, we identify a subset of successfully optimized candidates (TMC-34) and observe that encapsulated guests occupy an average of 60% of the cavity volume, in line with previous observations. Notably, some guests occupy as much as 72% of the cavity as a result of linker rotation. Encapsulation has a universal effect on the electrostatic potential (ESP), systematically decreasing the ESP at the metal center of each TMC in the TMC-34 data set, while minimally altering TMC metal partial charges. Collectively these observations support geometry-based screening of potential guests and suggest that encapsulation in Ga4L612- cages could electrostatically stabilize diverse cationic or electropositive intermediates. We highlight candidate guests with associated known reactivity and solubility most amenable for encapsulation in experimental follow-up studies.

6.
J Chem Inf Model ; 2024 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-38843070

RESUMO

Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.

7.
J Phys Chem Lett ; 15(26): 6712-6721, 2024 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-38900596

RESUMO

Water is often the testing ground for new, advanced force fields. While advanced functional forms for intermolecular interactions have been integral to the development of accurate water models, less attention has been paid to a transferable model for intramolecular valence terms. In this work, we present a one-body energy and dipole moment surface model, named 1B-UCB, that is simple yet accurate and can be feasibly adapted for both standard and advanced potentials. 1B-UCB for water is comparable in accuracy to those with much more complex functional forms, despite having drastically fewer parameters. The parametrization protocol has been implemented as part of the Q-Force automated workflow and requires only a quantum mechanical Hessian calculation as reference data, hence allowing it to be easily extended to a variety of molecular systems beyond water, which we demonstrate on a selection of small molecules with different symmetries.

8.
ArXiv ; 2024 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-38764597

RESUMO

Sidechain rotamer libraries of the common amino acids of a protein are useful for folded protein structure determination and for generating ensembles of intrinsically disordered proteins (IDPs). However much of protein function is modulated beyond the translated sequence through thFiguree introduction of post-translational modifications (PTMs). In this work we have provided a curated set of side chain rotamers for the most common PTMs derived from the RCSB PDB database, including phosphorylated, methylated, and acetylated sidechains. Our rotamer libraries improve upon existing methods such as SIDEpro and Rosetta in predicting the experimental structures for PTMs in folded proteins. In addition, we showcase our PTM libraries in full use by generating ensembles with the Monte Carlo Side Chain Entropy (MCSCE) for folded proteins, and combining MCSCE with the Local Disordered Region Sampling algorithms within IDPConformerGenerator for proteins with intrinsically disordered regions.

9.
Nat Commun ; 15(1): 3670, 2024 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-38693110

RESUMO

In charged water microdroplets, which occur in nature or in the lab upon ultrasonication or in electrospray processes, the thermodynamics for reactive chemistry can be dramatically altered relative to the bulk phase. Here, we provide a theoretical basis for the observation of accelerated chemistry by simulating water droplets of increasing charge imbalance to create redox agents such as hydroxyl and hydrogen radicals and solvated electrons. We compute the hydration enthalpy of OH- and H+ that controls the electron transfer process, and the corresponding changes in vertical ionization energy and vertical electron affinity of the ions, to create OH• and H• reactive species. We find that at ~ 20 - 50% of the Rayleigh limit of droplet charge the hydration enthalpy of both OH- and H+ have decreased by >50 kcal/mol such that electron transfer becomes thermodynamically favorable, in correspondence with the more favorable vertical electron affinity of H+ and the lowered vertical ionization energy of OH-. We provide scaling arguments that show that the nanoscale calculations and conclusions extend to the experimental microdroplet length scale. The relevance of the droplet charge for chemical reactivity is illustrated for the formation of H2O2, and has clear implications for other redox reactions observed to occur with enhanced rates in microdroplets.

10.
J Chem Theory Comput ; 20(5): 2152-2166, 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38331423

RESUMO

Theoretical predictions of NMR chemical shifts from first-principles can greatly facilitate experimental interpretation and structure identification of molecules in gas, solution, and solid-state phases. However, accurate prediction of chemical shifts using the gold-standard coupled cluster with singles, doubles, and perturbative triple excitations [CCSD(T)] method with a complete basis set (CBS) can be prohibitively expensive. By contrast, machine learning (ML) methods offer inexpensive alternatives for chemical shift predictions but are hampered by generalization to molecules outside the original training set. Here, we propose several new ideas in ML of the chemical shift prediction for H, C, N, and O that first introduce a novel feature representation, based on the atomic chemical shielding tensors within a molecular environment using an inexpensive quantum mechanics (QM) method, and train it to predict NMR chemical shieldings of a high-level composite theory that approaches the accuracy of CCSD(T)/CBS. In addition, we train the ML model through a new progressive active learning workflow that reduces the total number of expensive high-level composite calculations required while allowing the model to continuously improve on unseen data. Furthermore, the algorithm provides an error estimation, signaling potential unreliability in predictions if the error is large. Finally, we introduce a novel approach to keep the rotational invariance of the features using tensor environment vectors (TEVs) that yields a ML model with the highest accuracy compared to a similar model using data augmentation. We illustrate the predictive capacity of the resulting inexpensive shift machine learning (iShiftML) models across several benchmarks, including unseen molecules in the NS372 data set, gas-phase experimental chemical shifts for small organic molecules, and much larger and more complex natural products in which we can accurately differentiate between subtle diastereomers based on chemical shift assignments.

11.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38060268

RESUMO

SUMMARY: The Local Disordered Region Sampling (LDRS, pronounced loaders) tool is a new module developed for IDPConformerGenerator, a previously validated approach to model intrinsically disordered proteins (IDPs). The IDPConformerGenerator LDRS module provides a method for generating all-atom conformations of intrinsically disordered protein regions at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB or mmCIF formatted structural template of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein. The capabilities of the LDRS tool of IDPConformerGenerator include modeling phosphorylation sites using enhanced Monte Carlo-Side Chain Entropy, transmembrane proteins within an all-atom bilayer, and multi-chain complexes. The modeling capacity of LDRS capitalizes on the modularity, the ability to be used as a library and via command-line, and the computational speed of the IDPConformerGenerator platform. AVAILABILITY AND IMPLEMENTATION: The LDRS module is part of the IDPConformerGenerator modeling suite, which can be downloaded from GitHub at https://github.com/julie-forman-kay-lab/IDPConformerGenerator. IDPConformerGenerator is written in Python3 and works on Linux, Microsoft Windows, and Mac OS versions that support DSSP. Users can utilize LDRS's Python API for scripting the same way they can use any part of IDPConformerGenerator's API, by importing functions from the "idpconfgen.ldrs_helper" library. Otherwise, LDRS can be used as a command line interface application within IDPConformerGenerator. Full documentation is available within the command-line interface as well as on IDPConformerGenerator's official documentation pages (https://idpconformergenerator.readthedocs.io/en/latest/).


Assuntos
Proteínas Intrinsicamente Desordenadas , Software , Biblioteca Gênica , Proteínas de Membrana , Documentação
12.
J Phys Chem Lett ; 14(51): 11742-11749, 2023 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-38116782

RESUMO

The Raman spectrum of liquid water is quite complex, reflecting its strong sensitivity to the local environment of the individual waters. The OH-stretch region of the spectrum, which captures the influence of hydrogen bonding, has only just begun to be unraveled. Here we develop a model for predicting the Raman spectra of the OH-stretch region by considering how local electric fields distort the energy surface of each water monomer. We find that our model is capable of reproducing the bimodal nature of the main peak, with the shoulder at 3250 cm-1 resulting almost entirely from Fermi resonance. Furthermore, we capture the temperature and polarization dependence of the shoulder, which has proven to be difficult to obtain with previous methods, and analyze the origin of this dependence. We expect our model to be generally useful for understanding and predicting how Raman spectra change under different conditions and with different probe reporters beyond water.

13.
ACS Cent Sci ; 9(11): 2161-2170, 2023 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-38033801

RESUMO

We leveraged the power of ChatGPT and Bayesian optimization in the development of a multi-AI-driven system, backed by seven large language model-based assistants and equipped with machine learning algorithms, that seamlessly orchestrates a multitude of research aspects in a chemistry laboratory (termed the ChatGPT Research Group). Our approach accelerated the discovery of optimal microwave synthesis conditions, enhancing the crystallinity of MOF-321, MOF-322, and COF-323 and achieving the desired porosity and water capacity. In this system, human researchers gained assistance from these diverse AI collaborators, each with a unique role within the laboratory environment, spanning strategy planning, literature search, coding, robotic operation, labware design, safety inspection, and data analysis. Such a comprehensive approach enables a single researcher working in concert with AI to achieve productivity levels analogous to those of an entire traditional scientific team. Furthermore, by reducing human biases in screening experimental conditions and deftly balancing the exploration and exploitation of synthesis parameters, our Bayesian search approach precisely zeroed in on optimal synthesis conditions from a pool of 6 million within a significantly shortened time scale. This work serves as a compelling proof of concept for an AI-driven revolution in the chemistry laboratory, painting a future where AI becomes an efficient collaborator, liberating us from routine tasks to focus on pushing the boundaries of innovation.

14.
J Am Chem Soc ; 2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-37917924

RESUMO

Accurate potential energy models of proteins must describe the many different types of noncovalent interactions that contribute to a protein's stability and structure. Pi-pi contacts are ubiquitous structural motifs in all proteins, occurring between aromatic and nonaromatic residues and play a nontrivial role in protein folding and in the formation of biomolecular condensates. Guided by a geometric criterion for isolating pi-pi contacts from classical molecular dynamics simulations of proteins, we use quantum mechanical energy decomposition analysis to determine the molecular interactions that stabilize different pi-pi contact motifs. We find that neutral pi-pi interactions in proteins are dominated by Pauli repulsion and London dispersion rather than repulsive quadrupole electrostatics, which is central to the textbook Hunter-Sanders model. This results in a notable lack of variability in the interaction profiles of neutral pi-pi contacts even with extreme changes in the dielectric medium, explaining the prevalence of pi-stacked arrangements in and between proteins. We also find interactions involving pi-containing anions and cations to be extremely malleable, interacting like neutral pi-pi contacts in polar media and like typical ion-pi interactions in nonpolar environments. Like-charged pairs such as arginine-arginine contacts are particularly sensitive to the polarity of their immediate surroundings and exhibit canonical pi-pi stacking behavior only if the interaction is mediated by environmental effects, such as aqueous solvation.

15.
Chem Sci ; 14(39): 10934-10943, 2023 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-37829021

RESUMO

We present an investigation into the transferability of pseudopotentials (PPs) with a nonlinear core correction (NLCC) using the Goedecker, Teter, and Hutter (GTH) protocol across a range of pure GGA, meta-GGA and hybrid functionals, and their impact on thermochemical and non-thermochemical properties. The GTH-NLCC PP for the PBE density functional demonstrates remarkable transferability to the PBE0 and ωB97X-V exchange-correlation functionals, and relative to no NLCC, improves agreement significantly for thermochemical benchmarks compared to all-electron calculations. On the other hand, the B97M-rV meta-GGA functional performs poorly with the PBE-derived GTH-NLCC PP, which is mitigated by reoptimizing the NLCC parameters for this specific functional. The findings reveal that atomization energies exhibit the greatest improvements from use of the NLCC, which thus provides an important correction needed for covalent interactions relevant to applications involving chemical reactivity. Finally we test the NLCC-GTH PPs when combined with medium-size TZV2P molecularly optimized (MOLOPT) basis sets which are typically utilized in condensed phase simulations, and show that they lead to consistently good results when compared to all-electron calculations for atomization energies, ionization potentials, barrier heights, and non-covalent interactions, but lead to somewhat larger errors for electron affinities.

16.
J Phys Chem A ; 127(36): 7501-7509, 2023 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-37669457

RESUMO

The rates of many chemical reactions are accelerated when carried out in micron-sized droplets, but the molecular origin of the rate acceleration remains unclear. One example is the condensation reaction of 1,2-diaminobenzene with formic acid to yield benzimidazole. The observed rate enhancements have been rationalized by invoking enhanced acidity at the surface of methanol solvent droplets with low water content to enable protonation of formic acid to generate a cationic species (protonated formic acid or PFA) formed by attachment of a proton to the neutral acid. Because PFA is the key feature in this reaction mechanism, vibrational spectra of cryogenically cooled, microhydrated PFA·(H2O)n=1-6 were acquired to determine how the extent of charge localization depends on the degree of hydration. Analysis of these highly anharmonic spectra with path integral ab initio molecular dynamics simulations reveals the gradual displacement of the excess proton onto the water network in the microhydration regime at low temperatures with n = 3 as the tipping point for intra-cluster proton transfer.

17.
bioRxiv ; 2023 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-37546943

RESUMO

The Local Disordered Region Sampling (LDRS, pronounced loaders) tool, developed for the IDPConformerGenerator platform (Teixeira et al. 2022), provides a method for generating all-atom conformations of intrinsically disordered regions (IDRs) at N- and C-termini of and in loops or linkers between folded regions of an existing protein structure. These disordered elements often lead to missing coordinates in experimental structures or low confidence in predicted structures. Requiring only a pre-existing PDB structure of the protein with missing coordinates or with predicted confidence scores and its full-length primary sequence, LDRS will automatically generate physically meaningful conformational ensembles of the missing flexible regions to complete the full-length protein. The capabilities of the LDRS tool of IDPConformerGenerator include modeling phosphorylation sites using enhanced Monte Carlo Side Chain Entropy (MC-SCE) (Bhowmick and Head-Gordon 2015), transmembrane proteins within an all-atom bilayer, and multi-chain complexes. The modeling capacity of LDRS capitalizes on the modularity, ability to be used as a library and via command-line, and computational speed of the IDPConformerGenerator platform.

18.
J Chem Theory Comput ; 19(17): 5872-5885, 2023 Sep 12.
Artigo em Inglês | MEDLINE | ID: mdl-37585272

RESUMO

We use local diffusion maps to assess the quality of two types of collective variables (CVs) for a recently published hydrogen combustion benchmark dataset1 that contains ab initio molecular dynamics (MD) trajectories and normal modes along minimum energy paths. This approach was recently advocated in2 for assessing CVs and analyzing reactions modeled by classical MD simulations. We report the effectiveness of this approach to molecular systems modeled by quantum ab initio MD. In addition to assessing the quality of CVs, we also use global diffusion maps to perform committor analysis as proposed in.2 We show that the committor function obtained from the global diffusion map allows us to identify transition regions of interest in several hydrogen combustion reaction channels.

19.
ArXiv ; 2023 Aug 18.
Artigo em Inglês | MEDLINE | ID: mdl-37645037

RESUMO

Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage. The resulting leak-proof (LP)-PDBBind data is used to retrain four popular SFs: AutoDock vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular we have formulated a new independent data set, BDB2020+, by matching high quality binding free energies from BindingDB with co-crystalized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all the benchmark results, the retrained models using LP-PDBBind that rely on 3D information perform consistently among the best, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.

20.
Mol Phys ; 121(9-10)2023.
Artigo em Inglês | MEDLINE | ID: mdl-37470065

RESUMO

We present a new software package called M-Chem that is designed from scratch in C++ and parallelized on shared-memory multi-core architectures to facilitate efficient molecular simulations. Currently, M-Chem is a fast molecular dynamics (MD) engine that supports the evaluation of energies and forces from two-body to many-body all-atom potentials, reactive force fields, coarse-grained models, combined quantum mechanics molecular mechanics (QM/MM) models, and external force drivers from machine learning, augmented by algorithms that are focused on gains in computational simulation times. M-Chem also includes a range of standard simulation capabilities including thermostats, barostats, multi-timestepping, and periodic cells, as well as newer methods such as fast extended Lagrangians and high quality electrostatic potential generation. At present M-Chem is a developer friendly environment in which we encourage new software contributors from diverse fields to build their algorithms, models, and methods in our modular framework. The long-term objective of M-Chem is to create an interdisciplinary platform for computational methods with applications ranging from biomolecular simulations, reactive chemistry, to materials research.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...