RESUMO
Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here, we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reaction. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster.
RESUMO
We present an open-source MLatom@XACS software ecosystem for on-the-fly surface hopping nonadiabatic dynamics based on the Landau-Zener-Belyaev-Lebedev algorithm. The dynamics can be performed via Python API with a wide range of quantum mechanical (QM) and machine learning (ML) methods, including ab initio QM (CASSCF and ADC(2)), semiempirical QM methods (e.g., AM1, PM3, OMx, and ODMx), and many types of ML potentials (e.g., KREG, ANI, and MACE). Combinations of QM and ML methods can also be used. While the user can build their own combinations, we provide AIQM1, which is based on Δ-learning and can be used out-of-the-box. We showcase how AIQM1 reproduces the isomerization quantum yield of trans-azobenzene at a low cost. We provide example scripts that, in dozens of lines, enable the user to obtain the final population plots by simply providing the initial geometry of a molecule. Thus, those scripts perform geometry optimization, normal mode calculations, initial condition sampling, parallel trajectories propagation, population analysis, and final result plotting. Given the capabilities of MLatom to be used for training different ML models, this ecosystem can be seamlessly integrated into the protocols building ML models for nonadiabatic dynamics. In the future, a deeper and more efficient integration of MLatom with Newton-X will enable a vast range of functionalities for surface hopping dynamics, such as fewest-switches surface hopping, to facilitate similar workflows via the Python API.
RESUMO
Machine learning potentials (MLPs) are widely applied as an efficient alternative way to represent potential energy surfaces (PESs) in many chemical simulations. The MLPs are often evaluated with the root-mean-square errors on the test set drawn from the same distribution as the training data. Here, we systematically investigate the relationship between such test errors and the simulation accuracy with MLPs on an example of a full-dimensional, global PES for the glycine amino acid. Our results show that the errors in the test set do not unambiguously reflect the MLP performance in different simulation tasks, such as relative conformer energies, barriers, vibrational levels, and zero-point vibrational energies. We also offer an easily accessible solution for improving the MLP quality in a simulation-oriented manner, yielding the most precise relative conformer energies and barriers. This solution also passed the stringent test by diffusion Monte Carlo simulations.
RESUMO
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command-line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing service at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pretrained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
RESUMO
Molecular dynamics (MD) is a widely-used tool for simulating molecular and materials properties. It is common wisdom that molecular dynamics simulations should obey physical laws and, hence, lots of effort is put into ensuring that molecular dynamics simulations are energy conserving. The emergence of machine learning (ML) potentials for MD leads to a growing realization that monitoring conservation of energy during simulations is of low utility because the dynamics is often unphysically dissociative. Other ML methods for MD are not based on a potential and provide only forces or trajectories which are reasonable but not necessarily energy-conserving. Here we propose to clearly distinguish between the simulation-energy and true-energy conservation and highlight that the simulations should focus on decreasing the degree of true-energy non-conservation. We introduce very simple, new criteria for evaluating the quality of molecular dynamics by estimating the degree of true-energy non-conservation and we demonstrate their practical utility on an example of infrared spectra simulations. These criteria are more important and intuitive than simply evaluating the quality of the ML potential energies and forces as is commonly done and can be applied universally, e.g., even for trajectories with unknown or discontinuous potential energy. Such an approach introduces new standards for evaluating MD by focusing on the true-energy conservation and can help in developing more accurate methods for simulating molecular and materials properties.
RESUMO
We demonstrate that AI can learn atomistic systems in the four-dimensional (4D) spacetime. For this, we introduce the 4D-spacetime GICnet model, which for the given initial conditions (nuclear positions and velocities at time zero) can predict nuclear positions and velocities as a continuous function of time up to the distant future. Such models of molecules can be unrolled in the time dimension to yield long-time high-resolution molecular dynamics trajectories with high efficiency and accuracy. 4D-spacetime models can make predictions for different times in any order and do not need a stepwise evaluation of forces and integration of the equations of motions at discretized time steps, which is a major advance over traditional, cost-inefficient molecular dynamics. These models can be used to speed up dynamics, simulate vibrational spectra, and obtain deeper insight into nuclear motions, as we demonstrate for a series of organic molecules.
RESUMO
The KREG and pKREG models were proven to enable accurate learning of multidimensional single-molecule surfaces of quantum chemical properties such as ground-state potential energies, excitation energies, and oscillator strengths. These models are based on kernel ridge regression (KRR) with the Gaussian kernel function and employ a relative-to-equilibrium (RE) global molecular descriptor, while pKREG is designed to enforce invariance under atom permutations with a permutationally invariant kernel. Here we extend these two models to also explicitly include the derivative information from the training data into the models, which greatly improves their accuracy. We demonstrate on the example of learning potential energies and energy gradients that KREG and pKREG models are better or on par with state-of-the-art machine learning models. We also found that in challenging cases both energy and energy gradient labels should be learned to properly model potential energy surfaces and learning only energies or gradients is insufficient. The models' open-source implementation is freely available in the MLatom package for general-purpose atomistic machine learning simulations, which can be also performed on the MLatom@XACS cloud computing service.
RESUMO
Artificial intelligence-enhanced quantum mechanical method 1 (AIQM1) is a general-purpose method that was shown to achieve high accuracy for many applications with a speed close to its baseline semiempirical quantum mechanical (SQM) method ODM2*. Here, we evaluate the hitherto unknown performance of out-of-the-box AIQM1 without any refitting for reaction barrier heights on eight datasets, including a total of â¼24 thousand reactions. This evaluation shows that AIQM1's accuracy strongly depends on the type of transition state and ranges from excellent for rotation barriers to poor for, e.g., pericyclic reactions. AIQM1 clearly outperforms its baseline ODM2* method and, even more so, a popular universal potential, ANI-1ccx. Overall, however, AIQM1 accuracy largely remains similar to SQM methods (and B3LYP/6-31G* for most reaction types) suggesting that it is desirable to focus on improving AIQM1 performance for barrier heights in the future. We also show that the built-in uncertainty quantification helps in identifying confident predictions. The accuracy of confident AIQM1 predictions is approaching the level of popular density functional theory methods for most reaction types. Encouragingly, AIQM1 is rather robust for transition state optimizations, even for the type of reactions it struggles with the most. Single-point calculations with high-level methods on AIQM1-optimized geometries can be used to significantly improve barrier heights, which cannot be said for its baseline ODM2* method.
RESUMO
Molecules with strong two-photon absorption (TPA) are important in many advanced applications such as upconverted laser and photodynamic therapy, but their design is hampered by the high cost of experimental screening and accurate quantum chemical (QC) calculations. Here a systematic study is performed by collecting an experimental TPA database with ≈900 molecules, analyzing with interpretable machine learning (ML) the key molecular features explaining TPA magnitudes, and building a fast ML model for predictions. The ML model has prediction errors of similar magnitude compared to experimental and affordable QC methods errors and has the potential for high-throughput screening as additionally validated with the new experimental measurements. ML feature analysis is generally consistent with common beliefs which is quantified and rectified. The most important feature is conjugation length followed by features reflecting the effects of donor and acceptor substitution and coplanarity.
RESUMO
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
RESUMO
Atomistic machine learning (AML) simulations are used in chemistry at an ever-increasing pace. A large number of AML models has been developed, but their implementations are scattered among different packages, each with its own conventions for input and output. Thus, here we give an overview of our MLatom 2 software package, which provides an integrative platform for a wide variety of AML simulations by implementing from scratch and interfacing existing software for a range of state-of-the-art models. These include kernel method-based model types such as KREG (native implementation), sGDML, and GAP-SOAP as well as neural-network-based model types such as ANI, DeepPot-SE, and PhysNet. The theoretical foundations behind these methods are overviewed too. The modular structure of MLatom allows for easy extension to more AML model types. MLatom 2 also has many other capabilities useful for AML simulations, such as the support of custom descriptors, farthest-point and structure-based sampling, hyperparameter optimization, model evaluation, and automatic learning curve generation. It can also be used for such multi-step tasks as Δ-learning, self-correction approaches, and absorption spectrum simulation within the machine-learning nuclear-ensemble approach. Several of these MLatom 2 capabilities are showcased in application examples.
Assuntos
Simulação por Computador , Hidrocarbonetos Cíclicos/química , Aprendizado de Máquina , Software , Estrutura MolecularRESUMO
Fucosylation and its fucosidic linkage-specific motifs are believed to be essential to understand their distinct roles in cellular behavior, but their quantitative information has not yet been fully disclosed due to the requirements of ultra-sensitivity and selectivity. Herein, we report an approach that converts fucose (Fuc) to stable europium (Eu) isotopic mass signal on hard ionization inductively coupled plasma mass spectrometry (ICP-MS). Metabolically assembled azido-fucose on the cell surface allows us to tag them with an alkyne-customized Eu-crafted bacteriophage MS2 capsid nanoparticle for Eu signal multiplication, resulting in an ever lowest detection limit of 4.2 zmol Fuc. Quantitative breakdown of the linkage-specific fucosylation motifs in situ preserved on single cancerous HepG2 and paracancerous HL7702 cells can thus be realized on a single-cell ICP-MS platform, specifying their roles during the cancering process. This approach was further applied to the discrimination of normal hepatocellular cells and highly, moderately, and poorly differentiated hepatoma cells collected from real hepatocellular carcinoma tissues.
RESUMO
Millions of tons of collagen-rich bovine bone are produced as byproducts of the consumption of beef. Hydrolyzing bovine bone collagen (BBC) is an effective measure for both increasing its added value and protecting the environment. In this study, a kind of recombinant bacterial collagenase mining from Bacillus cereus was successfully performed and applied to hydrolyze BBC to collagen-soluble peptides (CPP). Response surface methodology (RSM) was applied to optimize the processing conditions of antioxidant CPP, attaining a distinguished ABTS free radical scavenging activity of 99.21 ± 0.35% while keeping DPPH free radical scavenging activity and reducing power at high levels under the optimal condition. Furthermore, we identified five new antioxidant peptides by LC-MS/MS with typical collagen repeated Gly-Xaa-Yaa sequence units within the CPP. These results suggest that our recombinant collagenase is a powerful tool for degrading collagen and the CPP are promising candidates for antioxidant and related functional food applications.
Assuntos
Bacillus cereus/enzimologia , Osso e Ossos/química , Colágeno/química , Colagenases/metabolismo , Fragmentos de Peptídeos/metabolismo , Fragmentos de Peptídeos/farmacologia , Proteínas Recombinantes/metabolismo , Animais , Antioxidantes/metabolismo , Antioxidantes/farmacologia , Bovinos , HidróliseRESUMO
Ru(bpy)3@SiO2-COOH and Ru(bpy)3@SiO2@CD47-peptide nanoparticles (NPs) with fluorescent and mass spectrometric properties were designed and synthesized as the models of drug-nanocarriers. Their phagocytic internalization could be quantitatively measured using more sensitive inductively coupled plasma mass spectrometry (ICPMS) (102Ru) versus traditional laser confocal scanning microscope (λex/em = 458/600 nm) for the first time. Modification of a self-signal trigging CD47-peptide on the NPs' surface decreased internalization by 10 times, (2.79 ± 0.21) × 104 Ru(bpy)3@SiO2-COOH and (0.28 ± 0.04) × 104 Ru(bpy)3@SiO2@CD47-peptide NPs per RAW264.7 macrophage (n = 5). The alkynyl-linked CD47-peptide allowed us to quantify the number (2412 ± 250) of CD47-peptide modified on the NP and the total content (5.14 ± 0.25 amol) of signal regulatory protein α (SIRPα) on the macrophage by measuring the clickable tagged Eu using ICPMS. Furthermore, the interaction between CD47-peptide and SIRPα as well as the changes of the remaining free SIRPα during the internalization process of Ru(bpy)3@SiO2@CD47-peptide NPs were quantitatively evaluated, providing direct experimental evidence of the longspeculated crucial CD47-SIRPα interaction for drug-nanocarriers to escape internalization by phagocytic cells. Remarkable difference in the internalization ratio of 12.3 ± 4.8 of Ru(bpy)3@SiO2-COOH NPs and 4.3 ± 0.5 Ru(bpy)3@SiO2@CD47-peptide NPs with and without the protein corona indicated that CD47-peptide still worked when the protein corona formed. Not limited to the evaluation of the NPs studied here, such a fluorescent and mass spectrometric approach is very much expected to apply to the assessment of other drug-nanocarriers designed by chemists and before their medical applications. Graphical abstract.
Assuntos
Antígeno CD47/metabolismo , Espectrometria de Massas/métodos , Fagocitose , Espectrometria de Fluorescência/métodos , Sequência de Aminoácidos , Animais , Antígeno CD47/química , Humanos , Camundongos , Células RAW 264.7 , Compostos de Rutênio/químicaRESUMO
We report an inhibitory covalent labeling and clickable-element-tagging strategy for measuring the absolute activity of a protease in cells using inductively coupled plasma mass spectrometry (ICPMS). Epoxysuccinyl-leucine-tyrosine-6-aminocaproic-lysine-amino-Boc-alkyne (epoxysuccinyl-LYK-alkyne) was designed and synthesized to achieve irreversibly labeling of the cysteine cathepsins, recording their momentary activities. L and Y assisted epoxysuccinyl-LYK-alkyne in accessing the deprotonated -S- of Cys25, located at the bottom of the long cathepsin active domain. Quantitative Eu-tagging was followed using azido-DOTA-Eu through a bioorthogonal 1:1 copper-catalyzed azide-alkyne-cycloaddition click reaction. The Eu tag could be absolutely quantified using 153Eu-species-nonspecific-isotope-dilution ICPMS coupled with HPLC, serving as a Eu ruler and allowing us to simultaneously measure the pH-dependent activities of cathepsins B, L, and S as well as the pH in the lysosomal microenvironment of liver cancerous C7721 and paracancerous C7701 cells. As long as suitable labeling molecules and elemental tags are designed and synthesized, we believe that such a tandem labeling and tagging ICPMS approach can be applied to the measurement of the activities of other proteases in cells, providing more accurate information on the proteases' biofunctions and thus implementing precise clinical diagnoses.
Assuntos
Catepsinas/metabolismo , Európio/química , Hepatócitos/metabolismo , Espectrometria de Massas/métodos , Catepsinas/química , Linhagem Celular , Cromatografia Líquida de Alta Pressão , Química Click , Cisteína/química , Hepatócitos/química , Humanos , Concentração de Íons de Hidrogênio , Lisossomos/química , Lisossomos/metabolismo , Técnica de Diluição de RadioisótoposRESUMO
Although rare cancerous cells are considered as more objective indications for a precise early diagnosis of cancers, accurate counting of them still is a spirited challenge. We reported a signal multiplication strategy by constructing element-tagged viruslike nanoparticles (VLNPs) with a precise number of atoms for a membrane biomarker mediated higher sensitive cell counting using inductively coupled plasma mass spectrometry (ICPMS). Typical bacteriophage MS2 was exemplified to demonstrate the effectiveness of the element-tagged VLNPs as signal multipliers. Dibenzylcyclooctyne-poly(ethylene glycol)-folate (DBCO-PEG-FA) and DOTA-Eu complex tag modified (FA-PEG)69-MS2-(DOTA-Eu)965 targeted the folate receptor (FR) on KB cells as low as subzeptomole FRs could be quantified by 153Eu-species unspecific isotope dilution ICPMS, allowing us to be able to count at least 5 KB cells. While more than 2197 KB cells were needed to give a significant ICPMS signal using FA-PEG-DOTA-Eu, demonstrating more than 2 orders of magnitude signal multiplication and resulting in total 4.0 × 108 times signal amplification relative to one KB cell. We believe that such a signal multiplication strategy can be expanded to quantify and count other membrane biomarkers and their host cells using various VLNPs modified with different kinds and precise numbers of elements and guiding groups. In this way, prescribed multiples of signal amplification can be realized for a more accurate ICPMS-based quantitative bioanalysis because targeted molecules/cells in a complicated biological system might exist in orders of magnitude wide concentration range.