J Chem Theory Comput ; 20(5): 2152-2166, 2024 Mar 12.
Article in English | MEDLINE | ID: mdl-38331423


Theoretical predictions of NMR chemical shifts from first-principles can greatly facilitate experimental interpretation and structure identification of molecules in gas, solution, and solid-state phases. However, accurate prediction of chemical shifts using the gold-standard coupled cluster with singles, doubles, and perturbative triple excitations [CCSD(T)] method with a complete basis set (CBS) can be prohibitively expensive. By contrast, machine learning (ML) methods offer inexpensive alternatives for chemical shift predictions but are hampered by generalization to molecules outside the original training set. Here, we propose several new ideas in ML of the chemical shift prediction for H, C, N, and O that first introduce a novel feature representation, based on the atomic chemical shielding tensors within a molecular environment using an inexpensive quantum mechanics (QM) method, and train it to predict NMR chemical shieldings of a high-level composite theory that approaches the accuracy of CCSD(T)/CBS. In addition, we train the ML model through a new progressive active learning workflow that reduces the total number of expensive high-level composite calculations required while allowing the model to continuously improve on unseen data. Furthermore, the algorithm provides an error estimation, signaling potential unreliability in predictions if the error is large. Finally, we introduce a novel approach to keep the rotational invariance of the features using tensor environment vectors (TEVs) that yields a ML model with the highest accuracy compared to a similar model using data augmentation. We illustrate the predictive capacity of the resulting inexpensive shift machine learning (iShiftML) models across several benchmarks, including unseen molecules in the NS372 data set, gas-phase experimental chemical shifts for small organic molecules, and much larger and more complex natural products in which we can accurately differentiate between subtle diastereomers based on chemical shift assignments.

J Chem Theory Comput ; 19(21): 7704-7714, 2023 Nov 14.
Article in English | MEDLINE | ID: mdl-37922416


This paper presents a novel theoretical measure, µEMD, based on the earth mover's distance (EMD), for quantifying the density shift caused by electronic excitations in molecules. As input, the EMD metric uses only the discretized ground- and excited-state electron densities in real space, rendering it compatible with almost all electronic structure methods used to calculate excited states. The EMD metric is compared against other popular theoretical metrics for describing the extent of electron-hole separation in a wide range of excited states (valence, Rydberg, charge transfer, etc.). The results showcase the EMD metric's effectiveness across all excitation types and suggest that it is useful as an additional tool to characterize electronic excitations. The study also reveals that µEMD can function as a promising diagnostic tool for predicting the failure of pure exchange-correlation functionals. Specifically, we show statistical relationships among the functional-driven errors, the exact exchange content within the functional, and the magnitude of µEMD values.

J Phys Chem A ; 127(29): 5999-6011, 2023 Jul 27.
Article in English | MEDLINE | ID: mdl-37441795


The stability and distributions of small water clusters generated in a supersonic beam expansion are interrogated by tunable vacuum ultraviolet (VUV) radiation generated at a synchrotron. Time-of-flight mass spectrometry reveals enhanced population of various protonated water clusters (H+(H2O)n) based upon ionization energy and photoionization distance from source, suggesting there are "magic" numbers below the traditional n = 21 that predominates in the literature. These intensity distributions suggest that VUV threshold photoionization (11.0-11.5 eV) of neutral water clusters close to the nozzle exit leads to a different nonequilibrium state compared to a skimmed molecular beam. This results in the appearance of a new magic number at 14. Metadynamics conformer searches coupled with modern density functional calculations are used to identify the global minimum energy structures of protonated water clusters between n = 2 and 21, as well as the manifold of low-lying metastable minima. New lowest energy structures are reported for the cases of n = 5, 6, 11, 12, 16, and 18, and special stability is identified by several measures. These theoretical results are in agreement with the experiments performed in this work in that n = 14 is shown to exhibit additional stability, based on the computed second-order stabilization energy relative to most cluster sizes, though not to the extent of the well-known n = 21 cluster. Other cluster sizes that show some additional energetic stability are n = 7, 9, 12, 17, and 19. To gain insight into the balance between ion-water and water-water interactions as a function of the cluster size, an analysis of the effective two-body interactions (which sum exactly to the total interaction energy) was performed. This analysis reveals a crossover as a function of cluster size between a water-hydronium-dominated regime for small clusters and a water-water-dominated regime for larger clusters around n = 17.

J Chem Phys ; 158(20)2023 May 28.
Article in English | MEDLINE | ID: mdl-37218699


VV10 is a powerful nonlocal density functional for long-range correlation that is used to include dispersion effects in many modern density functionals, such as the meta-generalized gradient approximation (mGGA), B97M-V, the hybrid GGA, ωB97X-V, and the hybrid mGGA, ωB97M-V. While energies and analytical gradients for VV10 are already widely available, this study reports the first derivation and efficient implementation of the analytical second derivatives of the VV10 energy. The additional compute cost of the VV10 contributions to analytical frequencies is shown to be small in all but the smallest basis sets for recommended grid sizes. This study also reports the assessment of VV10-containing functionals for predicting harmonic frequencies using the analytical second derivative code. The contribution of VV10 to simulating harmonic frequencies is shown to be small for small molecules but important for systems where weak interactions are important, such as water clusters. In the latter cases, B97M-V, ωB97M-V, and ωB97X-V perform very well. The convergence of frequencies with respect to the grid size and atomic orbital basis set size is studied, and recommendations are reported. Finally, scaling factors to allow comparison of scaled harmonic frequencies with experimental fundamental frequencies and to predict zero-point vibrational energy are presented for some recently developed functionals (including r2SCAN, B97M-V, ωB97X-V, M06-SX, and ωB97M-V).

J Chem Phys ; 158(16)2023 Apr 28.
Article in English | MEDLINE | ID: mdl-37114707


We developed and implemented a method-independent, fully numerical, finite difference approach to calculating nuclear magnetic resonance shieldings, using gauge-including atomic orbitals. The resulting capability can be used to explore non-standard methods, given only the energy as a function of finite-applied magnetic fields and nuclear spins. For example, standard second-order Møller-Plesset theory (MP2) has well-known efficacy for 1H and 13C shieldings and known limitations for other nuclei such as 15N and 17O. It is, therefore, interesting to seek methods that offer good accuracy for 15N and 17O shieldings without greatly increased compute costs, as well as exploring whether such methods can further improve 1H and 13C shieldings. Using a small molecule test set of 28 species, we assessed two alternatives: κ regularized MP2 (κ-MP2), which provides energy-dependent damping of large amplitudes, and MP2.X, which includes a variable fraction, X, of third-order correlation (MP3). The aug-cc-pVTZ basis was used, and coupled cluster with singles and doubles and perturbative triples [CCSD(T)] results were taken as reference values. Our κ-MP2 results reveal significant improvements over MP2 for 13C and 15N, with the optimal κ value being element-specific. κ-MP2 with κ = 2 offers a 30% rms error reduction over MP2. For 15N, κ-MP2 with κ = 1.1 provides a 90% error reduction vs MP2 and a 60% error reduction vs CCSD. On the other hand, MP2.X with a scaling factor of 0.6 outperformed CCSD for all heavy nuclei. These results can be understood as providing renormalization of doubles amplitudes to partially account for neglected triple and higher substitutions and offer promising opportunities for future applications.

J Chem Theory Comput ; 19(2): 514-523, 2023 Jan 24.
Article in English | MEDLINE | ID: mdl-36594660


This paper presents a systematic study of applying composite method approximations with locally dense basis sets (LDBS) to efficiently calculate NMR shielding constants in small and medium-sized molecules. The pcSseg-n series of basis sets are shown to have similar accuracy to the pcS-n series when n ≥ 1 and can slightly reduce computational costs. We identify two different LDBS partition schemes that perform very effectively for density functional calculations. We select a large subset of the recent NS372 database containing 290 H, C, N, and O shielding values evaluated by reference methods on 106 molecules to carefully assess methods of the high, medium, and low computational costs to make practical recommendations. Our assessment covers conventional electronic structure methods (density functional theory and wave function) with global basis calculations, as well as their use in one of the satisfactory LDBS approaches, and a range of composite approaches, also with and without LDBS. Altogether 99 methods are evaluated. On this basis, we recommend different methods to reach three different levels of accuracy and time requirements across the four nuclei considered.

J Chem Theory Comput ; 18(6): 3460-3473, 2022 Jun 14.
Article in English | MEDLINE | ID: mdl-35533317


In this paper, the performance of more than 40 popular or recently developed density functionals is assessed for the calculation of 463 vertical excitation energies against the large and accurate QuestDB benchmark set. For this purpose, the Tamm-Dancoff approximation offers a good balance between computational efficiency and accuracy. The functionals ωB97X-D and BMK are found to offer the best performance overall with a root-mean square error (RMSE) of around 0.27 eV, better than the computationally more demanding CIS(D) wave function method with a RMSE of 0.36 eV. The results also suggest that Jacob's ladder still holds for time-dependent density functional theory excitation energies, though hybrid meta generalized-gradient approximations (meta-GGAs) are not generally better than hybrid GGAs. Effects of basis set convergence, gauge invariance correction to meta-GGAs, and nonlocal correlation (VV10) are also studied, and practical basis set recommendations are provided.

Mol Phys ; 119(21-22)2021.
Article in English | MEDLINE | ID: mdl-35264815


Magnetic properties of molecules such as magnetizabilities represent second order derivatives of the energy with respect to external perturbations. To avoid the need for analytic second derivatives and thereby permit evaluation of the performance of methods where they are not available, a new implementation of quantum chemistry calculations in finite applied magnetic fields is reported. This implementation is employed for a collection of small molecules with the aug-cc-pVTZ basis set to assess orbital optimized (OO) MP2 and a recently proposed regularized variant of OOMP2, called κ-OOMP2. κ-OOMP2 performs significantly better than conventional second order Møller-Plesset (MP2) theory, by reducing MP2's exaggeration of electron correlation effects. As a chemical application, we revisit an old aromaticity criterion called magnetizability exaltation. In lieu of empirical tables or increment systems to generate references, we instead use straight chain molecules with the same formal bond structure as the target cyclic planar conjugated molecules. This procedure is found to be useful for qualitative analysis, yielding exaltations that are typically negative for aromatic species and positive for antiaromatic molecules. One interesting species, N2S2, shows a positive exaltation despite having aromatic characteristics.