Your browser doesn't support javascript.
loading
An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors.
Gallegos, Miguel; Isamura, Bienfait Kabuyaya; Popelier, Paul L A; Martín Pendás, Ángel.
Afiliación
  • Gallegos M; Department of Analytical and Physical Chemistry, University of Oviedo, Oviedo E-33006, Spain.
  • Isamura BK; Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, U.K.
  • Popelier PLA; Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, U.K.
  • Martín Pendás Á; Department of Analytical and Physical Chemistry, University of Oviedo, Oviedo E-33006, Spain.
J Chem Inf Model ; 64(8): 3059-3079, 2024 Apr 22.
Article en En | MEDLINE | ID: mdl-38498942
ABSTRACT
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Simulación de Dinámica Molecular / Aprendizaje Automático no Supervisado Idioma: En Revista: J Chem Inf Model Asunto de la revista: INFORMATICA MEDICA / QUIMICA Año: 2024 Tipo del documento: Article País de afiliación: España

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Simulación de Dinámica Molecular / Aprendizaje Automático no Supervisado Idioma: En Revista: J Chem Inf Model Asunto de la revista: INFORMATICA MEDICA / QUIMICA Año: 2024 Tipo del documento: Article País de afiliación: España