Your browser doesn't support javascript.
loading
Bias-Free Chemically Diverse Test Sets from Machine Learning.
Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S.
Affiliation
  • Swann ET; Data61 CSIRO , Molecular & Materials Modelling, Door 34, Goods Shed, Village Street, Docklands, Victoria 3008, Australia.
  • Fernandez M; Data61 CSIRO , Molecular & Materials Modelling, Door 34, Goods Shed, Village Street, Docklands, Victoria 3008, Australia.
  • Coote ML; ARC Centre of Excellence for Electromaterials Science, Research School of Chemistry, Australian National University , Canberra, Australian Capital Territory 2601, Australia.
  • Barnard AS; Data61 CSIRO , Molecular & Materials Modelling, Door 34, Goods Shed, Village Street, Docklands, Victoria 3008, Australia.
ACS Comb Sci ; 19(8): 544-554, 2017 08 14.
Article de En | MEDLINE | ID: mdl-28722399
ABSTRACT
Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.
Sujet(s)
Mots clés

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Bases de données chimiques / Apprentissage machine / Modèles chimiques Langue: En Journal: ACS Comb Sci Année: 2017 Type de document: Article Pays d'affiliation: Australie

Texte intégral: 1 Collection: 01-internacional Base de données: MEDLINE Sujet principal: Bases de données chimiques / Apprentissage machine / Modèles chimiques Langue: En Journal: ACS Comb Sci Année: 2017 Type de document: Article Pays d'affiliation: Australie