RESUMO
Analyzing machine learning models, especially nonlinear ones, poses significant challenges. In this context, centered kernel alignment (CKA) has emerged as a promising model analysis tool that assesses the similarity between two embeddings. CKA's efficacy depends on selecting a kernel that adequately captures the underlying properties of the compared models. The model analysis tool was designed for neural networks (NNs) with their invariance to data rotation in mind and has been successfully employed in various scientific domains. However, CKA has rarely been adopted in cheminformatics, partly because of the popularity of the random forest (RF) machine learning algorithm, which is not rotationally invariant. In this work, we present the adaptation of CKA that builds on the RF kernel to match the properties of RF. As part of the method validation, we show that the model analysis method is well-correlated with the prediction similarity of RF models. Furthermore, we demonstrate how CKA with the RF kernel can be utilized to analyze and explain the behavior of RF models derived from molecular and rooted fingerprints.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Algoritmos , Quimioinformática/métodos , Modelos MolecularesRESUMO
In this study, we investigate the influence of chiral and achiral cations on the enantiomerization of biphenylic anions in n-butylmethylether and water. In addition to the impact of the cations and solvent molecules on the free energy profile of rotation, we also explore if chirality transfer between a chiral cation and the biphenylic anion is possible, i.e., if pairing with a chiral cation can energetically favour one conformer of the anion via diastereomeric complex formation. The quantum-mechanical calculations are accompanied by polarizable MD simulations using umbrella sampling to study the impact of solvents of different polarity in more detail. We also discuss how accurate polarizable force fields for biphenylic anions can be constructed from quantum-mechanical reference data.