RESUMO
We present explainable machine learning approaches for the accurate prediction and understanding of solvation free energies, enthalpies, and entropies for different salts in various protic and aprotic solvents. As key input features, we use fundamental contributions from the conceptual density functional theory (DFT) of solutions. The most accurate models with the highest prediction accuracy for the experimental validation data set are decision tree-based approaches such as extreme gradient boosting and extra trees, which highlight the non-linear influence of feature values on target predictions. The detailed assessment of the importance of features in terms of Gini importance criteria as well as Shapley Additive Explanations (SHAP) and permutation and reduction approaches underlines the prominent role of anion and cation solvation effects in combination with fundamental electronic properties of the solvents. These results are reasonably consistent with previous assumptions and provide a solid rationale for more recent theoretical approaches.
Assuntos
Eletrônica , Aprendizado de Máquina , Entropia , Sais , SolventesRESUMO
The calculation of temporally varying upstream process outcomes is a challenging task. Over the last years, several parametric, semi-parametric as well as non-parametric approaches were developed to provide reliable estimates for key process parameters. We present generic and product-specific recurrent neural network (RNN) models for the computation and study of growth and metabolite-related upstream process parameters as well as their temporal evolution. Our approach can be used for the control and study of single product-specific large-scale manufacturing runs as well as generic small-scale evaluations for combined processes and products at development stage. The computational results for the product titer as well as various major upstream outcomes in addition to relevant process parameters show a high degree of accuracy when compared to experimental data and, accordingly, a reasonable predictive capability of the RNN models. The calculated values for the root-mean squared errors of prediction are significantly smaller than the experimental standard deviation for the considered process run ensembles, which highlights the broad applicability of our approach. As a specific benefit for platform processes, the generic RNN model is also used to simulate process outcomes for different temperatures in good agreement with experimental results. The high level of accuracy and the straightforward usage of the approach without sophisticated parameterization and recalibration procedures highlight the benefits of the RNN models, which can be regarded as promising alternatives to existing parametric and semi-parametric methods.