Your browser doesn't support javascript.
loading
Interpretable Machine Learning Models for Phase Prediction in Polymerization-Induced Self-Assembly.
Lu, Yiwen; Yalcin, Dilek; Pigram, Paul J; Blackman, Lewis D; Boley, Mario.
Afiliação
  • Lu Y; Department for Data Science and AI, Monash University, Wellington Road, Clayton, VIC 3168, Australia.
  • Yalcin D; CSIRO, Manufacturing Business Unit, Research Way, Clayton, VIC 3168, Australia.
  • Pigram PJ; Centre for Materials and Surface Science, Department of Chemistry and Physics, La Trobe University, Melbourne, VIC 3086, Australia.
  • Blackman LD; Centre for Materials and Surface Science, Department of Chemistry and Physics, La Trobe University, Melbourne, VIC 3086, Australia.
  • Boley M; CSIRO, Manufacturing Business Unit, Research Way, Clayton, VIC 3168, Australia.
J Chem Inf Model ; 63(11): 3288-3306, 2023 06 12.
Article em En | MEDLINE | ID: mdl-37208794
ABSTRACT
While polymerization-induced self-assembly (PISA) has become a preferred synthetic route toward amphiphilic block copolymer self-assemblies, predicting their phase behavior from experimental design is extremely challenging, requiring time and work-intensive creation of empirical phase diagrams whenever self-assemblies of novel monomer pairs are sought for specific applications. To alleviate this burden, we develop here the first framework for a data-driven methodology for the probabilistic modeling of PISA morphologies based on a selection and suitable adaption of statistical machine learning methods. As the complexity of PISA precludes generating large volumes of training data with in silico simulations, we focus on interpretable low variance methods that can be interrogated for conformity with chemical intuition and that promise to work well with only 592 training data points which we curated from the PISA literature. We found that among the evaluated linear models, generalized additive models, and rule and tree ensembles, all but the linear models show a decent interpolation performance with around 0.2 estimated error rate and 1 bit expected cross entropy loss (surprisal) when predicting the mixture of morphologies formed from monomer pairs already encountered in the training data. When considering extrapolation to new monomer combinations, the model performance is weaker but the best model (random forest) still achieves highly nontrivial prediction performance (0.27 error rate, 1.6 bit surprisal), which renders it a good candidate to support the creation of empirical phase diagrams for new monomers and conditions. Indeed, we find in three case studies that, when used to actively learn phase diagrams, the model is able to select a smart set of experiments that lead to satisfactory phase diagrams after observing only relatively few data points (5-16) for the targeted conditions. The data set as well as all model training and evaluation codes are publicly available through the GitHub repository of the last author.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Austrália

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Aprendizado de Máquina Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: J Chem Inf Model Assunto da revista: INFORMATICA MEDICA / QUIMICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Austrália