Your browser doesn't support javascript.
loading
Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors.
Zhao, Zhi-Wen; Del Cueto, Marcos; Troisi, Alessandro.
Afiliación
  • Zhao ZW; Department of Chemistry, University of Liverpool Liverpool L69 3BX UK m.del-cueto@liverpool.ac.uk.
  • Del Cueto M; Institute of Functional Material Chemistry, Faculty of Chemistry, Northeast Normal University Changchun 130024 Jilin P. R. China.
  • Troisi A; Department of Chemistry, University of Liverpool Liverpool L69 3BX UK m.del-cueto@liverpool.ac.uk.
Digit Discov ; 1(3): 266-276, 2022 Jun 13.
Article en En | MEDLINE | ID: mdl-35769202
ABSTRACT
We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value.

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Digit Discov Año: 2022 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Digit Discov Año: 2022 Tipo del documento: Article