Evaluating molecular representations in machine learning models for drug response prediction and interpretability.

Baptista, Delora; Correia, João; Pereira, Bruno; Rocha, Miguel

Baptista, Delora; Correia, João; Pereira, Bruno; Rocha, Miguel.

Afiliação

Baptista D; Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal.
Correia J; Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal.
Pereira B; Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal.
Rocha M; Centre of Biological Engineering, University of Minho, Campus of Gualtar, Braga, Portugal.

J Integr Bioinform ; 19(3)2022 Sep 01.

Article em En | MEDLINE | ID: mdl-36017668

RESUMO

Machine learning (ML) is increasingly being used to guide drug discovery processes. When applying ML approaches to chemical datasets, molecular descriptors and fingerprints are typically used to represent compounds as numerical vectors. However, in recent years, end-to-end deep learning (DL) methods that can learn feature representations directly from line notations or molecular graphs have been proposed as alternatives to using precomputed features. This study set out to investigate which compound representation methods are the most suitable for drug sensitivity prediction in cancer cell lines. Twelve different representations were benchmarked on 5 compound screening datasets, using DeepMol, a new chemoinformatics package developed by our research group, to perform these analyses. The results of this study show that the predictive performance of end-to-end DL models is comparable to, and at times surpasses, that of models trained on molecular fingerprints, even when less training data is available. This study also found that combining several compound representation methods into an ensemble can improve performance. Finally, we show that a post hoc feature attribution method can boost the explainability of the DL models.

Assuntos

Descoberta de Drogas; Aprendizado de Máquina; Descoberta de Drogas/métodos

Palavras-chave

cancer; deep learning; drug sensitivity; learned representations; molecular fingerprints

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Descoberta de Drogas / Aprendizado de Máquina Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Descoberta de Drogas / Aprendizado de Máquina Idioma: En Ano de publicação: 2022 Tipo de documento: Article