Your browser doesn't support javascript.
loading
Hybrid fragment-SMILES tokenization for ADMET prediction in drug discovery.
Aksamit, Nicholas; Tchagang, Alain; Li, Yifeng; Ombuki-Berman, Beatrice.
Afiliação
  • Aksamit N; Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada.
  • Tchagang A; Digital Technologies Research Centre, National Research Council Canada, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada.
  • Li Y; Department of Computer Science, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada. yli2@brocku.ca.
  • Ombuki-Berman B; Department of Biological Sciences, Brock University, 1812 Sir Isaac Brock Way, St. Catharines, ON, L2S 3A1, Canada. yli2@brocku.ca.
BMC Bioinformatics ; 25(1): 255, 2024 Aug 01.
Article em En | MEDLINE | ID: mdl-39090573
ABSTRACT

BACKGROUND:

Drug discovery and development is the extremely costly and time-consuming process of identifying new molecules that can interact with a biomarker target to interrupt the disease pathway of interest. In addition to binding the target, a drug candidate needs to satisfy multiple properties affecting absorption, distribution, metabolism, excretion, and toxicity (ADMET). Artificial intelligence approaches provide an opportunity to improve each step of the drug discovery and development process, in which the first question faced by us is how a molecule can be informatively represented such that the in-silico solutions are optimized.

RESULTS:

This study introduces a novel hybrid SMILES-fragment tokenization method, coupled with two pre-training strategies, utilizing a Transformer-based model. We investigate the efficacy of hybrid tokenization in improving the performance of ADMET prediction tasks. Our approach leverages MTL-BERT, an encoder-only Transformer model that achieves state-of-the-art ADMET predictions, and contrasts the standard SMILES tokenization with our hybrid method across a spectrum of fragment library cutoffs.

CONCLUSION:

The findings reveal that while an excess of fragments can impede performance, using hybrid tokenization with high frequency fragments enhances results beyond the base SMILES tokenization. This advancement underscores the potential of integrating fragment- and character-level molecular features within the training of Transformer models for ADMET property prediction.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Descoberta de Drogas Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Canadá País de publicação: Reino Unido

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Descoberta de Drogas Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Canadá País de publicação: Reino Unido