Pesquisa | BVS IEC

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling.

Veríssimo, Gabriel Corrêa; Pantaleão, Simone Queiroz; Fernandes, Philipe de Olveira; Gertrudes, Jadson Castro; Kronenberger, Thales; Honorio, Kathia Maria; Maltarollo, Vinícius Gonçalves.

J Comput Aided Mol Des ; 37(12): 735-754, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37804393

RESUMO

QSAR models capable of predicting biological, toxicity, and pharmacokinetic properties were widely used to search lead bioactive molecules in chemical databases. The dataset's preparation to build these models has a strong influence on the quality of the generated models, and sampling requires that the original dataset be divided into training (for model training) and test (for statistical evaluation) sets. This sampling can be done randomly or rationally, but the rational division is superior. In this paper, we present MASSA, a Python tool that can be used to automatically sample datasets by exploring the biological, physicochemical, and structural spaces of molecules using PCA, HCA, and K-modes. The proposed algorithm is very useful when the variables used for QSAR are not available or to construct multiple QSAR models with the same training and test sets, producing models with lower variability and better values for validation metrics. These results were obtained even when the descriptors used in the QSAR/QSPR were different from those used in the separation of training and test sets, indicating that this tool can be used to build models for more than one QSAR/QSPR technique. Finally, this tool also generates useful graphical representations that can provide insights into the data.

Assuntos

Algoritmos , Relação Quantitativa Estrutura-Atividade , Bases de Dados de Compostos Químicos , Benchmarking

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA