Development of New Methods Needs Proper Evaluation-Benchmarking Sets for Machine Learning Experiments for Class A GPCRs.

Lesniak, Damian; Podlewska, Sabina; Jastrzebski, Stanislaw; Sieradzki, Igor; Bojarski, Andrzej J; Tabor, Jacek

Lesniak, Damian; Podlewska, Sabina; Jastrzebski, Stanislaw; Sieradzki, Igor; Bojarski, Andrzej J; Tabor, Jacek.

Afiliação

Lesniak D; Faculty of Mathematics and Computer Science , Jagiellonian University , 6 Lojasiewicza Street , 30-348 Kraków , Poland.
Podlewska S; Department of Technology and Biotechnology of Drugs , Jagiellonian University Medical College , 9 Medyczna Street , 30-688 Kraków , Poland.
Jastrzebski S; Maj Institute of Pharmacology, Polish Academy of Sciences , 12 Smetna Street , 31-343 Kraków , Poland.
Sieradzki I; Faculty of Mathematics and Computer Science , Jagiellonian University , 6 Lojasiewicza Street , 30-348 Kraków , Poland.
Bojarski AJ; Faculty of Mathematics and Computer Science , Jagiellonian University , 6 Lojasiewicza Street , 30-348 Kraków , Poland.
Tabor J; Maj Institute of Pharmacology, Polish Academy of Sciences , 12 Smetna Street , 31-343 Kraków , Poland.

J Chem Inf Model ; 59(12): 4974-4992, 2019 12 23.

Article em En | MEDLINE | ID: mdl-31604014

ABSTRACT

ABSTRACT

New computational approaches for virtual screening applications are constantly being developed. However, before a particular tool is used to search for new active compounds, its effectiveness in the type of task must be examined. In this study, we conducted a detailed analysis of various aspects of preparation of respective data sets for such an evaluation. We propose a protocol for fetching data from the ChEMBL database, examine various compound representations in terms of the possible bias resulting from the way they are generated, and define a new metric for comparing the structural similarity of compounds, which is in line with chemical intuition. The newly developed method is also used for the evaluation of various approaches for division of the data set into training and test set parts, which are also examined in detail in terms of being the source of possible results bias. Finally, machine learning methods are applied in cross-validation studies of data sets constructed within the paper, constituting benchmarks for the assessment of computational methods developed for virtual screening tasks. Additionally, analogous data sets for class A G protein-coupled receptors (100 targets with the highest number of records) were prepared. They are available at http//gmum.net/benchmarks/ , together with script enabling reproduction of all results available at https//github.com/lesniak43/ananas .

Assuntos

Avaliação Pré-Clínica de Medicamentos/métodos; Aprendizado de Máquina; Receptores Acoplados a Proteínas G/metabolismo; Benchmarking; Ligantes; Interface Usuário-Computador

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Receptores Acoplados a Proteínas G / Avaliação Pré-Clínica de Medicamentos / Aprendizado de Máquina Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google