Improving (Q)SAR predictions by examining bias in the selection of compounds for experimental testing.

Pogodin, P V; Lagunin, A A; Filimonov, D A; Nicklaus, M C; Poroikov, V V

Pogodin, P V; Lagunin, A A; Filimonov, D A; Nicklaus, M C; Poroikov, V V.

Afiliación

Pogodin PV; Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia.
Lagunin AA; Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia.
Filimonov DA; Department of Bioinformatics, Medical-Biological Department, Pirogov Russian National Research Medical University , Moscow , Russia.
Nicklaus MC; Department of Bioinformatics, Institute of Biomedical Chemistry , Moscow , Russia.
Poroikov VV; Computer-Aided Drug Design Group, Chemical Biology Laboratory, Center for Cancer Research, National Cancer Institute, NIH, NCI-Frederick , Frederick , MD , USA.

SAR QSAR Environ Res ; 30(10): 759-773, 2019 Oct.

Article en En | MEDLINE | ID: mdl-31547686

ABSTRACT

ABSTRACT

Existing data on structures and biological activities are limited and distributed unevenly across distinct molecular targets and chemical compounds. The question arises if these data represent an unbiased sample of the general population of chemical-biological interactions. To answer this question, we analyzed ChEMBL data for 87,583 molecules tested against 919 protein targets using supervised and unsupervised approaches. Hierarchical clustering of the Murcko frameworks generated using Chemistry Development Toolkit showed that the available data form a big diffuse cloud without apparent structure. In contrast hereto, PASS-based classifiers allowed prediction whether the compound had been tested against the particular molecular target, despite whether it was active or not. Thus, one may conclude that the selection of chemical compounds for testing against specific targets is biased, probably due to the influence of prior knowledge. We assessed the possibility to improve (Q)SAR predictions using this fact PASS prediction of the interaction with the particular target for compounds predicted as tested against the target has significantly higher accuracy than for those predicted as untested (average ROC AUC are about 0.87 and 0.75, respectively). Thus, considering the existing bias in the data of the training set may increase the performance of virtual screening.

Asunto(s)

Descubrimiento de Drogas; Relación Estructura-Actividad; Análisis por Conglomerados; Simulación por Computador; Relación Estructura-Actividad Cuantitativa

Palabras clave

(Q)SAR; Ligand-target interaction; SAVI library; accuracy of prediction; applicability domain; bias; compound selection; training set; virtual screening

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Relación Estructura-Actividad / Descubrimiento de Drogas Tipo de estudio: Prognostic_studies / Risk_factors_studies Idioma: En Revista: SAR QSAR Environ Res Asunto de la revista: SAUDE AMBIENTAL Año: 2019 Tipo del documento: Article País de afiliación: Rusia

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google