Búsqueda | BVS Nicaragua

Modelling methods and cross-validation variants in QSAR: a multi-level analysis^$.

Rácz, A; Bajusz, D; Héberger, K.

SAR QSAR Environ Res ; 29(9): 661-674, 2018 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-30160175

RESUMEN

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.

Asunto(s)

Descubrimiento de Drogas/métodos , Modelos Moleculares , Relación Estructura-Actividad Cuantitativa , Análisis de Varianza , Bases de Datos de Compuestos Químicos/estadística & datos numéricos , Pruebas de Toxicidad/estadística & datos numéricos

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters.

Rácz, A; Bajusz, D; Héberger, K.

SAR QSAR Environ Res ; 26(7-9): 683-700, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26434574

RESUMEN

Recent implementations of QSAR modelling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modelling, compare and assess the resulting models, and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and ranking, and identify the best performance indicators and models. While the exchange of the original training and (external) test sets does not affect the ranking of performance parameters, it provides improved models in certain cases (despite the lower number of molecules in the training set). Performance parameters for external validation are substantially separated from the other merits in SRD analyses, highlighting their value in data fusion.

Asunto(s)

Derivados del Benceno/química , Maleimidas/química , Relación Estructura-Actividad Cuantitativa , Amidohidrolasas/antagonistas & inhibidores , Amidohidrolasas/química , Animales , Derivados del Benceno/toxicidad , Cyprinidae , Técnicas de Apoyo para la Decisión , Humanos , Maleimidas/toxicidad , Modelos Estadísticos , Simulación del Acoplamiento Molecular , Monoacilglicerol Lipasas/antagonistas & inhibidores , Monoacilglicerol Lipasas/química , Programas Informáticos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA