Paired evaluation of machine-learning models characterizes effects of confounders and outliers.

Nariya, Maulik K; Mills, Caitlin E; Sorger, Peter K; Sokolov, Artem

Nariya, Maulik K; Mills, Caitlin E; Sorger, Peter K; Sokolov, Artem.

Afiliação

Nariya MK; Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA 02115, USA.
Mills CE; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
Sorger PK; Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA 02115, USA.
Sokolov A; Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Science, Harvard Medical School, Boston, MA 02115, USA.

Patterns (N Y) ; 4(8): 100791, 2023 Aug 11.

Article em En | MEDLINE | ID: mdl-37602225

RESUMO

The true accuracy of a machine-learning model is a population-level statistic that cannot be observed directly. In practice, predictor performance is estimated against one or more test datasets, and the accuracy of this estimate strongly depends on how well the test sets represent all possible unseen datasets. Here we describe paired evaluation as a simple, robust approach for evaluating performance of machine-learning models in small-sample biological and clinical studies. We use the method to evaluate predictors of drug response in breast cancer cell lines and of disease severity in patients with Alzheimer's disease, demonstrating that the choice of test data can cause estimates of performance to vary by as much as 20%. We show that paired evaluation makes it possible to identify outliers, improve the accuracy of performance estimates in the presence of known confounders, and assign statistical significance when comparing machine-learning models.

Palavras-chave

Confounding Variables; Machine Learning; Model evaluation; Outlier Detection; Small-sample studies

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Patterns (N Y) Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos País de publicação: Estados Unidos

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google