Your browser doesn't support javascript.
loading
Nonparametric variable importance assessment using machine learning techniques.
Williamson, Brian D; Gilbert, Peter B; Carone, Marco; Simon, Noah.
Afiliación
  • Williamson BD; Department of Biostatistics, University of Washington, Seattle, Washington, USA.
  • Gilbert PB; Department of Biostatistics, University of Washington, Seattle, Washington, USA.
  • Carone M; Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
  • Simon N; Department of Biostatistics, University of Washington, Seattle, Washington, USA.
Biometrics ; 77(1): 9-22, 2021 03.
Article en En | MEDLINE | ID: mdl-33043428
ABSTRACT
In a regression setting, it is often of interest to quantify the importance of various features in predicting the response. Commonly, the variable importance measure used is determined by the regression technique employed. For this reason, practitioners often only resort to one of a few regression techniques for which a variable importance measure is naturally defined. Unfortunately, these regression techniques are often suboptimal for predicting the response. Additionally, because the variable importance measures native to different regression techniques generally have a different interpretation, comparisons across techniques can be difficult. In this work, we study a variable importance measure that can be used with any regression technique, and whose interpretation is agnostic to the technique used. This measure is a property of the true data-generating mechanism. Specifically, we discuss a generalization of the analysis of variance variable importance measure and discuss how it facilitates the use of machine learning techniques to flexibly estimate the variable importance of a single feature or group of features. The importance of each feature or group of features in the data can then be described individually, using this measure. We describe how to construct an efficient estimator of this measure as well as a valid confidence interval. Through simulations, we show that our proposal has good practical operating characteristics, and we illustrate its use with data from a study of risk factors for cardiovascular disease in South Africa.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Enfermedades Cardiovasculares / Aprendizaje Automático Tipo de estudio: Diagnostic_studies / Etiology_studies / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Biometrics Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Enfermedades Cardiovasculares / Aprendizaje Automático Tipo de estudio: Diagnostic_studies / Etiology_studies / Prognostic_studies / Risk_factors_studies Límite: Humans Idioma: En Revista: Biometrics Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos