Selective inference for <ns0:math><mi>k</mi></math>-means clustering.

Chen, Yiqun T; Witten, Daniela M

Selective inference for k-means clustering.

Chen, Yiqun T; Witten, Daniela M.

Afiliación

Chen YT; Data Science Institute and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
Witten DM; Departments of Statistics and Biostatistics, University of Washington, Seattle, WA 98195-4322, USA.

J Mach Learn Res ; 242023 May.

Article en En | MEDLINE | ID: mdl-38264325

ABSTRACT

ABSTRACT

We consider the problem of testing for a difference in means between clusters of observations identified via k-means clustering. In this setting, classical hypothesis tests lead to an inflated Type I error rate. In recent work, Gao et al. (2022) considered a related problem in the context of hierarchical clustering. Unfortunately, their solution is highly-tailored to the context of hierarchical clustering, and thus cannot be applied in the setting of k-means clustering. In this paper, we propose a p-value that conditions on all of the intermediate clustering assignments in the k-means algorithm. We show that the p-value controls the selective Type I error for a test of the difference in means between a pair of clusters obtained using k-means clustering in finite samples, and can be efficiently computed. We apply our proposal on hand-written digits data and on single-cell RNA-sequencing data.

Palabras clave

Hypothesis testing; Post-selection inference; RNA-sequencing; Type I error; Unsupervised learning

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: J Mach Learn Res Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google