Your browser doesn't support javascript.
loading
Clustering high-dimensional data via feature selection.
Liu, Tianqi; Lu, Yu; Zhu, Biqing; Zhao, Hongyu.
Afiliación
  • Liu T; Google Research, New York, New York, USA.
  • Lu Y; Two Sigma Investments, New York, New York, USA.
  • Zhu B; Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
  • Zhao H; Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
Biometrics ; 79(2): 940-950, 2023 06.
Article en En | MEDLINE | ID: mdl-35338489
ABSTRACT
High-dimensional clustering analysis is a challenging problem in statistics and machine learning, with broad applications such as the analysis of microarray data and RNA-seq data. In this paper, we propose a new clustering procedure called spectral clustering with feature selection (SC-FS), where we first obtain an initial estimate of labels via spectral clustering, then select a small fraction of features with the largest R-squared with these labels, that is, the proportion of variation explained by group labels, and conduct clustering again using selected features. Under mild conditions, we prove that the proposed method identifies all informative features with high probability and achieves the minimax optimal clustering error rate for the sparse Gaussian mixture model. Applications of SC-FS to four real-world datasets demonstrate its usefulness in clustering high-dimensional data.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Idioma: En Revista: Biometrics Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Aprendizaje Automático Idioma: En Revista: Biometrics Año: 2023 Tipo del documento: Article País de afiliación: Estados Unidos