Pesquisa | BVS IEC

Semiparametric Clustering: A Robust Alternative to Parametric Clustering.

Pan, Binbin; Dong, Huaiqin; Chen, Wen-Sheng; Xu, Chen.

IEEE Trans Neural Netw Learn Syst ; 30(9): 2583-2597, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-30602425

RESUMO

Clustering aims at naturally grouping the data according to the underlying data distribution. The data distribution is often estimated using a parametric or nonparametric model, e.g., Gaussian mixture or kernel density estimation. Compared with nonparametric models, parametric models are statistically stable, i.e., a small perturbation of data points leads to a small change in the estimated density. However, parametric models are highly sensitive to outliers because the data distribution is far away from the parametric assumptions in the presence of outliers. Given a parametric clustering algorithm, this paper shows how to turn this algorithm into a robust one. The idea is to modify the original parametric density into a semiparametric one. The high-density data that form the core of each cluster are modeled with the original parametric density. The low-density data are often far away from the cluster cores and may have an arbitrary shape, thus are modeled using a nonparametric density. A combination of parametric and nonparametric clustering algorithms is used to group the data modeled as a semiparametric density. From the robust statistical point of view, the proposed method has good robustness properties. We test the proposed algorithm on several synthetic and 70 UCI data sets. The results indicate that the semiparametric method could significantly improve the clustering performance.

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA