Pesquisa | Secretaria de Estado da Saúde

An Efficient and Adaptive Granular-Ball Generation Method in Classification Problem.

Xia, Shuyin; Dai, Xiaochuan; Wang, Guoyin; Gao, Xinbo; Giem, Elisabeth.

IEEE Trans Neural Netw Learn Syst ; PP2022 Oct 05.

Artigo em Inglês | MEDLINE | ID: mdl-36197862

RESUMO

Granular-ball computing (GBC) is an efficient, robust, and scalable learning method for granular computing. The granular ball (GB) generation method is based on GB computing. This article proposes a method for accelerating GB generation using division to replace k -means. It can significantly improve the efficiency of GB generation while ensuring an accuracy similar to that of the existing methods. In addition, a new adaptive method for GB generation is proposed by considering the elimination of the GB overlap and other factors. This makes the GB generation process parameter-free and completely adaptive in the true sense. In addition, this study first provides mathematical models for the GB covering. The experimental results on some real datasets demonstrate that the two proposed GB generation methods have accuracies similar to those of the existing method in most cases, while adaptiveness or acceleration is realized. All the codes were released in the open-source GBC library at http://www.cquptshuyinxia.com/GBC.html or https://github.com/syxiaa/gbc.

Ball k-Means: Fast Adaptive Clustering With No Bounds.

Xia, Shuyin; Peng, Daowan; Meng, Deyu; Zhang, Changqing; Wang, Guoyin; Giem, Elisabeth; Wei, Wei; Chen, Zizhong.

IEEE Trans Pattern Anal Mach Intell ; 44(1): 87-99, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-32750814

RESUMO

This paper presents a novel accelerated exact k-means called as "Ball k-means" by using the ball to describe each cluster, which focus on reducing the point-centroid distance computation. The "Ball k-means" can exactly find its neighbor clusters for each cluster, resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. What's more, each cluster can be divided into "stable area" and "active area", and the latter one is further divided into some exact "annular area". The assignment of the points in the "stable area" is not changed while the points in each "annular area" will be adjusted within a few neighbor clusters. There are no upper or lower bounds in the whole process. Moreover, ball k-means uses ball clusters and neighbor searching along with multiple novel stratagems for reducing centroid distance computations. In comparison with the current state-of-the art accelerated exact bounded methods, the Yinyang algorithm and the Exponion algorithm, as well as other top-of-the-line tree-based and bounded methods, the ball k-means attains both higher performance and performs fewer distance calculations, especially for large-k problems. The faster speed, no extra parameters and simpler design of "Ball k-means" make it an all-around replacement of the naive k-means.

mCRF and mRD: Two Classification Methods Based on a Novel Multiclass Label Noise Filtering Learning Framework.

Xia, Shuyin; Chen, Baiyun; Wang, Guoyin; Zheng, Yong; Gao, Xinbo; Giem, Elisabeth; Chen, Zizhong.

IEEE Trans Neural Netw Learn Syst ; 33(7): 2916-2930, 2022 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-33428577

RESUMO

Mitigating label noise is a crucial problem in classification. Noise filtering is an effective method of dealing with label noise which does not need to estimate the noise rate or rely on any loss function. However, most filtering methods focus mainly on binary classification, leaving the more difficult counterpart problem of multiclass classification relatively unexplored. To remedy this deficit, we present a definition for label noise in a multiclass setting and propose a general framework for a novel label noise filtering learning method for multiclass classification. Two examples of noise filtering methods for multiclass classification, multiclass complete random forest (mCRF) and multiclass relative density, are derived from their binary counterparts using our proposed framework. In addition, to optimize the NI_threshold hyperparameter in mCRF, we propose two new optimization methods: a new voting cross-validation method and an adaptive method that employs a 2-means clustering algorithm. Furthermore, we incorporate SMOTE into our label noise filtering learning framework to handle the ubiquitous problem of imbalanced data in multiclass classification. We report experiments on both synthetic data sets and UCI benchmarks to demonstrate our proposed methods are highly robust to label noise in comparison with state-of-the-art baselines. All code and data results are available at https://github.com/syxiaa/Multiclass-Label-Noise-Filtering-Learning.

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa