Pesquisa | Biblioteca Virtual em Saúde

1.

Efficient Online Stream Clustering Based on Fast Peeling of Boundary Micro-Cluster.

Sun, Jiarui; Du, Mingjing; Sun, Chen; Dong, Yongquan.

IEEE Trans Neural Netw Learn Syst ; PP2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-38587953

RESUMO

A growing number of applications generate streaming data, making data stream mining a popular research topic. Classification-based streaming algorithms require pre-training on labeled data. Manually labeling a large number of samples in the data stream is impractical and cost-prohibitive. Stream clustering algorithms rely on unsupervised learning. They have been widely studied for their ability to effectively analyze high-speed data streams without prior knowledge. Stream clustering plays a key role in data stream mining. Currently, most data stream clustering algorithms adopt the online-offline framework. In the online stage, micro-clusters are maintained, and in the offline stage, they are clustered using an algorithm similar to density-based spatial clustering of applications with noise (DBSCAN). When data streams have clusters with varying densities and ambiguous boundaries, traditional data stream clustering algorithms may be less effective. To overcome the above limitations, this article proposes a fully online stream clustering algorithm called fast boundary peeling stream clustering (FBPStream). First, FBPStream defines a decay-based kernel density estimation (KDE). It can discover clusters with varying densities and identify the evolving trend of streams well. Then, FBPStream implements an efficient boundary micro-cluster peeling technique to identify the potential core micro-clusters. Finally, FBPStream employs a parallel clustering strategy to effectively cluster core and boundary micro-clusters. The proposed algorithm is compared with ten popular algorithms on 15 data streams. Experimental results show that FBPStream is competitive with the other ten popular algorithms.

2.

Attention-Based Sequence-to-Sequence Model for Time Series Imputation.

Li, Yurui; Du, Mingjing; He, Sheng.

Entropy (Basel) ; 24(12)2022 Dec 09.

Artigo em Inglês | MEDLINE | ID: mdl-36554203

RESUMO

Time series data are usually characterized by having missing values, high dimensionality, and large data volume. To solve the problem of high-dimensional time series with missing values, this paper proposes an attention-based sequence-to-sequence model to imputation missing values in time series (ASSM), which is a sequence-to-sequence model based on the combination of feature learning and data computation. The model consists of two parts, encoder and decoder. The encoder part is a BIGRU recurrent neural network and incorporates a self-attentive mechanism to make the model more capable of handling long-range time series; The decoder part is a GRU recurrent neural network and incorporates a cross-attentive mechanism into associate with the encoder part. The relationship weights between the generated sequences in the decoder part and the known sequences in the encoder part are calculated to achieve the purpose of focusing on the sequences with a high degree of correlation. In this paper, we conduct comparison experiments with four evaluation metrics and six models on four real datasets. The experimental results show that the model proposed in this paper outperforms the six comparative missing value interpolation algorithms.

3.

Grid-Based Clustering Using Boundary Detection.

Du, Mingjing; Wu, Fuyu.

Entropy (Basel) ; 24(11)2022 Nov 04.

Artigo em Inglês | MEDLINE | ID: mdl-36359696

RESUMO

Clustering can be divided into five categories: partitioning, hierarchical, model-based, density-based, and grid-based algorithms. Among them, grid-based clustering is highly efficient in handling spatial data. However, the traditional grid-based clustering algorithms still face many problems: (1) Parameter tuning: density thresholds are difficult to adjust; (2) Data challenge: clusters with overlapping regions and varying densities are not well handled. We propose a new grid-based clustering algorithm named GCBD that can solve the above problems. Firstly, the density estimation of nodes is defined using the standard grid structure. Secondly, GCBD uses an iterative boundary detection strategy to distinguish core nodes from boundary nodes. Finally, two clustering strategies are combined to group core nodes and assign boundary nodes. Experiments on 18 datasets demonstrate that the proposed algorithm outperforms 6 grid-based competitors.

4.

M3W: Multistep Three-Way Clustering.

Du, Mingjing; Zhao, Jingqi; Sun, Jiarui; Dong, Yongquan.

IEEE Trans Neural Netw Learn Syst ; PP2022 Sep 29.

Artigo em Inglês | MEDLINE | ID: mdl-36173778

RESUMO

Three-way clustering has been an active research topic in the field of cluster analysis in recent years. Some efforts are focused on the technique due to its feasibility and rationality. We observe, however, that the existing three-way clustering algorithms struggle to obtain more information and limit the fault tolerance excessively. Moreover, although the one-step three-way allocation based on a pair of fixed, global thresholds is the most straightforward way to generate the three-way cluster representations, the clusters derived from a pair of global thresholds cannot exactly reveal the inherent clustering structure of the dataset, and the threshold values are often difficult to determine beforehand. Inspired by sequential three-way decisions, we propose an algorithm, called multistep three-way clustering (M3W), to address these issues. Specifically, we first use a progressive erosion strategy to construct a multilevel structure of data, so that lower levels (or external layers) can gather more available information from higher levels (or internal layers). Then, we further propose a multistep three-way allocation strategy, which sufficiently considers the neighborhood information of every eroded instance. We use the allocation strategy in combination with the multilevel structure to ensure that more information is gradually obtained to increase the probability of being assigned correctly, capturing adaptively the inherent clustering structure of the dataset. The proposed algorithm is compared with eight competitors using 18 benchmark datasets. Experimental results show that M3W achieves superior performance, verifying its advantages and effectiveness.

5.

Survey on granularity clustering.

Ding, Shifei; Du, Mingjing; Zhu, Hong.

Cogn Neurodyn ; 9(6): 561-72, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26557926

RESUMO

With the rapid development of uncertain artificial intelligent and the arrival of big data era, conventional clustering analysis and granular computing fail to satisfy the requirements of intelligent information processing in this new case. There is the essential relationship between granular computing and clustering analysis, so some researchers try to combine granular computing with clustering analysis. In the idea of granularity, the researchers expand the researches in clustering analysis and look for the best clustering results with the help of the basic theories and methods of granular computing. Granularity clustering method which is proposed and studied has attracted more and more attention. This paper firstly summarizes the background of granularity clustering and the intrinsic connection between granular computing and clustering analysis, and then mainly reviews the research status and various methods of granularity clustering. Finally, we analyze existing problem and propose further research.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA