Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 195
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38717888

RESUMO

Exploiting consistent structure from multiple graphs is vital for multi-view graph clustering. To achieve this goal, we propose an Efficient Balanced Multi-view Graph Clustering via Good Neighbor Fusion (EBMGC-GNF) model which comprehensively extracts credible consistent neighbor information from multiple views by designing a Cross-view Good Neighbors Voting module. Moreover, a novel balanced regularization term based on p-power function is introduced to adjust the balance property of clusters, which helps the model adapt to data with different distributions. To solve the optimization problem of EBMGC-GNF, we transform EBMGC-GNF into an efficient form with graph coarsening method and optimize it based on accelareted coordinate descent algorithm. In experiments, extensive results demonstrate that, in the majority of scenarios, our proposals outperform state-of-the-art methods in terms of both effectiveness and efficiency.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38717885

RESUMO

Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independently selecting features based on their scores. In this article, a large-scale feature selection approach based on structured bipartite graph and row-sparse projection (RS 2 BLFS) is proposed to overcome this limitation. RS 2 BLFS integrates the construction of a structured bipartite graph consisting of c connected components into row-sparse projection learning with k nonzero rows. This integration allows for the joint selection of an optimal feature subset in an unsupervised manner. Notably, the c connected components of the structured bipartite graph correspond to c clusters, each with multiple subcluster centers. This feature makes RS 2 BLFS particularly effective for feature selection and clustering on nonspherical large-scale data. An algorithm with theoretical analysis is developed to solve the optimization problem involved in RS 2 BLFS. Experimental results on synthetic and real-world datasets confirm its effectiveness in feature selection tasks.

3.
Artigo em Inglês | MEDLINE | ID: mdl-38700968

RESUMO

In existing multiview clustering research, the comprehensive learning from multiview graph and feature spaces simultaneously remains insufficient when achieving a consistent clustering structure. In addition, a postprocessing step is often required. In light of these considerations, a cross-view approximation on Grassman manifold (CAGM) model is proposed to address inconsistencies within multiview adjacency matrices, feature matrices, and cross-view combinations from the two sources. The model uses a ratio-formed objective function, enabling parameter-free bidirectional fusion. Furthermore, the CAGM model incorporates a paired encoding mechanism to generate low-dimensional and orthogonal cross-view embeddings. Through the approximation of two measurable subspaces on the Grassmann manifold, the direct acquisition of the indicator matrix is realized. Furthermore, an effective optimization algorithm corresponding to the CAGM model is derived. Comprehensive experiments on four real-world datasets are conducted to substantiate the effectiveness of our proposed method.

4.
Artigo em Inglês | MEDLINE | ID: mdl-38691433

RESUMO

The training process of a domain generalization (DG) model involves utilizing one or more interrelated source domains to attain optimal performance on an unseen target domain. Existing DG methods often use auxiliary networks or require high computational costs to improve the model's generalization ability by incorporating a diverse set of source domains. In contrast, this work proposes a method called Smooth-Guided Implicit Data Augmentation (SGIDA) that operates in the feature space to capture the diversity of source domains. To amplify the model's generalization capacity, a distance metric learning (DML) loss function is incorporated. Additionally, rather than depending on deep features, the suggested approach employs logits produced from cross entropy (CE) losses with infinite augmentations. A theoretical analysis shows that logits are effective in estimating distances defined on original features, and the proposed approach is thoroughly analyzed to provide a better understanding of why logits are beneficial for DG. Moreover, to increase the diversity of the source domain, a sampling-based method called smooth is introduced to obtain semantic directions from interclass relations. The effectiveness of the proposed approach is demonstrated through extensive experiments on widely used DG, object detection, and remote sensing datasets, where it achieves significant improvements over existing state-of-the-art methods across various backbone networks.

5.
Artigo em Inglês | MEDLINE | ID: mdl-38517722

RESUMO

Recently, more and more real-world datasets have been composed of heterogeneous but related features from diverse views. Multiview clustering provides a promising attempt at a solution for partitioning such data according to heterogeneous information. However, most existing methods suffer from hyper-parameter tuning trouble and high computational cost. Besides, there is still an opportunity for improvement in clustering performance. To this end, a novel multiview framework, called parameter-free multiview k -means clustering with coordinate descent method (PFMVKM), is presented to address the above problems. Specifically, PFMVKM is completely parameter-free and learns the weights via a self-weighted scheme, which can avoid the intractable process of hyper-parameters tuning. Moreover, our model is capable of directly calculating the cluster indicator matrix, with no need to learn the cluster centroid matrix and the indicator matrix simultaneously as previous multiview methods have to do. What's more, we propose an efficient optimization algorithm utilizing the idea of coordinate descent, which can not only reduce the computational complexity but also improve the clustering performance. Extensive experiments on various types of real datasets illustrate that the proposed method outperforms existing state-of-the-art competitors and conforms well with the actual situation.

6.
Artigo em Inglês | MEDLINE | ID: mdl-38376965

RESUMO

Clustering is a fundamental topic in machine learning and various methods are proposed, in which K-Means (KM) and min cut clustering are typical ones. However, they may produce empty or skewed clustering results, which are not as expected. In KM, the constrained clustering methods have been fully studied while in min cut clustering, it still needs to be developed. In this paper, we propose a parameter-insensitive min cut clustering with flexible size constraints. Specifically, we add lower limitations on the number of samples for each cluster, which can perfectly avoid the trivial solution in min cut clustering. As far as we are concerned, this is the first attempt of directly incorporating size constraints into min cut. However, it is a NP-hard problem and difficult to solve. Thus, the upper limits is also added in but it is still difficult to solve. Therefore, an additional variable that is equivalent to label matrix is introduced in and the augmented Lagrangian multiplier (ALM) is used to decouple the constraints. In the experiments, we find that the our algorithm is less sensitive to lower bound and is practical in image segmentation. A large number of experiments demonstrate the effectiveness of our proposed algorithm.

7.
Artigo em Inglês | MEDLINE | ID: mdl-38356212

RESUMO

Due to its high computational complexity, graph-based methods have limited applicability in large-scale multiview clustering tasks. To address this issue, many accelerated algorithms, especially anchor graph-based methods and indicator learning-based methods, have been developed and made a great success. Nevertheless, since the restrictions of the optimization strategy, these accelerated methods still need to approximate the discrete graph-cutting problem to a continuous spectral embedding problem and utilize different discretization strategies to obtain discrete sample categories. To avoid the loss of effectiveness and efficiency caused by the approximation and discretization, we establish a discrete fast multiview anchor graph clustering (FMAGC) model that first constructs an anchor graph of each view and then generates a discrete cluster indicator matrix by solving the discrete multiview graph-cutting problem directly. Since the gradient descent-based method makes it hard to solve this discrete model, we propose a fast coordinate descent-based optimization strategy with linear complexity to solve it without approximating it as a continuous one. Extensive experiments on widely used normal and large-scale multiview datasets show that FMAGC can improve clustering effectiveness and efficiency compared to other state-of-the-art baselines.

8.
Artigo em Inglês | MEDLINE | ID: mdl-38381645

RESUMO

Linear discriminant analysis (LDA) is a classic tool for supervised dimensionality reduction. Because the projected samples can be classified effectively, LDA has been successfully applied in many applications. Among the variants of LDA, trace ratio LDA (TR-LDA) is a classic form due to its explicit meaning. Unfortunately, when the sample size is much smaller than the data dimension, the algorithm for solving TR-LDA does not converge. The so-called small sample size (SSS) problem severely limits the application of TR-LDA. To solve this problem, we propose a revised formation of TR-LDA, which can be applied to datasets with different sizes in a unified form. Then, we present an optimization algorithm to solve the proposed method, explain why it can avoid the SSS problem, and analyze the convergence and computational complexity of the optimization algorithm. Next, based on the introduced theorems, we quantitatively elaborate on when the SSS problem will occur in TR-LDA. Finally, the experimental results on real-world datasets demonstrate the effectiveness of the proposed method.

9.
Artigo em Inglês | MEDLINE | ID: mdl-38215321

RESUMO

The goal of balanced clustering is partitioning data into distinct groups of equal size. Previous studies have attempted to address this problem by designing balanced regularizers or utilizing conventional clustering methods. However, these methods often rely solely on classic methods, which limits their performance and primarily focuses on low-dimensional data. Although neural networks exhibit effective performance on high-dimensional datasets, they struggle to effectively leverage prior knowledge for clustering with a balanced tendency. To overcome the above limitations, we propose deep semisupervised balanced clustering, which simultaneously learns clustering and generates balance-favorable representations. Our model is based on the autoencoder paradigm incorporating a semisupervised module. Specifically, we introduce a balance-oriented clustering loss and incorporate pairwise constraints into the penalty term as a pluggable module using the Lagrangian multiplier method. Theoretically, we ensure that the proposed model maintains a balanced orientation and provides a comprehensive optimization process. Empirically, we conducted extensive experiments on four datasets to demonstrate significant improvements in clustering performance and balanced measurements. Our code is available at https://github.com/DuannYu/BalancedSemi-TNNLS.

10.
IEEE Trans Cybern ; 54(4): 2420-2433, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37126629

RESUMO

Classification is a fundamental task in the field of data mining. Unfortunately, high-dimensional data often degrade the performance of classification. To solve this problem, dimensionality reduction is usually adopted as an essential preprocessing technique, which can be divided into feature extraction and feature selection. Due to the ability to obtain category discrimination, linear discriminant analysis (LDA) is recognized as a classic feature extraction method for classification. Compared with feature extraction, feature selection has plenty of advantages in many applications. If we can integrate the discrimination of LDA and the advantages of feature selection, it is bound to play an important role in the classification of high-dimensional data. Motivated by the idea, we propose a supervised feature selection method for classification. It combines trace ratio LDA with l2,p -norm regularization and imposes the orthogonal constraint on the projection matrix. The learned row-sparse projection matrix can be used to select discriminative features. Then, we present an optimization algorithm to solve the proposed method. Finally, the extensive experiments on both synthetic and real-world datasets indicate the effectiveness of the proposed method.

11.
IEEE Trans Pattern Anal Mach Intell ; 46(4): 1898-1912, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37747866

RESUMO

Clustering aims to partition a set of objects into different groups through the internal nature of these objects. Most existing methods face intractable hyper-parameter problems triggered by various regularization terms, which degenerates the applicability of models. Moreover, traditional graph clustering methods always encounter the expensive time overhead. To this end, we propose a Fast Clustering model with Anchor Guidance (FCAG). The proposed model not only avoids trivial solutions without extra regularization terms, but is also suitable to deal with large-scale problems by utilizing the prior knowledge of the bipartite graph. Moreover, the proposed FCAG can cope with out-of-sample extension problems. Three optimization methods Projected Gradient Descent (PGD) method, Iteratively Re-Weighted (IRW) algorithm and Coordinate Descent (CD) algorithm are proposed to solve FCAG. Extensive experiments verify the superiority of the optimization method CD. Besides, compared with other bipartite graph models, FCAG has the better performance with the less time cost. In addition, we prove through theory and experiment that when the learning rate of PGD tends to infinite, PGD is equivalent to IRW.

12.
Artigo em Inglês | MEDLINE | ID: mdl-38090873

RESUMO

Many recent research works on unsupervised feature selection (UFS) have focused on how to exploit autoencoders (AEs) to seek informative features. However, existing methods typically employ the squared error to estimate the data reconstruction, which amplifies the negative effect of outliers and can lead to performance degradation. Moreover, traditional AEs aim to extract latent features that capture intrinsic information of the data for accurate data recovery. Without incorporating explicit cluster structure-detecting objectives into the training criterion, AEs fail to capture the latent cluster structure of the data which is essential for identifying discriminative features. Thus, the selected features lack strong discriminative power. To address the issues, we propose to jointly perform robust feature selection and k -means clustering in a unified framework. Concretely, we exploit an AE with a l2,1 -norm as a basic model to seek informative features. To improve robustness against outliers, we introduce an adaptive weight vector for the data reconstruction terms of AE, which assigns smaller weights to the data with larger errors to automatically reduce the influence of the outliers, and larger weights to the data with smaller errors to strengthen the influence of clean data. To enhance the discriminative power of the selected features, we incorporate k -means clustering into the representation learning of the AE. This allows the AE to continually explore cluster structure information, which can be used to discover more discriminative features. Then, we also present an efficient approach to solve the objective of the corresponding problem. Extensive experiments on various benchmark datasets are provided, which clearly demonstrate that the proposed method outperforms state-of-the-art methods.

13.
ACS Omega ; 8(44): 41943-41952, 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37970020

RESUMO

Since the reagent dosage is manually adjusted according to work conditions, an event-triggered constrained model predictive control is proposed for rare earth extraction. First, the linear predictive system, based on a state space model, is established. Subsequently, the feedback correction link is fine-tuned to reduce the prediction error. Following this, an objective optimization function, incorporating input and output constraints, is introduced to calculate the appropriate reagent dosage. Finally, an event-triggering mechanism, underpinned by a designated threshold, is designed to update the controller. Simulation outcomes substantiate the efficacy of the proposed approach.

14.
Artigo em Inglês | MEDLINE | ID: mdl-37847628

RESUMO

As one of the most popular supervised dimensionality reduction methods, linear discriminant analysis (LDA) has been widely studied in machine learning community and applied to many scientific applications. Traditional LDA minimizes the ratio of squared l2 norms, which is vulnerable to the adversarial examples. In recent studies, many l1 -norm-based robust dimensionality reduction methods are proposed to improve the robustness of model. However, due to the difficulty of l1 -norm ratio optimization and weakness on defending a large number of adversarial examples, so far, scarce works have been proposed to utilize sparsity-inducing norms for LDA objective. In this article, we propose a novel robust discriminative projections learning (rDPL) method based on the l1,2 -norm trace-ratio minimization optimization algorithm. Minimizing the l1,2 -norm ratio problem directly is a much more challenging problem than the traditional methods, and there is no existing optimization algorithm to solve such nonsmooth terms ratio problem. We derive a new efficient algorithm to solve this challenging problem and provide a theoretical analysis on the convergence of our algorithm. The proposed algorithm is easy to implement and converges fast in practice. Extensive experiments on both synthetic data and several real benchmark datasets show the effectiveness of the proposed method on defending the adversarial patch attack by comparison with many state-of-the-art robust dimensionality reduction methods.

15.
Neural Netw ; 168: 431-449, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37804746

RESUMO

A self-training algorithm is a classical semi-supervised learning algorithm that uses a small number of labeled samples and a large number of unlabeled samples to train a classifier. However, the existing self-training algorithms consider only the geometric distance between data while ignoring the data distribution when calculating the similarity between samples. In addition, misclassified samples can severely affect the performance of a self-training algorithm. To address the above two problems, this paper proposes a self-training algorithm based on data editing with mass-based dissimilarity (STDEMB). First, the mass matrix with the mass-based dissimilarity is obtained, and then the mass-based local density of each sample is determined based on its k nearest neighbors. Inspired by density peak clustering (DPC), this study designs a prototype tree based on the prototype concept. In addition, an efficient two-stage data editing algorithm is developed to edit misclassified samples and efficiently select high-confidence samples during the self-training process. The proposed STDEMB algorithm is verified by experiments using accuracy and F-score as evaluation metrics. The experimental results on 18 benchmark datasets demonstrate the effectiveness of the proposed STDEMB algorithm.


Assuntos
Algoritmos , Aprendizado de Máquina Supervisionado , Análise por Conglomerados , Benchmarking
16.
Neural Netw ; 168: 560-568, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37837745

RESUMO

Graph-based multi-view clustering methods have achieved impressive success by exploring a complemental or independent graph embedding with low-dimension among multiple views. The majority of them, however, are shallow models with limited ability to learn the nonlinear information in multi-view data. To this end, we propose a novel deep graph reconstruction (DGR) framework for multi-view clustering, which contains three modules. Specifically, a Multi-graph Fusion Module (MFM) is employed to obtain the consensus graph. Then node representation is learned by the Graph Embedding Network (GEN). To assign clusters directly, the Clustering Assignment Module (CAM) is devised to obtain the final low-dimensional graph embedding, which can serve as the indicator matrix. In addition, a simple and powerful loss function is designed in the proposed DGR. Extensive experiments on seven real-world datasets have been conducted to verify the superior clustering performance and efficiency of DGR compared with the state-of-the-art methods.


Assuntos
Aprendizagem , Análise por Conglomerados , Consenso
17.
IEEE Trans Cybern ; PP2023 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-37883282

RESUMO

Dimensionality reduction (DR) targets to learn low-dimensional representations for improving discriminability of data, which is essential for many downstream machine learning tasks, such as image classification, information clustering, etc. Non-Gaussian issue as a long-standing challenge brings many obstacles to the applications of DR methods that established on Gaussian assumption. The mainstream way to address above issue is to explore the local structure of data via graph learning technique, the methods based on which however suffer from a common weakness, that is, exploring locality through pairwise points causes the optimal graph and subspace are difficult to be found, degrades the performance of downstream tasks, and also increases the computation complexity. In this article, we first propose a novel self-evolution bipartite graph (SEBG) that uses anchor points as the landmark of subclasses, and learns anchor-based rather than pairwise relationships for improving the efficiency of locality exploration. In addition, we develop an efficient local coherent structure learning (ELCS) algorithm based on SEBG, which possesses the ability of updating the edges of graph in learned subspace automatically. Finally, we also provide a multivariable iterative optimization algorithm to solve proposed problem with strict theoretical proofs. Extensive experiments have verified the superiorities of the proposed method compared to related SOTA methods in terms of performance and efficiency on several real-world benchmarks and large-scale image datasets with deep features.

18.
Artigo em Inglês | MEDLINE | ID: mdl-37796670

RESUMO

In this article, we propose a new unsupervised feature selection method named pseudo-label guided structural discriminative subspace learning (PSDSL). Unlike the previous methods that perform the two stages independently, it introduces the construction of probability graph into the feature selection learning process as a unified general framework, and therefore the probability graph can be learned adaptively. Moreover, we design a pseudo-label guided learning mechanism, and combine the graph-based method and the idea of maximizing the between-class scatter matrix with the trace ratio to construct an objective function that can improve the discrimination of the selected features. Besides, the main existing strategies of selecting features are to employ l2,1 -norm for feature selection, but this faces the challenges of sparsity limitations and parameter tuning. For addressing this issue, we employ the l2,0 -norm constraint on the learned subspace to ensure the row sparsity of the model and make the selected feature more stable. Effective optimization strategy is given to solve such NP-hard problem with the determination of parameters and complexity analysis in theory. Ultimately, extensive experiments conducted on nine real-world datasets and three biological ScRNA-seq genes datasets verify the effectiveness of the proposed method on the data clustering downstream task.

19.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 15154-15170, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37756170

RESUMO

In most existing graph-based multi-view clustering methods, the eigen-decomposition of the graph Laplacian matrix followed by a post-processing step is a standard configuration to obtain the target discrete cluster indicator matrix. However, we can naturally realize that the results obtained by the two-stage process will deviate from that obtained by directly solving the primal clustering problem. In addition, it is essential to properly integrate the information from different views for the enhancement of the performance of multi-view clustering. To this end, we propose a concise model referred to as Multi-view Discrete Clustering (MDC), aiming at directly solving the primal problem of multi-view graph clustering. We automatically weigh the view-specific similarity matrix, and the discrete indicator matrix is directly obtained by performing clustering on the aggregated similarity matrix without any post-processing to best serve graph clustering. More importantly, our model does not introduce an additive, nor does it has any hyper-parameters to be tuned. An efficient optimization algorithm is designed to solve the resultant objective problem. Extensive experimental results on both synthetic and real benchmark datasets verify the superiority of the proposed model.

20.
IEEE Trans Pattern Anal Mach Intell ; 45(12): 14321-14336, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37669200

RESUMO

Spectral Clustering (SC) has been the main subject of intensive research due to its remarkable clustering performance. Despite its successes, most existing SC methods suffer from several critical issues. First, they typically involve two independent stages, i.e., learning the continuous relaxation matrix followed by the discretization of the cluster indicator matrix. This two-stage approach can result in suboptimal solutions that negatively impact the clustering performance. Second, these methods are hard to maintain the balance property of clusters inherent in many real-world data, which restricts their practical applicability. Finally, these methods are computationally expensive and hence unable to handle large-scale datasets. In light of these limitations, we present a novel Discrete and Balanced Spectral Clustering with Scalability (DBSC) model that integrates the learning the continuous relaxation matrix and the discrete cluster indicator matrix into a single step. Moreover, the proposed model also maintains the size of each cluster approximately equal, thereby achieving soft-balanced clustering. What's more, the DBSC model incorporates an anchor-based strategy to improve its scalability to large-scale datasets. The experimental results demonstrate that our proposed model outperforms existing methods in terms of both clustering performance and balance performance. Specifically, the clustering accuracy of DBSC on CMUPIE data achieved a 17.93% improvement compared with that of the SOTA methods (LABIN, EBSC, etc.).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...