ABSTRACT
Recently, graph-based unsupervised feature selection algorithms (GUFS) have been shown to efficiently handle prevalent high-dimensional unlabeled data. One common drawback associated with existing graph-based approaches is that they tend to be time-consuming and in need of large storage, especially when faced with the increasing size of data. Research has started using anchors to accelerate graph-based learning model for feature selection, while the hard linear constraint between the data matrix and the lower-dimensional representation is usually overstrict in many applications. In this letter, we propose a flexible linearization model with anchor graph and â21 -norm regularization, which can deal with large-scale data sets and improve the performance of the existing anchor-based method. In addition, the anchor-based graph Laplacian is constructed to characterize the manifold embedding structure by means of a parameter-free adaptive neighbor assignment strategy. An efficient iterative algorithm is developed to address the optimization problem, and we also prove the convergence of the algorithm. Experiments on several public data sets demonstrate the effectiveness and efficiency of the method we propose.
ABSTRACT
Graph-based clustering methods perform clustering on a fixed input data graph. Thus such clustering results are sensitive to the particular graph construction. If this initial construction is of low quality, the resulting clustering may also be of low quality. We address this drawback by allowing the data graph itself to be adaptively adjusted in the clustering procedure. In particular, our proposed weight adaptive Laplacian (WAL) method learns a new data similarity matrix that can adaptively adjust the initial graph according to the similarity weight in the input data graph. We develop three versions of these methods based on the L2-norm, fuzzy entropy regularizer, and another exponential-based weight strategy, that yield three new graph-based clustering objectives. We derive optimization algorithms to solve these objectives. Experimental results on synthetic data sets and real-world benchmark data sets exhibit the effectiveness of these new graph-based clustering methods.
ABSTRACT
Since combining features from heterogeneous data sources can significantly boost classification performance in many applications, it has attracted much research attention over the past few years. Most of the existing multiview feature analysis approaches separately learn features in each view, ignoring knowledge shared by multiple views. Different views of features may have some intrinsic correlations that might be beneficial to feature learning. Therefore, it is assumed that multiviews share subspaces from which common knowledge can be discovered. In this letter, we propose a new multiview feature learning algorithm, aiming to exploit common features shared by different views. To achieve this goal, we propose a feature learning algorithm in a batch mode, by which the correlations among different views are taken into account. Multiple transformation matrices for different views are simultaneously learned in a joint framework. In this way, our algorithm can exploit potential correlations among views as supplementary information that further improves the performance result. Since the proposed objective function is nonsmooth and difficult to solve directly, we propose an iterative algorithm for effective optimization. Extensive experiments have been conducted on a number of real-world data sets. Experimental results demonstrate superior performance in terms of classification against all the compared approaches. Also, the convergence guarantee has been validated in the experiment.
ABSTRACT
Spectral clustering is a key research topic in the field of machine learning and data mining. Most of the existing spectral clustering algorithms are built on gaussian Laplacian matrices, which is sensitive to parameters. We propose a novel parameter-free distance-consistent locally linear embedding. The proposed distance-consistent LLE can promise that edges between closer data points are heavier. We also propose a novel improved spectral clustering via embedded label propagation. Our algorithm is built on two advancements of the state of the art. First is label propagation, which propagates a node's labels to neighboring nodes according to their proximity. We perform standard spectral clustering on original data and assign each cluster with [Formula: see text]-nearest data points and then we propagate labels through dense unlabeled data regions. Second is manifold learning, which has been widely used for its capacity to leverage the manifold structure of data points. Extensive experiments on various data sets validate the superiority of the proposed algorithm compared to state-of-the-art spectral algorithms.
ABSTRACT
Robust principal component analysis (PCA) is one of the most important dimension-reduction techniques for handling high-dimensional data with outliers. However, most of the existing robust PCA presupposes that the mean of the data is zero and incorrectly utilizes the average of data as the optimal mean of robust PCA. In fact, this assumption holds only for the squared [Formula: see text]-norm-based traditional PCA. In this letter, we equivalently reformulate the objective of conventional PCA and learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on [Formula: see text]-norm. The proposed method is robust to outliers and also invariant to rotation. More important, the reformulated objective not only automatically avoids the calculation of optimal mean and makes the assumption of centered data unnecessary, but also theoretically connects to the minimization of reconstruction error. To solve the proposed nonsmooth problem, we exploit an efficient optimization algorithm to soften the contributions from outliers by reweighting each data point iteratively. We theoretically analyze the convergence and computational complexity of the proposed algorithm. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.
ABSTRACT
In recent years, unsupervised two-dimensional (2D) dimensionality reduction methods for unlabeled large-scale data have made progress. However, performance of these degrades when the learning of similarity matrix is at the beginning of the dimensionality reduction process. A similarity matrix is used to reveal the underlying geometry structure of data in unsupervised dimensionality reduction methods. Because of noise data, it is difficult to learn the optimal similarity matrix. In this letter, we propose a new dimensionality reduction model for 2D image matrices: unsupervised 2D dimensionality reduction with adaptive structure learning (DRASL). Instead of using a predetermined similarity matrix to characterize the underlying geometry structure of the original 2D image space, our proposed approach involves the learning of a similarity matrix in the procedure of dimensionality reduction. To realize a desirable neighbors assignment after dimensionality reduction, we add a constraint to our model such that there are exact [Formula: see text] connected components in the final subspace. To accomplish these goals, we propose a unified objective function to integrate dimensionality reduction, the learning of the similarity matrix, and the adaptive learning of neighbors assignment into it. An iterative optimization algorithm is proposed to solve the objective function. We compare the proposed method with several 2D unsupervised dimensionality methods. K-means is used to evaluate the clustering performance. We conduct extensive experiments on Coil20, AT&T, FERET, USPS, and Yale data sets to verify the effectiveness of our proposed method.
ABSTRACT
Differential Evolution (DE) stands as a potent global optimization algorithm, renowned for its application in addressing a myriad of practical engineering issues. The efficacy of DE is profoundly influenced by its control parameters and mutation strategies. In light of this, we introduce a refined DE algorithm characterized by adaptive parameters and dual mutation strategies (APDSDE). APDSDE inaugurates an adaptive switching mechanism that alternates between two innovative mutation strategies: DE/current-to-pBest-w/1 and DE/current-to-Amean-w/1. Furthermore, a novel parameter adaptation technique rooted in cosine similarity is established, with the derivation of explicit calculation formulas for both the scaling factor weight and crossover rate weight. In pursuit of optimizing convergence speed whilst preserving population diversity, a sophisticated nonlinear population size reduction method is proposed. The robustness of each algorithm is rigorously evaluated against the CEC2017 benchmark functions, with empirical evidence underscoring the superior performance of APDSDE in comparison to a host of advanced DE variants.
ABSTRACT
Clustering aims to partition a set of objects into different groups through the internal nature of these objects. Most existing methods face intractable hyper-parameter problems triggered by various regularization terms, which degenerates the applicability of models. Moreover, traditional graph clustering methods always encounter the expensive time overhead. To this end, we propose a Fast Clustering model with Anchor Guidance (FCAG). The proposed model not only avoids trivial solutions without extra regularization terms, but is also suitable to deal with large-scale problems by utilizing the prior knowledge of the bipartite graph. Moreover, the proposed FCAG can cope with out-of-sample extension problems. Three optimization methods Projected Gradient Descent (PGD) method, Iteratively Re-Weighted (IRW) algorithm and Coordinate Descent (CD) algorithm are proposed to solve FCAG. Extensive experiments verify the superiority of the optimization method CD. Besides, compared with other bipartite graph models, FCAG has the better performance with the less time cost. In addition, we prove through theory and experiment that when the learning rate of PGD tends to infinite, PGD is equivalent to IRW.
ABSTRACT
Linear discriminant analysis (LDA) is a classic tool for supervised dimensionality reduction. Because the projected samples can be classified effectively, LDA has been successfully applied in many applications. Among the variants of LDA, trace ratio LDA (TR-LDA) is a classic form due to its explicit meaning. Unfortunately, when the sample size is much smaller than the data dimension, the algorithm for solving TR-LDA does not converge. The so-called small sample size (SSS) problem severely limits the application of TR-LDA. To solve this problem, we propose a revised formation of TR-LDA, which can be applied to datasets with different sizes in a unified form. Then, we present an optimization algorithm to solve the proposed method, explain why it can avoid the SSS problem, and analyze the convergence and computational complexity of the optimization algorithm. Next, based on the introduced theorems, we quantitatively elaborate on when the SSS problem will occur in TR-LDA. Finally, the experimental results on real-world datasets demonstrate the effectiveness of the proposed method.
ABSTRACT
Recently, more and more real-world datasets have been composed of heterogeneous but related features from diverse views. Multiview clustering provides a promising attempt at a solution for partitioning such data according to heterogeneous information. However, most existing methods suffer from hyper-parameter tuning trouble and high computational cost. Besides, there is still an opportunity for improvement in clustering performance. To this end, a novel multiview framework, called parameter-free multiview k -means clustering with coordinate descent method (PFMVKM), is presented to address the above problems. Specifically, PFMVKM is completely parameter-free and learns the weights via a self-weighted scheme, which can avoid the intractable process of hyper-parameters tuning. Moreover, our model is capable of directly calculating the cluster indicator matrix, with no need to learn the cluster centroid matrix and the indicator matrix simultaneously as previous multiview methods have to do. What's more, we propose an efficient optimization algorithm utilizing the idea of coordinate descent, which can not only reduce the computational complexity but also improve the clustering performance. Extensive experiments on various types of real datasets illustrate that the proposed method outperforms existing state-of-the-art competitors and conforms well with the actual situation.
ABSTRACT
Clustering is a fundamental topic in machine learning and various methods are proposed, in which K-Means (KM) and min cut clustering are typical ones. However, they may produce empty or skewed clustering results, which are not as expected. In KM, the constrained clustering methods have been fully studied while in min cut clustering, it still needs to be developed. In this paper, we propose a parameter-insensitive min cut clustering with flexible size constraints. Specifically, we add lower limitations on the number of samples for each cluster, which can perfectly avoid the trivial solution in min cut clustering. As far as we are concerned, this is the first attempt of directly incorporating size constraints into min cut. However, it is a NP-hard problem and difficult to solve. Thus, the upper limits is also added in but it is still difficult to solve. Therefore, an additional variable that is equivalent to label matrix is introduced in and the augmented Lagrangian multiplier (ALM) is used to decouple the constraints. In the experiments, we find that the our algorithm is less sensitive to lower bound and is practical in image segmentation. A large number of experiments demonstrate the effectiveness of our proposed algorithm.
ABSTRACT
Spectral clustering has been attracting increasing attention due to its well-defined framework and excellent performance. However, most traditional spectral clustering methods consist of two separate steps: 1) Solving a relaxed optimization problem to learn the continuous clustering labels, and 2) Rounding the continuous clustering labels into discrete ones. The clustering results of the relax-and-discretize strategy inevitably result in information loss and unsatisfactory clustering performance. Moreover, the similarity matrix constructed from original data may not be optimal for clustering since data usually have noise and redundancy. To address these problems, we propose a novel and effective algorithm to directly optimize the original spectral clustering model, called Direct Spectral Clustering (DSC). We theoretically prove that the original spectral clustering model can be solved by simultaneously learning a weighted discrete indicator matrix and a structured similarity matrix whose connected components are equal to the number of clusters. Both of them can be used to directly obtain the final clustering results without any post-processing. Further, an effective iterative optimization algorithm is exploited to solve the proposed method. Extensive experiments performed on synthetic and real-world datasets demonstrate the superiority and effectiveness of the proposed method compared to the state-of-the-art algorithms.
ABSTRACT
Classification is a fundamental task in the field of data mining. Unfortunately, high-dimensional data often degrade the performance of classification. To solve this problem, dimensionality reduction is usually adopted as an essential preprocessing technique, which can be divided into feature extraction and feature selection. Due to the ability to obtain category discrimination, linear discriminant analysis (LDA) is recognized as a classic feature extraction method for classification. Compared with feature extraction, feature selection has plenty of advantages in many applications. If we can integrate the discrimination of LDA and the advantages of feature selection, it is bound to play an important role in the classification of high-dimensional data. Motivated by the idea, we propose a supervised feature selection method for classification. It combines trace ratio LDA with l2,p -norm regularization and imposes the orthogonal constraint on the projection matrix. The learned row-sparse projection matrix can be used to select discriminative features. Then, we present an optimization algorithm to solve the proposed method. Finally, the extensive experiments on both synthetic and real-world datasets indicate the effectiveness of the proposed method.
ABSTRACT
The affinity graph is regarded as a mathematical representation of the local manifold structure. The performance of locality-preserving projections (LPPs) and its variants is tied to the quality of the affinity graph. However, there are two drawbacks in current approaches. First, the pre-designed graph is inconsistent with the actual distribution of data. Second, the linear projection way would cause damage to the nonlinear manifold structure. In this article, we propose a nonlinear dimensionality reduction model, named deep locality-preserving projections (DLPPs), to solve these problems simultaneously. The model consists of two loss functions, each employing deep autoencoders (AEs) to extract discriminative features. In the first loss function, the affinity relationships among samples in the intermediate layer are determined adaptively according to the distances between samples. Since the features of samples are obtained by nonlinear mapping, the manifold structure can be kept in the low-dimensional space. Additionally, the learned affinity graph is able to avoid the influence of noisy and redundant features. In the second loss function, the affinity relationships among samples in the last layer (also called the reconstruction layer) are learned. This strategy enables denoised samples to have a good manifold structure. By integrating these two functions, our proposed model minimizes the mismatch of the manifold structure between samples in the denoising space and the low-dimensional space, while reducing sensitivity to the initial weights of the graph. Extensive experiments on toy and benchmark datasets have been conducted to verify the effectiveness of our proposed model.
ABSTRACT
Feature selection plays an important role in data analysis, yet traditional graph-based methods often produce suboptimal results. These methods typically follow a two-stage process: constructing a graph with data-to-data affinities or a bipartite graph with data-to-anchor affinities and independently selecting features based on their scores. In this article, a large-scale feature selection approach based on structured bipartite graph and row-sparse projection (RS 2 BLFS) is proposed to overcome this limitation. RS 2 BLFS integrates the construction of a structured bipartite graph consisting of c connected components into row-sparse projection learning with k nonzero rows. This integration allows for the joint selection of an optimal feature subset in an unsupervised manner. Notably, the c connected components of the structured bipartite graph correspond to c clusters, each with multiple subcluster centers. This feature makes RS 2 BLFS particularly effective for feature selection and clustering on nonspherical large-scale data. An algorithm with theoretical analysis is developed to solve the optimization problem involved in RS 2 BLFS. Experimental results on synthetic and real-world datasets confirm its effectiveness in feature selection tasks.
ABSTRACT
In existing multiview clustering research, the comprehensive learning from multiview graph and feature spaces simultaneously remains insufficient when achieving a consistent clustering structure. In addition, a postprocessing step is often required. In light of these considerations, a cross-view approximation on Grassman manifold (CAGM) model is proposed to address inconsistencies within multiview adjacency matrices, feature matrices, and cross-view combinations from the two sources. The model uses a ratio-formed objective function, enabling parameter-free bidirectional fusion. Furthermore, the CAGM model incorporates a paired encoding mechanism to generate low-dimensional and orthogonal cross-view embeddings. Through the approximation of two measurable subspaces on the Grassmann manifold, the direct acquisition of the indicator matrix is realized. Furthermore, an effective optimization algorithm corresponding to the CAGM model is derived. Comprehensive experiments on four real-world datasets are conducted to substantiate the effectiveness of our proposed method.
ABSTRACT
Principal Component Analysis (PCA) aims to acquire the principal component space containing the essential structure of data, instead of being used for mining and extracting the essential structure of data. In other words, the principal component space contains not only information related to the essential structure of data but also some unrelated information. This frequently occurs when the intrinsic dimensionality of data is unknown or when it has complex distribution characteristics such as multi-modalities, manifolds, etc. Therefore, it is unreasonable to identify noise and useful information based solely on reconstruction error. For this reason, PCA is unsuitable as a preprocessing technique for most applications, especially in noisy environment. To solve this problem, this paper proposes robust PCA based on fuzzy local information reservation (FLIPCA). By analyzing the impact of reconstruction error on sample discriminability, FLIPCA provides a theoretical basis for noise identification and processing. This not only greatly improves its robustness but also extends its applicability and effectiveness as a data preprocessing technique. Meanwhile, FLIPCA maintains consistent mathematical descriptions with traditional PCA while having few adjustable hyperparameters and low algorithmic complexity. Finally, we conducted comprehensive experiments on synthetic and real-world datasets, which substantiated the superiority of our proposed algorithm.
ABSTRACT
Exploiting consistent structure from multiple graphs is vital for multi-view graph clustering. To achieve this goal, we propose an Efficient Balanced Multi-view Graph Clustering via Good Neighbor Fusion (EBMGC-GNF) model which comprehensively extracts credible consistent neighbor information from multiple views by designing a Cross-view Good Neighbors Voting module. Moreover, a novel balanced regularization term based on p-power function is introduced to adjust the balance property of clusters, which helps the model adapt to data with different distributions. To solve the optimization problem of EBMGC-GNF, we transform EBMGC-GNF into an efficient form with graph coarsening method and optimize it based on accelareted coordinate descent algorithm. In experiments, extensive results demonstrate that, in the majority of scenarios, our proposals outperform state-of-the-art methods in terms of both effectiveness and efficiency.
ABSTRACT
The goal of balanced clustering is partitioning data into distinct groups of equal size. Previous studies have attempted to address this problem by designing balanced regularizers or utilizing conventional clustering methods. However, these methods often rely solely on classic methods, which limits their performance and primarily focuses on low-dimensional data. Although neural networks exhibit effective performance on high-dimensional datasets, they struggle to effectively leverage prior knowledge for clustering with a balanced tendency. To overcome the above limitations, we propose deep semisupervised balanced clustering, which simultaneously learns clustering and generates balance-favorable representations. Our model is based on the autoencoder paradigm incorporating a semisupervised module. Specifically, we introduce a balance-oriented clustering loss and incorporate pairwise constraints into the penalty term as a pluggable module using the Lagrangian multiplier method. Theoretically, we ensure that the proposed model maintains a balanced orientation and provides a comprehensive optimization process. Empirically, we conducted extensive experiments on four datasets to demonstrate significant improvements in clustering performance and balanced measurements. Our code is available at https://github.com/DuannYu/BalancedSemi-TNNLS.
ABSTRACT
Multi-view learning has raised more and more attention in recent years. However, traditional approaches only focus on the difference while ignoring the consistency among views. It may make some views, with the situation of data abnormality or noise, ineffective in the progress of view learning. Besides, the current datasets have become high-dimensional and large-scale gradually. Therefore, this paper proposes a novel multi-view compressed subspace learning method via low-rank tensor constraint, which incorporates the clustering progress and multi-view learning into a unified framework. First, for each view, we take the partial samples to build a small-size dictionary, which can reduce the effect of both redundancy information and computation cost greatly. Then, to find the consistency and difference among views, we impose a low-rank tensor constraint on these representations and further design an auto-weighted mechanism to learn the optimal representation. Last, due to the non-square of the learned representation, the bipartite graph has been introduced, and under the structured constraint, the clustering results can be obtained directly from this graph without any post-processing. Extensive experiments on synthetic and real-world benchmark datasets demonstrate the efficacy and efficiency of our method, especially for the views with noise or outliers.