Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
IEEE Trans Neural Netw Learn Syst ; 34(2): 973-986, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34432638

RESUMO

Most existing multiview clustering methods are based on the original feature space. However, the feature redundancy and noise in the original feature space limit their clustering performance. Aiming at addressing this problem, some multiview clustering methods learn the latent data representation linearly, while performance may decline if the relation between the latent data representation and the original data is nonlinear. The other methods which nonlinearly learn the latent data representation usually conduct the latent representation learning and clustering separately, resulting in that the latent data representation might be not well adapted to clustering. Furthermore, none of them model the intercluster relation and intracluster correlation of data points, which limits the quality of the learned latent data representation and therefore influences the clustering performance. To solve these problems, this article proposes a novel multiview clustering method via proximity learning in latent representation space, named multiview latent proximity learning (MLPL). For one thing, MLPL learns the latent data representation in a nonlinear manner which takes the intercluster relation and intracluster correlation into consideration simultaneously. For another, through conducting the latent representation learning and consensus proximity learning simultaneously, MLPL learns a consensus proximity matrix with k connected components to output the clustering result directly. Extensive experiments are conducted on seven real-world datasets to demonstrate the effectiveness and superiority of the MLPL method compared with the state-of-the-art multiview clustering methods.

2.
IEEE Trans Neural Netw Learn Syst ; 34(12): 9671-9684, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-35324448

RESUMO

Session-based recommendation tries to make use of anonymous session data to deliver high-quality recommendations under the condition that user profiles and the complete historical behavioral data of a target user are unavailable. Previous works consider each session individually and try to capture user interests within a session. Despite their encouraging results, these models can only perceive intra-session items and cannot draw upon the massive historical relational information. To solve this problem, we propose a novel method named global graph guided session-based recommendation (G3SR). G3SR decomposes the session-based recommendation workflow into two steps. First, a global graph is built upon all session data, from which the global item representations are learned in an unsupervised manner. Then, these representations are refined on session graphs under the graph networks, and a readout function is used to generate session representations for each session. Extensive experiments on two real-world benchmark datasets show remarkable and consistent improvements of the G3SR method over the state-of-the-art methods, especially for cold items.

3.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 489-507, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35130146

RESUMO

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.


Assuntos
Algoritmos , Realidade Virtual , Humanos , Aprendizagem
4.
IEEE Trans Cybern ; 53(10): 6636-6648, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37021985

RESUMO

Multiparty learning is an indispensable technique to improve the learning performance via integrating data from multiple parties. Unfortunately, directly integrating multiparty data could not meet the privacy-preserving requirements, which then induces the development of privacy-preserving machine learning (PPML), a key research task in multiparty learning. Despite this, the existing PPML methods generally cannot simultaneously meet multiple requirements, such as security, accuracy, efficiency, and application scope. To deal with the aforementioned problems, in this article, we present a new PPML method based on the secure multiparty interactive protocol, namely, the multiparty secure broad learning system (MSBLS) and derive its security analysis. To be specific, the proposed method employs the interactive protocol and random mapping to generate the mapped features of data, and then uses efficient broad learning to train the neural network classifier. To the best of our knowledge, this is the first attempt for privacy computing method that jointly combines secure multiparty computing and neural network. Theoretically, this method can ensure that the accuracy of the model will not be reduced due to encryption, and the calculation speed is very fast. Three classical datasets are adopted to verify our conclusion.

5.
IEEE Trans Cybern ; 52(11): 12231-12244, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33961570

RESUMO

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multilevel diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this article proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can be thereby constructed. Furthermore, an entropy-based criterion is utilized to explore the cluster wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state of the art. The source code is available at https://github.com/huangdonghere/MDEC.


Assuntos
Benchmarking , Neoplasias , Algoritmos , Análise por Conglomerados , Humanos , Neoplasias/genética , Software
6.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6074-6093, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-34048336

RESUMO

In conventional person re-identification (re-id), the images used for model training in the training probe set and training gallery set are all assumed to be instance-level samples that are manually labeled from raw surveillance video (likely with the assistance of detection) in a frame-by-frame manner. This labeling across multiple non-overlapping camera views from raw video surveillance is expensive and time consuming. To overcome these issues, we consider a weakly supervised person re-id modeling that aims to find the raw video clips where a given target person appears. In our weakly supervised setting, during training, given a sample of a person captured in one camera view, our weakly supervised approach aims to train a re-id model without further instance-level labeling for this person in another camera view. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. The weakly supervised person re-id is challenging since it not only suffers from the difficulties occurring in conventional person re-id (e.g., visual ambiguity and appearance variations caused by occlusions, pose variations, background clutter, etc.), but more importantly, is also challenged by weakly supervised information because the instance-level labels and the ground-truth locations for person instances (i.e., the ground-truth bounding boxes of person instances) are absent. To solve the weakly supervised person re-id problem, we develop deep graph metric learning (DGML). On the one hand, DGML measures the consistency between intra-video spatial graphs of consecutive frames, where the spatial graph captures neighborhood relationship about the detected person instances in each frame. On the other hand, DGML distinguishes the inter-video spatial graphs captured from different camera views at different sites simultaneously. To further explicitly embed weak supervision into the DGML and solve the weakly supervised person re-id problem, we introduce weakly supervised regularization (WSR), which utilizes multiple weak video-level labels to learn discriminative features by means of a weak identity loss and a cross-video alignment loss. We conduct extensive experiments to demonstrate the feasibility of the weakly supervised person re-id approach and its special cases (e.g., its bag-to-bag extension) and show that the proposed DGML is effective.


Assuntos
Identificação Biométrica , Algoritmos , Identificação Biométrica/métodos , Humanos
7.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3386-3403, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-33571087

RESUMO

Despite the remarkable progress achieved in conventional instance segmentation, the problem of predicting instance segmentation results for unobserved future frames remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting features of future frames. However, these methods always treat features of multiple levels (e.g., coarse-to-fine pyramid features) independently and do not exploit them collaboratively, which results in inaccurate prediction for future frames; and moreover, such a weakness can partially hinder self-adaption of a future segmentation prediction model for different input samples. To solve this problem, we propose an adaptive aggregation approach called Auto-Path Aggregation Network (APANet), where the spatio-temporal contextual information obtained in the features of each individual level is selectively aggregated using the developed "auto-path". The "auto-path" connects each pair of features extracted at different pyramid levels for task-specific hierarchical contextual information aggregation, which enables selective and adaptive aggregation of pyramid features in accordance with different videos/frames. Our APANet can be further optimized jointly with the Mask R-CNN head as a feature decoder and a Feature Pyramid Network (FPN) feature encoder, forming a joint learning system for future instance segmentation prediction. We experimentally show that the proposed method can achieve state-of-the-art performance on three video-based instance segmentation benchmarks for future instance segmentation prediction.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aprendizagem
8.
IEEE Trans Cybern ; 52(6): 5229-5241, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33156800

RESUMO

In recent years, the recommender system has been widely used in online platforms, which can extract useful information from giant volumes of data and recommend suitable items to the user according to user preferences. However, the recommender system usually suffers from sparsity and cold-start problems. Cross-domain recommendation, as a particular example of transfer learning, has been used to solve the aforementioned problems. However, many existing cross-domain recommendation approaches are based on matrix factorization, which can only learn the shallow and linear characteristics of users and items. Therefore, in this article, we propose a novel autoencoder framework with an attention mechanism (AAM) for cross-domain recommendation, which can transfer and fuse information between different domains and make a more accurate rating prediction. The main idea of the proposed framework lies in utilizing autoencoder, multilayer perceptron, and self-attention to extract user and item features, learn the user and item-latent factors, and fuse the user-latent factors from different domains, respectively. In addition, to learn the affinity of the user-latent factors between different domains in a multiaspect level, we also strengthen the self-attention mechanism by using multihead self-attention and propose AAM++. Experiments conducted on two real-world datasets empirically demonstrate that our proposed methods outperform the state-of-the-art methods in cross-domain recommendation and AAM++ performs better than AAM on sparse and large-scale datasets.


Assuntos
Aprendizagem , Redes Neurais de Computação
9.
Artigo em Inglês | MEDLINE | ID: mdl-35839201

RESUMO

As a challenging problem, incomplete multi-view clustering (MVC) has drawn much attention in recent years. Most of the existing methods contain the feature recovering step inevitably to obtain the clustering result of incomplete multi-view datasets. The extra target of recovering the missing feature in the original data space or common subspace is difficult for unsupervised clustering tasks and could accumulate mistakes during the optimization. Moreover, the biased error is not taken into consideration in the previous graph-based methods. The biased error represents the unexpected change of incomplete graph structure, such as the increase in the intra-class relation density and the missing local graph structure of boundary instances. It would mislead those graph-based methods and degrade their final performance. In order to overcome these drawbacks, we propose a new graph-based method named Graph Structure Refining for Incomplete MVC (GSRIMC). GSRIMC avoids recovering feature steps and just fully explores the existing subgraphs of each view to produce superior clustering results. To handle the biased error, the biased error separation is the core step of GSRIMC. In detail, GSRIMC first extracts basic information from the precomputed subgraph of each view and then separates refined graph structure from biased error with the help of tensor nuclear norm. Besides, cross-view graph learning is proposed to capture the missing local graph structure and complete the refined graph structure based on the complementary principle. Extensive experiments show that our method achieves better performance than other state-of-the-art baselines.

10.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6726-6736, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-34081589

RESUMO

To alleviate the sparsity issue, many recommender systems have been proposed to consider the review text as the auxiliary information to improve the recommendation quality. Despite success, they only use the ratings as the ground truth for error backpropagation. However, the rating information can only indicate the users' overall preference for the items, while the review text contains rich information about the users' preferences and the attributes of the items. In real life, reviews with the same rating may have completely opposite semantic information. If only the ratings are used for error backpropagation, the latent factors of these reviews will tend to be consistent, resulting in the loss of a large amount of review information. In this article, we propose a novel deep model termed deep rating and review neural network (DRRNN) for recommendation. Specifically, compared with the existing models that adopt the review text as the auxiliary information, DRRNN additionally considers both the target rating and target review of the given user-item pair as ground truth for error backpropagation in the training stage. Therefore, we can keep more semantic information of the reviews while making rating predictions. Extensive experiments on four publicly available datasets demonstrate the effectiveness of the proposed DRRNN model in terms of rating prediction.


Assuntos
Redes Neurais de Computação , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA