Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 51
Filtrar
1.
Artículo en Inglés | MEDLINE | ID: mdl-38536695

RESUMEN

Few-shot image classification (FSIC) is beneficial for a variety of real-world scenarios, aiming to construct a recognition system with limited training data. In this article, we extend the original FSIC task by incorporating defense against malicious adversarial examples. This can be an arduous challenge because numerous deep learning-based approaches remain susceptible to adversarial examples, even when trained with ample amounts of data. Previous studies on this problem have predominantly concentrated on the meta-learning framework, which involves sampling numerous few-shot tasks during the training stage. In contrast, we propose a straightforward but effective baseline via learning robust and discriminative representations without tedious meta-task sampling, which can further be generalized to unforeseen adversarial FSIC tasks. Specifically, we introduce an adversarial-aware (AA) mechanism that exploits feature-level distinctions between the legitimate and the adversarial domains to provide supplementary supervision. Moreover, we design a novel adversarial reweighting training strategy to ameliorate the imbalance among adversarial examples. To further enhance the adversarial robustness without compromising discriminative features, we propose the cyclic feature purifier during the postprocessing projection, which can reduce the interference of unforeseen adversarial examples. Furthermore, our method can obtain robust feature embeddings that maintain superior transferability, even when facing cross-domain adversarial examples. Extensive experiments and systematic analyses demonstrate that our method achieves state-of-the-art robustness as well as natural performance among adversarially robust FSIC algorithms on three standard benchmarks by a substantial margin.

2.
Artículo en Inglés | MEDLINE | ID: mdl-38393853

RESUMEN

Group re-identification (GReID) aims to correctly associate group images belonging to the same group identity, which is a crucial task for video surveillance. Existing methods only model the member feature representations inside each image (regarded as spatial members), which leads to potential failures in long-term video surveillance due to cloth-changing behaviors. Therefore, we focus on a new task called cloth-changing group re-identification (CCGReID), which needs to consider group relationship modeling in GReID and robust group representation against cloth-changing members. In this paper, we propose the separable spatial-temporal residual graph (SSRG) for CCGReID. Unlike existing GReID methods, SSRG considers both spatial members inside each group image and temporal members among multiple group images with the same identity. Specifically, SSRG constructs full graphs for each group identity within the batched data, which will be completely and non-redundantly separated into the spatial member graph (SMG) and temporal member graph (TMG). SMG aims to extract group features from spatial members, and TMG improves the robustness of the cloth-changing members by feature propagation. The separability enables SSRG to be available in the inference rather than only assisting supervised training. The residual guarantees efficient SSRG learning for SMG and TMG. To expedite research in CCGReID, we develop two datasets, including GroupPRCC and GroupVC, based on the existing CCReID datasets. The experimental results show that SSRG achieves state-of-the-art performance, including the best accuracy and low degradation (only 2.15% on GroupVC). Moreover, SSRG can be well generalized to the GReID task. As a weakly supervised method, SSRG surpasses the performance of some supervised methods and even approaches the best performance on the CSG dataset.

3.
IEEE Trans Image Process ; 32: 3862-3872, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37428673

RESUMEN

Modern deep neural networks have made numerous breakthroughs in real-world applications, yet they remain vulnerable to some imperceptible adversarial perturbations. These tailored perturbations can severely disrupt the inference of current deep learning-based methods and may induce potential security hazards to artificial intelligence applications. So far, adversarial training methods have achieved excellent robustness against various adversarial attacks by involving adversarial examples during the training stage. However, existing methods primarily rely on optimizing injective adversarial examples correspondingly generated from natural examples, ignoring potential adversaries in the adversarial domain. This optimization bias can induce the overfitting of the suboptimal decision boundary, which heavily jeopardizes adversarial robustness. To address this issue, we propose Adversarial Probabilistic Training (APT) to bridge the distribution gap between the natural and adversarial examples via modeling the latent adversarial distribution. Instead of tedious and costly adversary sampling to form the probabilistic domain, we estimate the adversarial distribution parameters in the feature level for efficiency. Moreover, we decouple the distribution alignment based on the adversarial probability model and the original adversarial example. We then devise a novel reweighting mechanism for the distribution alignment by considering the adversarial strength and the domain uncertainty. Extensive experiments demonstrate the superiority of our adversarial probabilistic training method against various types of adversarial attacks in different datasets and scenarios.


Asunto(s)
Inteligencia Artificial , Redes Neurales de la Computación , Incertidumbre
4.
IEEE Trans Image Process ; 32: 3806-3820, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37418403

RESUMEN

We are concerned with retrieving a query person from multiple videos captured by a non-overlapping camera network. Existing methods often rely on purely visual matching or consider temporal constraints but ignore the spatial information of the camera network. To address this issue, we propose a pedestrian retrieval framework based on cross-camera trajectory generation that integrates both temporal and spatial information. To obtain pedestrian trajectories, we propose a novel cross-camera spatio-temporal model that integrates pedestrians' walking habits and the path layout between cameras to form a joint probability distribution. Such a cross-camera spatio-temporal model can be specified using sparsely sampled pedestrian data. Based on the spatio-temporal model, cross-camera trajectories can be extracted by the conditional random field model and further optimised by restricted non-negative matrix factorization. Finally, a trajectory re-ranking technique is proposed to improve the pedestrian retrieval results. To verify the effectiveness of our method, we construct the first cross-camera pedestrian trajectory dataset, the Person Trajectory Dataset, in real surveillance scenarios. Extensive experiments verify the effectiveness and robustness of the proposed method.

5.
IEEE Trans Image Process ; 32: 4517-4528, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37490374

RESUMEN

Surface-defect detection aims to accurately locate and classify defect areas in images via pixel-level annotations. Different from the objects in traditional image segmentation, defect areas comprise a small group of pixels with random shapes, characterized by uncommon textures and edges that are inconsistent with the normal surface patterns of industrial products. This task-specific knowledge is hardly considered in the current methods. Therefore, we propose a two-stage "promotion-suppression" transformer (PST) framework, which explicitly adopts the wavelet features to guide the network to focus on the detailed features in the images. Specifically, in the promotion stage, we propose the Haar augmentation module to improve the backbone's sensitivity to high-frequency details. However, the background noise is inevitably amplified as well because it also constitutes high-frequency information. Therefore, a quadratic feature-fusion module (QFFM) is proposed in the suppression stage, which exploits the two properties of noise: independence and attenuation. The QFFM analyzes the similarities and differences between noise and defect features to achieve noise suppression. Compared with the traditional linear-fusion approach, the QFFM is more sensitive to high-frequency details; thus, it can afford highly discriminative features. Extensive experiments are conducted on three datasets, namely DAGM, MT, and CRACK500, which demonstrate the superiority of the proposed PST framework.

6.
IEEE Trans Image Process ; 32: 3455-3464, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37327095

RESUMEN

We focus on addressing the problem of shadow removal for an image, and attempt to make a weakly supervised learning model that does not depend on the pixelwise-paired training samples, but only uses the samples with image-level labels that indicate whether an image contains shadow or not. To this end, we propose a deep reciprocal learning model that interactively optimizes the shadow remover and the shadow detector to improve the overall capability of the model. On the one hand, shadow removal is modeled as an optimization problem with a latent variable of the detected shadow mask. On the other hand, a shadow detector can be trained using the prior from the shadow remover. A self-paced learning strategy is employed to avoid fitting to intermediate noisy annotation during the interactive optimization. Furthermore, a color-maintenance loss and a shadow-attention discriminator are both designed to facilitate model optimization. Extensive experiments on the pairwise ISTD dataset, SRD dataset, and unpaired USR dataset demonstrate the superiority of the proposed deep reciprocal model.

7.
Artículo en Inglés | MEDLINE | ID: mdl-37318966

RESUMEN

Pose Guided Person Image Generation (PGPIG) is the task of transforming a person's image from the source pose to a target pose. Existing PGPIG methods often tend to learn an end-to-end transformation between the source image and the target image, but do not seriously consider two issues: 1) the PGPIG is an ill-posed problem, and 2) the texture mapping requires effective supervision. In order to alleviate these two challenges, we propose a novel method by incorporating Dual-task Pose Transformer Network and Texture Affinity learning mechanism (DPTN-TA). To assist the ill-posed source-to-target task learning, DPTN-TA introduces an auxiliary task, i.e., source-to-source task, by a Siamese structure and further explores the dual-task correlation. Specifically, the correlation is built by the proposed Pose Transformer Module (PTM), which can adaptively capture the fine-grained mapping between sources and targets and can promote the source texture transmission to enhance the details of the generated images. Moreover, we propose a novel texture affinity loss to better supervise the learning of texture mapping. In this way, the network is able to learn complex spatial transformations effectively. Extensive experiments show that our DPTN-TA can produce perceptually realistic person images under significant pose changes. Furthermore, our DPTN-TA is not limited to processing human bodies but can be flexibly extended to view synthesis of other objects, i.e., faces and chairs, outperforming the state-of-the-arts in terms of both LPIPS and FID. Our code is available at: https://github.com/PangzeCheung/Dual-task-Pose-Transformer-Network.

8.
IEEE Trans Image Process ; 32: 2580-2592, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37126633

RESUMEN

Attribute-based person search aims to find the target person from the gallery images based on the given query text. It often plays an important role in surveillance systems when visual information is not reliable, such as identifying a criminal from a few witnesses. Although recent works have made great progress, most of them neglect the attribute labeling problems that exist in the current datasets. Moreover, these problems also increase the risk of non-alignment between attribute texts and visual images, leading to large semantic gaps. To address these issues, in this paper, we propose Weak Semantic Embeddings (WSEs), which can modify the data distribution of the original attribute texts and thus improve the representability of attribute features. We also introduce feature graphs to learn more collaborative and calibrated information. Furthermore, the relationship modeled by our feature graphs between all semantic embeddings can reduce the semantic gap in text-to-image retrieval. Extensive evaluations on three challenging benchmarks - PETA, Market-1501 Attribute, and PA100K, demonstrate the effectiveness of the proposed WSEs, and our method outperforms existing state-of-the-art methods.

9.
IEEE Trans Cybern ; 53(10): 6636-6648, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37021985

RESUMEN

Multiparty learning is an indispensable technique to improve the learning performance via integrating data from multiple parties. Unfortunately, directly integrating multiparty data could not meet the privacy-preserving requirements, which then induces the development of privacy-preserving machine learning (PPML), a key research task in multiparty learning. Despite this, the existing PPML methods generally cannot simultaneously meet multiple requirements, such as security, accuracy, efficiency, and application scope. To deal with the aforementioned problems, in this article, we present a new PPML method based on the secure multiparty interactive protocol, namely, the multiparty secure broad learning system (MSBLS) and derive its security analysis. To be specific, the proposed method employs the interactive protocol and random mapping to generate the mapped features of data, and then uses efficient broad learning to train the neural network classifier. To the best of our knowledge, this is the first attempt for privacy computing method that jointly combines secure multiparty computing and neural network. Theoretically, this method can ensure that the accuracy of the model will not be reduced due to encryption, and the calculation speed is very fast. Three classical datasets are adopted to verify our conclusion.

10.
IEEE Trans Neural Netw Learn Syst ; 34(2): 973-986, 2023 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34432638

RESUMEN

Most existing multiview clustering methods are based on the original feature space. However, the feature redundancy and noise in the original feature space limit their clustering performance. Aiming at addressing this problem, some multiview clustering methods learn the latent data representation linearly, while performance may decline if the relation between the latent data representation and the original data is nonlinear. The other methods which nonlinearly learn the latent data representation usually conduct the latent representation learning and clustering separately, resulting in that the latent data representation might be not well adapted to clustering. Furthermore, none of them model the intercluster relation and intracluster correlation of data points, which limits the quality of the learned latent data representation and therefore influences the clustering performance. To solve these problems, this article proposes a novel multiview clustering method via proximity learning in latent representation space, named multiview latent proximity learning (MLPL). For one thing, MLPL learns the latent data representation in a nonlinear manner which takes the intercluster relation and intracluster correlation into consideration simultaneously. For another, through conducting the latent representation learning and consensus proximity learning simultaneously, MLPL learns a consensus proximity matrix with k connected components to output the clustering result directly. Extensive experiments are conducted on seven real-world datasets to demonstrate the effectiveness and superiority of the MLPL method compared with the state-of-the-art multiview clustering methods.

11.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 489-507, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-35130146

RESUMEN

Egocentric videos, which record the daily activities of individuals from a first-person point of view, have attracted increasing attention during recent years because of their growing use in many popular applications, including life logging, health monitoring and virtual reality. As a fundamental problem in egocentric vision, one of the tasks of egocentric action recognition aims to recognize the actions of the camera wearers from egocentric videos. In egocentric action recognition, relation modeling is important, because the interactions between the camera wearer and the recorded persons or objects form complex relations in egocentric videos. However, only a few of existing methods model the relations between the camera wearer and the interacting persons for egocentric action recognition, and moreover they require prior knowledge or auxiliary data to localize the interacting persons. In this work, we consider modeling the relations in a weakly supervised manner, i.e., without using annotations or prior knowledge about the interacting persons or objects, for egocentric action recognition. We form a weakly supervised framework by unifying automatic interactor localization and explicit relation modeling for the purpose of automatic relation modeling. First, we learn to automatically localize the interactors, i.e., the body parts of the camera wearer and the persons or objects that the camera wearer interacts with, by learning a series of keypoints directly from video data to localize the action-relevant regions with only action labels and some constraints on these keypoints. Second, more importantly, to explicitly model the relations between the interactors, we develop an ego-relational LSTM (long short-term memory) network with several candidate connections to model the complex relations in egocentric videos, such as the temporal, interactive, and contextual relations. In particular, to reduce human efforts and manual interventions needed to construct an optimal ego-relational LSTM structure, we search for the optimal connections by employing a differentiable network architecture search mechanism, which automatically constructs the ego-relational LSTM network to explicitly model different relations for egocentric action recognition. We conduct extensive experiments on egocentric video datasets to illustrate the effectiveness of our method.


Asunto(s)
Algoritmos , Realidad Virtual , Humanos , Aprendizaje
12.
IEEE Trans Neural Netw Learn Syst ; 34(12): 9671-9684, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-35324448

RESUMEN

Session-based recommendation tries to make use of anonymous session data to deliver high-quality recommendations under the condition that user profiles and the complete historical behavioral data of a target user are unavailable. Previous works consider each session individually and try to capture user interests within a session. Despite their encouraging results, these models can only perceive intra-session items and cannot draw upon the massive historical relational information. To solve this problem, we propose a novel method named global graph guided session-based recommendation (G3SR). G3SR decomposes the session-based recommendation workflow into two steps. First, a global graph is built upon all session data, from which the global item representations are learned in an unsupervised manner. Then, these representations are refined on session graphs under the graph networks, and a readout function is used to generate session representations for each session. Extensive experiments on two real-world benchmark datasets show remarkable and consistent improvements of the G3SR method over the state-of-the-art methods, especially for cold items.

13.
Artículo en Inglés | MEDLINE | ID: mdl-35839201

RESUMEN

As a challenging problem, incomplete multi-view clustering (MVC) has drawn much attention in recent years. Most of the existing methods contain the feature recovering step inevitably to obtain the clustering result of incomplete multi-view datasets. The extra target of recovering the missing feature in the original data space or common subspace is difficult for unsupervised clustering tasks and could accumulate mistakes during the optimization. Moreover, the biased error is not taken into consideration in the previous graph-based methods. The biased error represents the unexpected change of incomplete graph structure, such as the increase in the intra-class relation density and the missing local graph structure of boundary instances. It would mislead those graph-based methods and degrade their final performance. In order to overcome these drawbacks, we propose a new graph-based method named Graph Structure Refining for Incomplete MVC (GSRIMC). GSRIMC avoids recovering feature steps and just fully explores the existing subgraphs of each view to produce superior clustering results. To handle the biased error, the biased error separation is the core step of GSRIMC. In detail, GSRIMC first extracts basic information from the precomputed subgraph of each view and then separates refined graph structure from biased error with the help of tensor nuclear norm. Besides, cross-view graph learning is proposed to capture the missing local graph structure and complete the refined graph structure based on the complementary principle. Extensive experiments show that our method achieves better performance than other state-of-the-art baselines.

14.
IEEE Trans Cybern ; 52(6): 5229-5241, 2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33156800

RESUMEN

In recent years, the recommender system has been widely used in online platforms, which can extract useful information from giant volumes of data and recommend suitable items to the user according to user preferences. However, the recommender system usually suffers from sparsity and cold-start problems. Cross-domain recommendation, as a particular example of transfer learning, has been used to solve the aforementioned problems. However, many existing cross-domain recommendation approaches are based on matrix factorization, which can only learn the shallow and linear characteristics of users and items. Therefore, in this article, we propose a novel autoencoder framework with an attention mechanism (AAM) for cross-domain recommendation, which can transfer and fuse information between different domains and make a more accurate rating prediction. The main idea of the proposed framework lies in utilizing autoencoder, multilayer perceptron, and self-attention to extract user and item features, learn the user and item-latent factors, and fuse the user-latent factors from different domains, respectively. In addition, to learn the affinity of the user-latent factors between different domains in a multiaspect level, we also strengthen the self-attention mechanism by using multihead self-attention and propose AAM++. Experiments conducted on two real-world datasets empirically demonstrate that our proposed methods outperform the state-of-the-art methods in cross-domain recommendation and AAM++ performs better than AAM on sparse and large-scale datasets.


Asunto(s)
Aprendizaje , Redes Neurales de la Computación
15.
IEEE Trans Pattern Anal Mach Intell ; 44(7): 3386-3403, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-33571087

RESUMEN

Despite the remarkable progress achieved in conventional instance segmentation, the problem of predicting instance segmentation results for unobserved future frames remains challenging due to the unobservability of future data. Existing methods mainly address this challenge by forecasting features of future frames. However, these methods always treat features of multiple levels (e.g., coarse-to-fine pyramid features) independently and do not exploit them collaboratively, which results in inaccurate prediction for future frames; and moreover, such a weakness can partially hinder self-adaption of a future segmentation prediction model for different input samples. To solve this problem, we propose an adaptive aggregation approach called Auto-Path Aggregation Network (APANet), where the spatio-temporal contextual information obtained in the features of each individual level is selectively aggregated using the developed "auto-path". The "auto-path" connects each pair of features extracted at different pyramid levels for task-specific hierarchical contextual information aggregation, which enables selective and adaptive aggregation of pyramid features in accordance with different videos/frames. Our APANet can be further optimized jointly with the Mask R-CNN head as a feature decoder and a Feature Pyramid Network (FPN) feature encoder, forming a joint learning system for future instance segmentation prediction. We experimentally show that the proposed method can achieve state-of-the-art performance on three video-based instance segmentation benchmarks for future instance segmentation prediction.


Asunto(s)
Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación , Algoritmos , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje
16.
IEEE Trans Cybern ; 52(11): 12231-12244, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33961570

RESUMEN

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multilevel diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this article proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can be thereby constructed. Furthermore, an entropy-based criterion is utilized to explore the cluster wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state of the art. The source code is available at https://github.com/huangdonghere/MDEC.


Asunto(s)
Benchmarking , Neoplasias , Algoritmos , Análisis por Conglomerados , Humanos , Neoplasias/genética , Programas Informáticos
17.
IEEE Trans Neural Netw Learn Syst ; 33(11): 6726-6736, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-34081589

RESUMEN

To alleviate the sparsity issue, many recommender systems have been proposed to consider the review text as the auxiliary information to improve the recommendation quality. Despite success, they only use the ratings as the ground truth for error backpropagation. However, the rating information can only indicate the users' overall preference for the items, while the review text contains rich information about the users' preferences and the attributes of the items. In real life, reviews with the same rating may have completely opposite semantic information. If only the ratings are used for error backpropagation, the latent factors of these reviews will tend to be consistent, resulting in the loss of a large amount of review information. In this article, we propose a novel deep model termed deep rating and review neural network (DRRNN) for recommendation. Specifically, compared with the existing models that adopt the review text as the auxiliary information, DRRNN additionally considers both the target rating and target review of the given user-item pair as ground truth for error backpropagation in the training stage. Therefore, we can keep more semantic information of the reviews while making rating predictions. Extensive experiments on four publicly available datasets demonstrate the effectiveness of the proposed DRRNN model in terms of rating prediction.


Asunto(s)
Redes Neurales de la Computación , Semántica
18.
IEEE Trans Pattern Anal Mach Intell ; 44(10): 6074-6093, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-34048336

RESUMEN

In conventional person re-identification (re-id), the images used for model training in the training probe set and training gallery set are all assumed to be instance-level samples that are manually labeled from raw surveillance video (likely with the assistance of detection) in a frame-by-frame manner. This labeling across multiple non-overlapping camera views from raw video surveillance is expensive and time consuming. To overcome these issues, we consider a weakly supervised person re-id modeling that aims to find the raw video clips where a given target person appears. In our weakly supervised setting, during training, given a sample of a person captured in one camera view, our weakly supervised approach aims to train a re-id model without further instance-level labeling for this person in another camera view. The weak setting refers to matching a target person with an untrimmed gallery video where we only know that the identity appears in the video without the requirement of annotating the identity in any frame of the video during the training procedure. The weakly supervised person re-id is challenging since it not only suffers from the difficulties occurring in conventional person re-id (e.g., visual ambiguity and appearance variations caused by occlusions, pose variations, background clutter, etc.), but more importantly, is also challenged by weakly supervised information because the instance-level labels and the ground-truth locations for person instances (i.e., the ground-truth bounding boxes of person instances) are absent. To solve the weakly supervised person re-id problem, we develop deep graph metric learning (DGML). On the one hand, DGML measures the consistency between intra-video spatial graphs of consecutive frames, where the spatial graph captures neighborhood relationship about the detected person instances in each frame. On the other hand, DGML distinguishes the inter-video spatial graphs captured from different camera views at different sites simultaneously. To further explicitly embed weak supervision into the DGML and solve the weakly supervised person re-id problem, we introduce weakly supervised regularization (WSR), which utilizes multiple weak video-level labels to learn discriminative features by means of a weak identity loss and a cross-video alignment loss. We conduct extensive experiments to demonstrate the feasibility of the weakly supervised person re-id approach and its special cases (e.g., its bag-to-bag extension) and show that the proposed DGML is effective.


Asunto(s)
Identificación Biométrica , Algoritmos , Identificación Biométrica/métodos , Humanos
19.
IEEE Trans Image Process ; 31: 352-365, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34807829

RESUMEN

Learning discriminative and rich features is an important research task for person re-identification. Previous studies have attempted to capture global and local features at the same time and layer of the model in a non-interactive manner, which are called synchronous learning. However, synchronous learning leads to high similarity, and further defects in model performance. To this end, we propose asynchronous learning based on the human visual perception mechanism. Asynchronous learning emphasizes the time asynchrony and space asynchrony of feature learning and achieves mutual promotion and cyclical interaction for feature learning. Furthermore, we design a dynamic progressive refinement module to improve local features with the guidance of global features. The dynamic property allows this module to adaptively adjust the network parameters according to the input image, in both the training and testing stage. The progressive property narrows the semantic gap between the global and local features, which is due to the guidance of global features. Finally, we have conducted several experiments on four datasets, including Market1501, CUHK03, DukeMTMC-ReID, and MSMT17. The experimental results show that asynchronous learning can effectively improve feature discrimination and achieve strong performance.


Asunto(s)
Algoritmos , Procesamiento de Imagen Asistido por Computador , Humanos , Semántica
20.
IEEE Trans Image Process ; 30: 8019-8033, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34534082

RESUMEN

Person re-identification across visible and near-infrared cameras (VIS-NIR Re-ID) has widespread applications. The challenge of this task lies in heterogeneous image matching. Existing methods attempt to learn discriminative features via complex feature extraction strategies. Nevertheless, the distributions of visible and near-infrared features are disparate caused by modal gap, which significantly affects feature metric and makes the performance of the existing models poor. To address this problem, we propose a novel approach from the perspective of metric learning. We conduct metric learning on a well-designed angular space. Geometrically, features are mapped from the original space to the hypersphere manifold, which eliminates the variations of feature norm and concentrates on the angle between the feature and the target category. Specifically, we propose a cyclic projection network (CPN) that transforms features into an angle-related space while identity information is preserved. Furthermore, we proposed three kinds of loss functions, AICAL, LAL and DAL, in angular space for angular metric learning. Multiple experiments on two existing public datasets, SYSU-MM01 and RegDB, show that performance of our method greatly exceeds the SOTA performance.


Asunto(s)
Identificación Biométrica , Algoritmos , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...