Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 60
Filtrar
1.
Sensors (Basel) ; 24(4)2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38400397

RESUMEN

Person Re-identification is the task of recognizing comparable subjects across a network of nonoverlapping cameras. This is typically achieved by extracting from the source image a vector of characteristic features of the specific person captured by the camera. Learning a good set of robust, invariant and discriminative features is a complex task, often leveraging contrastive learning. In this article, we explore a different approach, learning the representation of an individual as the conditioning information required to generate images of the specific person starting from random noise. In this way we decouple the identity of the individual from any other information relative to a specific instance (pose, background, etc.), allowing interesting transformations from one identity to another. As generative models, we use the recent diffusion models that have already proven their sensibility to conditioning in many different contexts. The results presented in this article serve as a proof-of-concept. While our current performance on common benchmarks is lower than state-of-the-art techniques, the approach is intriguing and rich of innovative insights, suggesting a wide range of potential improvements along various lines of investigation.

2.
Sensors (Basel) ; 24(7)2024 Mar 30.
Artículo en Inglés | MEDLINE | ID: mdl-38610439

RESUMEN

Video-based person re-identification (ReID) aims to exploit relevant features from spatial and temporal knowledge. Widely used methods include the part- and attention-based approaches for suppressing irrelevant spatial-temporal features. However, it is still challenging to overcome inconsistencies across video frames due to occlusion and imperfect detection. These mismatches make temporal processing ineffective and create an imbalance of crucial spatial information. To address these problems, we propose the Spatiotemporal Multi-Granularity Aggregation (ST-MGA) method, which is specifically designed to accumulate relevant features with spatiotemporally consistent cues. The proposed framework consists of three main stages: extraction, which extracts spatiotemporally consistent partial information; augmentation, which augments the partial information with different granularity levels; and aggregation, which effectively aggregates the augmented spatiotemporal information. We first introduce the consistent part-attention (CPA) module, which extracts spatiotemporally consistent and well-aligned attentive parts. Sub-parts derived from CPA provide temporally consistent semantic information, solving misalignment problems in videos due to occlusion or inaccurate detection, and maximize the efficiency of aggregation through uniform partial information. To enhance the diversity of spatial and temporal cues, we introduce the Multi-Attention Part Augmentation (MA-PA) block, which incorporates fine parts at various granular levels, and the Long-/Short-term Temporal Augmentation (LS-TA) block, designed to capture both long- and short-term temporal relations. Using densely separated part cues, ST-MGA fully exploits and aggregates the spatiotemporal multi-granular patterns by comparing relations between parts and scales. In the experiments, the proposed ST-MGA renders state-of-the-art performance on several video-based ReID benchmarks (i.e., MARS, DukeMTMC-VideoReID, and LS-VID).

3.
Entropy (Basel) ; 26(6)2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38920445

RESUMEN

To address challenges related to the inadequate representation and inaccurate discrimination of pedestrian attributes, we propose a novel method for person re-identification, which leverages global feature learning and classification optimization. Specifically, this approach integrates a Normalization-based Channel Attention Module into the fundamental ResNet50 backbone, utilizing a scaling factor to prioritize and enhance key pedestrian feature information. Furthermore, dynamic activation functions are employed to adaptively modulate the parameters of ReLU based on the input convolutional feature maps, thereby bolstering the nonlinear expression capabilities of the network model. By incorporating Arcface loss into the cross-entropy loss, the supervised model is trained to learn pedestrian features that exhibit significant inter-class variance while maintaining tight intra-class coherence. The evaluation of the enhanced model on two popular datasets, Market1501 and DukeMTMC-ReID, reveals improvements in Rank-1 accuracy by 1.28% and 1.4%, respectively, along with corresponding gains in the mean average precision (mAP) of 1.93% and 1.84%. These findings indicate that the proposed model is capable of extracting more robust pedestrian features, enhancing feature discriminability, and ultimately achieving superior recognition accuracy.

4.
Entropy (Basel) ; 26(8)2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39202151

RESUMEN

In order to minimize the disparity between visible and infrared modalities and enhance pedestrian feature representation, a cross-modality person re-identification method is proposed, which integrates modality generation and feature enhancement. Specifically, a lightweight network is used for dimension reduction and augmentation of visible images, and intermediate modalities are generated to bridge the gap between visible images and infrared images. The Convolutional Block Attention Module is embedded into the ResNet50 backbone network to selectively emphasize key features sequentially from both channel and spatial dimensions. Additionally, the Gradient Centralization algorithm is introduced into the Stochastic Gradient Descent optimizer to accelerate convergence speed and improve generalization capability of the network model. Experimental results on SYSU-MM01 and RegDB datasets demonstrate that our improved network model achieves significant performance gains, with an increase in Rank-1 accuracy of 7.12% and 6.34%, as well as an improvement in mAP of 4.00% and 6.05%, respectively.

5.
Sensors (Basel) ; 23(2)2023 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-36679571

RESUMEN

Person re-identification (Re-ID) plays an important role in the search for missing people and the tracking of suspects. Person re-identification based on deep learning has made great progress in recent years, and the application of the pedestrian contour feature has also received attention. In the study, we found that pedestrian contour feature is not enough in the representation of CNN. On this basis, in order to improve the recognition performance of Re-ID network, we propose a contour information extraction module (CIEM) and a contour information embedding method, so that the network can focus on more contour information. Our method is competitive in experimental data; the mAP of the dataset Market1501 reached 83.8% and Rank-1 reached 95.1%. The mAP of the DukeMTMC-reID dataset reached 73.5% and Rank-1 reached 86.8%. The experimental results show that adding contour information to the network can improve the recognition rate, and good contour features play an important role in Re-ID research.


Asunto(s)
Almacenamiento y Recuperación de la Información , Peatones , Humanos , Reconocimiento en Psicología , Registros
6.
Sensors (Basel) ; 23(17)2023 Aug 24.
Artículo en Inglés | MEDLINE | ID: mdl-37687838

RESUMEN

The idea of the person re-identification (Re-ID) task is to find the person depicted in the query image among other images obtained from different cameras. Algorithms solving this task have important practical applications, such as illegal action prevention and searching for missing persons through a smart city's video surveillance. In most of the papers devoted to the problem under consideration, the authors propose complex algorithms to achieve a better quality of person Re-ID. Some of these methods cannot be used in practice due to technical limitations. In this paper, we propose several approaches that can be used in almost all popular modern re-identification algorithms to improve the quality of the problem being solved and do not practically increase the computational complexity of algorithms. In real-world data, bad images can be fed into the input of the Re-ID algorithm; therefore, the new Filter Module is proposed in this paper, designed to pre-filter input data before feeding the data to the main re-identification algorithm. The Filter Module improves the quality of the baseline by 2.6% according to the Rank1 metric and 3.4% according to the mAP metric on the Market-1501 dataset. Furthermore, in this paper, a fully automated data collection strategy from surveillance cameras for self-supervised pre-training is proposed in order to increase the generality of neural networks on real-world data. The use of self-supervised pre-training on the data collected using the proposed strategy improves the quality of cross-domain upper-body Re-ID on the DukeMTMC-reID dataset by 1.0% according to the Rank1 and mAP metrics.

7.
Sensors (Basel) ; 23(19)2023 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-37836968

RESUMEN

Local feature extractions have been verified to be effective for person re-identification (re-ID) in recent literature. However, existing methods usually rely on extracting local features from single part of a pedestrian while neglecting the relationship of local features among different pedestrian images. As a result, local features contain limited information from one pedestrian image, and cannot benefit from other pedestrian images. In this paper, we propose a novel approach named Local Relation-Aware Graph Convolutional Network (LRGCN) to learn the relationship of local features among different pedestrian images. In order to completely describe the relationship of local features among different pedestrian images, we propose overlap graph and similarity graph. The overlap graph formulates the edge weight as the overlap node number in the node's neighborhoods so as to learn robust local features, and the similarity graph defines the edge weight as the similarity between the nodes to learn discriminative local features. To propagate the information for different kinds of nodes effectively, we propose the Structural Graph Convolution (SGConv) operation. Different from traditional graph convolution operations where all nodes share the same parameter matrix, SGConv learns different parameter matrices for the node itself and its neighbor nodes to improve the expressive power. We conduct comprehensive experiments to verify our method on four large-scale person re-ID databases, and the overall results show LRGCN exceeds the state-of-the-art methods.

8.
Sensors (Basel) ; 23(3)2023 Jan 27.
Artículo en Inglés | MEDLINE | ID: mdl-36772466

RESUMEN

Visible-infrared person re-identification (VIPR) has great potential for intelligent transportation systems for constructing smart cities, but it is challenging to utilize due to the huge modal discrepancy between visible and infrared images. Although visible and infrared data can appear to be two domains, VIPR is not identical to domain adaptation as it can massively eliminate modal discrepancies. Because VIPR has complete identity information on both visible and infrared modalities, once the domain adaption is overemphasized, the discriminative appearance information on the visible and infrared domains would drain. For that, we propose a novel margin-based modal adaptive learning (MMAL) method for VIPR in this paper. On each domain, we apply triplet and label smoothing cross-entropy functions to learn appearance-discriminative features. Between the two domains, we design a simple yet effective marginal maximum mean discrepancy (M3D) loss function to avoid an excessive suppression of modal discrepancies to protect the features' discriminative ability on each domain. As a result, our MMAL method could learn modal-invariant yet appearance-discriminative features for improving VIPR. The experimental results show that our MMAL method acquires state-of-the-art VIPR performance, e.g., on the RegDB dataset in the visible-to-infrared retrieval mode, the rank-1 accuracy is 93.24% and the mean average precision is 83.77%.

9.
Sensors (Basel) ; 23(11)2023 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-37299715

RESUMEN

Visible-infrared person re-identification aims to solve the matching problem between cross-camera and cross-modal person images. Existing methods strive to perform better cross-modal alignment, but often neglect the critical importance of feature enhancement for achieving better performance. Therefore, we proposed an effective method that combines both modal alignment and feature enhancement. Specifically, we introduced Visible-Infrared Modal Data Augmentation (VIMDA) for visible images to improve modal alignment. Margin MMD-ID Loss was also used to further enhance modal alignment and optimize model convergence. Then, we proposed Multi-Grain Feature Extraction (MGFE) Structure for feature enhancement to further improve recognition performance. Extensive experiments have been carried out on SYSY-MM01 and RegDB. The result indicates that our method outperforms the current state-of-the-art method for visible-infrared person re-identification. Ablation experiments verified the effectiveness of the proposed method.


Asunto(s)
Grano Comestible , Reconocimiento en Psicología , Humanos
10.
Sensors (Basel) ; 23(7)2023 Apr 02.
Artículo en Inglés | MEDLINE | ID: mdl-37050738

RESUMEN

Person re-identification (Re-ID) is a method for identifying the same individual via several non-interfering cameras. Person Re-ID has been felicitously applied to an assortment of computer vision applications. Due to the emergence of deep learning algorithms, person Re-ID techniques, which often involve the attention module, have gained remarkable success. Moreover, people's traits are mostly similar, which makes distinguishing between them complicated. This paper presents a novel approach for person Re-ID, by introducing a multi-part feature network, that combines the position attention module (PAM) and the efficient channel attention (ECA). The goal is to enhance the accuracy and robustness of person Re-ID methods through the use of attention mechanisms. The proposed multi-part feature network employs the PAM to extract robust and discriminative features by utilizing channel, spatial, and temporal context information. The PAM learns the spatial interdependencies of features and extracts a greater variety of contextual information from local elements, hence enhancing their capacity for representation. The ECA captures local cross-channel interaction and reduces the model's complexity, while maintaining accuracy. Inclusive experiments were executed on three publicly available person Re-ID datasets: Market-1501, DukeMTMC, and CUHK-03. The outcomes reveal that the suggested method outperforms existing state-of-the-art methods, and the rank-1 accuracy can achieve 95.93%, 89.77%, and 73.21% in trials on the public datasets Market-1501, DukeMTMC-reID, and CUHK03, respectively, and can reach 96.41%, 94.08%, and 91.21% after re-ranking. The proposed method demonstrates a high generalization capability and improves both quantitative and qualitative performance. Finally, the proposed multi-part feature network, with the combination of PAM and ECA, offers a promising solution for person Re-ID, by combining the benefits of temporal, spatial, and channel information. The results of this study evidence the effectiveness and potential of the suggested method for person Re-ID in computer vision applications.


Asunto(s)
Aprendizaje Profundo , Humanos , Algoritmos , Fenotipo
11.
Sensors (Basel) ; 23(10)2023 May 19.
Artículo en Inglés | MEDLINE | ID: mdl-37430819

RESUMEN

Pedestrian tracking is a challenging task in the area of visual object tracking research and it is a vital component of various vision-based applications such as surveillance systems, human-following robots, and autonomous vehicles. In this paper, we proposed a single pedestrian tracking (SPT) framework for identifying each instance of a person across all video frames through a tracking-by-detection paradigm that combines deep learning and metric learning-based approaches. The SPT framework comprises three main modules: detection, re-identification, and tracking. Our contribution is a significant improvement in the results by designing two compact metric learning-based models using Siamese architecture in the pedestrian re-identification module and combining one of the most robust re-identification models for data associated with the pedestrian detector in the tracking module. We carried out several analyses to evaluate the performance of our SPT framework for single pedestrian tracking in the videos. The results of the re-identification module validate that our two proposed re-identification models surpass existing state-of-the-art models with increased accuracies of 79.2% and 83.9% on the large dataset and 92% and 96% on the small dataset. Moreover, the proposed SPT tracker, along with six state-of-the-art (SOTA) tracking models, has been tested on various indoor and outdoor video sequences. A qualitative analysis considering six major environmental factors verifies the effectiveness of our SPT tracker under illumination changes, appearance variations due to pose changes, changes in target position, and partial occlusions. In addition, quantitative analysis based on experimental results also demonstrates that our proposed SPT tracker outperforms the GOTURN, CSRT, KCF, and SiamFC trackers with a success rate of 79.7% while beating the DiamSiamRPN, SiamFC, CSRT, GOTURN, and SiamMask trackers with an average of 18 tracking frames per second.

12.
Sensors (Basel) ; 23(2)2023 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-36679613

RESUMEN

Recently, person-following robots have been increasingly used in many real-world applications, and they require robust and accurate person identification for tracking. Recent works proposed to use re-identification metrics for identification of the target person; however, these metrics suffer due to poor generalization, and due to impostors in nonlinear multi-modal world. This work learns a domain generic person re-identification to resolve real-world challenges and to identify the target person undergoing appearance changes when moving across different indoor and outdoor environments or domains. Our generic metric takes advantage of novel attention mechanism to learn deep cross-representations to address pose, viewpoint, and illumination variations, as well as jointly tackling impostors and style variations the target person randomly undergoes in various indoor and outdoor domains; thus, our generic metric attains higher recognition accuracy of target person identification in complex multi-modal open-set world, and attains 80.73% and 64.44% Rank-1 identification in multi-modal close-set PRID and VIPeR domains, respectively.


Asunto(s)
Identificación Biométrica , Robótica , Humanos , Reconocimiento de Normas Patrones Automatizadas , Estimulación Luminosa , Benchmarking
13.
Entropy (Basel) ; 25(8)2023 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-37628184

RESUMEN

Person re-identification is a technology used to identify individuals across different cameras. Existing methods involve extracting features from an input image and using a single feature for matching. However, these features often provide a biased description of the person. To address this limitation, this paper introduces a new method called the Dual Descriptor Feature Enhancement (DDFE) network, which aims to emulate the multi-perspective observation abilities of humans. The DDFE network uses two independent sub-networks to extract descriptors from the same person image. These descriptors are subsequently combined to create a comprehensive multi-view representation, resulting in a significant improvement in recognition performance. To further enhance the discriminative capability of the DDFE network, a carefully designed training strategy is employed. Firstly, the CurricularFace loss is introduced to enhance the recognition accuracy of each sub-network. Secondly, the DropPath operation is incorporated to introduce randomness during sub-network training, promoting difference between the descriptors. Additionally, an Integration Training Module (ITM) is devised to enhance the discriminability of the integrated features. Extensive experiments are conducted on the Market1501 and MSMT17 datasets. On the Market1501 dataset, the DDFE network achieves an mAP of 91.6% and a Rank1 of 96.1%; on the MSMT17 dataset, the network achieves an mAP of 69.9% and a Rank1 of 87.5%. These outcomes outperform most SOTA methods, highlighting the significant advancement and effectiveness of the DDFE network.

14.
Sensors (Basel) ; 22(16)2022 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-36016054

RESUMEN

Person re-identification is essential to intelligent video analytics, whose results affect downstream tasks such as behavior and event analysis. However, most existing models only consider the accuracy, rather than the computational complexity, which is also an aspect to consider in practical deployment. We note that self-attention is a powerful technique for representation learning. It can work with convolution to learn more discriminative feature representations for re-identification. We propose an improved multi-scale feature learning structure, DM-OSNet, with better performance than the original OSNet. Our DM-OSNet replaces the 9×9 convolutional stream in OSNet with multi-head self-attention. To maintain model efficiency, we use double-layer multi-head self-attention to reduce the computational complexity of the original multi-head self-attention. The computational complexity is reduced from the original O((H×W)2) to O(H×W×G2). To further improve the model performance, we use SpCL to perform unsupervised pre-training on the large-scale unlabeled pedestrian dataset LUPerson. Finally, our DM-OSNet achieves an mAP of 87.36%, 78.26%, 72.96%, and 57.13% on the Market1501, DukeMTMC-reID, CUHK03, and MSMT17 datasets.


Asunto(s)
Redes Neurales de la Computación , Peatones , Humanos , Aprendizaje , Reconocimiento de Normas Patrones Automatizadas/métodos
15.
Sensors (Basel) ; 22(18)2022 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-36146326

RESUMEN

Unsupervised person re-identification has attracted a lot of attention due to its strong potential to adapt to new environments without manual annotation, but learning to recognise features in disjoint camera views without annotation is still challenging. Existing studies tend to ignore the optimisation of feature extractors in the feature-extraction stage of this task, while the use of traditional losses in the unsupervised learning stage severely affects the performance of the model. Additionally the use of a contrast learning framework in the latest methods uses only a single cluster centre or all instance features, without considering the correctness and diversity of the samples in the class, which affects the training of the model. Therefore, in this paper, we design an unsupervised person-re-identification framework called attention-guided fine-grained feature network and symmetric contrast learning (AFF_SCL) to improve the two stages in the unsupervised person-re-identification task. AFF_SCL focuses on learning recognition features through two key modules, namely the Attention-guided Fine-grained Feature network (AFF) and the Symmetric Contrast Learning module (SCL). Specifically, the attention-guided fine-grained feature network enhances the network's ability to discriminate pedestrians by performing further attention operations on fine-grained features to obtain detailed features of pedestrians. The symmetric contrast learning module replaces the traditional loss function to exploit the information potential given by the multiple samples and maintains the stability and generalisation capability of the model. The performance of the USL and UDA methods is tested on the Market-1501 and DukeMTMC-reID datasets by means of the results, which demonstrate that the method outperforms some existing methods, indicating the superiority of the framework.


Asunto(s)
Identificación Biométrica , Peatones , Atención , Identificación Biométrica/métodos , Humanos , Procesamiento de Imagen Asistido por Computador/métodos , Mantenimiento
16.
Sensors (Basel) ; 22(24)2022 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-36560221

RESUMEN

Person re-identification (re-ID) is one of the essential tasks for modern visual intelligent systems to identify a person from images or videos captured at different times, viewpoints, and spatial positions. In fact, it is easy to make an incorrect estimate for person re-ID in the presence of illumination change, low resolution, and pose differences. To provide a robust and accurate prediction, machine learning techniques are extensively used nowadays. However, learning-based approaches often face difficulties in data imbalance and distinguishing a person from others having strong appearance similarity. To improve the overall re-ID performance, false positives and false negatives should be part of the integral factors in the design of the loss function. In this work, we refine the well-known AGW baseline by incorporating a focal Tversky loss to address the data imbalance issue and facilitate the model to learn effectively from the hard examples. Experimental results show that the proposed re-ID method reaches rank-1 accuracy of 96.2% (with mAP: 94.5) and rank-1 accuracy of 93% (with mAP: 91.4) on Market1501 and DukeMTMC datasets, respectively, outperforming the state-of-the-art approaches.


Asunto(s)
Inteligencia , Humanos , Iluminación , Aprendizaje Automático , Grabación de Cinta de Video
17.
Sensors (Basel) ; 22(19)2022 Sep 30.
Artículo en Inglés | MEDLINE | ID: mdl-36236528

RESUMEN

Pedestrian origin-destination (O-D) estimates that record traffic flows between origins and destinations, are essential for the management of pedestrian facilities including pedestrian flow simulation in the planning phase and crowd control in the operation phase. However, current O-D data collection techniques such as surveys, mobile sensing using GPS, Wi-Fi, and Bluetooth, and smart card data have the disadvantage that they are either time consuming and costly, or cannot provide complete O-D information for pedestrian facilities without entrances and exits or pedestrian flow inside the facilities. Due to the full coverage of CCTV cameras and the huge potential of image processing techniques, we address the challenges of pedestrian O-D estimation and propose an image-based O-D estimation framework. By identifying the same person in disjoint camera views, the O-D trajectory of each identity can be accurately generated. Then, state-of-the-art deep neural networks (DNNs) for person re-ID at different congestion levels were compared and improved. Finally, an O-D matrix based on trajectories was generated and the resident time was calculated, which provides recommendations for pedestrian facility improvement. The factors that affect the accuracy of the framework are discussed in this paper, which we believe could provide new insights and stimulate further research into the application of the Internet of cameras to intelligent transport infrastructure management.


Asunto(s)
Peatones , Simulación por Computador , Aglomeración , Humanos , Procesamiento de Imagen Asistido por Computador , Redes Neurales de la Computación
18.
Sensors (Basel) ; 22(23)2022 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-36502088

RESUMEN

To solve the problem of inadequate feature extraction by the model due to factors such as occlusion and illumination in person re-identification tasks, this paper proposed a model with a joint cross-consistency learning and multi-feature fusion person re-identification. The attention mechanism and the mixed pooling module were first embedded in the residual network so that the model adaptively focuses on the more valid information in the person images. Secondly, the dataset was randomly divided into two categories according to the camera perspective, and a feature classifier was trained for the two types of datasets respectively. Then, two classifiers with specific knowledge were used to guide the model to extract features unrelated to the camera perspective for the two types of datasets so that the obtained image features were endowed with domain invariance by the model, and the differences in the perspective, attitude, background, and other related information of different images were alleviated. Then, the multi-level features were fused through the feature pyramid to concern the more critical information of the image. Finally, a combination of Cosine Softmax loss, triplet loss, and cluster center loss was proposed to train the model to address the differences of multiple losses in the optimization space. The first accuracy of the proposed model reached 95.9% and 89.7% on the datasets Market-1501 and DukeMTMC-reID, respectively. The results indicated that the proposed model has good feature extraction capability.


Asunto(s)
Conocimiento , Aprendizaje , Humanos , Iluminación
19.
Sensors (Basel) ; 22(15)2022 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-35957414

RESUMEN

Currently, the importance of autonomous operating devices is rising with the increasing number of applications that run on robotic platforms or self-driving cars. The context of social robotics assumes that robotic platforms operate autonomously in environments where people perform their daily activities. The ability to re-identify the same people through a sequence of images is a critical component for meaningful human-robot interactions. Considering the quick reactions required by a self-driving car for safety considerations, accurate real-time tracking and people trajectory prediction are mandatory. In this paper, we introduce a real-time people re-identification system based on a trajectory prediction method. We tackled the problem of trajectory prediction by introducing a system that combines semantic information from the environment with social influence from the other participants in the scene in order to predict the motion of each individual. We evaluated the system considering two possible case studies, social robotics and autonomous driving. In the context of social robotics, we integrated the proposed re-identification system as a module into the AMIRO framework that is designed for social robotic applications and assistive care scenarios. We performed multiple experiments in order to evaluate the performance of our proposed method, considering both the trajectory prediction component and the person re-identification system. We assessed the behaviour of our method on existing datasets and on real-time acquired data to obtain a quantitative evaluation of the system and a qualitative analysis. We report an improvement of over 5% for the MOTA metric when comparing our re-identification system with the existing module, on both evaluation scenarios, social robotics and autonomous driving.


Asunto(s)
Robótica , Humanos , Movimiento (Física) , Robótica/métodos
20.
Sensors (Basel) ; 22(8)2022 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-35459030

RESUMEN

Partial occlusion and background clutter in camera video surveillance affect the accuracy of video-based person re-identification (re-ID). To address these problems, we propose a person re-ID method based on random erasure of frame sampling and temporal weight aggregation of mutual information of partial and global features. First, for the case in which the target person is interfered or partially occluded, the frame sampling-random erasure (FSE) method is used for data enhancement to effectively alleviate the occlusion problem, improve the generalization ability of the model, and match persons more accurately. Second, to further improve the re-ID accuracy of video-based persons and learn more discriminative feature representations, we use a ResNet-50 network to extract global and partial features and fuse these features to obtain frame-level features. In the time dimension, based on a mutual information-temporal weight aggregation (MI-TWA) module, the partial features are added according to different weights and the global features are added according to equal weights and connected to output sequence features. The proposed method is extensively experimented on three public video datasets, MARS, DukeMTMC-VideoReID, and PRID-2011; the mean average precision (mAP) values are 82.4%, 94.1%, and 95.3% and Rank-1 values are 86.4%, 94.8%, and 95.2%, respectively.


Asunto(s)
Grabación en Video , Humanos , Grabación en Video/métodos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda