Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Más filtros











Intervalo de año de publicación
1.
Neural Netw ; 179: 106539, 2024 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-39089149

RESUMEN

Significant progress has been achieved in multi-object tracking (MOT) through the evolution of detection and re-identification (ReID) techniques. Despite these advancements, accurately tracking objects in scenarios with homogeneous appearance and heterogeneous motion remains a challenge. This challenge arises from two main factors: the insufficient discriminability of ReID features and the predominant utilization of linear motion models in MOT. In this context, we introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor that relies solely on object trajectory information. This predictor comprehensively integrates two levels of granularity in motion features to enhance the modeling of temporal dynamics and facilitate precise future motion prediction for individual objects. Specifically, the proposed approach adopts a self-attention mechanism to capture token-level information and a Dynamic MLP layer to model channel-level features. MotionTrack is a simple, online tracking approach. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT, characterized by highly complex object motion.


Asunto(s)
Movimiento (Física) , Humanos , Redes Neurales de la Computación , Algoritmos , Aprendizaje Automático , Percepción de Movimiento/fisiología
2.
Sensors (Basel) ; 24(11)2024 Jun 03.
Artículo en Inglés | MEDLINE | ID: mdl-38894408

RESUMEN

Most logit-based knowledge distillation methods transfer soft labels from the teacher model to the student model via Kullback-Leibler divergence based on softmax, an exponential normalization function. However, this exponential nature of softmax tends to prioritize the largest class (target class) while neglecting smaller ones (non-target classes), leading to an oversight of the non-target classes's significance. To address this issue, we propose Non-Target-Class-Enhanced Knowledge Distillation (NTCE-KD) to amplify the role of non-target classes both in terms of magnitude and diversity. Specifically, we present a magnitude-enhanced Kullback-Leibler (MKL) divergence multi-shrinking the target class to enhance the impact of non-target classes in terms of magnitude. Additionally, to enrich the diversity of non-target classes, we introduce a diversity-based data augmentation strategy (DDA), further enhancing overall performance. Extensive experimental results on the CIFAR-100 and ImageNet-1k datasets demonstrate that non-target classes are of great significance and that our method achieves state-of-the-art performance across a wide range of teacher-student pairs.

3.
Artículo en Inglés | MEDLINE | ID: mdl-38691433

RESUMEN

The training process of a domain generalization (DG) model involves utilizing one or more interrelated source domains to attain optimal performance on an unseen target domain. Existing DG methods often use auxiliary networks or require high computational costs to improve the model's generalization ability by incorporating a diverse set of source domains. In contrast, this work proposes a method called Smooth-Guided Implicit Data Augmentation (SGIDA) that operates in the feature space to capture the diversity of source domains. To amplify the model's generalization capacity, a distance metric learning (DML) loss function is incorporated. Additionally, rather than depending on deep features, the suggested approach employs logits produced from cross entropy (CE) losses with infinite augmentations. A theoretical analysis shows that logits are effective in estimating distances defined on original features, and the proposed approach is thoroughly analyzed to provide a better understanding of why logits are beneficial for DG. Moreover, to increase the diversity of the source domain, a sampling-based method called smooth is introduced to obtain semantic directions from interclass relations. The effectiveness of the proposed approach is demonstrated through extensive experiments on widely used DG, object detection, and remote sensing datasets, where it achieves significant improvements over existing state-of-the-art methods across various backbone networks.

4.
Artículo en Inglés | MEDLINE | ID: mdl-38625773

RESUMEN

Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by comparing our model generalization capabilities on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.

5.
IEEE Trans Pattern Anal Mach Intell ; 46(9): 6341-6354, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-38546996

RESUMEN

Given data with noisy labels, over-parameterized deep networks suffer overfitting mislabeled data, resulting in poor generalization. The memorization effect of deep networks shows that although the networks have the ability to memorize all noisy data, they would first memorize clean training data, and then gradually memorize mislabeled training data. A simple and effective method that exploits the memorization effect to combat noisy labels is early stopping. However, early stopping cannot distinguish the memorization of clean data and mislabeled data, resulting in the network still inevitably overfitting mislabeled data in the early training stage. In this paper, to decouple the memorization of clean data and mislabeled data, and further reduce the side effect of mislabeled data, we perform additive decomposition on network parameters. Namely, all parameters are additively decomposed into two groups, i.e., parameters w are decomposed as w=σ+γ. Afterward, the parameters σ are considered to memorize clean data, while the parameters γ are considered to memorize mislabeled data. Benefiting from the memorization effect, the updates of the parameters σ are encouraged to fully memorize clean data in early training, and then discouraged with the increase of training epochs to reduce interference of mislabeled data. The updates of the parameters γ are the opposite. In testing, only the parameters σ are employed to enhance generalization. Extensive experiments on both simulated and real-world benchmarks confirm the superior performance of our method.

6.
Neural Netw ; 170: 548-563, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38052151

RESUMEN

Siamese tracking has witnessed tremendous progress in tracking paradigm. However, its default box estimation pipeline still faces a crucial inconsistency issue, namely, the bounding box decided by its classification score is not always best overlapped with the ground truth, thus harming performance. To this end, we explore a novel simple tracking paradigm based on the intersection over union (IoU) value prediction. To first bypass this inconsistency issue, we propose a concise target state predictor termed IoUformer, which instead of default box estimation pipeline directly predicts the IoU values related to tracking performance metrics. In detail, it extends the long-range dependency modeling ability of transformer to jointly grasp target-aware interactions between target template and search region, and search sub-region interactions, thus neatly unifying global semantic interaction and target state prediction. Thanks to this joint strength, IoUformer can predict reliable IoU values near-linear with the ground truth, which paves a safe way for our new IoU-based siamese tracking paradigm. Since it is non-trivial to explore this paradigm with pleased efficacy and portability, we offer the respective network components and two alternative localization ways. Experimental results show that our IoUformer-based tracker achieves promising results with less training data. For its applicability, it still serves as a refinement module to consistently boost existing advanced trackers.


Asunto(s)
Benchmarking , Semántica
7.
Micromachines (Basel) ; 14(10)2023 Oct 05.
Artículo en Inglés | MEDLINE | ID: mdl-37893343

RESUMEN

Piezoelectric ceramic actuators utilize an inverse piezoelectric effect to generate high-frequency vibration energy and are widely used in ultrasonic energy conversion circuits. This paper presents a novel drive circuit with input-current shaping (ICS) and soft-switching features which consists of a front AC-DC full-wave bridge rectifier and a rear DC-AC circuit combining a stacked boost converter and a half-bridge resonant inverter for driving a piezoelectric ceramic actuator. To enable ICS functionality in the proposed drive circuit, the inductor of the stacked boost converter sub-circuit is designed to operate in boundary-conduction mode (BCM). In order to allow the two power switches in the proposed drive circuit to achieve zero-voltage switching (ZVS) characteristics, the resonant circuit of the half-bridge resonant inverter sub-circuit is designed as an inductive load. In this paper, a prototype drive circuit for providing piezoelectric ceramic actuators was successfully implemented. Experimental results tested at 110 V input utility voltage show that high power factor (PF > 0.97), low input current total harmonic distortion (THD < 16%), and ZVS characteristics of the power switch were achieved in the prototype drive circuit.

8.
IEEE Trans Image Process ; 32: 5992-6003, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37903046

RESUMEN

Video hashing learns compact representation by mapping video into low-dimensional Hamming space and has achieved promising performance in large-scale video retrieval. It is challenging to effectively exploit temporal and spatial structure in an unsupervised setting. To fulfill this gap, this paper proposes Contrastive Transformer Hashing (CTH) for effective video retrieval. Specifically, CTH develops a bidirectional transformer autoencoder, based on which visual reconstruction loss is proposed. CTH is more powerful to capture bidirectional correlations among frames than conventional unidirectional models. In addition, CTH devises multi-modality contrastive loss to reveal intrinsic structure among videos. CTH constructs inter-modality and intra-modality triplet sets and proposes multi-modality contrastive loss to exploit inter-modality and intra-modality similarities simultaneously. We perform video retrieval tasks on four benchmark datasets, i.e., UCF101, HMDB51, SVW30, FCVID using the learned compact hash representation, and extensive empirical results demonstrate the proposed CTH outperforms several state-of-the-art video hashing methods.

9.
Artículo en Inglés | MEDLINE | ID: mdl-37410641

RESUMEN

Supervised person re-identification (ReID) has attracted widespread attentions in the computer vision community due to its great potential in real-world applications. However, the demand of human annotation heavily limits the application as it is costly to annotate identical pedestrians appearing from different cameras. Thus, how to reduce the annotation cost while preserving the performance remains challenging and has been studied extensively. In this article, we propose a tracklet-aware co-cooperative annotators' framework to reduce the demand of human annotation. Specifically, we partition the training samples into different clusters and associate adjacent images in each cluster to produce the robust tracklet which decreases the annotation requirements significantly. Besides, to further reduce the cost, we introduce a powerful teacher model in our framework to implement the active learning strategy and select the most informative tracklets for human annotator, the teacher model itself, in our setting, also acts as an annotator to label the relatively certain tracklets. Thus, our final model could be well-trained with both confident pseudo-labels and human-given annotations. Extensive experiments on three popular person ReID datasets demonstrate that our approach could achieve competitive performance compared with state-of-the-art methods in both active learning and unsupervised learning (USL) settings.

10.
Neural Netw ; 165: 705-720, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-37385024

RESUMEN

Much progress has been made in siamese tracking, primarily benefiting from increasing huge training data. However, very little attention has been really paid to the role of huge training data in learning an effective siamese tracker. In this study, we undertake an in-depth analysis of this issue from a novel optimization perspective, and observe that training data is particularly adept at background suppression, thereby refining target representation. Inspired by this insight, we present a data-free siamese tracking algorithm named SiamDF, which requires only a pre-trained backbone and no further fine-tuning on additional training data. Particularly, to suppress background distractors, we separately improve two branches of siamese tracking by retaining the pure target region as target input with the removal of template background, and by exploring an efficient inverse transformation to maintain the constant aspect ratio of target state in search region. Besides, we further promote the center displacement prediction of the entire backbone by eliminating its spatial stride deviations caused by convolution-like quantification operations. Our experimental results on several popular benchmarks demonstrate that SiamDF, free from both offline fine-tuning and online update, achieves impressive performance compared to well-established unsupervised and supervised tracking methods.


Asunto(s)
Algoritmos , Aprendizaje , Benchmarking
11.
IEEE Trans Image Process ; 32: 3338-3353, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37235471

RESUMEN

Unsupervised person re-identification is a challenging and promising task in computer vision. Nowadays unsupervised person re-identification methods have achieved great progress by training with pseudo labels. However, how to purify feature and label noise is less explicitly studied in the unsupervised manner. To purify the feature, we take into account two types of additional features from different local views to enrich the feature representation. The proposed multi-view features are carefully integrated into our cluster contrast learning to leverage more discriminative cues that the global feature easily ignored and biased. To purify the label noise, we propose to take advantage of the knowledge of teacher model in an offline scheme. Specifically, we first train a teacher model from noisy pseudo labels, and then use the teacher model to guide the learning of our student model. In our setting, the student model could converge fast with the supervision of the teacher model thus reduce the interference of noisy labels as the teacher model greatly suffered. After carefully handling the noise and bias in the feature learning, our purification modules are proven to be very effective for unsupervised person re-identification. Extensive experiments on two popular person re-identification datasets demonstrate the superiority of our method. Especially, our approach achieves a state-of-the-art accuracy 85.8% @mAP and 94.5% @Rank-1 on the challenging Market-1501 benchmark with ResNet-50 under the fully unsupervised setting. Code has been available at: https://github.com/tengxiao14/Purification_ReID.


Asunto(s)
Benchmarking , Aprendizaje , Humanos
12.
Neural Netw ; 163: 86-96, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37030278

RESUMEN

Off-Policy Actor-Critic methods can effectively exploit past experiences and thus they have achieved great success in various reinforcement learning tasks. In many image-based and multi-agent tasks, attention mechanism has been employed in Actor-Critic methods to improve their sampling efficiency. In this paper, we propose a meta attention method for state-based reinforcement learning tasks, which combines attention mechanism and meta-learning based on the Off-Policy Actor-Critic framework. Unlike previous attention-based work, our meta attention method introduces attention in the Actor and the Critic of the typical Actor-Critic framework, rather than in multiple pixels of an image or multiple information sources in specific image-based control tasks or multi-agent systems. In contrast to existing meta-learning methods, the proposed meta-attention approach is able to function in both the gradient-based training phase and the agent's decision-making process. The experimental results demonstrate the superiority of our meta-attention method in various continuous control tasks, which are based on the Off-Policy Actor-Critic methods including DDPG and TD3.


Asunto(s)
Algoritmos , Refuerzo en Psicología , Aprendizaje
13.
IEEE Trans Cybern ; 53(10): 6236-6247, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35604988

RESUMEN

Deep hashing reaps the benefits of deep learning and hashing technology, and has become the mainstream of large-scale image retrieval. It generally encodes image into hash code with feature similarity preserving, that is, geometric-structure preservation, and achieves promising retrieval results. In this article, we find that existing geometric-structure preservation manner inadequately ensures feature discrimination, while improving feature discrimination of hash code essentially determines hash learning retrieval performance. This fact principally spurs us to propose a discriminative geometric-structure-based deep hashing method (DGDH), which investigates three novel loss terms based on class centers to induce the so-called discriminative geometrical structure. In detail, the margin-aware center loss assembles samples in the same class to the corresponding class centers for intraclass compactness, then a linear classifier based on class center serves to boost interclass separability, and the radius loss further puts different class centers on a hypersphere to tentatively reduce quantization errors. An efficient alternate optimization algorithm with guaranteed desirable convergence is proposed to optimize DGDH. We theoretically analyze the robustness and generalization of the proposed method. The experiments on five popular benchmark datasets demonstrate superior image retrieval performance of the proposed DGDH over several state of the arts.

14.
Acta Physiologica Sinica ; (6): 379-389, 2023.
Artículo en Chino | WPRIM (Pacífico Occidental) | ID: wpr-981014

RESUMEN

The present article was aimed to compare the effectiveness of different induction methods for depression models. Kunming mice were randomly divided into chronic unpredictable mild stress (CUMS) group, corticosterone (CORT) group, and CUMS+CORT (CC) group. The CUMS group received CUMS stimulation for 4 weeks, and the CORT group received subcutaneous injection of 20 mg/kg CORT into the groin every day for 3 weeks. The CC group received both CUMS stimulation and CORT administration. Each group was assigned a control group. After modeling, forced swimming test (FST), tail suspension test (TST) and sucrose preference test (SPT) were used to detect the behavioral changes of mice, and the serum levels of brain-derived neurotrophic factor (BDNF), 5-hydroxytryptamine (5-HT) and CORT were detected with ELISA kits. Attenuated total refraction (ATR) spectra of mouse serum were collected and analyzed. HE staining was used to detect morphological changes in mouse brain tissue. The results showed that the weight of model mice from the CUMS and CC groups decreased significantly. There was no significant change in immobility time of model mice from the three groups in FST and TST, while the glucose preference of model mice from the CUMS and CC groups was significantly reduced (P < 0.05). The serum 5-HT levels of model mice from the CORT and CC groups were significantly reduced, while the serum BDNF and CORT levels of model mice from the CUMS, CORT, and CC groups showed no significant changes. Compared with their respective control groups, the three groups showed no significant difference in the one-dimensional spectrum of serum ATR. The difference spectrum analysis results of the first derivative of the spectrogram showed that the CORT group had the greatest difference from its respective control group, followed by the CUMS group. The structures of hippocampus in the model mice from the three groups were all destroyed. These results suggest that both CORT and CC treatments can successfully construct a depression model, and the CORT model is more effective than the CC model. Therefore, CORT induction can be used to establish a depression model in Kunming mice.


Asunto(s)
Ratones , Animales , Depresión/etiología , Antidepresivos/farmacología , Factor Neurotrófico Derivado del Encéfalo , Serotonina
15.
Neural Netw ; 154: 521-537, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-35987063

RESUMEN

How to obtain good retrieval performance in the case of few-shot labeled samples is the current research focus of Person Re-Identification. To facilitate formal analysis, we formally put forward the concept of Pseudo-Supervised Learning (PSL) to represent a series of research works based on label generation under few-shot condition. Through extensive investigations, we find that the main problem that needs to be solved of PSL is how we can improve the quality of pseudo-label. To solve this problem, in this work, we proposed a simple yet effective Heterogeneous Pseudo-Supervised Learning (H-PSL) framework based on classical PSL to implement asynchronous match, which boosts the feature expression and then a better label prediction in the following. Specifically, a novel isomer is constructed as the feature extractor and is trained with a much larger amount of pseudo-supervised data, i.e., samples with pseudo-labels. In this way, the isomer obtains advanced feature expression. We then deliberately implement a cross-level asynchronous match mechanism between model and pseudo-supervised data. As a result, the quality of pseudo-label is greatly improved and the feature expression performance also be optimized accordingly. In addition, to make better use of pseudo-supervised data, we also designed a knowledge fusion strategy to integrate the pseudo labels and their confidence which are easily obtained by the base model and isomer. Encouragingly, knowledge fusion strategy further removes the noise-labeled samples from candidate data. We conduct experiments on four popular datasets to fully verify the universality of the proposed method. The experimental results show that the proposed method improves the performance of all compared baseline works.

16.
Acta Pharmaceutica Sinica ; (12): 2445-2452, 2022.
Artículo en Chino | WPRIM (Pacífico Occidental) | ID: wpr-937059

RESUMEN

The combination of Shuanghuanglian injection (SHLI) and ciprofloxacin injection (CIPI) is frequently prescribed in clinical practice, but the basis for the combination is weak. In this study, isothermal titration calorimetry and ultraviolet-visible absorption spectrometry were applied to identify the molecular interactions of SHLI and its main components, chlorogenic acid and neochlorogenic acid with CIPI. Scanning electron microscopy, Fourier-transform infrared spectroscopy, and cold-spray ionization mass spectrometry were performed to confirm that this molecular interaction was related to the formation of self-assembled supramolecular systems induced by chlorogenic acid and neochlorogenic acid with CIPI through weak intermolecular bonds. The antibacterial activity toward Pseudomonas aeruginosa (P. aeruginosa) was evaluated via molecular interactions, and the inhibitory ability of SHLI, chlorogenic acid and neochlorogenic acid against P. aeruginosa was significantly reduced after interaction with CIPI. A molecular docking study demonstrated that the reduced antibacterial ability was closely related to the competitive binding of drug molecules to the same binding site of the DNA gyrase B (GyrB) subunit of P. aeruginosa. The present study uncovered the intermolecular interactions of SHLI and its main components chlorogenic acid and neochlorogenic acid with CIPI from the perspective of molecular self-assembly and contribute to the reduction of its antibacterial ability, providing a basis for the clinical combination of SHLI and CIPI.

17.
Acta Pharmaceutica Sinica ; (12): 1471-1476, 2022.
Artículo en Chino | WPRIM (Pacífico Occidental) | ID: wpr-924753

RESUMEN

The joint application of traditional Chinese medicine injection containing chlorogenic acid (CA) and cefotaxime sodium (CS) is sometimes appeared in clinical practice, but the scientific basis of drug molecular compatibility is still weak. This study proposes a sequential analysis strategy based on isothermal titration calorimetry (ITC), cold-spray ionization mass spectrometry (CSI-MS) and antibacterial activity test to evaluate the molecular interactions between CA and CS. The results of ITC experiments showed that the Gibbs free energy ΔG < 0 and it was driven by enthalpy change when CA titrated CS, suggesting CA could spontaneously chemically react with CS. Subsequently, the parent ions (m/z 808.143 5) of binding molecular of CA and CS was detected by CSI-MS, indicating CA could chemically bond with CS. Furtherly, the antibacterial experiments found the antibacterial ability of CS against Klebsiella pneumonia was significantly reduced (P < 0.01) by CA in mixed solution. Finally, molecular docking technology showed CA and CS have a common target of penicillin binding protein 3 (PBP3), suggesting that the phenomenon of CA reduced the antibacterial ability of CS may be related to the competitive binding of two components with PBP3. Our studies have shown that CA could spontaneously chemically bond to CS and reduced its antibacterial ability, providing scientific data for molecular interaction evaluation of CA and CS.

18.
Micromachines (Basel) ; 12(10)2021 Oct 09.
Artículo en Inglés | MEDLINE | ID: mdl-34683280

RESUMEN

This paper proposes a novel and cost-effective drive circuit for supplying a piezoelectric ceramic actuator, which combines a dual boost AC-DC converter with a coupled inductor and a half-bridge resonant DC-AC inverter into a single-stage architecture with power-factor-correction (PFC) and soft-switching characteristics. The coupled inductor of the dual boost AC-DC converter sub-circuit is designed to work in discontinuous conduction mode (DCM), so the PFC function can be realized in the proposed drive circuit. The resonant tank of the half-bridge resonant inverter sub-circuit is designed as an inductive load, so that the two power switches in the presented drive circuit can achieve zero-voltage switching (ZVS) characteristics. A 50 W-rated prototype drive circuit providing a piezoelectric ceramic actuator has been successfully implemented in this paper. From the experimental results at 110 V input utility-line voltage, the drive circuit has the characteristics of high power factor and low input current total-harmonic-distortion factor, and two power switches have ZVS characteristics. Therefore, satisfactory outcomes from measured results prove the function of the proposed drive circuit.

19.
Entropy (Basel) ; 23(9)2021 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-34573757

RESUMEN

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.

20.
IEEE Trans Image Process ; 30: 2656-2668, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33439844

RESUMEN

Siamese trackers contain two core stages, i.e., learning the features of both target and search inputs at first and then calculating response maps via the cross-correlation operation, which can also be used for regression and classification to construct typical one-shot detection tracking framework. Although they have drawn continuous interest from the visual tracking community due to the proper trade-off between accuracy and speed, both stages are easily sensitive to the distracters in search branch, thereby inducing unreliable response positions. To fill this gap, we advance Siamese trackers with two novel non-local blocks named Nocal-Siam, which leverages the long-range dependency property of the non-local attention in a supervised fashion from two aspects. First, a target-aware non-local block (T-Nocal) is proposed for learning the target-guided feature weights, which serve to refine visual features of both target and search branches, and thus effectively suppress noisy distracters. This block reinforces the interplay between both target and search branches in the first stage. Second, we further develop a location-aware non-local block (L-Nocal) to associate multiple response maps, which prevents them inducing diverse candidate target positions in the future coming frame. Experiments on five popular benchmarks show that Nocal-Siam performs favorably against well-behaved counterparts both in quantity and quality.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA