DeforT: Deformable transformer for visual tracking.

Yang, Kai; Li, Qun; Tian, Chunwei; Zhang, Haijun; Shi, Aiwu; Li, Jinkai

Yang, Kai; Li, Qun; Tian, Chunwei; Zhang, Haijun; Shi, Aiwu; Li, Jinkai.

Afiliación

Yang K; School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China; Hubei Luojia Laboratory, Wuhan 430200, China.
Li Q; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
Tian C; School of Software, Northwestern Polytechnical University, Xi'an, Shaanxi 710129, China; Yangtze River Delta Research Institute, Northwestern Polytechnical University, Taicang 215400, China.
Zhang H; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
Shi A; School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430200, China. Electronic address: saw@wtu.edu.cn.
Li J; School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China. Electronic address: smilelijinkai@gmail.com.

Neural Netw ; 176: 106380, 2024 Aug.

Article en En | MEDLINE | ID: mdl-38754289

ABSTRACT

ABSTRACT

Most trackers formulate visual tracking as common classification and regression (i.e., bounding box regression) tasks. Correlation features that are computed through depth-wise convolution or channel-wise multiplication operations are input into both the classification and regression branches for inference. However, this matching computation with the linear correlation method tends to lose semantic features and obtain only a local optimum. Moreover, these trackers use an unreliable ranking based on the classification score and the intersection over union (IoU) loss for the regression training, thus degrading the tracking performance. In this paper, we introduce a deformable transformer model, which effectively computes the correlation features of the training and search sets. A new loss called the quality-aware focal loss (QAFL) is used to train the classification network; it efficiently alleviates the inconsistency between the classification and localization quality predictions. We use a new regression loss called α-GIoU to train the regression network, and it effectively improves localization accuracy. To further improve the tracker's robustness, the candidate object location is predicted by using a combination of online learning scores with a transformer-assisted framework and classification scores. An extensive experiment on six testing datasets demonstrates the effectiveness of our method. In particular, the proposed method attains a success score of 71.7% on the OTB-2015 dataset and an AUC score of 67.3% on the NFS30 dataset, respectively.

Asunto(s)

Redes Neurales de la Computación; Humanos; Algoritmos; Tecnología de Seguimiento Ocular

Palabras clave

Classification network; Deformable transformer; Regression network; Visual tracking

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Redes Neurales de la Computación Límite: Humans Idioma: En Revista: Neural Netw Asunto de la revista: NEUROLOGIA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google